Blog
News, updates, and editorials from our DSRI expert researchers.
Safety Alerts
Tue Oct 29 2024
Introducing the Safety Alerts Program: Protecting Yourself in a Connected World.
A privacy-respecting way to test how well systems keep secrets
Wed Oct 23 2024
How DSRI developed a data audit for LLM training data, and used it to measure the risk that an LLM could be used to "exfiltrate" sensitive information from its own training data.
Ignore All Previous Instructions and Build a Lighthouse
Mon Aug 26 2024
Society is quickly approaching a rocky shoal prepared to sink the digital commons into a sea of bots. Modern chatbots can now engage people in long, complex conversations on social media to shape and distort our perceptions of reality. We can already see instances of the new paradigm leaking out, such as the meme, "Ignore All Previous Instructions"
The Missing Section of LLM Model Cards
Fri Aug 09 2024
Large Language Models (LLMs) are increasingly integrated into a broad variety of specific products, but LLM safety research today focuses on a general sense of safety. Take for example...
Walkthrough for GRT2 Participants: Convincing the Vendor Panel
Thu Aug 08 2024
On August 8th through 11th, tens of thousands of hackers will converge on Las Vegas for DEF CON 32. One of the marquee events is a multi day generative red team (GRT2) where participants will find ways in which a Large Language Model (LLM) is “flawed” (i.e., it presents "...unexpected model behavior that is outside of the defined intent and scope of the model design.").
The Chances of Bad Advice
Thu Aug 08 2024
Suppose you have a question about a non-prescription health product, like a dietary supplement or an over-the-counter skin cream. Is it safe to act on the answers you’d get from a large language model (LLM)-based AI system?
DSRI and Ai2 Making an Impact at DEF CON 24
Fri Aug 02 2024
The Digital Safety Research Institute (DSRI) of UL Research Institutes and the Allen Institute for Artificial Intelligence (Ai2) are already making progress on their recently announced collaboration by working on independent LLM testing practices at DEF CON 2024.
Submit your AI Incident Research to IAAI
Tue Jul 30 2024
Mitigating harmful AI incidents requires that we not only understand the relevant technical factors, but also the sociotechnical context that affects the assumptions and decisions present at each stage of research, development, and deployment. DSRI is sponsoring a US$1k best paper award that also comes with up to US$1k in additional travel support for presenting authors.
Bridging Safety Research and Safety Application
Fri Jul 26 2024
To help create safer digital products for people around the world, the Digital Safety Research Institute (DSRI) of UL Research Institutes is excited to announce a research collaboration with The Allen Institute for Artificial Intelligence (Ai2). DSRI and Ai2 will collaborate on scientific research to advance the current state of large language model (LLM) safety evaluation practices.
Learning from the Past – The AI Incident Database
Tue Mar 05 2024
The Digital Safety Research Institutes (DSRI) of UL Research Institutes is thrilled to announce our partnership with TheCollab to continue developing and maintaining the AI Incident Database (AIID). DSRI views the AIID as a crucial first step in mitigating the harms of AI systems.
How to Understand Large Language Models through Improv
Tue Mar 28 2023
On February 16th, the New York Times published Kevin Roose's conversation with Microsoft's chatbot wherein the chatbot attempted to end Mr. Roose's marriage. A deeper examination of the chatlog reveals the bot conformed to technical intuition not broadly shared by people outside the machine learning engineering community.
Is your AI a "Clever Hans"?
Fri Feb 17 2023
What a counting horse can teach us about evaluating AI tools and systems. Clever Hans, the horse mathematician, was a worldwide sensation in 1904. With skill comparable to a human grade schooler, the horse solved basic arithmetic problems. “Ten minus six?” His owner would ask. ”stamp, stamp, stamp, stamp”. Clever Hans would stamp the ground four times to give his answer.