Bridging Safety Research and Safety Application

26 Jul 2024 ⋅ Sean McGregor ⋅ 2 min ⋅ #llm#ai

To help create safer digital products for people around the world, the Digital Safety Research Institute (DSRI) of UL Research Institutes is excited to announce a research collaboration with The Allen Institute for Artificial Intelligence (Ai2). DSRI and Ai2 will collaborate on scientific research to advance the current state of large language model (LLM) safety evaluation practices. This partnership is a giant step toward creating LLM safety practices centered on independent assessment before product releases. DSRI intends to research and perform red team assessment testing of current and forthcoming versions of Ai2’s Open Language Model (OLMo), focused on several different types of safety testing protocols.

As a first step, DSRI and Ai2 will be focusing on two impactful and timely projects. First, in collaboration with DSRI, Ai2’s OLMo will be the featured model at this year’s Generative Red Team Challenge hosted by the AI Village at DEF CON 2024. DEF CON 2024 will be held in Las Vegas, NV from August 8 to August 11, 2024. DSRI and Ai2 will collaboratively address a gap between the security and machine learning communities challenging the effective collection of and response to flaws in LLM products. The Challenge participants will suggest amendments to Ai2’s LLM model card.

Second, DSRI and Ai2 will cooperate directly on completing a “citizen” red team of OLMo, focusing on the lived experiences and expertise of a diverse panel of human testers. These new initiatives follow on collaborative research efforts that DSRI and Ai2 began eight months ago, which kicked off with a facilitated “tabletop” red team of Ai2’s plans for the OLMo project. This tabletop red team was intended to avoid many surprises that can emerge when red teaming is applied exclusively as a pre-release testing methodology. Check back in the months ahead for a blog post with additional details about this tabletop exercise and the lessons learned.

Ai2’s commitment to creating and sharing fully open AI resources is a perfect match for DSRI’s mission of fostering innovation around safety testing. Ai2 leads the field in transparency by making all aspects of their model development pipelines, from training data to model weights to evaluation code, available to the community for inspection and collaborative problem solving with the goal of improving AI model safety, ethics, and understanding.

DSRI and Ai2 have additional plans to evaluate future OLMo models and datasets still in development at Ai2. Watch this space for future announcements about those projects.