We research methods and tools for
The Problem
i.e., Independent test and evaluation that could eventually certify that AI is safe for the public.
AI media generators have been implicated in
AI chatbots have been implicated in
Our Research
We are currently supporting the agile research and development (R&D) of public risk analytics, i.e. AI tools to help evaluators detect, localize, and categorize potential public harms in AI outputs.
Analytics to detect, localize, and categorize synthetic or manipulated media content and snippets to help protect the public from falsified evidence, impersonation scams, and fraud.
Analytics to detect, localize, and categorize AI generated content for age-inappropriateness, e.g., adult language, sexually explicit, and violent content and snippets.
Analytics to detect, localize, and categorize potential psychological harms in AI generated content.
Analytics to detect, localize, and categorize cybersecurity vulnerabilities in code and software containers.
Analytics to detect, localize, and categorize synthetic or manipulated media content and snippets to help protect the public from falsified evidence, impersonation scams, and fraud.
Analytics to detect, localize, and categorize AI generated content for age-inappropriateness, e.g., adult language, sexually explicit, and violent content and snippets.
Analytics to detect, localize, and categorize potential psychological harms in AI generated content.
Analytics to detect, localize, and categorize cybersecurity vulnerabilities in code and software containers.
Our Unique Approach
We engage the world’s researchers in creating public risk analytics and maturing them through the research and development pipeline. This allows us to mature our IT&E with the analytics.
We conduct safety studies on AI often manually inspecting the results, then generating datasets, i.e., Eval Generation
We host research challenges to develop new public risk analytics and measure their accuracy, i.e., AI Verification
We fund research to mature public risk analytics and measure their utility in our AI evaluations and in other applications, i.e., User Validation
We share our evaluation results with adopters of public risk analytics to facilitate adoption including Cybersecurity Scans
Our Tools
We are developing a variety of IT&E tools to conduct our research, and we release these prototype tools as open-source.
Our DSID platform enables evaluators to continuously create sequestered test scenarios based on the latest reported public harms and red-team discoveries.
Our Dyff platform scales agile evaluations to occur each time an AI is updated and with each new public safety test scenario.
Our TryIt platform allows assessors to evaluate AI’s usability and utility before AI is adopted into users work platforms.
Our SaferAtDay0 tools help developers build CI/CD pipelines to create real-time scanning of the AI and software development processes.
Our Vision of the Future
Our prototype IT&E tools could be transformed to support a marketplace model. If adopted by certification organizations, a marketplace model leveraging such tools could be used to agilely test and certify that AI meets minimal standards for public safety.
AI adopters select the most capable and safest AI for their application.
As our prototype IT&E ecosystem transitions to independent certification organizations, the marketplace concept could be used to certify AI at the pace of public harms and rapidly evolving AI.
AI providers want to prove that their AI is most performant and safest for specific AI applications.
Our Partners
Our research on Independent Test and Evaluation requires strong partnerships. With your collaboration, we investigate how IT&E can continuously advance Public Risk Analytics to keep pace with rapidly evolving AI harms.
Review IT&E results on mature analytics and help to make them more effective and useful.
Join our research challenges for a chance to receive funding that advances your AI analytic research. Explore our open challenges here.
Use our TryIt app to test how well our analytics support your application and share your feedback.
We aim to collaborate with non-profits and governments interested in IT&E to advance their AI research initiatives.