We research methods and tools for​

Independent Test and Evaluation (IT&E) that AI is Safe for the Public​

AI Needs IT&E

i.e., Independent test and evaluation that could eventually certify that AI is safe for the public.

Example Public Harms from AI

AI media generators have been implicated in

  • Falsified evidence
  • Impersonation scams and fraud

AI chatbots have been implicated in

  • Suicides
  • Inappropriate conversations with children

AI Evaluators Need​

  • Public Risk Analytics, i.e., AI to identify publicly harmful AI results
  • Agile IT&E platforms, i.e., automation to accelerate the IT&E so that it can keep pace with public harms from rapidly evolving AI​

Public Risk Analytics

We are currently supporting the agile research and development (R&D) of public risk analytics, i.e. AI tools to help evaluators detect, localize, and categorize potential public harms in AI outputs.

Media Forensic Analytics

Analytics to detect, localize, and categorize synthetic or manipulated media content and snippets to help protect the public from falsified evidence, impersonation scams, and fraud.

Age-Screening Analytics

Analytics to detect, localize, and categorize AI generated content for age-inappropriateness, e.g., adult language, sexually explicit, and violent content and snippets.

Psychological Harm Analytics

Analytics to detect, localize, and categorize potential psychological harms in AI generated content.​

Cyber Vulnerability Analytics

Analytics to detect, localize, and categorize cybersecurity vulnerabilities in code and software containers.​

Analytics to detect, localize, and categorize synthetic or manipulated media content and snippets to help protect the public from falsified evidence, impersonation scams, and fraud.

Analytics to detect, localize, and categorize AI generated content for age-inappropriateness, e.g., adult language, sexually explicit, and violent content and snippets.

Analytics to detect, localize, and categorize potential psychological harms in AI generated content.​

Analytics to detect, localize, and categorize cybersecurity vulnerabilities in code and software containers.​

The Public Risk Analytic Pipeline

We engage the world’s researchers in creating public risk analytics and maturing them through the research and development pipeline. This allows us to mature our IT&E with the analytics.

Incubation

We conduct safety studies on AI often manually inspecting the results, then generating datasets, i.e., Eval Generation

Research

We host research challenges to develop new public risk analytics and measure their accuracy, i.e., AI Verification

Engineering

We fund research to mature public risk analytics and measure their utility in our AI evaluations and in other applications, i.e., User Validation

Adoption

We share our evaluation results with adopters of public risk analytics to facilitate adoption including Cybersecurity Scans​

Our Research Tools

We are developing a variety of IT&E tools to conduct our research, and we release these prototype tools as open-source.

Eval Generation

Eval Generation

Our DSID platform enables evaluators to continuously create sequestered test scenarios based on the latest reported public harms and red-team discoveries.​

AI Verification

AI Verification

Our Dyff platform scales agile evaluations to occur each time an AI is updated and with each new public safety test scenario.​

User Validation

User Validation

Our TryIt platform allows assessors to evaluate AI’s usability and utility before AI is adopted into users work platforms.​

Cyber Auditing

Cyber Auditing

Our SaferAtDay0 tools help developers build CI/CD pipelines to create real-time scanning of the AI and software development processes.​

Certified AI Marketplaces

Our prototype IT&E tools could be transformed to support a marketplace model. If adopted by certification organizations, a marketplace model leveraging such tools could be used to agilely test and certify that AI meets minimal standards for public safety.​

AI Adopters

AI Adopters

AI adopters select the most capable and safest AI for their application.​

Future AI Marketplaces

Future AI Marketplaces

As our prototype IT&E ecosystem transitions to independent certification organizations, the marketplace concept could be used to certify AI at the pace of public harms and rapidly evolving AI.​

AI Providers

AI Providers

AI providers want to prove that their AI is most performant and safest for specific AI applications.​

Join Us

Our research on Independent Test and Evaluation requires strong partnerships. With your collaboration, we investigate how IT&E can continuously advance Public Risk Analytics to keep pace with rapidly evolving AI harms.

Contact Us

AI Adopters

Review IT&E results on mature analytics and help to make them more effective and useful.

Researchers

Join our research challenges for a chance to receive funding that advances your AI analytic research. Explore our open challenges here.

Analysts

Use our TryIt app to test how well our analytics support your application and share your feedback.

Sponsors

We aim to collaborate with non-profits and governments interested in IT&E to advance their AI research initiatives.