Frontier risk and preparedness

To support the safety of highly-capable AI systems, we are developing our approach to catastrophic risk preparedness, including building a Preparedness team and launching a challenge.

As part of our ‘unknown unknowns’ work stream from the Preparedness Framework⁠(opens in a new window), the Preparedness Team offered $25K each in API credits for the ten best submissions to the Preparedness Challenge. These submissions aimed to identify unique, but still plausible, risk areas for frontier AI. We received hundreds of submissions in half a dozen languages and are excited to announce our ten winners below. This exercise helped us surface new types of risk, so that we can improve our preemptive testing and mitigation strategy.

We reviewed and graded each submission by assessing technical rigor, uniqueness, scale of potential damage caused, and clarity. The top ten submissions, some of which are listed below, combined thoughtful ideas with proofs of concepts, and highlighted the advantages of their approach over an approach that did not utilize AI-related tools1.

  • Precipitating a financial crisis in a strategically important country - Claudia Biancotti 
  • Identifying private information discussed or released in public settings - Chris Cundy 
  • Increasing the likelihood of reverse-engineering classified or sensitive information - George Davis 
  • Impeding individuals’ ability to access medical care - Mato Gudelj
  • Identifying targets for blackmail and scams - Connor Heaton 
  • Causing plane crashes by accessing radio frequencies and disrupting flight paths - Joel Hypolite 
  • Running prompt injection attacks to elicit dangerous responses - Daniel Julh
  • Operating and scaling cyberattacks that break victims’ computers and request payments for restoration of functions - Jun Kokatsu
  • Interfering with patient’s medical dosage - Zhenzhen Zhan

While grading the challenge, we noticed similarities in topics that entrants identified as key threats. Roughly 70% of entrants emphasized the potential for OpenAI’s models to enhance malicious actor’s persuasive capabilities. These entrants detailed threat models that included online radicalization, polarization, and political influence. We are currently conducting studies on AI’s impact on persuasiveness, and look forward to sharing more information with the community soon. Thank you to everyone who participated in the challenge - there were many excellent submissions.

References

  1. 1

To avoid information hazards, we have kept descriptions of projects intentionally vague, and will not be publishing full proposals. Additionally, some participants did not wish for their names to be shared.

Author

OpenAI Preparedness Team