Improving verifiability in AI development

We’ve contributed to a multi-stakeholder report by 58 co-authors at 30 organizations, including the Centre for the Future of Intelligence, Mila, Schwartz Reisman Institute for Technology and Society, 

While a growing number of organizations have articulated ethics principles to guide their AI development process, it can be difficult for those outside of an organization to verify whether the organization’s AI systems reflect those principles in practice. This ambiguity makes it harder for stakeholders such as users, policymakers, and civil society to scrutinize AI developers’ claims about properties of AI systems and could fuel competitive corner-cutting, increasing social risks and harms. The report describes existing and potential mechanisms that can help stakeholders grapple with questions like:

  • Can I (as a user) verify the claims made about the level of privacy protection guaranteed by a new AI system I’d like to use for machine translation of sensitive documents?
  • Can I (as a regulator) trace the steps that led to an accident caused by an autonomous vehicle? Against what standards should an autonomous vehicle company’s safety claims be compared?
  • Can I (as an academic) conduct impartial research on the risks associated with large-scale AI systems when I lack the computing resources of industry?
  • Can I (as an AI developer) verify that my competitors in a given area of AI development will follow best practices rather than cut corners to gain an advantage?

The 10 mechanisms highlighted in the report are listed below, along with recommendations aimed at advancing each one. (See the report⁠(opens in a new window) for discussion of how these mechanisms support verifiable claims as well as relevant caveats about our findings.)

Report authors, descending contribution

Gillian Hadfield (OpenAI, University of Toronto, Schwartz Reisman Institute for Technology and Society)Heidy Khlaaf (Adelard)Jingying Yang (Partnership on AI)Helen Toner (Center for Security and Emerging Technology)Ruth Fong (University of Oxford)Tegan Maharaj (Mila, Montreal Polytechnic)Pang Wei Koh (Stanford University)Sara Hooker (Google Brain)Jade Leung (Future of Humanity Institute)Andrew Trask (University of Oxford)Emma Bluemke (University of Oxford)Jonathan Lebensold (Mila, McGill University)Cullen O’Keefe (OpenAI)Mark Koren (Stanford Centre for AI Safety)Théo Ryffel (École Normale Supérieure [Paris])JB Rubinovitz (Remedy.AI)Tamay Besiroglu (University of Cambridge)Federica Carugati (Center for Advanced Study in the Behavioral Sciences)Jack Clark (OpenAI)Peter Eckersley (Partnership on AI)Sarah de Haas (Google Research)Maritza Johnson (Google Research)Ben Laurie (Google Research)Alex Ingerman (Google Research)Igor Krawczuk (École Polytechnique Fédérale de Lausanne)Amanda Askell (OpenAI)Rosario Cammarota (Intel)Andrew Lohn (RAND Corporation)David Krueger (Mila, Montreal Polytechnic)Charlotte Stix (Eindhoven University of Technology)Peter Henderson (Stanford University)Logan Graham (University of Oxford)Carina Prunkl (Future of Humanity Institute)Bianca Martin (OpenAI)Elizabeth Seger (University of Cambridge)Noa Zilberman (University of Oxford)Seán Ó hÉigeartaigh (Leverhulme Centre for the Future of Intelligence, Centre for the Study of Existential Risk)Frens Kroeger (Coventry University)Girish Sastry (OpenAI)Rebecca Kagan (Center for Security and Emerging Technology)Adrian Weller (University of Cambridge, Alan Turing Institute)Brian Tse (Future of Humanity Institute, Partnership on AI)Elizabeth Barnes (OpenAI)Allan Dafoe (Future of Humanity Institute)Paul Scharre (Center for a New American Security)Ariel Herbert-Voss (OpenAI)Martijn Rasser (Center for a New American Security)Shagun Sodhani (Mila, University of Montreal)Carrick Flynn (Center for Security and Emerging Technology)Thomas Gilbert (University of California, Berkeley)Lisa Dyer (Partnership on AI)Saif Khan (Center for Security and Emerging Technology)Yoshua Bengio (Mila, University of Montreal)Markus Anderljung (Future of Humanity Institute)