Forecasting potential misuses of language models for disinformation campaigns and how to reduce risk

OpenAI researchers collaborated with Georgetown University’s Center for Security and Emerging Technology and the Stanford Internet Observatory to investigate how large language models might be misused

As generative language models improve, they open up new possibilities in fields as diverse as healthcare, law, education and science. But, as with any new technology, it is worth considering how they can be misused. Against the backdrop of recurring online influence operations—covert or deceptive efforts to influence the opinions of a target audience—the paper asks:

How might language models change influence operations, and what steps can be taken to mitigate this threat?

Our work brought together different backgrounds and expertise—researchers with grounding in the tactics, techniques, and procedures of online disinformation campaigns, as well as machine learning experts in the generative artificial intelligence field—to base our analysis on trends in both domains.

We believe that it is critical to analyze the threat of AI-enabled influence operations and outline steps that can be taken before language models are used for influence operations at scale. We hope our research will inform policymakers that are new to the AI or disinformation fields, and spur in-depth research into potential mitigation strategies for AI developers, policymakers, and disinformation researchers.

To chart a path forward, the report lays out key stages in the language model-to-influence operation pipeline. Each of these stages is a point for potential mitigations.To successfully wage an influence operation leveraging a language model, propagandists would require that: (1) a model exists, (2) they can reliably access it, (3) they can disseminate content from the model, and (4) an end user is affected. Many possible mitigation strategies fall along these four steps, as shown below.