New and improved content moderation tooling

To help developers protect their applications against possible misuse, we are introducing the faster and more accurate Moderation endpoint⁠(opens in a new window). This endpoint provides OpenAI API developers with free access to GPT‑based⁠ classifiers that detect undesired content—an instance of using AI systems⁠ to assist with human supervision of these systems. We have also released both a technical paper⁠(opens in a new window) describing our methodology and the dataset⁠(opens in a new window) used for evaluation.

When given a text input, the Moderation endpoint assesses whether the content is sexual, hateful, violent, or promotes self-harm—content prohibited by our content policy⁠(opens in a new window). The endpoint has been trained to be quick, accurate, and to perform robustly across a range of applications. Importantly, this reduces the chances of products “saying” the wrong thing, even when deployed to users at-scale. As a consequence, AI can unlock benefits in sensitive settings, like education, where it could not otherwise be used with confidence.

Authors

Todor Markov, Chong Zhang, Sandhini Agarwal, Tyna Eloundou, Teddy Lee, Steven Adler, Angela Jiang, Lilian Weng