Model Distillation in the API

Fine-tune a cost-efficient model with the outputs of a large frontier model–all on the OpenAI platform

We’re introducing a new Model Distillation offering to provide developers with an integrated workflow to manage the entire distillation pipeline directly within the OpenAI platform. This lets developers easily use the outputs of frontier models like o1‑preview and GPT‑4o to fine-tune and improve the performance of more cost-efficient models like GPT‑4o mini.

Model distillation involves fine-tuning smaller, cost-efficient models using outputs from more capable models, allowing them to match the performance of advanced models on specific tasks at a much lower cost. Until now, distillation has been a multi-step, error-prone process, which required developers to manually orchestrate multiple operations across disconnected tools, from generating datasets to fine-tuning models and measuring performance improvements. Since distillation is inherently iterative, developers needed to repeatedly run each step, adding significant effort and complexity.

Our new Model Distillation suite includes:

  • Stored Completions⁠(opens in a new window): Developers can now easily generate datasets for distillation by automatically capturing and storing the input-output pairs generated by one of our models, like GPT‑4o or o1‑preview through our API. With Stored Completions, you can easily build datasets with your production data to evaluate and fine-tune models. Developers can review this integration guide⁠(opens in a new window) to learn how to opt-in to storing completions.
  • Evals⁠(opens in a new window) (beta): Developers can now create and run custom evaluations on our platform to measure model performance on specific tasks. Instead of manually creating evaluation scripts and integrating disparate logging tools, Evals provides an integrated way to measure model performance. You can either use data from Stored Completions or upload existing datasets to set up your evaluations. Evals can also be used independently of fine-tuning to quantitatively evaluate model performance for your use cases.
  • Fine-tuning⁠(opens in a new window): Stored Completions and Evals are fully integrated with our existing fine-tuning offering. This means that developers can use datasets created with Stored Completions in their fine-tuning jobs and run evaluations on fine-tuned models using Evals, all within our platform.