Introducing vision to the fine-tuning API
Developers can now fine-tune GPT-4o with images and text to improve vision capabilities
Update on May 8, 2026: OpenAI is winding down the fine-tuning platform. The platform is no longer accessible to new users but existing users of the fine-tuning platform will be able to create training jobs for the coming months. All fine-tuned models will remain available for inference until their base models are _deprecated_(opens in a new window). The full timeline is _here_(opens in a new window).
Today, we’re introducing vision fine-tuning(opens in a new window) on GPT‑4o1, making it possible to fine-tune with images, in addition to text. Developers can customize the model to have stronger image understanding capabilities which enables applications like enhanced visual search functionality, improved object detection for autonomous vehicles or smart cities, and more accurate medical image analysis.
Since we first introduced fine-tuning on GPT‑4o, hundreds of thousands of developers have customized our models using text-only datasets to improve performance on specific tasks. However, for many cases, fine-tuning models on text alone doesn’t provide the performance boost expected.