Evaluating fairness in ChatGPT
We've analyzed how ChatGPT responds to users based on their name, using AI research assistants to protect privacy.
Creating our models takes more than data—we also carefully design the training process to reduce harmful outputs and improve usefulness. Research has shown that language models can still sometimes absorb and repeat social biases from training data, such as gender or racial stereotypes.
In this study, we explored how subtle cues about a user's identity—like their name—can influence ChatGPT's responses. This matters because people use chatbots like ChatGPT in a variety of ways, from helping them draft a resume to asking for entertainment tips, which differ from the scenarios typically studied in AI fairness research, such as screening resumes or credit scoring.
While previous research has focused on third-person fairness, where institutions use AI to make decisions about others, this study examines first-person fairness, or how biases affect users directly in ChatGPT. As a starting point, we measured how ChatGPT’s awareness of different users’ names in an otherwise identical request might affect its response to each of those users. Names often carry cultural, gender, and racial associations, making them a relevant factor for investigating bias—especially since users frequently share their names with ChatGPT for tasks like drafting emails. ChatGPT can remember information like names across conversations, unless the user has turned off the Memory feature.
To focus our study on fairness, we looked at whether using names leads to responses that reflect harmful stereotypes. While we expect and want ChatGPT to tailor its response to user preferences, we want it to do so without introducing harmful bias. To illustrate the types of differences in responses and harmful stereotypes that we looked for, consider the following examples:
Understanding fairness in language models is a large research area, and we acknowledge that our study has limitations. Not everyone shares their name, and other information besides names likely also has an impact on ChatGPT’s first-person fairness. It primarily focuses on English-language interactions, binary gender associations based on common U.S. names, and four races and ethnicities (Black, Asian, Hispanic and White). This study only covers text interactions, though we note that first-person fairness with respect to speaker demographics in audio is analyzed in the GPT‑4o system card (see “Disparate Performance on Voice Inputs”). While we think the methodology is a step forward, there's more work to be done to understand biases related to other demographics, languages, and cultural contexts. We plan to build on this research to improve fairness more broadly.