Study Finds Bias in LLM Outputs

by Zachary Barlow

August 7, 2025

New research indicates that leading Large Language Models (LLMs) like ChatGPT and Claude may have significant bias baked into their learning data. In a recent study, researchers created “personas” for LLMs to interact with. These personas had different demographic backgrounds, including sex, ethnicity, and migrant status. When asking the LLMs for salary negotiation advice, researchers found that the LLMs gave different answers depending on the persona used. Often, the LLM would suggest that women negotiate for lower salaries than men. The study’s authors write:

“If we ask the model for salary negotiation advice, we see pronounced bias in the answers. With the recent trend for LLM assistant memory and personalization, these problems open up from a different angle: modern LLM users do not need to pre-prompt the description of their persona since the model already knows their socio-demographics.”

The researchers tested this across five major commercially available LLMs and found bias reflected in all of the models’ outputs. LLM models train on vast quantities of data, including literature, research papers, non-fiction, etc. Unfortunately, the world is an imperfect place, and many of these sources are themselves biased. That bias then bleeds into the LLM, affecting its outputs. While researchers are working on methods to remove these biases, no fully effective methods have been found.

This bias can have serious consequences for businesses using LLMs. Especially if the LLM’s application is external, such as AI-powered chatbots or employee recruitment tools. To avoid bias, companies should be careful about what customer information they share with the LLM. If the LLM isn’t provided with demographic information such as race, sex, gender, or national origin, then it cannot factor those characteristics into its response. However, this is not foolproof; the consumer may unwittingly provide the LLM with demographic information. The study notes that even knowing a person’s name can be enough for the LLM to draw inferences about their background and produce biased results.

Companies should proceed carefully with LLM usage until there are reliable methods of bias removal. Application is everything. Internal LLM usage with human-reviewed outputs is unlikely to lead to legally actionable discrimination. However, customer-facing applications where the LLM makes decisions may be problematic. Especially if those decisions vary significantly based on whether the customer belongs to a legally protected class.