AI Misinformation: The Bullshit Index Explained

Despite their impressive language capabilities, today’s leading AI models have a patchy relationship with the truth. A new “bullshit index” could help quantify the extent to which they are making things up and also find ways to curtail the behavior.

Large language models (LLMs) have a well-documented tendency to produce convincing sounding but factually inaccurate responses, a phenomenon which has been dubbed hallucinating. But this is just the tip of the iceberg, says Jaime Fernández Fisac, an assistant professor of electrical and computer engineering at Princeton University.

In a recent paper, his group introduced the idea of “machine bullshit” to encompass the range of ways that LLMs skirt around the truth. As well as outright falsehoods, they found that these models often use ambiguous language, partial truths, or flattery to mislead users. And crucially, widely used training techniques appear to exacerbate the problem.

IEEE Spectrum spoke to Fernández Fisac and the paper’s first author Kaiqu Liang, a Ph.D. student at Princeton, to find out why LLMs are such prolific bullshitters, and whether anything can be done to rein them in.

You borrow the term “bullshit” from the philosopher Harry Frankfurt. Can you summarize what he meant by it and why you think it’s a useful lens for this topic?

Jaime Fernández Fisac: Frankfurt wrote this excellent and very influential essay On Bullshit many decades ago, because he felt that bullshit was such a prevalent feature in our society, and yet nobody had taken the trouble to do a rigorous analysis of what it is and how it works.

It’s not the same as outright lying, but it’s also not the same as telling the truth. Lying requires you to believe something and then say the opposite. But with bullshit, you just don’t care much whether what you’re saying is true.

It turns out it is a very useful model to apply to analyzing the behavior of language models, because it is often the case that we train these models using machine learning and optimization tools to achieve certain objectives that don’t always coincide with telling the truth.

There has already been a lot of research on how LLMs can hallucinate false information. How does this phenomena fit in with your definition of machine bullshit?

Fernández Fisac: There’s a fundamental distinction between hallucination and bullshit, which is in the internal belief and intent of the system. A language model hallucinating corresponds to situations in which the model loses track of reality so it’s not able to produce accurate outputs. It is not clear that there is any intent to be reporting inaccurate information. With bullshit, it’s not a problem of the model becoming confused about what is true, as much as the model becoming uncommitted to reporting the truth.