AI Spam Threatens the Internet—AI Can Also Protect It

2023 wasn’t a great year for AI detectors. Leaders like GPTZero surged in popularity but faced a backlash as false positives led to incorrect accusations. Then OpenAI quietly tossed ice-cold water on the idea with an FAQ to answer whether AI detectors work. The verdict? “No, not in our experience.”

OpenAI’s conclusion was correct—at the time. Yet the demise of AI detectors is greatly exaggerated. Researchers are inventing new detectors that perform better than their predecessors and can operate at scale. And these come alongside “data poisoning” attacks that individuals can use to safeguard their work from being scooped up against their wishes to train AI models.

“Language model detection can be done with a high enough level of accuracy to be useful, and it can also be done in the ‘zero shot’ sense, meaning you can detect all sorts of different language models at the same time,” says Tom Goldstein, a professor of computer science at the University of Maryland. “It’s a real counterpoint to the narrative that language model detection is basically impossible.”

Using AI to detect AI

Goldstein co-authored a paper recently uploaded to the arXiv preprint server that describes “Binoculars”: A detector that pairs an AI detective with a helpful sidekick.

Early AI detectors played at detective by asking a simple question: How surprising is this text? The assumption was that statistically less surprising text is more likely to be AI-generated. It’s an LLM’s mission to predict the “correct” word at each point in a string of text, which should lead to patterns a detector can pick up. Most detectors answered by giving users a numerical probability that the text submitted to it was AI-generated.

But that approach is flawed. AI-generated text can still be surprising if it’s generated in response to a surprising prompt, which the detector has no way to deduce. And the opposite is true, as well. Humans may write unsurprising text if covering a well-worn topic.

Detectors will only prove useful to companies, governments, and educational institutions if they create fewer headaches than they solve, and false positives cause many headaches.

Binoculars asks its AI detective (in this case Falcon-7B, an open-source large language model) the same question as previous detectors, but also asks an AI sidekick to do the same work. The results are compared to calculate how much the sidekick surprised the detective, creating a benchmark for comparison. Text written by a human should prove more surprising to the detective than the AI sidekick.

There are gaps in what Binoculars can see. Vinu Sankar Sadasivan, a graduate student in the University of Maryland’s computer science department and a co-author on another pre-print paper evaluating a variety of LLM detection techniques, says that Binoculars “significantly improves the performance of zero-shot detection, but it’s still not better than watermarking or retrieval-based methods in…

Read full article: AI Spam Threatens the Internet—AI Can Also Protect It

The post “AI Spam Threatens the Internet—AI Can Also Protect It” by Matthew S. Smith was published on 02/26/2024 by spectrum.ieee.org