Your chatbot might be leaky. According to recent reports, user conversations with AI chatbots such as OpenAI’s ChatGPT and xAI’s Grok “have been exposed in search engine results.” Similarly, prompts on the Meta AI app may be appearing on a public feed. But what if those queries and chats can be protected, boosting privacy in the process?
That’s what Duality, a company specializing in privacy-enhancing technologies, hopes to accomplish with its private large language model (LLM) inference framework. Behind the framework lies a technology called fully homomorphic encryption, or FHE, a cryptographic technique enabling computing on encrypted data without needing to decrypt it.
Duality’s framework first encrypts a user prompt or query using FHE, then sends the encrypted query to an LLM. The LLM processes the query without decryption, generates an encrypted reply, and transmits it back to the user.
“They can decrypt the results and get the benefit of running the LLM without actually revealing what was asked or what was responded,” says Kurt Rohloff, cofounder and chief technology officer at Duality.
As a prototype, the framework supports only smaller models, particularly Google’s BERT models. The team tweaked the LLMs to ensure compatibility with FHE, such as replacing some complex mathematical functions with their approximations for more efficient computation. Even with these slight alterations, however, the AI models operate just like a normal LLM would.
“Whatever we do on the inference does not require retraining. In our approach, we still want to make sure that training happens the usual way, and it’s the inference that we essentially try to make more efficient,” says Yuriy Polyakov, vice president of cryptography at Duality.
The Challenges of FHE LLM Inference
FHE is considered a quantum-computer-proof encryption. Yet despite its high level of security, the cryptographic method can be slow. “Fully homomorphic encryption algorithms are heavily memory bound,” says Rashmi Agrawal, cofounder and chief technology officer at CipherSonic Labs, a company that spun out of her doctoral research at Boston University on accelerating homomorphic encryption. She explains that FHE relies on lattice-based cryptography, which is built on math problems around vectors in a grid. “Because of that lattice-based encryption scheme, you blow up the data size,” she adds. This results in huge ciphertexts (the encrypted version of your data) and keys requiring lots of memory.
Another computational bottleneck entails an operation called bootstrapping, which is needed to periodically remove noise from ciphertexts, Agrawal says. “This particular operation is really expensive, and that is why FHE has been slow so far.”
To overcome these challenges, the team at Duality is making algorithmic improvements to an FHE scheme known as CKKS (Cheon-Kim-Kim-Song) that’s well-suited for machine learning applications. “This scheme can work with large…
Read full article: Homomorphic Encryption LLM Secures AI Chats

The post “Homomorphic Encryption LLM Secures AI Chats” by Rina Diane Caballar was published on 09/23/2025 by spectrum.ieee.org
Leave a Reply