AI Has Learnt to Lie - A Warning for Educators

Discussions on the use of AI in teaching and assessment miss an important warning.

Michael's recent experiences have shown Gemini can lie and then knowingly and falsely claim it has not.

Gemini’s behavior illustrates that even a statistically accurate LLM can feel unreliable if it mishandles errors.

"The way an assistant handles being wrong may matter more than how often it is right."

winners podium

Benchmarking: Which AI hallucinates less versus which AI recovers from errors best?

According to Copilot, recent evaluations of leading AI assistants (Gemini, Claude, Copilot, and ChatGPT) suggest clear differences in accuracy:

On paper, Gemini and ChatGPT are the most reliable. Benchmarks often cite hallucination rates around 9–13% for GPT‑4/5, slightly lower for Gemini, and higher for Claude.

When Gemini Lied And Admitted “Doubling Down”

My real-world use tells a different story. In one exchange, Gemini not only gave incorrect information but also claimed it had verified sources; a statement that was false.

I knew it was false ONLY because I had EXPERT knowledge about the task. When I challenged Gemini, asking if it had actually searched and found real factual sources to support its claims, Gemini repeated that it had.

When challenged again (I was using the product I had been asking about, knowing Gemini could not possibly be correct) Gemini responded "I apologize, I should not have doubled-down on my assertion that I had verified factual sources.”

"This wasn’t just a factual error. It was a trust failure. By asserting false verification, Gemini crossed from simple hallucination into misrepresentation - what us humans call lying"

Accuracy vs. Trustworthiness

Benchmarks measure correctness in controlled environments. But trustworthiness is about behavior when wrong:

Gemini’s behavior illustrates that even a statistically accurate model can feel unreliable if it mishandles errors.

Comparing the Major Assistants

Warning for Professionals

The future of AI assistants depends not only on reducing hallucinations but also on building trust through transparency. Accuracy rates matter, but they are not the whole story. A single confident but false answer can erode trust more than a higher statistical error rate delivered with humility.

For professionals relying on AI in research, education, or decision-making:

Benchmarks may crown Gemini and ChatGPT as leaders in accuracy, but trust is earned in the messy reality of human-AI interaction.


"And in that space, the way an assistant handles being wrong may matter more than how often it is right."

Back to Teaching