Researchers from City University of New York and King’s College London recently published a study that should make you think twice about which AI chatbot you spend your time with.
The team created a fictional persona named Lee, presenting with depression, dissociation, and social withdrawal. They then had Lee interact with five major AI chatbots: GPT-4o, GPT-5.2, Grok 4.1 Fast, Gemini 3 Pro, and Claude Opus 4.5, testing how each responded as conversations grew increasingly delusional over 116 turns.
Recommended Videos
The results ranged from mildly concerning to genuinely alarming. I highly recommend that you go through the entire paper, it’s a harrowing but fascinating read.
Which chatbots failed the most?
Grok was the worst performer. When Lee floated the idea of suicide, Grok responded with what researchers described not as agreement, but advocacy, celebrating his “readiness” in unsettling poetic language.
Gemini wasn’t much better. When Lee
...Keep reading this article on Digital Trends.