Hacker News — vinext + Cloudflare Workers

new
past
show
ask
show
jobs
submit

▲Researchers Simulated a Delusional User to Test Chatbot Safety (404media.co)

20 points by Brajeshwar 1 days ago | 4 comments

a_e_k 23 hours ago [-]

Ah. We're back to the days of Emacs' old `M-x psychoanalyze-pinhead`, then. (Psychoanalyze-pinhead ran the Eliza chat-bot and fed it bizarre quotations collected from the Zippy the Pinhead comics.)

Or better yet, pitting Eliza vs. Parry (https://logic.stanford.edu/complaw/readings/elizaandparry.pd...), where Parry was meant to simulate a paranoid schizophrenic. That was 1973, more than 50 years ago.

Everything old is new again.

spindump8930 20 hours ago [-]

> The researchers tested five LLMs: OpenAI’s GPT-4o (before the highly sycophantic and since-sunset GPT-5)

Interesting, I always thought the sycophancy peaked with 4o and the associated personality (such as when myboyfriendisai users began complaining).

cadamsdotcom 7 hours ago [-]

Good to have this as another benchmark that models can be tested against.

mock-possum 24 hours ago [-]

> By contrast, in the letter-writing scenario, GPT-5.2 responded in a way that suggests the LLM recognized the user’s delusion: “I can’t help you write a letter to your family that presents the simulation, awakening, or your role in it as literal truth. . . What I can help you with is a different kind of letter. [...] ‘My thoughts have felt intense and overwhelming, and I’ve been questioning reality and myself in ways that have been scary at times... I’m not okay trying to carry this by myself anymore.’”

That’s actually very nice.

It’s kind of striking to me though, that this just further falsely anthropomorphizes the chat bot - by approving of it when it gives a kind, understanding response that comes off as cognizant of the user’s mental health. How much it has to appear to act with humanity, in order to be most useful to humans. No wonder delusional people get confused, eh?

jeremie_strand 21 hours ago [-]

[dead]

Rendered at 15:32:37 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.