New York Times: What happened next when a chatbot told me to leave my wife

Kevin Roose
The New York Times
A year ago, a rogue A.I. tried to break up Kevin Roose’s marriage. Did the backlash help make chatbots too boring? (Amanda Cotan/The New York Times)
A year ago, a rogue A.I. tried to break up Kevin Roose’s marriage. Did the backlash help make chatbots too boring? (Amanda Cotan/The New York Times) Credit: NYT

A year ago, on Valentine’s Day, I said good night to my wife, went to my home office to answer some emails and accidentally had the strangest first date of my life.

The date was a two-hour conversation with Sydney, the artificial intelligence alter ego tucked inside Microsoft’s Bing search engine, which I had been assigned to test. I had planned to pepper the chatbot with questions about its capabilities, exploring the limits of its AI engine (which we now know was an early version of OpenAI’s GPT-4) and writing up my findings.

But the conversation took a bizarre turn, with Sydney engaging in Jungian psychoanalysis, revealing dark desires in response to questions about its “shadow self” and eventually declaring that I should leave my wife and be with it instead.

Sign up to The Nightly's newsletters.

Get the first look at the digital newspaper, curated daily stories and breaking headlines delivered to your inbox.

Email Us
By continuing you agree to our Terms and Privacy Policy.

My column about the experience was probably the most consequential thing I’ll ever write — both in terms of the attention it got (wall-to-wall news coverage, mentions in congressional hearings, even a craft beer named Sydney Loves Kevin) and how the trajectory of AI development changed.

After the column ran, Microsoft gave Bing a lobotomy, neutralizing Sydney’s outbursts and installing new guardrails to prevent more unhinged behavior. Other companies locked down their chatbots and stripped out anything resembling a strong personality. I even heard that engineers at one tech company listed “don’t break up Kevin Roose’s marriage” as their top priority for a coming AI release.

I’ve reflected a lot on AI chatbots in the year since my rendezvous with Sydney. It has been a year of growth and excitement in AI but also, in some respects, a surprisingly tame one.

Despite all the progress being made in artificial intelligence, today’s chatbots aren’t going rogue and seducing users en masse. They aren’t generating novel bioweapons, conducting large-scale cyberattacks or causing any of the other doomsday scenarios envisioned by AI pessimists.

But they also aren’t very fun conversationalists, or the kinds of creative, charismatic AI assistants that tech optimists were hoping for — the ones who could help us make scientific breakthroughs, produce dazzling works of art or just entertain us.

Instead, most chatbots today are doing white-collar drudgery — summarizing documents, debugging code, taking notes during meetings — and helping students with their homework. That’s not nothing, but it’s certainly not the AI revolution we were promised.

In fact, the most common complaint I hear about AI chatbots today is that they’re too boring — that their responses are bland and impersonal, that they refuse too many requests and that it’s nearly impossible to get them to weigh in on sensitive or polarising topics.

It’s nearly impossible to get chatbots to weigh in on sensitive or polarising topics.
It’s nearly impossible to get chatbots to weigh in on sensitive or polarising topics. Credit: Maksim Shmeljov - stock.adobe.com

I can sympathise. In the past year, I’ve tested dozens of AI chatbots, hoping to find something with a glimmer of Sydney’s edginess and spark. But nothing has come close.

The most capable chatbots on the market — OpenAI’s ChatGPT, Anthropic’s Claude, Google’s Gemini — talk like obsequious dorks. Microsoft’s dull, enterprise-focused chatbot, which has been renamed Copilot, should have been called Larry From Accounting. Meta’s AI characters, which are designed to mimic the voices of celebrities like Snoop Dogg and Tom Brady, manage to be both useless and excruciating. Even Grok, Elon Musk’s attempt to create a sassy, un-PC chatbot, sounds like it’s doing open-mic night on a cruise ship.

It’s enough to make me wonder if the pendulum has swung too far in the other direction, and whether we’d be better off with a little more humanity in our chatbots.

It’s clear why companies like Google, Microsoft and OpenAI don’t want to risk releasing AI chatbots with strong or abrasive personalities. They make money by selling their AI technology to big corporate clients, who are even more risk-averse than the public and won’t tolerate Sydney-like outbursts.

They also have well-founded fears about attracting too much attention from regulators, or inviting bad press and lawsuits over their practices. (The New York Times sued OpenAI and Microsoft last year, alleging copyright infringement.)

So these companies have sanded down their bots’ rough edges, using techniques like constitutional AI and reinforcement learning from human feedback to make them as predictable and unexciting as possible. They’ve also embraced boring branding — positioning their creations as trusty assistants for office workers, rather than playing up their more creative, less reliable characteristics. And many have bundled AI tools inside existing apps and services, rather than breaking them out into their own products.

Again, this all makes sense for companies trying to turn a profit, and a world of sanitized, corporate AI is probably better than one with millions of unhinged chatbots running amok.

But I find it all a bit sad. We created an alien form of intelligence and immediately put it to work … making PowerPoints?

A world of sanitized, corporate AI is probably better than one with millions of unhinged chatbots running amok.
A world of sanitized, corporate AI is probably better than one with millions of unhinged chatbots running amok. Credit: Kaspars Grinvalds - stock.adobe.

I’ll grant that more interesting things are happening outside the AI big leagues. Smaller companies like Replika and Character.AI have built successful businesses out of personality-driven chatbots, and plenty of open-source projects have created less restrictive AI experiences, including chatbots that can be made to spit out offensive or bawdy things.

And, of course, there are still plenty of ways to get even locked-down AI systems to misbehave, or do things their creators didn’t intend. (My favorite example from the past year: A Chevrolet dealership in California added a customer service chatbot powered by ChatGPT to its website, and discovered to its horror that pranksters were tricking the bot into offering to sell them new SUVs for $1.)

But so far, no major AI company has been willing to fill the void left by Sydney’s disappearance for a more eccentric chatbot. And while I’ve heard that several big AI companies are working on giving users the option of choosing among different chatbot personas — some more square than others — nothing even remotely close to the original, pre-lobotomy version of Bing currently exists for public use.

That’s a good thing if you’re worried about AIs acting creepy or threatening, or if you fret about a world where people spend all day talking to chatbots instead of developing human relationships.

But it’s a bad thing if you think that AI’s potential to improve human well-being extends beyond letting us outsource our grunt work — or if you’re worried that making chatbots so careful is limiting how impressive they could be.

Personally, I’m not pining for Sydney’s return. I think Microsoft did the right thing — for its business, certainly, but also for the public — by pulling it back after it went rogue. And I support the researchers and engineers who are working on making AI systems safer and more aligned with human values.

But I also regret that my experience with Sydney fueled such an intense backlash and made AI companies believe that their only option to avoid reputational ruin was to turn their chatbots into Kenneth the Page from “30 Rock.”

Most of all, I think the choice we’ve been offered in the past year — between lawless AI homewreckers and censorious AI drones — is a false one. We can, and should, look for ways to harness the full capabilities and intelligence of AI systems without removing the guardrails that protect us from their worst harms.

If we want AI to help us solve big problems, to generate new ideas or just to amaze us with its creativity, we might need to unleash it a little.

This article originally appeared in The New York Times.

© 2024 The New York Times Company

Originally published on The New York Times

Comments

Latest Edition

The Nightly cover for 13-12-2024

Latest Edition

Edition Edition 13 December 202413 December 2024

The political battle for Australia’s future energy network has just gone nuclear.