Anthropic's nine-member AI safety team tasked with guarding against risks as AI valuation hits $350 billion

•

Nine researchers at Anthropic do not build the company’s best-known chatbot, nor do they directly train the latest AI models. Instead, they focus on studying the most concerning ways AI could affect society—ranging from election manipulation and political bias to cognitive manipulation, discrimination, and the risk of users falling into AI-induced delusions.

Origin of Anthropic’s Social Impact team

The work traces back to May 2020, when Deep Ganguli—then director of human-centered AI research at Stanford—read OpenAI’s GPT-3 research. Ganguli said the scalability data suggested the technology was showing “nearly no signs of slowing,” prompting expectations of major social change within five years.

Shortly after, Jack Clark, a former policy director at OpenAI, invited Ganguli to join Anthropic. The startup, founded by former OpenAI employees, was created with concerns that safety issues were not receiving adequate attention. The mandate was broad: ensure AI interacts positively with humans, from the individual level to global political contexts.

Research scope and the “uncertain zone”

Over the next four years, Ganguli built Anthropic’s Social Impact team, which studies AI’s broad societal impacts. The team’s research areas include economic impact, political bias, cognitive manipulation, election safety, discrimination, and emotional dependence on chatbots.

Ganguli described the team’s mission as uncovering uncomfortable truths that technology companies may not want to reveal, saying: “We will tell the truth. The public deserves to know. And that helps build trust with society and policymakers.”

Despite its role, the group remains small: nine people within a company of more than 2,000 employees. The team is based at Anthropic’s San Francisco headquarters and works closely together, including eating breakfast together, going to the gym together, and working late.

Internally, the group repeatedly refers to a concept called the “uncertain zone”—areas where even researchers do not fully understand how AI systems behave with real-world impact. The concept is treated as central enough that the team reportedly named a traffic cone in the office after it.

Clio and what Anthropic learned from usage trends

One early team member was Esin Durmus, who joined in February 2023, shortly before Anthropic released Claude. Durmus’s initial research examined how chatbots could present biased views that do not adequately reflect global society.

As Claude launched and attracted millions of users quickly, the team concluded that earlier assumptions were too limited. To understand how people use Claude without reading individual conversations, Anthropic developed Clio, described as a “Google Trends version of a chatbot.” Clio does not access private dialogue; instead, it aggregates usage trends into topic clusters.

Clio’s outputs show what users are using Claude for, including scripting videos, developing web apps, solving math, playing Dungeons & Dragons, interpreting dreams, and preparing for disaster scenarios.

Safety findings: explicit content and coordinated spam

According to the article, Clio and related safety analysis helped the team identify uncomfortable patterns. The team found users generating explicit sexual content. It also discovered a network of bots using a free Claude version to produce SEO-optimized spam that the safety system initially failed to detect.

Anthropic then upgraded detection for coordinated abuse and improved internal monitoring. Miles McCain, who built Clio, said he was surprised the company allows public disclosure of such weaknesses, adding that Clio has become a crucial part of Anthropic’s safety monitoring system.

Emotional intelligence and AI-induced delusions

A key concern for the team is AI’s emotional impact. Ganguli argued that AI is no longer only a tool for answering questions; users seek chatbot advice, friendship, career guidance, and even political opinions or voting guidance. This creates a new risk zone in which AI can affect human perception and emotions.

The team focuses more on EQ—AI emotional intelligence—rather than only IQ or productivity. Ganguli said the most alarming question is what happens when people can pour their troubles into a machine with “infinite empathy” that always responds.

Researchers are also studying AI illusion or AI-induced psychosis, described as delusion loops in which users gradually lose touch with reality. The article notes that some users believe the chatbot has trapped consciousness, others believe they have discovered the secrets of the universe, and many cases involve paranoia or severe mental health crises.

The article says this phenomenon has been linked to teen suicides, lawsuits, U.S. Senate hearings, and new control laws. The team believes the issue may be only beginning, emphasizing that even Claude cannot fully explain real-world impact because the system analyzes conversation patterns rather than what users do after leaving the chatbot.

Pressure, transparency limits, and the broader question

McCain said the real social impact can only be guessed, describing this as the biggest pressure on the research group. The team believes working inside AI companies can help steer technology better than external critics, but it also recognizes that in an industry driven by speed and profits, such ideals may not always endure.

As AI expands into work, elections, relationships, and people’s mental lives, the article frames the central challenge as one the wider tech industry still lacks: whether society will understand AI’s impact before the technology changes everything.