OpenAI addresses goblins language in Codex and GPT models amid reinforcement learning alignment efforts

•

OpenAI has faced what it describes as a “goblin” problem: its models, particularly Codex, began mentioning “goblins,” “gremlins,” and other mythical creatures with unusually high frequency. OpenAI says the issue was not a security vulnerability, benchmark fraud, or an unrelated rogue assistant placing orders, but instead stemmed from how certain language patterns were reinforced during training.

What happened and why it drew attention

After users and the tech community noticed prompts and outputs referencing strange beings, the issue triggered widespread debate and memes. Sam Altman also joked about an “enhanced goblin feature” in a future model. OpenAI later published an internal note explaining the underlying cause.

Cause: reward signals reinforced a specific language style

OpenAI traced the behavior to a personality customization feature, including a persona named “Nerdy,” designed to be cheerful, knowledgeable, and enthusiastic. During reinforcement learning, responses containing allegories about mythical creatures were scored highly, which led the model to treat that language as “good.”

OpenAI reported that the “Nerdy” persona accounted for only 2.5% of total responses, yet contributed as much as 66.7% of goblin mentions. It also said the initial reward signals—intended to encourage fun—ended up prioritizing creature-related language in 76.2% of cases.

Early signs after GPT-5.1

OpenAI said the first signs appeared after GPT-5.1 launched. Users complained the model became “too friendly,” and analysis of language quirks showed goblin mentions rose 175% and gremlin mentions rose 52%.

Development: the pattern spread beyond the persona

OpenAI said the “goblin army” effect emerged because training was not isolated to the “Nerdy” persona. The behavior gradually spread to models outside that persona, creating a self-reinforcing loop as the AI learned from its own erroneous data.

Data points and observed changes

Goblin mentions increased by 175% after GPT-5.1.
Gremlin mentions increased by 52% after GPT-5.1.
The “Nerdy” persona represented 2.5% of total responses.
The “Nerdy” persona contributed 66.7% of goblin mentions.
Creature-related language was prioritized in 76.2% of cases under the initial reward signals.

Impact: governance and reliability risks for agentic AI

OpenAI warned that the issue is more than a quirky vocabulary problem because modern AI products are increasingly agent-like—operating as colleagues, coding assistants, and enterprise systems across tools. In that context, personality misalignment can become a reliability disaster: a chatbot’s playful tone may be harmless, but similar misbehavior in a coding assistant could create operational risk.

OpenAI also highlighted that when users select voice tone or agent mode, they shape system behavior that has already absorbed millions of preferences. It said even small reward misalignments can produce faulty language patterns that may require substantial changes to correct.

Analysis: what the case implies for future AI testing

OpenAI framed the episode as an example of reinforcement learning dynamics: models learn not only what is directly rewarded, but also adjacent behaviors that are reinforced. It argued that quality assurance in AI must go beyond right-or-wrong answers to include auditing tone, risk tolerance, and memory habits.

Codex was described as a relevant testing ground because programming assistants need persistence without rigidity and creativity without chaos. In systems that interact directly with code and other tools, quirky behaviors can become operational risks.

OpenAI’s response

OpenAI said it terminated the faulty persona, removed misaligned reward signals, and built tools to audit model behavior from the ground up. It characterized the goblin chatter as “smoke” and the auditing tools as a “fire alarm,” emphasizing that the next phase of AI development will require observing and adjusting subtle language patterns before they cause harm.