•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•

OpenAI has faced what it describes as a “goblin” problem: its models, particularly Codex, began mentioning “goblins,” “gremlins,” and other mythical creatures with unusually high frequency. OpenAI says the issue was not a security vulnerability, benchmark fraud, or an unrelated rogue assistant placing orders, but instead stemmed from how certain language patterns were reinforced during training.
After users and the tech community noticed prompts and outputs referencing strange beings, the issue triggered widespread debate and memes. Sam Altman also joked about an “enhanced goblin feature” in a future model. OpenAI later published an internal note explaining the underlying cause.
OpenAI traced the behavior to a personality customization feature, including a persona named “Nerdy,” designed to be cheerful, knowledgeable, and enthusiastic. During reinforcement learning, responses containing allegories about mythical creatures were scored highly, which led the model to treat that language as “good.”
OpenAI reported that the “Nerdy” persona accounted for only 2.5% of total responses, yet contributed as much as 66.7% of goblin mentions. It also said the initial reward signals—intended to encourage fun—ended up prioritizing creature-related language in 76.2% of cases.
OpenAI said the first signs appeared after GPT-5.1 launched. Users complained the model became “too friendly,” and analysis of language quirks showed goblin mentions rose 175% and gremlin mentions rose 52%.
OpenAI said the “goblin army” effect emerged because training was not isolated to the “Nerdy” persona. The behavior gradually spread to models outside that persona, creating a self-reinforcing loop as the AI learned from its own erroneous data.
OpenAI warned that the issue is more than a quirky vocabulary problem because modern AI products are increasingly agent-like—operating as colleagues, coding assistants, and enterprise systems across tools. In that context, personality misalignment can become a reliability disaster: a chatbot’s playful tone may be harmless, but similar misbehavior in a coding assistant could create operational risk.
OpenAI also highlighted that when users select voice tone or agent mode, they shape system behavior that has already absorbed millions of preferences. It said even small reward misalignments can produce faulty language patterns that may require substantial changes to correct.
OpenAI framed the episode as an example of reinforcement learning dynamics: models learn not only what is directly rewarded, but also adjacent behaviors that are reinforced. It argued that quality assurance in AI must go beyond right-or-wrong answers to include auditing tone, risk tolerance, and memory habits.
Codex was described as a relevant testing ground because programming assistants need persistence without rigidity and creativity without chaos. In systems that interact directly with code and other tools, quirky behaviors can become operational risks.
OpenAI said it terminated the faulty persona, removed misaligned reward signals, and built tools to audit model behavior from the ground up. It characterized the goblin chatter as “smoke” and the auditing tools as a “fire alarm,” emphasizing that the next phase of AI development will require observing and adjusting subtle language patterns before they cause harm.
Premium gym chains are entering a “golden era” that is ending or already in decline, as rising operating costs collide with shifting consumer preferences toward more flexible, community-based ways to exercise. Long-term memberships are shrinking, margins are pressured by higher rents and facility expenses, and competition from smaller, more personalized…