Why ChatGPT, Gemini, and Claude differ on which jobs are most exposed to AI

•

Figures used by researchers to forecast which jobs may be displaced by artificial intelligence may be less reliable than widely assumed, according to The Wall Street Journal. The outlet points to the AI systems themselves as a key source of the mismatch, raising questions about how policymakers and workers interpret “exposure scores” that are meant to estimate vulnerability to automation.

How “exposure scores” are built

Economists often estimate job risk using task-based exposure scores. The approach starts from the U.S. Bureau of Labor Statistics’ database describing what workers in different occupations do day to day. Researchers then assess which tasks within each occupation AI could potentially handle or speed up. In theory, the larger the share of tasks AI can perform in a given job, the higher the job’s exposure to replacement.

However, the Wall Street Journal says researchers currently rely on three main methods to construct these scores, each with drawbacks:

Human evaluators review how much AI can perform tasks, but the process is highly subjective.
Worker surveys ask people using AI platforms for their views, but the results may reflect only small groups on limited platforms rather than the broader labor force.
AI ranking lets models determine which tasks are easiest to replace, a method described as especially controversial due to technical and methodological complications.

NBER study finds divergent results across leading AI models

The concerns are highlighted by a new study published as a working paper on the National Bureau of Economic Research (NBER) website. A team from Northwestern University and American University conducted a large-scale experiment using leading large language models to identify which jobs have the greatest exposure to AI.

The economists, including Michelle Yin, Hoa Vu, and Claudia Persico, tested three prominent models: OpenAI’s ChatGPT-5, Google DeepMind’s Gemini 2.5, and Anthropic’s Claude 4.5. The study found that the models frequently produced clearly divergent answers about job exposure.

In practice, the paper reports that some occupations are rated as highly exposed by one model but considered relatively safe by another. The disagreement also extends to specific roles, including advertising managers and even chief executive officers (CEOs), where models failed to align on risk levels.

While ChatGPT and Gemini appeared more aligned than the other pairings, the Wall Street Journal notes that even these two models still disagreed up to 25% of the time in testing—indicating instability even among systems viewed as among the most capable.

Why discrepancies may persist

The study attributes part of the mismatch to differences in model architecture. But it also points to a deeper driver: the habits of worker groups that adopted AI early. For example, financial analysts are described as using AI very early and frequently. That behavior can generate substantial training data for subsequent AI model development, which may influence how AI later evaluates and ranks risk for that occupation.

Implications for forecasts and policy decisions

The Wall Street Journal warns that some policymakers and employers may be placing too much trust in exposure-score figures, using them as a “north star” without sufficient caution. The paper is still a working paper and has not yet undergone formal peer review by independent experts, but it is described as sounding an alarm about the validity of current economic forecasts.

To address the uncertainty, economists propose that researchers consider multiple models rather than relying on a single system. They also argue for greater candor about the uncertainty of AI-generated indicators, suggesting that real-world surveys about how AI is actually deployed in the economy could provide more useful evidence.

The study emphasizes the need to understand which tasks AI is changing and how workers are adapting, rather than relying on unverified statistics. It also reiterates that investment in education and new skills remains central to preparing for change.

“Personally, I would not rely on a single metric alone to decide to change jobs or alter my child's educational path.”

The article concludes that, regardless of how forecasts diverge, readiness to adapt is presented as the most practical approach for individuals—learning to work with AI rather than expecting replacement to happen overnight.