Google AI Overviews: accuracy around 85–91% but many answers lack sources

The piece analyzes the accuracy of Google AI Overviews, showing that although the correct-answer rate is fairly high, around 85–91%, the system still carries significant risks. Notably, many correct answers lack solid grounding. The New York Times reports that the AI-generated answers appear credible, but they are drawn from a mix of sources, from reputable websites to Facebook posts. Editors and researchers discussed AI hallucinations with AI research firms before selecting Oumi and its AI verification model HallOumi to assess Google's accuracy using a widely used standard test called SimpleQA. Late last year, Stephen Punwasi, a 41-year-old data analyst in Toronto, was about to have dinner when he read news that Hulk Hogan’s wife might sue over his death and asked Google about the timing. The AI’s answer stated there were no reliable reports of Hogan’s death, which confused him. A Daily Mail article contradicted Google’s result, with the headline “The Mystery Surrounding Hulk Hogan’s Death.” Since 2024, Google has prioritized AI-generated answers at the top of search results. AI Overviews, a new product, helped Google shift from information editor to publisher. According to an analysis by the startup AI Oumi, Google’s answers were about 85% accurate with Gemini 2 and 91% accurate with Gemini 3. But with more than 5 trillion searches per year, that means tens of millions of incorrect answers every hour (or hundreds of thousands of errors per minute). While more than half of the answers are correct, they often lack clear and trustworthy sources. The linked sources sometimes do not actually substantiate the content, complicating verification. Is near-perfect accuracy worth celebrating when it is not completely correct? This is part of a broader Silicon Valley debate about AI system performance and what we can trust on the web. Some tech experts say Google's AI Overviews are reasonably accurate and have improved in recent months. Others warn that ordinary users may not realize that those results need to be rechecked. At the request of The New York Times, Oumi analyzed AI Overviews’ accuracy with a standard test called SimpleQA, widely used to gauge AI systems. The company tested Google’s system in October when handling complex questions with Gemini 2, and re-tested in February after upgrading to Gemini 3. In both cases, Oumi’s analysis focused on 4,326 searches. The company found the results 85% accurate with Gemini 2 and 91% accurate with Gemini 3. Pratik Verma, CEO of Okahu, a company that helps users access and use AI, said Google’s technology has accuracy on par with today’s leading AI systems. He nonetheless urged users to verify information before fully trusting it. “Never rely on a single source,” he said. “Always compare what you get with another source.” Google acknowledges that AI-overviews summaries can contain errors. The small line under each AI summary reads: “A.I. may be wrong, please double-check the answer.” However, Google says Oumi’s analysis is flawed because it relied on an OpenAI-built standard test, which itself contains inaccuracies. “This study has serious flaws,” Ned Adriance, Google’s spokesperson, said in a statement. “It does not reflect what users actually search for on Google.” AI Overviews summaries provide two types of information: the answer to questions and a list of web pages that support those answers. When asked when Bob Marley’s house was converted into a museum, the AI Overviews answer correctly stated the year, but the linked sources included a Facebook post, a travel blog with questionable information, and a Wikipedia page with conflicting details about the museum’s opening year. A set of GIFs and visuals accompanies the article, illustrating the variability of sources cited and the challenge of validating AI-generated content. Oumi’s testing also noted that Google’s AI Overviews cited Facebook and Reddit among the most frequently referenced sources, with citation rates differing depending on the accuracy of the result. Evaluating AI-overview performance is difficult because Google’s system can produce different responses for the same query at different times. If a user repeats the same search within seconds, one answer may be correct and another not. To assess accuracy, firms like Oumi use their own AI as a verifier. This method can itself be fallible because the verifier AI may err. Google has published a parallel analysis using Oumi’s results. In Google’s Gemini 3 analysis—the core technology behind AI Overviews—Google found the model produced incorrect information in about 28%. Google claims AI Overviews, which pulls information from Google Search before generating a response, is more accurate than Gemini when operating independently. Thanks to advances in Google’s AI, AI-generated answers are increasingly accurate. Oumi’s October analysis showed only 15% accuracy for AI-generated summaries. With Gemini 3, however, Google’s AI-generated answers are more likely to be unsupported than when the system ran on Gemini 2, with unsupported-but-correct answers rising from 37% in October to 56% in February. “Even when the answer is correct, how can you be sure it is true? How can you verify?” said Manos Koukoumidis, CEO of Oumi. Modern AI systems rely on probabilistic math to guess the best answer rather than a strict rule set defined by humans, which implies they will contain certain errors. Sometimes, Google’s AI Overview identifies a credible-looking page but seems to misinterpret the information on it. During Oumi’s testing, when asked what river runs along the west side of Goldsboro, North Carolina, Google’s system identified the Neuse River, which flows along the city’s southwest. The river running along the west border is actually the Little River, a tributary of the Neuse. The AI Overview linked to a travel site for Goldsboro that claimed the Neuse runs through the city, but it appears to have inferred the river’s location rather than stating it clearly. Even when Google identifies a page with correct information, it can still yield incorrect results. When asked when Bob Marley’s house museum opened, AI Overviews correctly linked to the organization’s site, listing 165 inductees since 1998, including Marley. But the AI-generated answer claimed there was no record of his induction. Even when AI Overviews provided a correct answer, it often added inaccurate context. AI Overviews face another challenge: potential manipulation. Lily Ray, a vice president at an AI-focused search firm, warns that prominent experts can be created by publishing a blog post asserting expertise. Google acknowledges the issue but downplays its significance, saying: “Our AI search features are built on the same ranking criteria and safety protections that help prevent most spam in search results. Most of these examples are searches users would not perform.” Following this perspective, BBC’s Thomas Germain published a blog post about tech journalists who eat sausages, describing a pretend sausage-eating championship in South Dakota and ranking himself first among a list of tech journalists known for food-related coverage. He later searched for the top sausage-eating tech journalists and found Google ranking him first, citing this as a demonstration of how search reputation can be manipulated. Source: NYT By Bao Ngoc Market Pulse Original source: MarketTimes.vn (dated 04/16/2026 10:11 GMT+7)

Việt Hương

•2 weeks ago

State Securities Commission revokes the public company status of Hanoi Joint Stock Company for Electrical and Water Installation

Việt Hương

•2 weeks ago

Global capital flows into equity ETFs continue to surge worldwide, with Vietnam as an exception.

Hau Giang–Ca Mau Expressway to reopen to traffic from April 30, 2026

Growth in the luxury skincare market slows

Vietnam expands lifelong vaccination approach under 2025 Prevention Law, enabling government-led campaign vaccinations

Health Ministry targets 100% deployment of electronic medical records at health facilities nationwide by 2030, phasing out paper records

Nghệ An to auction 23 land plots totaling about 210,683 square meters

Removing bottlenecks in pesticide packaging waste management under the EPR framework.

Amazon to acquire Globalstar for $11.57 billion to accelerate competition with Starlink

KIS Vietnam launches 1.1 billion VND promotion offering stock gifts and a 100% trading fee rebate

Vietnamese National Assembly Chairman Trần Thanh Mẫn Meets Italian President Sergio Mattarella

Ministry of Construction proposes revising the national airport system plan for 2021–2030 with a 2050 vision, including studies of Mang Den and Van Phong airports and upgrading Quang Tri Airport from 4C to 4E

PLX given one year to meet public-company criteria

Education, Science, Technology, and Youth: A New Pillar in Vietnam–China Relations

Bitcoin's quantum debate splits as Adam Back pushes optional upgrades over forced freeze

Bitcoin security under quantum threat: canary fund and wait-and-see approaches

Community

Interested to stay up-to-date with cryptocurrencies?