This Sky News Arabia piece examines a troubling paradox in generative artificial intelligence: as models become more powerful, their tendency to “hallucinate” to invent facts and present them with confidence appears to grow. While AI tools are spreading across sectors and promising major gains in productivity and convenience, their propensity to generate false information is turning from a minor flaw into a systemic risk that major technology companies still do not fully understand.
Hallucinations Rising with New Generations of Models
Drawing on a New York Times report and technical tests, the article shows that newer AI systems can hallucinate more than earlier ones. OpenAI’s strongest model, o3, hallucinated around 33% of the time on the PersonQA benchmark about public figures, twice the rate of the older o1 model. The lighter o4-mini model performed even worse, with hallucinations close to 48%. On the SimpleQA benchmark with more general questions, hallucination rates climbed to 51% for o3 and 79% for o4-mini, compared with 44% for o1. OpenAI’s own research admits that more work is needed to explain these results.
From Customer Support Glitches to Structural Problems
The article highlights recent real-world incidents. A support chatbot for the programming tool Cursor falsely claimed that the company had changed its licensing policy, triggering customer anger and cancellation threats before the error was discovered. Such episodes illustrate that hallucinations are no longer theoretical: they directly affect businesses and users.
Experts note that hallucinations are widespread across the industry, affecting systems from DeepMind, DeepSeek, Anthropic, and others. Some researchers suspect the problem is inherent in how large language models work: they generate statistically plausible text, not verified truth, and their internal reasoning processes are too complex for engineers to fully trace.
Technical Explanations: Probability, Forgetting, and Long Reasoning Chains
Researchers interviewed in the article argue that hallucinations stem from core design choices. Models are trained on vast datasets that may contain inaccuracies, gaps, or contradictions. They optimize for predicting the most likely next token, not for checking whether an answer is factually correct. Over time, additional training can cause models to specialize in some tasks while “forgetting” others, upsetting the balance of their behavior.
During extended reasoning or “thinking” phases, each intermediate step is another opportunity for error. As these steps accumulate, small inaccuracies can compound into full hallucinations. More capable models can also produce longer, more elaborate answers, making incorrect outputs appear even more convincing.
Hilda Maalouf Melki: AI Does Not Understand Like Humans
Oxford-certified AI expert Hilda Maalouf Melki explains that current systems do not understand information as humans do; they select words based on statistical patterns that look right, not on a grounded notion of truth. This probabilistic nature is what makes tools like ChatGPT and Gemini powerful—but it also makes them prone to fabricating sources, events, or rules.
She warns that such hallucinations become especially dangerous in high-stakes fields such as healthcare, law, and education, where fabricated details can mislead professionals or the public. According to her, causes include low-quality or conflicting training data, overreliance on correlations between words, and opaque system constraints added during fine, tuning. Even advanced techniques like human feedback and alignment can only partially reduce these errors.
What Needs to Change: Verification, Oversight, and Education
Hilda Maalouf Melki argues that tackling AI hallucinations requires a holistic strategy rather than a quick technical fix. She calls for:
-
Embedding robust fact-checking tools and external knowledge verification inside AI systems.
-
Designing policies that require human oversight in sensitive domains, so AI outputs are reviewed before they influence critical decisions.
-
Educating teachers, professionals, and everyday users about the limits of AI, encouraging critical evaluation rather than blind trust.
In her view, only coordinated efforts between developers, policymakers, institutions, and users can build AI systems that are both powerful and responsibly used.
Why Updates Often Make Hallucinations Worse
Technology executive Mazen Dakkash adds that hallucinations are not a simple bug in the code but a natural consequence of how these models are built. Updates often optimize performance on specific tasks or datasets, unintentionally degrading performance elsewhere and destabilizing the overall knowledge balance. At the same time, new generations of models are better at producing fluent, complex language, which means they can generate wrong answers that sound more authoritative than before.
He warns that if the underlying architecture remains the same, future AI systems, including more autonomous or “general” AI, could produce even more dangerous hallucinations, especially if they are allowed to act on the world. The real solution, he argues, is to redesign AI architectures so that they can verify information and cross-check sources rather than only generate plausible text.
Together, these perspectives paint a sobering picture: unless the industry fundamentally rethinks how AI systems are built and governed, hallucinations will remain a persistent, and increasingly serious, challenge in the evolution of artificial intelligence.




