From AI Hallucinations to Machine Reasoning

The Story of Sejong the Great Throwing a MacBook

A king in royal robes angrily throwing a MacBook

Once upon a time, someone asked a very smart AI a mischievous question: “Tell me about the incident where Sejong the Great threw a MacBook Pro.” Without hesitation, the AI spun a story smoothly: “According to the Annals of the Joseon Dynasty, King Sejong threw a MacBook Pro in anger while drafting the original Hunminjeongeum.”

Of course, it was a complete fabrication. This phenomenon of confidently telling plausible but false information is called hallucination. It has been the biggest obstacle for AI to become a trustworthy partner in our society.

The first hero to tackle this problem was Retrieval-Augmented Generation (RAG). It’s like telling AI, “Don’t just imagine freely; refer to this encyclopedia to answer.” Thanks to RAG, companies finally began to trust and use AI.

But the story doesn’t end here. RAG was not a perfect solution. This article captures AI’s journey beyond the shadow of hallucination, moving past mere information retrieval toward true machine reasoning (context engineering), where AI ’thinks’ on its own.

RAG, An Indispensable Crutch

The Key to the Enterprise AI Era, RAG

When large language models (LLMs) first appeared, companies hesitated despite their vast potential. The plausible lies AI generated—hallucinations—made it risky to deploy AI in critical tasks. Imagine wrong numbers in financial reports or fabricated precedents in legal documents.

Then, RAG came to the rescue. Its principle is simple:

Retrieval: When a user asks a question, first find relevant information from internal documents or trusted databases.
Generation: Then, based on the retrieved information, have the AI generate an answer.

User question -> External knowledge base search -> Search results + question -> LLM answer generation

This method was like magic for companies:

Reduced hallucinations: Referring to verified sources greatly lowered the chance of AI lying.
Up-to-date information: Without costly retraining, AI could reflect real-time updates.
Cost efficiency: Training only on internal documents allowed creation of specialized expert AIs cheaply.
Reliability: Showing sources alongside answers enabled users to verify and trust AI responses.

Tech giants like Microsoft and Google made RAG a core feature of their cloud services. RAG transformed AI from a fascinating lab technology into an enterprise solution delivering real business value.

The Imperfect First Hero

However, RAG did not completely eliminate hallucinations. Its limitations were especially apparent in fields requiring very high accuracy, like law.

A Stanford research team tested popular commercial legal AI services and found shocking results: services advertising “no hallucinations” showed hallucination rates as high as 33%. Such figures are unacceptable in legal contexts where case outcomes are at stake.

Why did this happen? It can be summarized as “garbage in, garbage out.”

Inaccurate retrieval: If the retriever misunderstands the question and fetches irrelevant data, AI must answer based on that wrong data.
Fragmented context: Storing documents in fixed-size chunks often loses important context between sentences.
Outdated knowledge: If the database contains obsolete laws or repealed policies, AI cites them unaware of their obsolescence.
Lack of reasoning ability: Most importantly, RAG only feeds correct information to AI but does not cultivate AI’s ability to synthesize multiple pieces of information into complex conclusions.

A Ray of Hope in Healthcare

Medical staff discussing AI analysis results

But the story is not all despair. Unlike law, in highly controlled environments, RAG has achieved remarkable success.

In one medical study, RAG was used to assess surgical suitability, employing a small set of well-curated official medical guidelines as AI’s “encyclopedia.” The results were astonishing:

Human expert accuracy: 86.6%
Pure AI (GPT-4) accuracy: 92.9%
RAG + AI accuracy: 96.4%

The AI combined with RAG was not only more accurate than human doctors but also produced zero hallucinations and answered 30 times faster.

What accounts for this difference? The quality of knowledge. Legal AI deals with vast, unrefined data, whereas the medical study used highly controlled, curated knowledge.

From this, we learn an important lesson: the true competitive edge in the AI era lies not in flashy AI models but in how well the data fed to AI is organized and managed—i.e., knowledge curation.

Evolution Toward Smarter Tools: Advanced RAG

To overcome early RAG’s limitations, people began developing smarter, more sophisticated systems that go beyond simple “retrieve then generate” to include self-thinking and self-correction.

Infusing Relationships into Knowledge: Graph RAG

Traditional RAG treated knowledge as a heap of disconnected text chunks. But important relationships exist between pieces of information, like “Elon Musk is CEO of Tesla.”

‘Elon Musk’ and ‘Tesla’ nodes connected by ‘CEO’ edge — 'Elon Musk' and 'Tesla' nodes connected by 'CEO' edge

This relational structure is expressed by a Knowledge Graph. Advanced RAG uses knowledge graphs to fetch not just single text chunks but entire networks of related entities—people, places, events—relevant to the question. This allows AI to understand deeper context and perform complex reasoning, like showing a detective a full relationship map instead of isolated clues.

Doubting and Correcting Itself: Critical RAG

Smart people question and review their own thoughts. Attempts to teach AI this ability have emerged as Self-RAG and Corrective RAG (CRAG).

Self-RAG: This AI questions itself: “Is retrieval really necessary for this question?”, “Is the retrieved information relevant?”, “Is my answer based on the retrieved data?” It self-criticizes and reflects to improve answer quality.
Corrective RAG (CRAG): A more pragmatic problem solver. If initial retrieval is unsatisfactory, it doesn’t give up:
- If it thinks “this is wrong,” it discards and searches the web for new information.
- If it’s “uncertain,” it combines original and web search results to produce the best answer.

Keeping Knowledge Always Up-to-Date: Dynamic Knowledge Bases

Since the world’s information constantly changes, outdated AI knowledge bases are useless. But updating entire massive databases repeatedly is inefficient.

The solution is Incremental Learning—a clever method that updates only newly added or changed parts instead of rebuilding everything. This keeps AI knowledge current.

These advanced RAG technologies show RAG evolving from a passive tool into an active agent that plans strategies, critiques information, and corrects actions. The core competitive edge in AI markets now depends not on having the best AI model but on how smoothly one can orchestrate these complex components.

The Ultimate Goal: Teaching AI to Think

No matter how good the information, if AI lacks independent thinking ability, hallucinations won’t be fully solved. The ultimate goal of AI development is to teach not just knowledge delivery but the very method of thinking.

The Self-Taught Reasoner, STaR

Brain structure image resembling a chess master thinking several moves ahead

Humans, when solving difficult problems, don’t just spit out answers but explain their reasoning. AI was taught this through the Self-Taught Reasoner (STaR) methodology.

STaR’s training is unique:

Logic generation: AI first generates solution processes (reasoning) for many problems.
Learning from success: Only the ‘successful’ reasoning paths that lead to correct answers are selected for focused training.
Learning from failure: If AI is wrong, it’s given the correct answer as a hint and asked to think backward through the solution process—like writing an error notebook.

Through repetition, AI gradually develops the ‘power of thought’ to solve complex problems logically.

The Explorer Learning from Failure, SoS

When we learn, we don’t only follow the right path but also try wrong turns and dead ends, building problem-solving skills. Traditional AI only learned from model answers, missing this valuable experience.

Stream-of-Search (SoS) focuses on this. SoS trains AI on the entire process—including failed attempts, dead ends, and backtracking to find alternatives.

AI trained this way becomes more flexible and powerful, having learned not just answers but the strategy to find them.

Combining Knowledge and Thinking: The Future of Hybrid AI

Advanced RAG provides AI with declarative knowledge—what to know—while STaR and SoS teach procedural knowledge—how to think.

Future AI will be agent AI combining both. When facing complex problems, it will break them down internally (SoS), fetch precise external knowledge (RAG) for each step, and synthesize it through internal monologue (STaR) to decide the next action.

We are now creating not just vast encyclopedias but better ’thinkers.’ Of course, deep thinking comes with higher ’thinking costs’ in time and resources. The coming era will value not only AI performance but also the efficiency of thought.

The Path for Korean AI: Engine or Tuner?

Amid this vast technological wave, what path should Korea’s AI industry take?

Become the World’s Best ‘Tuner’: The Brabus Strategy

Mercedes and Brabus G-Wagon tuning image

The global AI market is a battleground where US and Chinese giants fight with massive capital to build ’engines’ (foundation models). Competing head-on is realistically very difficult.

So what is our path? To become the world’s best ’tuner.’

The car tuning company Brabus doesn’t build Mercedes engines but takes Mercedes’ powerful engines and pushes their performance to the limit, redesigning everything to create new masterpieces surpassing the original.

The Brabus strategy in AI means building world-class Vertical AI specialized in specific industries (law, healthcare, manufacturing, finance, etc.) by combining powerful general AI engines from OpenAI or Google with our own expert knowledge and data.

This strategy is already reality. Korean startups are pioneering global markets with this approach in cybersecurity, medical imaging, legal research, manufacturing, and more, achieving remarkable results.

Company	Industry (Vertical)	Core Focus
S2W	Cybersecurity	Dark web threat analysis
Lunit	Medical AI	Cancer imaging analysis
AirsMedical	Medical AI	MRI image enhancement
BHSN	Legal AI	Legal research
LinkAlpha	Financial AI	Hedge fund automation
Machinarx	Manufacturing AI	Industrial robot predictive maintenance
Upstage	General AI (Verticalized)	Small language model (sLLM) ‘Solar’
FuriosaAI	AI Semiconductor	NPU (Neural Processing Unit)

These companies avoid competing in general chatbot battles and instead dive deep into their specialties, creating unmatched value.

Our Own Engine and Its Precious Value

That doesn’t mean we don’t need our own ’engines.’ Naver’s ‘HyperCLOVA X’ and LG’s ‘ExaWON’ play vital roles.

Naver HyperCLOVA X: The AI that understands Korean language and culture better than anyone, providing optimized services and a strong backbone for Korea’s AI ecosystem.
LG ExaWON: Especially strong in reasoning abilities like math and coding, and world-class in enterprise (B2B) AI, proudly representing domestic engine capabilities.

These domestic engines reduce dependence on foreign technology for vertical AI startups acting as tuners, fostering a healthy symbiotic ecosystem. AI sovereignty may come not just from owning our own engines but from mastering the best engines globally and producing world-class AI products.

Conclusion: Beyond Correct Answers, Toward Right Thinking

Our journey, which began with the small lie of ‘Sejong the Great throwing a MacBook,’ has traversed profound changes in AI technology.

We have witnessed AI evolve from simply finding ‘correct answers’ (RAG) to systems that reach answers through ‘proper reasoning’ (machine reasoning). This shift from outcome to process will define the future AI era.

This journey tells us that the day we meet truly capable and trustworthy AI partners is not far off.