Would you believe that ChatGPT, a symbol of cutting-edge technology, originated from a casino in Monaco? Let’s follow the grand evolution of large language models (LLMs).
- The probabilistic idea that sparked artificial intelligence: the principle of the Monte Carlo method
- The AI winter and how multilayer perceptrons overcame it, leading to the big bang of deep learning
- The transformer architecture innovation that made modern large language models possible
- The future of technology beyond ChatGPT toward agent AI and the current status in Korea
Dawn of Artificial Intelligence: The Emergence of Probability and Neural Networks
In the winter of 2022, the world was captivated by the seemingly magical appearance of the AI chatbot, ChatGPT. This astonishing technology, which appeared overnight, answered our questions fluently, wrote poetry, and coded, shocking the entire globe. It felt like the future from a sci-fi movie suddenly became reality.
But was this truly a miracle achieved overnight? Or was it the fruit of decades of persistent effort we hadn’t fully recognized? To answer this, we must go back in time. Surprisingly, the journey began not in a cutting-edge computer lab but with an idea inspired by chance and probability at a casino in Monaco.
1. Casino Secrets: Taming Uncertainty with the Monte Carlo Algorithm
Why a casino when talking about AI history? At the beginning lies a uniquely named methodology called the ‘Monte Carlo algorithm.’ This name comes from Monaco’s famous gambling city, Monte Carlo, and its principle is deeply related to gambling probability games.
The core of the Monte Carlo method is ’trying many random attempts.’ When faced with problems too complex or impossible to calculate mathematically, this technique obtains approximate solutions through countless random trials.
By drawing a circle perfectly inside a square and randomly placing points, the ratio of points inside the circle approximates pi.
This idea was applied in games like chess or Go, where the number of possibilities is nearly infinite, by randomly exploring promising paths instead of calculating all possibilities to estimate the optimal move. AlphaGo, which later defeated Go champion Lee Sedol, used this idea as its core weapon called ‘Monte Carlo Tree Search (MCTS).’ This probabilistic approach to finding the most plausible answer aligns with the fundamental philosophy of large language models that predict the next word probabilistically.
2. The Birth of AI and Two Diverging Paths
In 1956, the Dartmouth Workshop introduced the term ‘Artificial Intelligence (AI),’ and AI research split into two major directions:
- Symbolism: A top-down approach viewing human intelligence as the result of logical rules and symbol manipulation, aiming to program these explicitly.
- Connectionism: A bottom-up approach inspired by brain structures, believing intelligence emerges from connecting numerous artificial neurons.
In 1958, psychologist Frank Rosenblatt developed the ‘Perceptron,’ the first practical artificial neural network model mimicking brain neurons. It took multiple inputs, multiplied them by weights, and activated if the sum exceeded a threshold. The innovation was that these weights could be ’learned’ from data.
3. The First AI Winter: The XOR Problem That Frustrated AI
The perceptron solved ’linearly separable’ problems like AND or OR, where a single line can separate correct and incorrect answers, fueling optimism in AI research.
Advertisement
However, this optimism collapsed before the very simple XOR (exclusive OR) problem. XOR is true only when two inputs differ, which cannot be separated by any single line.
In 1969, Marvin Minsky mathematically proved this limitation, turning AI expectations into disappointment and triggering the ‘AI Winter,’ a period of sharply reduced investment. Personally, I think the shock of the XOR problem was like having all the ingredients and recipes but missing a crucial spice, ruining the dish. This simple problem dampened AI hopes and led to a long stagnation, teaching us that innovation can be hindered by small details rather than massive obstacles.
The Leap of Deep Learning and the Dawn of Large Language Models
After the harsh winter, AI evolved into deeper and more complex structures, seizing another chance to leap forward.
4. The Savior Ending the Dark Age: Multilayer Perceptrons and Backpropagation
The end of the first AI winter came with the idea ‘If one line doesn’t work, use many lines,’ realized in the ‘Multilayer Perceptron (MLP).’ MLP added one or more ‘hidden layers’ between input and output layers. These hidden layers nonlinearly transformed data, enabling solutions to problems like XOR that single-layer perceptrons couldn’t solve.
However, training such complex networks was challenging. The breakthrough came with the ‘Backpropagation’ algorithm. Backpropagation calculates how much each connection (weight) contributed to the error by propagating the error backward from the output, enabling effective training of deep neural networks.
5. 2012, The Big Bang of Deep Learning: The Arrival of AlexNet
Though theoretical tools existed since the 1980s, deep learning’s potential exploded only with the advent of ‘big data’ and ‘GPUs.’ The 14 million-image dataset ‘ImageNet’ released in 2009 combined with GPUs’ powerful parallel computing completed the trinity.
In 2012, Geoffrey Hinton’s team won the ImageNet challenge with the deep convolutional neural network (CNN) ‘AlexNet,’ achieving a remarkable 15.3% error rate, signaling the start of the deep learning era. This event spread the belief that ‘scale equals performance,’ foreshadowing the emergence of large language models.
Advertisement
6. Understanding the Flow of Time: Recurrent Neural Networks (RNN)
After conquering images, AI’s next target was sequential data like language, where ‘order’ matters. The model developed for this was the ‘Recurrent Neural Network (RNN).’ RNNs create ’loops’ inside the network to remember previous steps and incorporate that information into current calculations.
However, RNNs suffered from the critical ’long-term dependency problem,’ forgetting earlier information as sequences grew longer. Although improved models like LSTM and GRU appeared, the fundamental limitation of sequential processing remained.
The Modern Deity: Transformers and Large Language Models
In 2017, a single paper changed everything, opening the true era of large language models.
7. “Attention Is All You Need”: The Transformer That Changed the World
Google’s 2017 paper introduced the revolutionary ‘Transformer’ architecture, completely abandoning the ‘recurrent’ structure that was the foundation of sequential data processing.
Transformers use a ‘self-attention’ mechanism to lay out all words at once and simultaneously calculate the importance of relationships each word has with every other word in the sentence.
This approach fundamentally solved the long-term dependency problem and allowed all computations to be processed in parallel, maximizing GPU performance. This overwhelming efficiency opened the era of ’large’ language models previously unimaginable.
8. The Age of Giants: BERT and GPT
Based on transformers, two giant models dominating natural language processing emerged: BERT and GPT. To simplify, BERT is the detective, GPT is the storyteller.
- BERT (Context Detective): Trained using the ‘masked language model’ method, filling in blanks by considering both left and right context. It excels at understanding subtle word meanings and is a core technology in Google Search.
- GPT (Creative Storyteller): Trained to predict the most probable next word given previous words. This autoregressive method empowers creative text generation. GPT-3 notably demonstrated ‘few-shot learning,’ performing new tasks with just a few examples, opening possibilities for general AI.
9. The Birth of Human-like AI: ChatGPT’s Secret, RLHF
GPT-3 was impressive but sometimes lied or generated harmful content. Aligning the model with human intentions and values was necessary.
Advertisement
The key technology that solved this and gave birth to ChatGPT is ‘Reinforcement Learning from Human Feedback (RLHF).’
- Step 1 (Instruction Tuning): Teach the model to follow user instructions using an ‘instruction-answer’ dataset.
- Step 2 (Reward Model Training): Humans rank multiple answers, and the model learns to score answers, creating a ‘judge AI.’
- Step 3 (Reinforcement Learning): The step 1 model generates answers, the step 2 judge scores them, and the model adjusts itself to achieve higher scores.
Recently, more efficient techniques like ‘Direct Preference Optimization (DPO)’ that simplify this process have gained attention.
Status and Future of Large Language Models in South Korea
Amid the LLM race triggered by ChatGPT, Korean companies fiercely compete to secure ‘sovereign AI’ deeply understanding Korean language and culture.
Comparison of Leading Korean LLMs
Developer | Model Name | Key Features |
---|---|---|
Naver | HyperCLOVA X | Massive Naver data base, Korean language specialization, ‘Thinking’ feature, integration with own services (search, shopping, etc.) |
Kakao | Koala (formerly KoGPT) | Open source (commercial use allowed), lightweight and efficient, excellent Korean performance, multimodal support |
SKT | A.X (A.Dot.X) | Developed ‘from scratch,’ multimodal (VLM), high-performance document encoder, telecom specialization |
LG AI Research | EXAONE | Expert AI, hybrid of inference and generation, specialized in math/coding/science |
Upstage | SOLAR | Lightweight model (SLM) with top-level performance, high efficiency and cost-effectiveness, global leaderboard #1 |
In this fierce competition, the ‘Open Ko-LLM Leaderboard’ led by Upstage serves as a standard benchmark objectively comparing domestic models’ performance, contributing to the development of Korea’s AI ecosystem.
Conclusion: The Journey Toward Agent AI and Our Challenges
The AI journey that began with dice throwing has opened the era of large language models that converse like humans in about 70 years. Now, technology is advancing to the next stage: ‘Agentic AI.’ Agent AI autonomously sets goals, plans, and uses tools to perform complex tasks as active problem solvers.
Facing this dazzling future, Geoffrey Hinton warns of superintelligence risks, and Yann LeCun points out the current LLMs’ lack of ‘common sense,’ advocating for new architectures. Their debates show we stand not at the peak of technology but at a new starting point.
Key Summary
- A journey starting from probability: AI began with a probabilistic approach inspired by casino probability games, seeking the ‘most plausible answer’ rather than perfect calculation.
- The transformer revolution: The transformer architecture overcame sequential processing limits, maximizing parallelism and contextual understanding, ushering in the LLM era.
- Alignment with humans and the future: ChatGPT was aligned with human intent via RLHF, and now AI is evolving toward ‘agent AI’ that plans and acts autonomously.
The next chapter—how we develop and responsibly integrate this powerful new technology into society—is in all our hands. Why not experience one of the domestic LLMs introduced today to feel their potential and limitations firsthand?
References
- Monte Carlo Method - Namu Wiki Link
- Dartmouth Workshop - Wikipedia Link
- AI Winter - Wikipedia Link
- What is Backpropagation? - IBM Link
- ImageNet - Wikipedia Link
- What is Recurrent Neural Network (RNN)? - AWS Link
- [1706.03762] Attention Is All You Need - arXiv Link
- The Illustrated Transformer - Jay Alammar Link
- Illustrating Reinforcement Learning from Human Feedback (RLHF) - Hugging Face Link
- Status and Comparison of Domestic LLM Models - MSAP.ai Link