The Unsung Hero of the AI Era, NPU: Brain-Inspired Semiconductors, Global Giants and Korean Challengers

The New Brain That Powers AI’s Heartbeat: The Emergence of the NPU

We Know CPU and GPU, But What Is NPU?

When we talk about a computer’s “brain,” we usually think of the CPU (Central Processing Unit). We also know that a GPU (Graphics Processing Unit) is essential for handling vivid graphics. However, with the full onset of the AI era, the term ‘NPU’ is increasingly heard. What exactly is this new processing unit, distinct from CPUs and GPUs?

NPU stands for ‘Neural Processing Unit’, a dedicated hardware designed specifically to run artificial intelligence (AI) and deep learning algorithms. To understand its role, imagine an office team:

CPU (Central Processing Unit): The capable ’team leader’ who handles complex and varied instructions sequentially. It manages the operating system and prioritizes tasks but struggles with thousands or millions of simple repetitive tasks simultaneously.
GPU (Graphics Processing Unit): Originally a ’large task force’ hired for graphics processing, specialized in parallel processing of thousands of simple calculations simultaneously. This parallelism suits graphics and was later found ideal for AI’s large-scale matrix multiplications, making GPUs key AI workers.
NPU (Neural Processing Unit): The newly joined ‘AI specialist analyst’ focusing solely on neural network computations. Designed for this single purpose, it processes AI tasks much faster and more efficiently than the team leader (CPU) or the large task force (GPU).

If the CPU is the team leader handling sequential tasks, the GPU is the large parallel workforce, and the NPU is the analyst specialized exclusively in AI computations.

Why Were Existing Brains Insufficient?

Why couldn’t the powerful GPU team alone satisfy AI’s demands? The AI era fundamentally changed the type and volume of data we process. While past data was mostly text, now image and video data flood in. AI applications requiring real-time analysis and decision-making of massive data surged.

Deep learning’s core involves performing numerous matrix multiplications and convolutions simultaneously and repeatedly. GPUs excelled at this due to their parallelism, outperforming CPUs, but had fundamental limits. GPUs were originally designed as general-purpose chips for graphics, not optimized solely for AI computations. This led to two major issues:

First, enormous power consumption. Data centers running AI training and inference deploy thousands or tens of thousands of GPUs, resulting in staggering electricity use and heat generation, earning GPUs the nickname “power hogs.” This sharply increased data center operating costs. Second, embedding GPUs in battery-powered small devices like smartphones or IoT gadgets was difficult due to high power and heat.

Thus, the AI era demanded not just speed but power efficiency—doing more computations with less energy. This was critical not only technologically but also for AI’s sustainability and economics. Expanding AI from data centers to handheld devices required a new approach.

Birth of Brain-Inspired Processors

The answer to this demand was the NPU. Its defining feature is mimicking the human brain’s operation—neural network architecture implemented in hardware. The human brain processes information in parallel through countless neurons and synapses. NPUs emulate this by performing core AI operations like matrix multiplications simultaneously across many small processing units.

Thanks to this brain-like structure, NPUs achieve much higher efficiency than GPUs in AI tasks. By shedding unnecessary functions and focusing solely on AI computations, NPUs deliver faster speeds with significantly lower power consumption for the same tasks.

This trait has dramatically expanded NPU applications. Real-time face recognition, subject separation in photos, and voice-to-text conversion on smartphones—on-device AI—became possible thanks to NPUs. In data centers, NPUs are gaining attention as cost-effective alternatives to GPUs for the inference phase of large language models (LLMs). In autonomous vehicles, NPUs serve as the critical brain for real-time environment perception and decision-making.

In summary, the rise of NPUs marks a paradigm shift in AI hardware—from “just fast” to “efficiently fast,” enabling AI to permeate all devices beyond data centers.

The Dawn of the AI Semiconductor War: Global Big Tech’s NPU Strategies

The battle to dominate the vast AI market has escalated into a ‘chip war’ centered on hardware, especially NPUs. This war is not just about making faster chips; each company builds strategic territories aligned with their strengths and visions. Let’s examine the moves of the three giants: Nvidia, Google, and Apple.

The Emperor’s Defense: Nvidia’s GPU Empire and Software Moat

The undisputed leader in the AI chip market is Nvidia. It dominates 80–90% of the data center GPU market used for AI training, building a near-monopoly empire. Its latest architectures like Blackwell deliver overwhelming performance that competitors struggle to match.

Nvidia’s true strength lies beyond hardware, in the powerful ‘CUDA’ software ecosystem built over 15 years. — Nvidia’s true strength lies beyond hardware, in the powerful 'CUDA' software ecosystem built over 15 years.

However, Nvidia’s real power isn’t just hardware. Its empire is secured by the deep and broad software moat called ‘CUDA’. Introduced in 2007, CUDA is Nvidia’s parallel computing platform and programming model. Developers use CUDA to fully leverage Nvidia GPUs for AI model development. The CUDA ecosystem, accumulated over 15+ years, includes vast libraries, optimized tools, and a large developer community. This makes it hard for AI developers to switch to other chips, as rewriting software for new hardware demands huge time and cost. This is the toughest barrier for competitors.

Nvidia is evolving beyond chip manufacturing into a ‘full-stack AI platform company.’ It offers the Omniverse platform for AI training and simulation in virtual worlds, the NeMo framework for conversational AI development, and even sells entire data center racks loaded with tens of thousands of GPUs. Its vision now targets ‘Physical AI’ and ‘Agentic AI’—AI interacting with the real world through robots and autonomous vehicles—expanding AI’s reach beyond the digital realm.

Challenger’s Ambush: Google’s TPU and the Inference Market

One of Nvidia’s strongest challengers is Google. Operating vast AI services like Search, Photos, and Translate, Google faced huge computational demands. Relying solely on Nvidia GPUs was costly and inefficient. So, Google developed its secret weapon: the Tensor Processing Unit (TPU).

Google takes a different path in the ‘inference’ market with its TPU optimized for its services. — Google takes a different path in the 'inference' market with its TPU optimized for its services.

TPUs emerged around 2015, optimized for Google’s AI framework TensorFlow. They powered the inference phase of AlphaGo’s matches against Lee Sedol. TPUs are highly optimized for the ‘inference’ stage—using trained models to generate outputs—rather than the ‘training’ stage of building models.

If the AI market is a car race, ‘training’ is designing and tuning the car for peak performance, while ‘inference’ is the actual continuous driving on the track. Nvidia GPUs excel at training, but Google’s TPUs are hyper-optimized for inference, delivering the best fuel efficiency and performance.

Google’s TPU has evolved through generations. Recently, after the 6th generation ‘Trillium,’ Google unveiled the 7th generation ‘Ironwood,’ maximizing inference performance. Ironwood doubles the watt-per-performance efficiency compared to its predecessor, underscoring Google’s strategy to dominate the growing inference market. Rather than attacking Nvidia’s training market head-on, Google targets the expanding inference market with a flank attack, aiming to reshape the AI chip war.

The On-Device King: Apple’s Neural Engine and Privacy Fortress

Apple’s strategy focuses not on data centers but on devices in our hands. Its goal is to deliver powerful AI experiences without cloud dependency while rigorously protecting user privacy. At the core is the Apple Neural Engine (ANE).

Apple pursues powerful on-device AI and privacy protection through the Neural Engine (ANE) embedded in M-series chips.

ANE debuted in 2017 with the A11 Bionic chip in the iPhone X and has since become a key component across all iPhones, iPads, and M-series Macs.

ANE’s performance growth is remarkable. From 0.6 TOPS (trillions of operations per second) in 2017’s A11 to 11 TOPS in M1 (2020), 18 TOPS in M3 (2023), and a staggering 38 TOPS in the latest M4 (2024)—a 60-fold increase in just seven years.

Apple’s obsession with ANE performance is clear: to achieve the best user experience and privacy by processing all AI computations on-device. This avoids sending sensitive data to external servers, enhancing security and enabling fast, reliable operation without network dependence. Many impressive Apple features rely on ANE:

Security and Authentication: 3D face recognition unlocking with Face ID.
Camera: Smart HDR for detail in backlit scenes, Night Mode for bright, clear low-light photos.
Productivity: Live Text for copying text from photos, real-time call translation.
AI Assistant: Siri and the recently announced Apple Intelligence, both working offline.

Apple supports developers with the Core ML framework to easily integrate ANE’s power into apps, building a robust on-device AI ecosystem.

Thus, the global AI chip war unfolds across three main fronts: data center training (Nvidia), data center inference (Google), and on-device experience (Apple). This market segmentation opens crucial opportunities for Korean companies.

Global AI Chip Giants: Strategy Comparison

Company	Flagship Chip/Architecture	Core Strategic Advantage
Nvidia	Blackwell GPU	CUDA software ecosystem & full-stack platform
Google	Ironwood TPU	Extreme efficiency for own services, Google Cloud integration
Apple	M4 Neural Engine	Hardware/software integration, privacy, power efficiency

Ironically, Nvidia’s biggest threat isn’t a better chip but its largest customers developing their own chips. Hyperscalers like Google, Amazon, Microsoft, and Meta reduce reliance on expensive Nvidia GPUs by creating custom NPUs (ASICs) optimized for their services. This validates the fundamental value of NPUs—specialized hardware outperforms general-purpose chips—and signals a diversified future for AI hardware.

Challenging the ‘Nvidia Fortress’: The Current State of Korean NPU Development

As global giants expand their AI semiconductor domains, South Korea’s semiconductor industry is boldly entering this massive wave. Refusing to rest on its memory semiconductor dominance, Korea actively seeks new growth in the AI semiconductor market, known as the crown jewel of system semiconductors. Korea’s strategy is a clever ‘pincer movement’ targeting both the data center inference market and the on-device edge market, rather than attacking Nvidia’s core directly. From giants like Samsung and SK to well-funded startups, let’s explore the current status of the Korean NPU dream team.

The Giants’ Diversionary Attack: Samsung Electronics and SK Hynix

Samsung’s On-Device Powerhouse: Exynos NPU

Samsung Electronics has long developed its own mobile AP (Application Processor) series, Exynos. Leveraging this, it focused early on NPU development for on-device AI computations. The journey began with the 2018 Exynos 9820, whose NPU boosted AI computation capability by 7 times over the previous generation, heralding the on-device AI era.

Recently, Samsung showcased the pinnacle of its NPU technology with the Exynos 2400. Featured in the Galaxy S24 series, this chip’s NPU performance improved by about 14.7 times compared to the Exynos 2200. This powerful NPU enables core features of Galaxy AI, such as real-time call translation without internet and image generation from text prompts—demonstrating how NPU technology becomes a flagship product’s key competitive advantage.

SK’s Strategic Shift: From Sapeon to the Rebellion Alliance

SK Group also entered the AI semiconductor market early. Starting as a subsidiary of SK Telecom, ‘Sapeon’ focused on data center inference NPUs. In 2020, it launched Korea’s first data center AI chip, the X220, followed by the significantly improved X330. The X330 achieved over 4 times the computational performance and twice the power efficiency of its predecessor, earning recognition as a competitor to Nvidia’s mid-range inference card L40S. It was validated for use in Supermicro servers, increasing commercialization prospects.

However, SK envisioned a bigger picture. To avoid excessive competition among domestic AI semiconductor startups and build a ‘national representative’ company capable of global competition, SK announced in 2024 the merger of Sapeon with another NPU powerhouse, Rebellion. This strategic move unites scattered technologies and resources into an alliance aimed at challenging Nvidia, marking a historic milestone in Korea’s AI semiconductor history.

Fearless Challengers: Korea’s Fabless Trio

Another pillar of Korea’s NPU ecosystem is innovative fabless startups armed with cutting-edge technology. Backed by large corporate investments, they achieve remarkable results in their domains.

Startups like Rebellion, FuriosaAI, and DeepX are leading the future of ‘K-semiconductors’. — Startups like Rebellion, FuriosaAI, and DeepX are leading the future of 'K-semiconductors'.

Rebellion: Rising Star with Massive Investment

Rebellions, founded just over three years ago, has attracted 280 billion KRW in cumulative investments, drawing strong market attention. Their flagship product is the data center inference NPU ‘ATOM’. Rebellion’s major achievement is the successful commercial deployment of ATOM. KT, a strategic investor and key partner, introduced ATOM in its cloud data centers, launching Korea’s first cloud service based on domestic NPUs. ATOM also powers a lightweight version of KT’s massive AI model ‘Mi:dm’, proving domestic NPUs can serve as the brain of AI services. Rebellion’s next goal is the global market. Partnering with Samsung Electronics, they are jointly developing the next-generation AI semiconductor ‘REBEL’, targeting the large language model (LLM) market like ChatGPT. This chip will be Rebellion’s key weapon for global expansion.

FuriosaAI: Performance-Focused Technical Challenger

FuriosaAI is another NPU powerhouse challenging Nvidia with outstanding performance and efficiency. Founded in 2017, it stunned the market with its second-generation chip ‘Renegade’, following the first-generation ‘Warboy’. Renegade delivers 512 TOPS at just 180W, boasting superior power efficiency compared to competing GPUs. This prowess attracted acquisition offers from Meta, which FuriosaAI declined. A decisive validation came through a partnership with LG AI Research Institute. LG tested FuriosaAI’s Renegade instead of expensive Nvidia GPUs to run its massive AI model ‘Exaone’, confirming meaningful performance. This symbolic event demonstrated domestic NPUs as realistic alternatives to GPUs in major LLM infrastructures. FuriosaAI also supplies chips for AI startup Upstage’s OCR solutions, steadily building commercialization cases.

DeepX: Niche Leader Targeting the Edge AI Market

While Rebellion and FuriosaAI target data centers, DeepX aims to dominate the ultra-low-power edge AI market—AI computations performed on devices like smartphones, appliances, robots, and CCTV at the network edge. DeepX’s core strength is offering an optimized NPU portfolio tailored to diverse edge device requirements. It supplies customized AI semiconductors for physical security, smart factories, robotics, and smart mobility. Beyond chips, DeepX provides full-stack software—compilers, runtimes, SDKs—to ease AI model development and deployment, enhancing customer convenience. Awarded three innovation awards at CES 2024, DeepX has established unmatched competitiveness in the edge AI niche.

Korea’s NPU development forms an organic ecosystem where large companies and startups play complementary roles. Fabless startups innovate in design, Samsung Electronics and SK Hynix supply foundry and high-bandwidth memory (HBM), and conglomerates like KT and LG open initial markets. This unique ‘Korean NPU Alliance’ is the most realistic and powerful strategy to challenge Nvidia’s dominance.

Korean NPU Challengers: Competitive Landscape

Company	Flagship Product	Key Partners / Sponsors
Samsung LSI	Exynos 2400	Samsung Electronics MX Division (Galaxy)
Rebellion (merged with Sapeon)	ATOM / REBEL / X330	KT, SK Telecom, Samsung Electronics, IBM
FuriosaAI	Renegade	LG, Upstage, Naver
DeepX	DX-M1 & Portfolio	Various industrial clients

The Race Toward the Future: NPU Market Outlook and Korea’s Challenges

The AI semiconductor market has just begun its explosive growth. The future ahead is both a huge opportunity and a harsh test for K-semiconductors. Let’s examine the market potential and the challenges Korea must overcome to win this fierce race.

A Trillion-Won Opportunity: NPU Market Outlook

Market analysts predict a very bright future for the AI chip market. The global AI chip market is expected to grow to hundreds of billions of dollars by the early 2030s. Particularly, the NPU and on-device AI markets are among the hottest battlegrounds, with annual growth rates between 20% and 35%.

This growth will accelerate as AI permeates all aspects of life—from smartphones and PCs to cars, robots, medical devices, and smart homes. The explosive demand for fast, secure AI operating independently of the cloud will increase the importance of low-power, high-efficiency NPUs. The ‘inference’ and ‘edge’ markets targeted by Korean NPU companies lie at the heart of this growth.

The Real Battlefield: Beyond Hardware to Ecosystem Wars

However, making high-performance chips alone won’t secure future market rewards. The essence of the AI semiconductor war is not hardware specs but a software ecosystem war. This is why Nvidia’s fortress is so strong and why all challengers struggle.

As mentioned, Nvidia’s CUDA platform is a software asset accumulated over 15+ years. Countless developers are familiar with CUDA, and innumerable AI models and applications are built on it. No matter how powerful a new NPU is, without easy-to-use software tools (compilers, libraries, SDKs), it’s useless. Developers don’t want to relearn everything from scratch for new hardware.

This explains why all NPU companies desperately develop their own software stacks. DeepX emphasizes full-stack software development roles in compiler and runtime in job postings, and Rebellion highlights perfect compatibility with standard frameworks like PyTorch and TensorFlow. These are desperate efforts to survive the ecosystem war.

The Korean government also recognizes this issue’s importance. The Ministry of Science and ICT leads the ‘K-Cloud Project,’ a national strategic response investing over 400 billion KRW by 2030 to develop hardware and software infrastructure based on domestic NPUs for data centers. This goes beyond supporting individual companies—it aims to create a massive software ecosystem, a ‘Korean CUDA.’ The project’s success will be a critical factor shaping Korea’s NPU industry future.

Korea’s Path Forward: Challenges and Opportunities

The path for K-semiconductors to become key players in the global NPU market is far from smooth but holds clear opportunities.

Challenges

Software Gap: Nvidia’s CUDA ecosystem built over decades remains a high barrier, requiring sustained massive investment to catch up.
Scale and Capital: Domestic startups’ R&D investments are less than 1% of Nvidia’s. Competing globally demands much larger capital and talent acquisition.
Global Customer Acquisition: Beyond government projects and domestic conglomerates, entering the demanding global big tech and data center markets to secure large contracts is the ultimate survival and growth hurdle.

Opportunities

Niche Market Capture: Focusing on ‘inference’ and ‘edge’ markets where NPUs have structural advantages, rather than Nvidia’s ‘training’ market, is a valid strategy to gain technological edge and market share.
Domestic Ecosystem Synergy: Korea’s world-class memory (HBM) technology and foundry infrastructure offer unparalleled opportunities for NPU fabless companies. Close collaboration between HBM manufacturers like SK Hynix and Samsung Electronics and NPU designers can create strong synergies for next-gen chips.
National Support: Government initiatives like ‘K-Cloud’ provide crucial support for early market creation and accelerated technology development.

Ultimately, the success of Korean NPU companies carries significance beyond national industry—it impacts the global AI industry. Currently, the AI industry faces risks from overdependence on a single supplier, Nvidia. If Korean alliances like Rebellion-Sapeon or FuriosaAI establish themselves as meaningful alternatives in global data centers, it signals a strong shift from a GPU-centric AI hardware market to a diversified NPU-centric one.

This marks the start of a major paradigm shift in AI infrastructure’s future, with Korea’s challengers standing at the center of this history. The brain-inspired semiconductor, NPU, and the new AI era it opens, have the world watching what role K-semiconductors will play.

Sources