Can AMD Bridge Nvidia’s Software Moat?
Transient Moats, Betting on Generative AI, AMD's Path Forward
Nvidia’s dominance is often attributed to various “moats” — competitive advantages that are hard to replicate. Industry observers claim Nvidia’s most important advantage is their CUDA software ecosystem. Yet companies like AMD, Intel, and various startups wake up every day believing they can compete with Nvidia regardless of said moat. Why do these underdogs believe they can overcome the CUDA moat? Let’s dig in.
We’ll first discuss the temporal nature of moats. Then we’ll explore how and why Nvidia built a software moat. Finally, in light of our learnings, we’ll discuss what it will take for AMD to compete.
The Transience of Moats
CUDA's extensive history, the sheer breadth and depth of its offerings, and its massive installed base create the impression of an insurmountable challenge for competitors.
However, are moats truly insurmountable?
After all, can’t a competitor spend their way over a moat?
Warren Buffet and Charlie Munger argue that all moats are temporary.
All moats are subject to attack in a capitalistic system so everybody is going to try. If you got a big castle in there, people are going to try to figure out how to get to it. What we have to decide – and most moats aren’t worth a damn in capitalism, I mean that's the nature of it – but we are trying to figure out what is keeping that castle still standing and what's going to keep it standing or cause it not to be standing in 5, 10, 20 years from now? What are the key factors and how permanent are they? How much do they depend on the genius of the lord in the castle?
Warren says the question isn’t if, but rather how long and why?
There’s widespread agreement that Nvidia's software ecosystem is a durable competitive advantage. But most observers are simply pointing out the existence of the moat or marveling at its size.
When we understand moats to be temporary, we can think differently. A temporary moat is simply a head start, and software generally gives a shorter head start than deep tech.
Let’s explore the factors contributing to the permanence of Nvidia’s CUDA moat.
As an aside: Comparing deep tech moats like ASML’s EUV lithography against software moats would be really interesting. So would the question “what moats do the GPU Rich have, and what happens when AI hardware becomes abundant?” If anyone reading this is thinking about questions like these, email me!
Expansive Ecosystem
Nvidia provides a comprehensive development toolkit – including compilers, drivers, and APIs – that simplifies the task of optimizing algorithms for parallel execution on GPUs. Over time, Nvidia widened the platform’s reach with software development kits tailored to a variety of domains and deepened its capabilities with significant core library and domain-specific optimizations.
The vast breadth and depth of Nvidia's software ecosystem form the bedrock of the CUDA moat argument. Proponents argue it would take a massive amount of manpower for any competitor to catch up to Nvidia’s head start — after all, they began this journey back in 2006 and employ (thousands?) of engineers dedicated to reinforcing this moat.1
These recent job postings demonstrate Nvidia's ongoing investment in deepening their competitive moat:
CUDA Math Libraries Engineer - Image Processing
We are the CUDA Math Libraries team at NVIDIA… We develop image processing libraries that … are used in healthcare, computer vision, deep learning, autonomous vehicles and countless others… Your contributions may include extending the capabilities of existing libraries as well as building new libraries … implementing new image processing algorithms, defining APIs, analyzing performance, finding appropriate solutions for difficult numerical corner cases…
and
Senior Systems Software Engineer, CUDA Trace and Profiling
Join the NVIDIA Developer Tools team and empower engineers throughout the world developing groundbreaking products in AI, Automotive, Gaming, and High Performance Computing… Develop real-time compute performance analysis tools for NVIDIA GPUs running on Linux, Windows, and embedded operating systems… Enable 3rd party tools developers to write sophisticated tools using the profiling API that you develop
The first job focuses on adding depth to a particular CUDA library, and the second expands the breadth of the platform with tools to help third party developers build atop Nvidia.
Let’s dig deeper here and ask why Nvidia built such a vast ecosystem and whether a competitor actually needs to reproduce all of this software to chip away at Nvidia’s moat?
The Evolution of Nvidia's Software Strategy
Recall that GPUs were originally not programmable, but were purpose-built graphic chips. In 2001, Nvidia introduced programmable shaders to their GPUs, marking the beginning of GPU programmability. This feature caught the attention of academic researchers who recognized the parallelism used in graphics rendering was analogous to that needed for scientific calculations such as solving partial differential equations. These researchers hacked the early programmability of GPUs to support parallel processing tasks beyond conventional graphics.2
In 2003, Ian Buck, a Stanford PhD student on a fellowship from Nvidia, led the development of the Brook streaming language which made it easier to write general purpose applications on GPUs. Upon graduation, Ian went to Nvidia and created the Compute Unified Device Architecture (CUDA) which was released in 2006 and ushered in the age of general-purpose computing on Nvidia GPUs. These GPUs picked up the moniker General Purpose GPUs (GPGPUs).
Nvidia rightfully took a platform approach with GPGPUs and built foundational parallel programming libraries to support many different problem domains. For example, here’s Nvidia’s 2011 GPU Technology Conference call for submissions:
If you are pursuing innovative work in parallel computing, we encourage you to submit a proposal in the form of a session or poster submission… Although submitters are not limited to these topic areas, we encourage you to be guided by them.
- Algorithms & Numerical Techniques
- Application Design & Porting Techniques
- Astronomy & Astrophysics
- Audio, Image and Video Processing
- Bioinformatics
- Climate & Weather Modeling
- Cloud Computing
- Cluster Management
- Computational Fluid Dynamics
- Computational Photography
- Computational Physics
- Computational Structural Mechanics
- Computer Graphics
- Computer Vision
- Databases, Data Mining, Business Intelligence
- Development Tools & Libraries
- Digital Content Creation & Film Electrical Design and Analysis
- Energy Exploration
- Finance
- GPU Accelerated Internet
- Life Sciences
- Machine Vision
- Machine Learning & AI
- Medical Imaging & Visualization
- Mobile Applications & Interfaces
- Molecular Dynamics
- Neuroscience
- Parallel Programming Languages
- Quantum Chemistry
- Ray Tracing
- Stereoscopic 3D
- Supercomputing
- Visualization
At the time, the terminal value of any given domain was unclear. Driven by customer demand, Nvidia supported as many domains as possible to see what materialized. This customer-driven platform strategy is also known as “throwing spaghetti at the wall and seeing what sticks”.
It was the correct approach and unlocked a first-mover advantage across many domains, begetting developer inertia.
Nvidia’s platform strategy is summarized in their 2012 annual report:
NVIDIA CUDA is a general purpose parallel computing architecture that leverages the parallel compute engine in NVIDIA GPUs to solve many complex computational problems in a fraction of the time required by a CPU. We are working with developers around the world who have adopted and written application programs for the CUDA architecture using various high-level programming languages, which can then be run at significant execution speeds on our GPUs. Developers are able to accelerate algorithms in areas ranging from molecular dynamics to image processing, medical image reconstruction and derivatives modeling for financial risk analysis. We are also working with universities around the world that teach parallel programming with CUDA as well as with many PC or OEMs that offer high performance computing solutions with Tesla [Tesla is name of an Nvidia GPU] for use by their customers around the world. We also sell directly to supercomputing centers such as Oak Ridge National Laboratory in the U.S. and the National Supercomputing Center in Tianjin, China. Researchers use CUDA to accelerate their time-to-discovery, and many popular off-the-shelf software packages are now CUDA-accelerated.
At the time, Nvidia was touting the following sampling of “killer” apps powered by CUDA: tracking space debris, modeling air traffic, detecting IEDs, reducing radiation from CT scans, operating on a beating heart, simulating shampoo, and cleaning cotton.
Product managers out there will look at that list and ask “how many space junk tracking customers are actually out there, and how many GPUs will they buy?” Enterprise and government solutions generate revenue and buzz but don't drive the chip sales Nvidia needs for true economies of scale.
These are the hard early days of a platform endeavor. The platform has customers with real problems and interesting solutions, but there’s no obvious product-market fit. Companies in this position often attempt to support every incoming customer use case, and prioritization is often a mixture of whoever will pay the most and whichever logo seems most attractive on the “customer stories” page of the website.
Nvidia was clearly still in this discovery stage, exploring the platform's possibilities and seeking out the breakthrough application that would drive massive datacenter GPU demand.
Identifying High-Value Domains and Prioritizing Investments
Nvidia’s early platform sentiment, “there’s value in each of these bets, but it’s not clear which will provide significantly higher returns” reminds me of the dynamics of early-stage venture investing. Early-stage investors bet on scores of startups that all have merit with hopes that a few strike it rich. YCombinator has literally invested in thousands of startups and is sustained by tens of massive winners like Airbnb and Dropbox.
Nvidia’s platform strategy was similar. They bet on a variety of industries with the hopes that something would hit. Early on, the order of magnitude of the terminal value of any given domain was unclear:
At this point in time, any follow-on competitors like AMD faced the uphill battle of entering markets where Nvidia already had a foothold with developers. Overcoming Nvidia’s first-mover advantage would have been a tough task, especially when potential returns were uncertain. It would have been expensive too — competitors would have to rapidly invest the costs that Nvidia distributed over a longer period.
Clearly, copying Nvidia’s “spaghetti against the wall” strategy would have been a futile endeavor.
Product-Market Fit & Winning Bets
If competitors couldn’t match Nvidia's spread, they must instead place fewer bets on the highest-potential domains. Of course the billion-dollar question: where to place bets?
Choosing which domains to bet on would have been very hard in the years B.C. (Before ChatGPT 😂). Fortunately for competitors like AMD, we are into year 2 A.D. (AI Dominance) and the winning bet is suddenly obvious.
Deep-learning is the transformative domain we’ve all been waiting for, and Generative AI is the broad use case with clear paths to consumer-scale adoption.
The expected value of the software bets now looks like the following:
In the next decade, deep-learning enabled Generative AI has orders of magnitudes higher expected value for the hardware industry than other high-performance computing domains. GenAI is the use case that will sell amount of chips needed for economies of scale.
We have product-market fit.
Breaching the Moat
Until now, Nvidia’s software moat seemed well fortified with a huge head start and first-mover advantages.
Paradoxically, in finding product-market fit, Nvidia has revealed the place for competitors to attack.
Specifically, competitors like AMD and various startups including MatX (hello world!) now know where to place their software bets — all-in on Generative AI.
Competitors can build the bridge over the moat by making their hardware run deep-learning based generative AI models as fast / cheap / energy-efficient as possible to unlock the “10-100X better than GPUs” value proposition and entice Nvidia’s customers. They will gamble all of their software engineers’ time on writing the code needed to support GenAI use cases and make it as easy as possible developers to make the switch.
For now, competitors should leave all of the other interesting high-performance computing use cases that have a small number of users for Nvidia to tackle (e.g. computational lithography).
AMD’s Path to Success
Now that Generative AI is the obvious place to invest, AMD is in a much more favorable position than it was a year or two ago.
But they aren’t out of the woods yet. To start, ROCm has a bad rap with developers. And of course, Nvidia still has a major head start, an army of software engineers, and a 10x market cap.
That said, there are opportunities for AMD thanks to abstraction layers, open source, and industry partnerships.
Can AMD actually build a bridge over Nvidia’s software moat and win over developers to build Generative AI use cases on their platform?
Let’s find out. Behind the paywall we’ll deep-dive into the challenges and opportunities for AMD.