Rene Haas recently discussed Arm’s edge AI strategy in an interview, but a major question remains unanswered.
Let’s start with insights from the podcast.
Rene’s Insights
First, Rene reminds the listeners that we’re in the training epoch, but the inference explosion is coming.
If training is the teacher, inference is the student. And there are far more students than teachers in the universe. And that's why there'll be far more inference workloads than training.
This is obvious to most who follow the industry closely, but it can still be forgotten. Nvidia will tell you inference will stay predominately in the cloud, while edge companies and finance folks will push inference to the edge. Let the users pay for CapEx and OpEx!
As expected, Rene sees inference coming to the edge. He’s also bullish on GenAI’s ability to unlock fantastic user experiences for alternative form factors.
And [inference] is going to run everywhere. Relative to the smallest devices, whether it's wearables, whether it's a headset and augmented reality, you're not gonna run a 100 watt GPU on your head. I'm sorry, this is not gonna happen, right? You're gonna have to get into very, very different form factors.
Generative AI at the edge requires exceptional power efficiency and minimal thermal dissipation, constraints Arm has engineered for since its founding.
Back in the day, they used to build chips in two different ways. They had plastic packages, which was pretty rare. And ceramic packages, which were much better in terms of heat dissipation, but were costly and not that great in terms of thermals. So one of the directives in terms of the original design was — let's get it in a plastic package. So as a result, from the very early days, the early ARM processor was defined to basically to run off a battery.
Future form factors like AR/VR and wearables have the same battery and thermal design constraints.
I also love Rene’s “you’re not gonna run 100W GPU on your head” point. I’m very bullish on glasses form factor (Meta, Qualcomm), which demands <1W to avoid face discomfort.
Naturally, Rene mentions an edge AI future that’s always a hybrid of edge and cloud; I’d expect as much given Arm’s opportunity in data center inference, e.g., Grace CPU.
And I think the model will be for these edge devices to run in conjunction with cloud, where you're going to have some processing happening locally, some processing going to be happening in the cloud. You're going to need to have some level of security and authentication and attestation locally so that the models know that it's you and it's not somebody else and the information is kept private to you.
This hybrid approach makes sense in the near future — the biggest, smartest models run in the cloud while tiny models are fine-tuned to device-specific applications and run locally (e.g. Apple Intelligence).
What’s unclear is whether this is the only path forward in 3-5 years, or if we’ll have small local models with enough local compute and memory to run most common workloads at the edge.
Haas laid out Arm’s edge AI vision: systems with a host CPU and AI accelerator.
Now, naturally, a CPU is going to be there. You can't have an accelerator out there without something that's running the main and the system.
And remember the earlier quote calling out that edge AI accelerator doesn’t only mean GPUs, especially in size and power-constrained systems.
Instead, Rene believes in Arm host CPUs plus small NPUs
But also, back to the customization, you could add small AI acceleration, which we do today with our Ethos NPUs, which are four tops, eight tops, etc. That will do some level of offload.
Rene also mentioned updates to CPUs in the form of Arm ISA extensions that improve parallel compute performance.
We can add more and more capability to our CPUs, which we are today, around extensions that help with AI acceleration.
But that’s for simple (non-GenAI) workloads.
In summary, Arm envisions a future where generative AI expands into various edge devices, likely resulting in increased sales of Arm CPUs to support edge AI accelerators.
Behind the paywall I’ll give my take and list big unanswered strategy questions I still have for Arm.
Keep reading with a 7-day free trial
Subscribe to Chipstrat to keep reading this post and get 7 days of free access to the full post archives.