DeepSeek Pulls Forward Copilot+ PC adoption
What DeepSeek means for Microsoft Copilot+ PCs, Qualcomm & Arm, Copilot+ Cloud cannibalization
The trajectory of generative AI in consumer devices has always been a matter of when, not if. DeepSeek’s distilled R1 models suggest on-device AI is coming sooner than anticipated.
I’ve written extensively about the benefits of multimodal LLMs for mobile and hands-free form factors, e.g., this Qualcomm piece. Yet I’ve been pushing back on the timing, especially Pat Gelsinger’s aspirational vision of a 2024 AI PC supercycle. At that time, local LLMs were too large and slow, while the slimmed-down versions were less intelligent and still not very fast. If consumers could access high intelligence via the cloud, why choose a dumber and significantly slower local model?
However, DeepSeek R1 pulls us past these technical constraints and user experience friction.
As I discussed in this recent Apple article, DeepSeek’s distilled models are small enough to fit on a Macbook or iPhone. The models are
intelligent
responsive
rational
The first two, intelligence and responsiveness, are crucial for local LLM adoption.
The ability to think rationally, or reason, is essential for building local agentic applications.
For example, developers will create “business process automation” features for AI PCs that need reasoning models. Imagine the agent app thinking through tough situations:
Austin gave me eight images of receipts from his recent business trip. He wants me to extract the information and paste it into his expenses spreadsheet. Let’s try the first image. Hmm, I have low confidence in this receipt as it’s hard to read. I wasn’t given instructions on how to handle this situation. Should I skip this receipt? That might cause confusion because he is expecting eight entries in the spreadsheet. I’ll still enter the data into the spreadsheet but leave the total blank.
This is an example of the reasoning we’ll need for local AI. Not every use case indeed needs reasoning; ideally, we’ll have models that can “think slow” when given a complex or ambiguous prompt but also “think fast” when asked a simple question.
Note that these agentic applications will generate many intermediate tokens; the resulting user experience will likely be slow, driving demand for more performant hardware and software (e.g. increased RAM / unified memory, higher memory bandwidth speeds, more matrix/vector processing power (TOPS), and algorithmic innovations).
What does this mean for the Microsoft PC ecosystem?
Last October I wrote,
Qualcomm made a big splash back in May as Microsoft's exclusive Copilot+ PC supplier, but the Copilot+ excitement quickly faded when Microsoft Recall was rescinded over security concerns. This shines a light on the pains of not being vertically integrated. Qualcomm built a compliant NPU and OEMs raced to integrate the SoC in various laptops, but the software provider (Microsoft) stumbled and erased everyone’s head start.
This was a reminder that hardware makers—Qualcomm, Intel, AMD, and even Arm—are tied to Microsoft’s AI software roadmap. The best chips can’t drive adoption without great AI software.
Microsoft is working to flip this script and be the partner everyone needs. From Microsoft’s recent post Running Distilled DeepSeek R1 models locally on Copilot+ PCs:
We’re bringing NPU-optimized versions of DeepSeek-R1 directly to Copilot+ PCs, starting with Qualcomm Snapdragon X first, followed by Intel Core Ultra 200V and others. The first release, DeepSeek-R1-Distill-Qwen-1.5B (Source), will be available in AI Toolkit, with the 7B (Source) and 14B (Source) variants arriving soon. These optimized models let developers build and deploy AI-powered applications that run efficiently on-device, taking full advantage of the powerful NPUs in Copilot+ PCs.
Microsoft is racing to get R1’s fast, intelligent AI models into the hands of developers building for Copilot+ PCs. The success of these AI-powered PCs hinges on useful local applications, and that means giving third-party developers the tools they need.
I’m sure Microsoft’s hardware partners are happy about this too.
How Good Are We Talking?
Microsoft’s blog detailed optimizations of DeepSeek’s models for Copilot+ PCs to reduce battery drain while maintaining performance. The post mentions a time-to-first-token (TTFT) of 130 ms and a throughput rate of 16 tokens/s.
This a nice time-to-first-token. However, the throughput isn’t anything to call home about, and it’s for “short prompts (<64 tokens)” only. This isn’t useful; typically, any inquiry that isn't just a straightforward, one-off question contains over 64 input tokens since the preceding text in the conversation is returned through the LLM as context.
Furthermore, this was the 1.5B model with additional 4-bit and int4 quantizations to reduce memory usage and power consumption. Reducing a model’s size can affect accuracy or capability, but the specific quantization techniques used were selected to minimize any loss of intelligence. When the quantized R1 model was tested, it seemed to hold up well; the differences from the original seem small. Microsoft’s blog shows just one example, which could be cherry-picked. It’s a start, but real-world validation is needed for the whole story. On that note, I look forward to “vibe-checking” the performance of Microsoft’s forthcoming 7B and 14B DeepSeek variants.
Behind the paywall, we’ll explore some questions:
What does DeepSeek’s distilled models imply for Copilot+ PC adoption?
How might this impact Qualcomm and Arm’s Windows offerings?
Could local GenAI cannibalize Microsoft Copilot+ cloud offerings?
Keep reading with a 7-day free trial
Subscribe to Chipstrat to keep reading this post and get 7 days of free access to the full post archives.