Microsoft’s problem, I think, is in significant part that they are the big commercial player trying for a local AI play. Like, your local Windows machine does AI inference. In Anthropic’s business model, the inference is cloud-based.
Local is more hardware-intensive, because the capacity utilization of hardware is going to be lower for local AI. If you stick a piece of hardware in a datacenter and lots of people share it, you need less hardware, because when one person isn’t using that hardware, another can be. If you would use local AI hardware 1% of the time, then it’s 100 times cheaper from a hardware standpoint to have people sharing parallel compute hardware in a datacenter. So as long as hardware prices, like shortages of memory, are a constraining factor (or cooling, for that matter, or maybe power if you’re talking about laptops on battery, all of which have cloud-based approaches getting an advantage) Microsoft’s going to have a harder time of it than the cloud guys.
Microsoft (and local AI in general) does better if people really want low-latency or always-doing-work load, or reliably always-available services, or services where data privacy is critical. There, local AI has the advantage over cloud-based or at least erodes the cloud-based advantage. Right now, I think that that’s just not generally where the state of affairs is. Could change in the future, but I think that they’re just going to have a hard time of things in the near term. My guess is that Microsoft’s relative potential improves as memory prices come back down.
I think that running local LLMs would be great. But the simple fact is that for most users, it’s just too costly to make sense for a lot of applications with current memory prices.
I got a Framework Desktop, 128GB, specifically to do local generative AI stuff, in 2025. At the time, the system was $2,500, which is already going to be pricey for a number of people for a single-purpose computer. In the months since that shipped, the price on the exact same hardware configuration has gone up to over $6,500. That’s just not a price that a lot of people are going to be willing to pay for a PC. If component supply rises and prices drop back down, then I think that the calculus changes for local AI.
AI companies are acquiring more memory than the entire rest of the world uses. If we want to do the same thing that we could do in the cloud locally and have capacity utilization of 1% on that hardware, then we need a hundred times as much memory as that. That’s…a kind of staggering number.
Suitably MS is behind a bunch of small models released to hugging face for things like audio transcription and text to speech. Suggests they’re moving to the idea of full local experience, but haven’t really been able to fully materialise it.
Wouldn’t surprise me if they’re fighting it out internally of having the fully local experience they’re kinda moving to vs the frontier experience using cloud and subscriptions they can change more for.
Microsoft’s problem, I think, is in significant part that they are the big commercial player trying for a local AI play. Like, your local Windows machine does AI inference. In Anthropic’s business model, the inference is cloud-based.
Local is more hardware-intensive, because the capacity utilization of hardware is going to be lower for local AI. If you stick a piece of hardware in a datacenter and lots of people share it, you need less hardware, because when one person isn’t using that hardware, another can be. If you would use local AI hardware 1% of the time, then it’s 100 times cheaper from a hardware standpoint to have people sharing parallel compute hardware in a datacenter. So as long as hardware prices, like shortages of memory, are a constraining factor (or cooling, for that matter, or maybe power if you’re talking about laptops on battery, all of which have cloud-based approaches getting an advantage) Microsoft’s going to have a harder time of it than the cloud guys.
Microsoft (and local AI in general) does better if people really want low-latency or always-doing-work load, or reliably always-available services, or services where data privacy is critical. There, local AI has the advantage over cloud-based or at least erodes the cloud-based advantage. Right now, I think that that’s just not generally where the state of affairs is. Could change in the future, but I think that they’re just going to have a hard time of things in the near term. My guess is that Microsoft’s relative potential improves as memory prices come back down.
I think that running local LLMs would be great. But the simple fact is that for most users, it’s just too costly to make sense for a lot of applications with current memory prices.
I got a Framework Desktop, 128GB, specifically to do local generative AI stuff, in 2025. At the time, the system was $2,500, which is already going to be pricey for a number of people for a single-purpose computer. In the months since that shipped, the price on the exact same hardware configuration has gone up to over $6,500. That’s just not a price that a lot of people are going to be willing to pay for a PC. If component supply rises and prices drop back down, then I think that the calculus changes for local AI.
AI companies are acquiring more memory than the entire rest of the world uses. If we want to do the same thing that we could do in the cloud locally and have capacity utilization of 1% on that hardware, then we need a hundred times as much memory as that. That’s…a kind of staggering number.
Suitably MS is behind a bunch of small models released to hugging face for things like audio transcription and text to speech. Suggests they’re moving to the idea of full local experience, but haven’t really been able to fully materialise it.
Wouldn’t surprise me if they’re fighting it out internally of having the fully local experience they’re kinda moving to vs the frontier experience using cloud and subscriptions they can change more for.