So are we assuming here that LLMs won’t become more efficient over time? GPT-3 has been a frontier model just a few years ago and it’s performance blew everyone’s mind at that time. I can now run equivalent LLM on my personal computer. Why can’t we expect that after a few years Claude Sonnet level of capability won’t be possible to accomplish locally?
deleted by creator
A large majority definitely hate it to the point of having blinders on for sure.
On one side you have corpo hype/lies, and the other is LLM is slop garbage and terrible for anything, also developers wrote perfect code before LLMs and now everything that breaks is AI slop caused.
deleted by creator
Oh I know, it’s just what I see in every thread when some kind of outage or major bug is discovered. Half the comments are hurrdurr probably vibecoded.
What’s the cost of the compute you have to run something locally?
Majority of people don’t have 32G of vram to run something remotely as capable
I remember my computer not being fast enough to even play an MP3 file. Two years later, my computer was capable of running 3D accelerated games, browsing the internet at broadband speeds and playing videos.
Sometimes technology advances fast. We could be entering such an era as there are major investments taking place and global competitors will rise to the occasion to market these to a broader audience.
I think it will be entirely possible for consumers to use a decent LLM on their computer in a few years time.
It’s not the 90s anymore. Unless there’s a compression algorithm putting billions of relationships into a manageable size, local AI is highly specific under 8G vram (text-to-speech as an example is under 1G) let alone the context required for keeping a conversation or writing code.
To be clear, I wasn’t talking about a leap in LLM design. I was talking about a leap in hardware capabilities…
Which are increasingly out of reach for a normal person. Phones let alone PC hardware have increased exponentially in recent history
Improved hardware capabilities used to come very quickly (see Moore’s Law and Dennard Scaling). However that trend is basically over, so getting higher performance hardware takes a lot of effort to make hardware specialized for certain tasks. That’s why you see there inference accelerators like Groq, SambaNova, Cerebrus, etc. However this is hardware that still is gonna go into data centers. Something innovative has to happen on the AI side for commercial-grade models to be runnable on consumer hardware.
If text-to-speech is what Youtube uses to autogenerate the subtitles, it is worthless for anything that uses slightly richer vocabulary.
No. Autogenerated subtitles would be speech-to-text, rather than text-to-speech.
lfm2 works like greased lightning on the NPU built into the current macbook M5.
Describe greased lightning, because it’s much slower and needs to handle compression for context
We’re moving in that direction but an M5 is not what the majority of people are running at home
I dunno man, I’m not a slopjockey so I don’t know the minutiae of the addiction.
All of our devs appear to have M5s right now. All of those copilot+ laptops have NPUs too.
Your company has bought you the latest and greatest and likely supports commercial token usage too
You can’t compare LLMs at scale to running it locally; same experience and capabilities
“Latest and greatest” my fucking sides lmao
My company gave me some US shitware and I’ve got some local shitware instead.
If you can’t make that work and are dependent on the teat of the slopgenerators, that’s a skill issue on you, buddy.
I’ve got an old 1060ti in my server. Ollama shares it with just a couple other containers. Electricity here is majority hydro with some natural gas, $0.08/kWh.
It’s a little slow, but I can comfortably run qwen3:14b. Of course that’s not all done on the GPU, a large part is offloaded to server ram (generally 32GB available so more than enough headroom)
My server and my gaming PC combined last month came out to $13.32
How does that compare to closed models that Anthropic offers, at the context and scale they offer.
I run Qwen3.6 27B locally and it’s usable with 16G vram but still not the same as a data centre of Blackwell clusters.
They could, but what’s the plan here, exactly? That all these for profit companies who are currently publishing models for free, like Qwen, will continue to do so in the future?
Why not? Why Microsoft develops it’s .NET ecosystem? Why Google develops Go/Dart? It costs them lots of money and they give it for free.
The answer is: they don’t earn money on it directly, but these tools are a way to tie programmers to their cloud services. If you use .NET you’ll probably end up on Azure. If Go - probably you’ll use GCP.
So I suspect the same will be with LLMs. At some point they will say: “hey, you can use this LLM however you want, but as you are already using it, then you may want to know our platform is optimized for it”
Why can’t we expect that after a few years Claude Sonnet level of capability won’t be possible to accomplish locally?
Because when you’re old enough to remember what AIM chat it’s could do 25 years ago, it stops being impressive what today’s chatbots can do…
It’s seems “new” because everyone hated it and it was just a novelty back then.
But if you read up on them, they did 90% of what modern ones do. And if they had access to today’s computing, the only explanation for why they still suck so much, is that no one has ever wanted them.
The oligarchs just decided it didn’t matter
Because when you’re old enough to remember what AIM chat it’s could do 25 years ago, it stops being impressive what today’s chatbots can do…
C’mon, that’s just silly.
It already happened, small language models are busy dragging their nutsack on frontier models, running on a macbook and costing nothing
Where’s the fucking product, Sam?
Wake me up when this says “yes”.
Profit ≠ success
no wonder why OpenAI is losing alot of money.
The author is right and wrong. Its subsidised but not by anthropic. The power users who use their plans to the limit are subsidised by the rest of the users. Im an AI hater but I do think anthropic will be profitable next year. Their revenue growth is insane and looks to just be getting started. Claude code took enterprise by storm and now cowork is out.
What is the actual “cost” after they buy the hardware, is that $1000 really pure power usage cost?
that’s the $84,000 question. They’re filling datacenters with the fastest possible equipment and need it to be 10x faster, That hardware is dinosaur fodder a year after they install it.
I’m curious as well. My knowledge is probably quite outdated, but from what I understood the training part is what’s expensive and then querying the model is pretty cheap. Is it still true (or was it ever) that the generated answers on search engines are cheaper to generate than the actual search results?
It is sorta. Training is orders of magnitudes more intensive than inference, but we infer billions of times within a model generation.
I find that hard to believe, I recently had to uninstall co-pilot after it weaseled its way into my search bar. Its not an exageration to say that my PC literally ran cyberpunk 2077 with pathtracting better than it ran the fucking windows search bar with co-pilot.
Look at the public numbers, it seems true. Copilot on your taskbar is just windows being garbage, not the AI being bad. Just look at self-hosted AI and measure the power costs of your queries. It’s tiny.
That’s just a shitty front end interface implementation, it has nothing to do with the actual inference run by the models.
The problem is that the hardware has a 5 or 6 year depreciation schedule on paper, but NVIDIA keeps saying that their next generation chip will be twice as good as their last chip so there is a FOMO schedule of like every two years.
Would be nice to see that used hardware for sale rather than it being junked as a writeoff.
That’s how it goes for any industry in its growth phase. A lot of money is spent on research and infrastructure before it starts to collect revenue.
Yeah, I thought everyone was aware they are building datacenters and basically investing in infrastructure right now. Their spending doesn’t reflect how much it costs to deliver the service.
I don’t believe they will succeed, I just think there’s more discussion to be had here than repeating “fuck AI lol”
I think a lot of people just want to conclude that AI is going to “go away”, and latch on to beliefs that lead to this conclusion.
I think a lot of AI companies are likely to “go away.” That’s what happened when the dot com bubble popped, if there is indeed an AI bubble then we’ll see a similar massacre at the stock market. But the technology itself is sound, just like how the basic idea of e-commerce didn’t vanish with the dot-coms.
I’ve been doing a lot of fiddling with locally-run AI models and I’m thinking that the local open-weight models will be good enough to perform 90% of the tasks that most of us are currently depending on those big companies like Anthropic and OpenAI for. That’s going to let a lot of the air out of them when the applications catch up and start using those cheaper commodity-level models instead. For now it’s easier to just throw an OpenAI API key into your application and let it use the heavyweight models for everything, a powerful model can do simple tasks just as well as a simple model. Most tasks are simple but adding the ability to distinguish those tasks from the complicated ones is hard.
I like my local LLM too, but it’s one thing to utilize my existing VRam for a model that fits in there for fault tolerant tasks, and a whole other thing to utilize current frontier models which rack up an energy bill comparable to running a group of space heaters in a building which had to be designed for them, while not even having a guarantee that the output isn’t useless.
Right, which is why I said 90% and not 100%, and called out the challenge of deciding which tasks to send to which AIs. A lot of the interesting work I’m seeing in AI right now is in the agentic frameworks and harnesses that call the LLMs rather than just the LLMs themselves, these are the things that will break big complicated tasks down into more focused sub-tasks that cheaper LLMs can handle.
Given how some of the big providers like Gemini and Anthropic have been cranking up their API costs in recent weeks I expect we’ll see a lot more effort being put into rolling those sorts of features out.
It’s not even where to send it - you cannot predict how much any given task is going to cost you in tokens, which is the deciding factor in which model to use. The “cranking up” part has not even started yet, and we already have stories like Uber which blew through their complete AI budget for the year, what was it, 2 months ago? Uber is very pro-AI, so that budget was probably very generous. And to top it off, I haven’t seen or heard about anything new at Uber that would be even worth mentioning.
If you read the article, this project started from a clean slate and is 40k lines of code, so it’s peanuts in regards of complexity compared to what is out there in companies, and the author had to use the maximum power available to him to let Claude keep up. There still was no guarantee that the output was useable (and there can’t be such a guarantee, since hallucinations are a statistical fact, increasing in occurrence with smaller amounts of training Data available).
If you extrapolate this to an average IT stack, which has quirks and issues that are unique to it, you will never get anywhere you wouldn’t get by employing more engineers, who will get better over time and have fixed costs you can budget.
Remember, this is the “killer” application for LLMs. It looks a lot worse in EVERY other area except probably translation.
You can predict how much a task will take in tokens. The accuracy of the prediction may not be perfect, but if you can ballpark it that can tell you a lot about what models to make use of.
Also, not all tokens are the same. Different models require different amounts and kinds of computing power to run. Using a very large context costs more per token because you need a computer with a lot of memory to fit it all. If you need it fast that’s more expensive than if you an take your time. Does the task involve vision or audio? Does the context need to be saved for an ongoing chat? Does it need to wait for tool calls to return between rounds? There are a lot of variables that can be tweaked to vary the cost that an AI call will take, and a lot of those variables can be predicted without having to actually run the whole thing first.
The “cranking up” part has not even started yet, and we already have stories like Uber which blew through their complete AI budget for the year,
This is exactly what I’m talking about. Current LLM usage patterns tend to be pretty inefficient because people just thow tasks at the biggest and bestest models. Those models handle them, sure, because they’re the biggest and bestest. But most tasks don’t need that much.
I’ve used coding agents a fair bit along with the various other AI applications I’ve fiddled with, and often I ask them to do things that are dead simple. Create a function to sort some data and select whatever fits certain criteria. Add type checking to a file. Create a unit test for a function. Stuff like that could easily be done by a small local model, but the coding agent sends it off to Opus or whatever just like every other task. That can change.
There still was no guarantee that the output was useable (and there can’t be such a guarantee, since hallucinations are a statistical fact, increasing in occurrence with smaller amounts of training Data available).
I don’t think you’ve used modern coding AIs much.
Or, for that matter, worked with human coders.
Remember, this is the “killer” application for LLMs.
There is no one single “killer” application for LLMs. They’re about as general a computing platform as you can get.
I used to think like you, and I am still pro local LLMs - I use them as tutors for areas I don’t know much about, and since I use the output just as a guide and implement it on my own I quickly realize if something isn’t right.
We will see - when OpenAI and Anthropic rush towards IPO this year, which was made very likely because SpaceX has upped the tempo - what the real costs are. If this article and others I’ve read in the last year are correct, and the prices have to go up x10 to break even, then we are in for a wild ride. I’m only grateful that for now they don’t get lumped into the index funds.
Ah it’s the AI evangelist troll. You know better than to actually believe this, and even if you didn’t, the statement is a thought-terminating cliché that has been thoroughly mocked.
They will never collect revenue that will exceed the amount of capital that has been invested, because economics of scale do not work with LLMs.
Then they will go bankrupt, their assets and IP will be sold for pennies on the dollar, and those that follow them will be able to make a profit serving the established demand without the debt burden of the R&D that created it. It’s a common pattern for first-movers to not benefit from the industries they create.
No, sorry. Because the lion’s share of the cost comes from inference itself and the cost of running datacenters, no amount of shedding debt will help.
Good thing I don’t personally pay them anything
Oh, you are going to pay. The bubble is going to fuck us all quite thoroughly.
Exactly, these companies will keep leveraging more and more because they know the govt will step in and print whatever number of trillions of dollars needed to fix the accounting. Then they’ll tell us “core” inflation is only 2.8%.
Uh … This doesn’t seem like it will end well.
Honestly Google is likely to beat openAI and Anthropic as things are.
OpenAI and Anthropic have to buy/rent their hardware from Nvidia, while Google is making their own TPU hardware. Google’s hardware costs on AI is way lower, every dollar they spend on it goes a lot farther.
And unlike the other two, they’re already a profitable company. They’re making record profits right now. They don’t have a desperate need to figure out how to make back billions on their AI models, they can just keep offering Gemini at a comparatively cheap price and wait for anthropic and open AI to bankrupt themselves.
I really really really don’t want evil corporation Google to dominate even more.
I prefer plailny greedy corporations over evil ones
They’re all evil, so we just have to exploit the ones that offer us some value. If Google is cheaper, and has the ability to damage the others, then Google it is.
Google is shaping up to fare better than the others, but I dont think that means success. They, too, are spending more than its making, just at a less drunken rate than some competitors.
OpenAI and Anthropic aren’t less evil than Google.
They aren’t great, though I do think Google is worse. And far too powerful
Google is only worse by virtue of their reach. OpenAI and Anthropic don’t have the reach yet, but they absolutely will get there given the chance.
Before Google had the reach it has now, it was widely regarded as a comparitive ‘good guy’ and people believed in the “don’t be evil”. Lo and behold once they got going, “don’t be evil” went away.
I guess you missed this story from last week: Google To Pay SpaceX $920 Million Per Month For Massive AI Compute Power
That’s definitely costing them more than running it on their own hardware, but it doesn’t mean AI is costing them more than the AI startups. Anthropic for example is already paying SpaceX 1.25 Billion a month for compute, and has agreed to pay Google 200Billion oflcer the next 5 years for access to Google’s compute and TPU chips.
Google’s deal with xAI specifically lets them terminate the deal with 90 days notice after the end of the year. Google is also investing heavily in building new data centers with their hardware. I’m assuming this deal means they’ve eclipsed their current TPU capacity, and are just looking for a short term bandaid until they can catch up with their new constructions.
Anthropic is doing the same too. SpaceX over here providing the shovels and pans for the modern day gold rush, sheesh.
Plus they have a hook with the common folk, the phone steers you toward Gemini (Android phones, obviously, and Apple currently partners with Google for Gemini for iPhone…).
For Claude and OpenAI, you have to explicitly want to go out of your way to use them, or use them indirectly through another service that has a hook.
Claude seems to have some software developers explicitly preferring them, though a alot of the corporate money is on Microsoft and Microsoft leveraged Visual Studio and Github to become the business-friendly frontend, and sure, you can use Anthropic models too… Though Microsoft ultimately has control of what is reasonably available and how much each one costs. Anthropic has a shot but I could see Microsoft pivot to really mess with Anthropic. The one gap in Microsoft strategy is the “native AI” workflow where Claude Code has won hearts and minds, but it uses massively more tokens for frankly marginal or sometimes negative value compared to a more curated use in-editor.
OpenAI I see as the most exposed. Lot’s of data showing they are suffering from people being over the fad of going out of their way to use ChatGPT, especially since their phones have started embracing ‘default’ Chatbot. Software developers that are inclined to use LLM are also inclined to be pretty dismissive of anything other than either Anthropic or open weight models, depending on their inclination. Also Altman seemed the most agressive in committing to spending money they didn’t have, though all of them exhibit this to some extent.
I predict Microsoft ultimately pivots to in-house models and convinces the businesses to go that way. Apple may continue with Gemini or roll their own eventually. Anthropic currently has the stronger position between OpenAI and them, but I think you are right that both have risk of just being left behind.
I guess google’s announcement of renting xai compute could have been simply for show to boost SpaceX ipo.
Love that for them
That’s not good business
It’s gonna come crashing down pretty soon. It’s gonna hurt all of us. It won’t hurt the people responsible nearly enough.
pretty soon
people have been saying that for some time though
The thing is this really depends on the speed of some financial events, not some technical failing.
Notably, if OpenAI has to cancel any of their commitments to buy hardware because they find they have neither the money nor can secure even more debt to cover, that event would potentially cause the bubble to pop, even for hypothetical companies that may have been more responsible and might have a viable business approach. Those commitments are coming up, and a lot of analysis struggles to see how they will fund those commitments.
The thing with this bubble is that the investors don’t get the nuance and will flee at signs of trouble in any of OpenAI, Anthropic, or a handful of others, and Altman’s leadership has made trouble at OpenAI very likely, but the investors don’t believe it and won’t believe it’s unique to OpenAI, even if it would be.
The bubble will pop, I think a lot of people are just baffled by how big it’s getting.
What people? All the credible people I read say that things fall apart Q2/Q3 2027 as debt and profit obligations are due.
The only thing that changed is now there is an energy crisis coming, so it’s possible that might force the bubble to pop sooner if all the systemic risk aligns.
How much do they spend when I pay nothing?
I think they might’ve broken the laws of math there, as they’re certainly still spending a non-zero amount.
It just means they lose more money per paying user, I guess.
reminder than during 2019 there were streaming services popping left and right, all showing tremendous growth because they started from zero, and articles were about how bad Netflix was doing due to having practically no growth compared with the competition (they already had a massive subscriber base). Twist? Netflix was the only streaming service that was actually making a profit, the rest were a massive loss but big growth.
Needless to say most of those streaming services died; who remembers DC streaming service, or Yahoo’s? While Netflix is basically as stong as ever, despite the prevalent enshitification happening through the whole industry.
Point of the story? shareholders don’t care about stable profitable business, only cancerous growth. AI is like that, zero profits, ton of cost, but as long as they show growth the shareholders are happy, regardless of how cooked the books are.
Netflix was also late to streaming because their mail service subscriptions were THE major player
late to streaming, but practically the first subscription based system to watch movies/tv online.
First years of Netflix were the best, the product began degrading quite early on. but that was mostly companies realizing that instead of licensing their content on Netflix, they can make their own platforms.
I think people forget that there is also the problem of being “too early” where people or the technology isn’t ready yet. Netflix timed their entry perfectly.
There are so many defunct websites or businesses that no one has ever heard of that were precursors to modern day services we view as conveniences.
Late to streaming? Netflix was the first big time streaming service that I ever heard of. The main reason their streaming service was able to take off like it did is that nobody else of significance thought that streaming was worth pursuing. What other companies were offering streaming services at anything approaching scale before Netflix?
YouTube and Hulu were basically all starting about the same time. But RealPlayer was the first big one.
Netflix just had the layout that everyone uses now. The Cable networks had streaming services, just not on demand. YouTube and Hulu also pioneered the on demand layout. YouTube focused on personal experiences so maybe that’s why you’re forgetting them
YouTube started in 2005, but was not really a “streaming service”, it hosted random internet posted videos. The concept of engaging with the big content rights holders wasn’t remotely in sight back then.
Hulu came out a year after Netflix started streaming, by about a year. Hulu was inspired by Netflix’s move to have actual traditional media content as a streaming service instead of ad-hoc video uploads like youtube.
RealPlayer offered technology for websites to provide videos, they themselves I don’t recall being a streaming platform in and of itself.
Whatever one may say about Netflix, they were right there in the beginning with streaming traditional, professional media content. Yes, video playback over the internet wasn’t new, but that’s a technical detail that enables, but is not the core of the “streaming service” business model.
who remembers DC streaming service, or Yahoo’s?
Quibi will always have a place in my heart. Or, at least, my golden arm
2019 Yahoo
My immediate thought, there is no way Yahoo! Screen survived into 2019.
I looked it up and Yahoo! Screen (which featured Community season 6) was shutdown in January 2016. But Yahoo! View launched in late 2016 (as a Hulu-like replacement), and that did shutter in mid 2019.
So Yahoo! was already dead, but it also died for real in 2019.
Imagine having a streaming service so bad it fails twice
Isn’t that kind of Yahoo!'s business model?
Actually, when Yahoo was the search giant, before Google went mainstream, they were pretty damn good at what they did.
With how shit Google is these days, I kinda wonder if Yahoo could dust out their search engine from two decades back and it would just be… better.
Yahoo had it’s own web crawler only between 2004 and 2009, then they made a deal with Microsoft to use Bing indexes, so i highly doubt they even have their old index
I love that nobody watched anything on Yahoo! Screen except for that one season of Community
So much comments on just the title … Could come from anthropic directly.
There is literally zero basis on the made claim in the article, just arbitrage calculations over supposed token consumptions under non stable test sets.
I have no idea if/how much these
stupidfuckers spend to get more customers - and this “article” wasted a lot of time showing that they don’t know either.(Stupid is cut out because I don’t think they they’re stupid. Which makes it way worse in my book)
Joke’s on them. I’m not paying them a dime.

















