Just want to clarify, this is not my Substack, I’m just sharing this because I found it insightful.
The author describes himself as a “fractional CTO”(no clue what that means, don’t ask me) and advisor. His clients asked him how they could leverage AI. He decided to experience it for himself. From the author(emphasis mine):
I forced myself to use Claude Code exclusively to build a product. Three months. Not a single line of code written by me. I wanted to experience what my clients were considering—100% AI adoption. I needed to know firsthand why that 95% failure rate exists.
I got the product launched. It worked. I was proud of what I’d created. Then came the moment that validated every concern in that MIT study: I needed to make a small change and realized I wasn’t confident I could do it. My own product, built under my direction, and I’d lost confidence in my ability to modify it.
Now when clients ask me about AI adoption, I can tell them exactly what 100% looks like: it looks like failure. Not immediate failure—that’s the trap. Initial metrics look great. You ship faster. You feel productive. Then three months later, you realize nobody actually understands what you’ve built.
To quote your quote:
I got the product launched. It worked. I was proud of what I’d created. Then came the moment that validated every concern in that MIT study: I needed to make a small change and realized I wasn’t confident I could do it. My own product, built under my direction, and I’d lost confidence in my ability to modify it.
I think the author just independently rediscovered “middle management”. Indeed, when you delegate the gruntwork under your responsibility, those same people are who you go to when addressing bugs and new requirements. It’s not on you to effect repairs: it’s on your team. I am Jack’s complete lack of surprise. The idea that relying on AI to do nuanced work like this and arrive at the exact correct answer to the problem, is naive at best. I’d be sweating too.
The problem though (with AI compared to humans): The human team learns, i.e. at some point they probably know what the mistake was and avoids doing it again. AI instead of humans: well maybe the next or different model will fix it maybe…
And what is very clear to me after trying to use these models, the larger the code-base the worse the AI gets, to the point of not helping at all or even being destructive. Apart from dissecting small isolatable pieces of independent code (i.e. keep the context small for the AI).
Humans likely get slower with a larger code-base, but they (usually) don’t arrive at a point where they can’t progress any further.
Humans likely get slower with a larger code-base, but they (usually) don’t arrive at a point where they can’t progress any further.
Notable exceptions like: https://peimpact.com/the-denver-international-airport-automated-baggage-handling-system/
I do a lot with AI but it is not good enough to replace humans, not even close. It repeats the same mistakes after you tell it no, it doesn’t remember things from 3 messages ago when it should. You have to keep re-explaining the goal to it. It’s wholey incompetant. And yea when you have it do stuff you aren’t familiar with or don’t create, def. I have it write a commentary, or I take the time out right then to ask it what x or y does then I add a comment.
Even worse, the ones I’ve evaluated (like Claude) constantly fail to even compile because, for example, they mix usages of different SDK versions. When instructed to use version 3 of some package, it will add the right version as a dependency but then still code with missing or deprecated APIs from the previous version that are obviously unavailable.
More time (and money, and electricity) is wasted trying to prompt it towards correct code than simply writing it yourself and then at the end of the day you have a smoking turd that no one even understands.
LLMs are a dead end.
constantly fail to even compile because, for example, they mix usages of different SDK versions
Try an agentic tool like Claude Code - it closes the loop by testing the compilation for you, and fixing its mistakes (like human programmers do) before bothering you for another prompt. I was where you are at 6 months ago, the tools have improved dramatically since then.
From TFS > I needed to make a small change and realized I wasn’t confident I could do it. My own product, built under my direction, and I’d lost confidence in my ability to modify it.
That sounds like a “fractional CTO problem” to me (IMO a fractional CTO is a guy who convinces several small companies that he’s a brilliant tech genius who will help them make their important tech decisions without actually paying full-time attention to any of them. Actual tech experience: optional.)
If you have lost confidence in your ability to modify your own creation, that’s not a tools problem - you are the tool, that’s a you problem. It doesn’t matter if you’re using an LLM coding tool, or a team of human developers, or a pack of monkeys to code your applications, if you don’t document and test and formally develop an “understanding” of your product that not only you but all stakeholders can grasp to the extent they need to, you’re just letting the development run wild - lacking a formal software development process maturity. LLMs can do that faster than a pack of monkeys, or a bunch of kids you hired off Craigslist, but it’s the exact same problem no matter how you slice it.
If you mean I have to install Claude’s software on my own computer, no thanks.
The LLM comparison to a team of human developers is a great example. But like outsourcing your development, LLM is less a tool and more just delegation. And yes, you can dig in deep to understand all the stuff the LLM is delegated to do the same as you can get deeply involved with a human development team to maintain an understanding. But most of the time, the sell is that you can save time - which means you aren’t expecting to micro manage your development team.
It is a fractional CTO problem but the actual issue is that developers are being demanded to become fractional CTOs by using LLM because they are being measured by expected productivity increases that limit time for understanding.
thats an interesting take, developers are demanded to also become fractional CTO, there is probably a larger than estimated knowledge and experience gap there and unless you have the knack for managing people you probably run into more problems than you are used to normally being just a code jockey
the sell is that you can save time
How do you know when salespeople (and lawyers) are lying? It’s only when their lips are moving.
developers are being demanded to become fractional CTOs by using LLM because they are being measured by expected productivity increases that limit time for understanding.
That’s the kind of thing that works out in the end. Like outsourcing to Asia, etc. It does work for some cases, it can bring sustainable improvements to the bottom line, but nowhere near as fast or easy or cheaply as the people selling it say.
There’s no point telling it not to do x because as soon as you mention it x it goes into its context window.
It has no filter, it’s like if you had no choice in your actions, and just had to do every thought that came into your head, if you were told not to do a thing you would immediately start thinking about doing it.
I’ve noticed this too, it’s hilarious(ly bad).
Especially with image generation, which we were using to make some quick avatars for a D&D game. “Draw a picture of an elf.” Generates images of elves that all have one weird earring. “Draw a picture of an elf without an earing.” Great now the elves have even more earrings.
I find this kind of performance to vary from one model to the next. I definitely have experienced the bad image getting worse phenomenon - especially with MS Copilot - but different models will perform differently.
There’s no point telling it not to do x because as soon as you mention it x it goes into its context window.
Reminds me of the Sonny Bono high speed downhill skiing problem: don’t fixate on that tree, if you fixate on that tree you’re going to hit the tree, fixate on the open space to the side of the tree.
LLMs do “understand” words like not, and don’t, but they also seem to work better with positive examples than negative ones.
I cannot understand and debug code written by AI. But I also cannot understand and debug code written by me.
Let’s just call it even.
At least you can blame yourself for your own shitty code, which hopefully will never attempt to “accidentally” erase the entire project
I don’t know how that happens, I regularly use Claude code and it’s constantly reminding me to push to git.
As an experiment I asked Claude to manage my git commits, it wrote the messages, kept a log, archived excess documentation, and worked really well for about 2 weeks. Then, as the project got larger, the commit process was taking longer and longer to execute. I finally pulled the plug when the automated commit process - which had performed flawlessly for dozens of commits and archives, accidentally irretrievably lost a batch of work - messed up the archive process and deleted it without archiving it first, didn’t commit it either.
AI/LLM workflows are non-deterministic. This means: they make mistakes. If you want something reliable, scalable, repeatable, have the AI write you code to do it deterministically as a tool, not as a workflow. Of course, deterministic tools can’t do things like summarize the content of a commit.
The longer the project the more stupid Claude gets. I’ve seen it both in chat, and in Claude code, and Claude explains the situation quite well:
Increased cognitive load: Longer projects have more state to track - more files, more interconnected components, more conventions established earlier. Each decision I make needs to consider all of this, and the probability of overlooking something increases with complexity.
Git specifically: For git operations, the problem is even worse because git state is highly sequential - each operation depends on the exact current state of the repository. If I lose track of what branch we’re on, what’s been committed, or what files exist, I’ll give incorrect commands.
Anything I do with Claude. I will split into different chats, I won’t give it access to git but I will provide it an updated repository via Repomix. I get much better results because of that.
Yeah, context management is one big key. The “compacting conversation” hack is a good one, you can continue conversations indefinitely, but after each compact it will throw away some context that you thought was valuable.
The best explanation I have heard for the current limitations is that there is a “context sweet spot” for Opus 4.5 that’s somewhere short of 200,000 tokens. As your context window gets filled above 100,000 tokens, at some point you’re at “optimal understanding” of whatever is in there, then as you continue on toward 200,000 tokens the hallucinations start to increase. As a hack, they “compact the conversation” and throw out less useful tokens getting you back to the “essential core” of what you were discussing before, so you can continue to feed it new prompts and get new reactions with a lower hallucination rate, but with that lower hallucination rate also comes a lower comprehension of what you said before the compacting event(s).
Some describe an aspect of this as the “lost in the middle” phenomenon since the compacting event tends to hang on to the very beginning and very end of the context window more aggressively than the middle, so more “middle of the window” content gets dropped during a compacting event.
I also cannot understand and debug code written by me.
So much this. I look back at stuff I wrote 10 years ago and shake my head, console myself that “we were on a really aggressive schedule.” At least in my mind I can do better, in practice the stuff has got to ship eventually and what ships is almost never what I would call perfect, or even ideal.
They never actually say what “product” do they make, it’s always “shipped product” like they’re fucking amazon warehouse. I suspect because it’s some trivial webpage that takes an afternoon for a student to ship up, that they spent three days arguing with an autocomplete to shit out.
Cloudflare, AWS, and other recent major service outages are what come to mind re: AI code. I’ve no doubt it is getting forced into critical infrastructure without proper diligence.
Humans are prone to error so imagine the errors our digital progeny are capable of!
Great article, brave and correct. Good luck getting the same leaders who blindly believe in a magical trend for this or next quarters numbers; they don’t care about things a year away let alone 10.
I work in HR and was stuck by the parallel between management jobs being gutted by major corps starting in the 80s and 90s during “downsizing” who either never replaced them or offshore them. They had the Big 4 telling them it was the future of business. Know who is now providing consultation to them on why they have poor ops, processes, high turnover, etc? Take $ on the way in, and the way out. AI is just the next in long line of smart people pretending they know your business while you abdicate knowing your business or employees.
Hope leaders can be a bit braver and wiser this go 'round so we don’t get to a cliffs edge in software.
Tbh I think the true leaders are high on coke.
Wow I didn’t know that I was leading this whole time.
I’m trying
Much appreciated 🫡
So there’s actual developers who could tell you from the start that LLMs are useless for coding, and then there’s this moron & similar people who first have to fuck up an ecosystem before believing the obvious. Thanks fuckhead for driving RAM prices through the ceiling… And for wasting energy and water.
I can least kinda appreciate this guy’s approach. If we assume that AI is a magic bullet, then it’s not crazy to assume we, the existing programmers, would resist it just to save our own jobs. Or we’d complain because it doesn’t do things our way, but we’re the old way and this is the new way. So maybe we’re just being whiny and can be ignored.
So he tested it to see for himself, and what he found was that he agreed with us, that it’s not worth it.
Ignoring experts is annoying, but doing some of your own science and getting first-hand experience isn’t always a bad idea.
And not only did he see for himself, he wrote up and published his results.
Yup. This was almost science. It’s just lacking measurements and repeatablity.
100% this. The guy was literally a consultant and a developer. It’d just be bad business for him to outright dismiss AI without having actual hands on experience with said product. Clients want that type of experience and knowledge when paying a business to give them advice and develop a product for them.
Except that outright dismissing snake oil would not at all be bad business. Calling a turd a diamond neither makes it sparkle, nor does it get rid of the stink.
I can’t just call everything snake oil without some actual measurements and tests.
Naive cynicism is just as naive as blind optimism
I can’t just call everything snake oil without some actual measurements and tests.
With all due respect, you have not understood the basic mechanic of machine learning and the consequences thereof.
With due respect, you have not understood how snake oil is detected.
Problem is that statistical word prediction has fuck-all to do with AI. It’s not and will never be. By “giving it a try” you contribute to the spread of this snake oil. And even if someone came up with actual AI, if it used enough resources to impact our ecosystem, instead of being a net positive, and if it was in the greedy hands of billionaires, then using it is equivalent to selling your executioner an axe.
Terrible take. Thanks for playing.
It’s actually impressive the level of downvotes you’ve gathered in what is generally a pretty anti-ai crowd.
They are useful for doing the kind of boilerplate boring stuff that any good dev should have largely optimized and automated already. If it’s 1) dead simple and 2) extremely common, then yeah an LLM can code for you, but ask yourself why you don’t have a time-saving solution for those common tasks already in place? As with anything LLM, it’s decent at replicating how humans in general have responded to a given problem, if the problem is not too complex and not too rare, and not much else.
Thats exactly what I so often find myself saying when people show off some neat thing that a code bot “wrote” for them in x minutes after only y minutes of “prompt engineering”. I’ll say, yeah I could also do that in y minutes of (bash scripting/vim macroing/system architecting/whatever), but the difference is that afterwards I have a reusable solution that: I understand, is automated, is robust, and didn’t consume a ton of resources. And as a bonus I got marginally better as a developer.
Its funny that if you stick them in an RPG and give them an ability to “kill any level 1-x enemy instantly, but don’t gain any xp for it” they’d all see it as the trap it is, but can’t see how that’s what AI so often is.
As you said, “boilerplate” code can be script generated - and there are IDEs that already do this, but in a deterministic way, so that you don’t have to proof-read every single line to avoid catastrophic security or crash flaws.
Maybe they’ll listen to one of their own?
The kind of useful article I would expect then is one exlaining why word prediction != AI
I really have not found AI to be useless for coding. I have found it extremely useful and it has saved me hundreds of hours. It is not without its faults or frustrations, but the it really is a tool I would not want to be without.
That’s because you are not a proper developer, as proven by your comment. And you create tech legacy that will have a net cost in terms of maintenance or downtime.
I am for sure not a coder as it has never been my strong suite, but I am without a doubt an awesome developer or I would not have a top rated multiplayer VR app that is pushing the boundaries of what mobile VR can do.
The only person who will have to look at my code is me so any and all issues be it my code or AI code will be my burden and AI has really made that burden much less. In fact, I recently installed Coplay in my Unity Engine Editor and OMG it is amazing at assisting not just with code, but even finding little issues with scene setup, shaders, animations and more. I am really blown away with it. It has allowed me to spend even less time on the code and more time imagineering amazing experiences which is what fans of the app care about the most. They couldn’t care less if I wrote the code or AI did as long as it works and does not break immersion. Is that not what it is all about at the end of the day?
As long as AI helps you achieve your goals and your goals are grounded, including maintainability, I see no issues. Yeah, misdirected use of AI can lead to hard to maintain code down the line, but that is why you need a human developer in the loop to ensure the overall architecture and design make sense. Any code base can become hard to maintain if not thought through be is human or AI written.
Look, bless your heart if you have a successful app, but success / sales is not exclusive to products of quality. Just look around at all the slop that people buy nowadays.
As long as AI helps you achieve your goals and your goals are grounded, including maintainability, I see no issues.
Two issues with that
- what you are using has nothing whatsoever to do with AI, it’s a glorified pattern repeater - an actual parrot has more intelligence
- if the destruction of entire ecosystems for slop is not an issue that you see, you should not be allowed anywhere near technology (as by now probably billions of people)
I do not understand your point you are making about my particular situation as I am not making slop. Plus one persons slop is another’s treasure. What exactly are you suggesting as the 2 issues you outlined see like they are being directed to someone else perhaps?
- I am calling it AI as that is what it is called, but you are correct, it is a pattern predictor
- I am not creating slop but something deeply immersive and enjoyed by people. In terms of the energy used, I am on solar and run local LLMs.
I didn’t say your particular application that I know nothing about is slop, I said success does not mean quality. And if you use statistical pattern generation to save time, chances are high that your software is not of good quality.
Even solar energy is not harvested waste-free (chemical energy and production of cells). Nevertheless, even if it were, you are still contributing to the spread of slop and harming other people. Both through spreading acceptance of a technology used to harm billions of people for the benefit of a few, and through energy and resource waste.
I am sure my code could be better. I am also sure the SDKs I use could be better and the gam engine could’ve better. For what I need, they all work good enough to get the job done. I am sure issues will come up as a result as it has many times in the past already, even before LLMs helped, but that is par for the course for a developer to tackle.
Fractional CTO: Some small companies benefit from the senior experience of these kinds of executives but don’t have the money or the need to hire one full time. A fraction of the time they are C suite for various companies.
Sooo… he works multiple part-time jobs?
Weird how a forced technique of the ultra-poor is showing up here.
It’s more like the MSP IT style of business. There are clients that consult you for your experience or that you spend a contracted amount of time with and then you bill them for your time as a service. You aren’t an employee of theirs.
Or he’s some deputy assistant vice president or something.
Deputy assistant to the vice president
Just ask the ai to make the change?
The developers can’t debug code they didn’t write.
This is a bit of a stretch.
agreed. 50% of my job is debugging code I didn’t write.
Vibe coders can’t debug code because they didn’t write
Vibe coders can’t debug code because they can’t write code
Yes, this is what I intended to write but I submitted it hastily.
Its like a catch-22, they can’t write code so they vibecode, but to maintain vibed code you would necessarily need to write code to understand what’s actually happening
I don’t get this argument. Isn’t the whole point that the ai will debug and implement small changes too?
Think an interior designer having to reengineer the columns and load bearing walls of a masonry construction.
What are the proportions of cement and gravel for the mortar? What type of bricks to use? Do they comply with the PSI requirements? What caliber should the rebars be? What considerations for the pouring of concrete? Where to put the columns? What thickness? Will the building fall?
“I don’t know that shit, I only design the color and texture of the walls!”
And that, my friends, is why vibe coding fails.
And it’s even worse: Because there are things you can more or less guess and research. The really bad part is the things you should know about but don’t even know they are a thing!
Unknown unknowns: Thread synchronization, ACID transactions, resiliency patterns. That’s the REALLY SCARY part. Write code? Okay, sure, let’s give the AI a chance. Write stable, resilient code with fault tolerance, and EASY TO MAINTAIN? Nope. You’re fucked. Now the engineers are gone and the newbies are in charge of fixing bad code built by an alien intelligence that didn’t do its own homework and it’s easier to rewrite everything from scratch.
If you need to refractor your program you might aswell start from the beginning
I mean I was trying to solve a problem t’other day (hobbyist) - it told me to create a
function foo(bar): await object.foo(bar)
then in object
function foo(bar): _foo(bar)
function _foo(bar): original_object.foo(bar)
like literally passing a variable between three wrapper functions in two objects that did nothing except pass the variable back to the original function in an infinite loop
add some layers and complexity and it’d be very easy to get lost
The few times I’ve used LLMs for coding help, usually because I’m curious if they’ve gotten better, they let me down. Last time it was insistent that its solution would work as expected. When I gave it an example that wouldn’t work, it even broke down each step of the function giving me the value of its variables at each step to demonstrate that it worked… but at the step where it had fucked up, it swapped the value in the variable to one that would make the final answer correct. It made me wonder how much water and energy it cost me to be gaslit into a bad solution.
How do people vibe code with this shit?
As a learning process it’s absolutely fine.
You make a mess, you suffer, you debug, you learn.
But you don’t call yourself a developer (at least I hope) on your CV.
Some can’t because they never acquired to skill to read code. But most did and can.
If you’ve never had to debug code. Are you really a developer?
There is zero chance you have never written a big so… Who is fixing them?
Unless you just leave them because you work for Infosys or worse but then I ask again - are you really a developer?
I think it highly depends on the skill and experience of the dev. A lot of the people flocking into the vibe coding hype are not necessarily always people who know how about coding practices (including code review etc …) nor are experienced in directing AI agent to achieve such goals. The result is MIT prediction. Although, this will start to change soon.
Personally I tried using LLMs for reading error logs and summarizing what’s going on. I can say that even with somewhat complex errors, they were almost always right and very helpful. So basically the general consensus of using them as assistants within a narrow scope.
Though it should also be noted that I only did this at work. While it seems to work well, I think I’d still limit such use in personal projects, since I want to keep learning more, and private projects are generally much more enjoyable to work on.
Another interesting use case I can highlight is using a chatbot as documentation when the actual documentation is horrible. However, this only works within the same ecosystem, so for instance Copilot with MS software. Microsoft definitely trained Copilot on its own stuff and it’s often considerably more helpful than the docs.
Computers are too powerful and too cheap. Bring back COBOL, painfully expensive CPU time, and some sort of basic knowledge of what’s actually going on.
Pain for everyone!
Yeah I think around the Pentium 200mhz point was the sweet spot. Powerful enough to do a lot of things, but not so powerful that software can be as inefficient and wasteful as it is today.
Be careful what you wish for, with RAM prices soaring owning a home computer might become less of an option. Luckily we can get a subscription for computing power easily!
I built a new PC early October, literally 2 weeks later RAM prices went nuts… so glad I pulled the trigger when I did
Just sell it to AI customers for AI cash.
Vibe profits.
You just won capitalism. You and musk can go to Mars now. Well send a postcard
Same thing would happen if they were a non-coder project manager or designer for a team of actual human programmers.
Stuff done, shipped and working.
“But I can’t understand the code 😭”, yes. You were the project manager why should you?
I think the point is that someone should understand the code. In this case, no one does.
So…like dealing with Oracle.
I think the point is that someone should understand the code. In this case, no one does.
Big corporations have been pushing for outsourcing software development for decades, how is this any different? Can you always recall your outsourced development team for another round of maintenance? A LLM may actually be more reliable and accessible in the future.
If you outsource you could at least sue them when things go wrong. Good luck doing that with AI.
Plus you can own the code if a person does it.
If you outsource you could at least sure them when things go wrong.
Most outsourcing consultants I have worked with aren’t worth the legal fees to attempt to sue.
Plus you can own the code if a person does it.
I’m not aware of any ownership issues with code I have developed using Claude, or any other agents. It’s still mine, all the more so because I paid Claude to write it for me, at my direction.
AI doesn’t get IP protections.
Nobody is asking it to (except freaks trying to get news coverage.)
It’s like compiler output - no, I didn’t write that assembly code, gcc did, but it did it based on my instructions. My instructions are copyright by me, the gcc interpretation of them is a derivative work covered by my rights in the source code.
When a painter paints a canvas, they don’t record the “source code” but the final work is also still theirs, not the brush maker or the canvas maker or paint maker (though some pigments get a little squirrely about that…)
My instructions are copyright by me
First, how much that is true is debatable. Second, that doesn’t matter as far as the output. No one can legally own that.
It looks like a rigid design philosophy that must completely rebuild for any change. If the speed of production becomes fast enough, and the cost low enough, iterating the entire program for every change would become feasible and cost effective.
I frequently feel that urge to rebuild from ground (specifications) up, to remove the “old bad code” from the context window and get back to the “pure” specification as the source of truth. That only works up to a certain level of complexity. When it works it can be a very fast way to “fix” a batch of issues, but when the problem/solution is big enough the new implementation will have new issues that may take longer to identify as compared with just grinding through the existing issues. Devil whose face you know kind of choice.
AI is hot garbage and anyone using it is a skillless hack. This will never not be true.
Wait so I should just be manually folding all these proteins?
Do you not know the difference between an automated process and machine learning?
Yes? Machine learning has been huge for protein folding and not because anyone is stupid, it’s because it’s a task uniquely suited for machine learning, of which there are many. But none of that is what this AI bubble is really about, and even though I find the underlining math and technology fascinating, I share the disdain for how the bulk of it is currently being used.
The thing with being cocky is, if you are wrong it makes you look like an even bigger asshole
https://en.wikipedia.org/wiki/AlphaFold
The program uses a form of attention network, a deep learning technique that focuses on having the AI identify parts of a larger problem, then piece it together to obtain the overall solution.
Cool, now do an environmental impact on it.
While this is a popular sentiment, it is not true, nor will it ever be true.
AI (LLMs & agents in the coding context, in this case) can serve as both a tool and a crutch. Those who learn to master the tools will gain benefit from them, without detracting from their own skill. Those who use them as a crutch will lose (or never gain) their own skills.
Some skills will in turn become irrelevent in day-to-day life (as is always the case with new tech), and we will adapt in turn.
LLMs exist so that skill-less hacks can pretend to be skilled artists. It’s a shortcut to success.
That this is and will be abused is not in question. :-P
You are making a leap though.
















