Screenshot of this question was making the rounds last week. But this article covers testing against all the well-known models out there.
Also includes outtakes on the ‘reasoning’ models.
Screenshot of this question was making the rounds last week. But this article covers testing against all the well-known models out there.
Also includes outtakes on the ‘reasoning’ models.
It goes beyond the problems introduced by the model router, though. I have to work with GPT 5.2 for my job (along with Claude, Gemini, and a few others), and we have enterprise API access to it. So when I select GPT 5.2 as the model to use, it’s spending tokens to actually use it.
And it’s pretty bad. It’s noticeably worse than the 4.x series. I find myself having to fix its mistakes far more often.
I’ve struggled to reason out an explanation, and model collapse really seems like a contender, especially if you follow information theory and why training these things is so hard.
As it happens, there’s a new talk about exactly this from George D. Montañez. You might find it interesting: https://youtu.be/ShusuVq32hc