Screenshot of this question was making the rounds last week. But this article covers testing against all the well-known models out there.

Also includes outtakes on the ‘reasoning’ models.

  • kescusay@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    4 months ago

    It goes beyond the problems introduced by the model router, though. I have to work with GPT 5.2 for my job (along with Claude, Gemini, and a few others), and we have enterprise API access to it. So when I select GPT 5.2 as the model to use, it’s spending tokens to actually use it.

    And it’s pretty bad. It’s noticeably worse than the 4.x series. I find myself having to fix its mistakes far more often.

    I’ve struggled to reason out an explanation, and model collapse really seems like a contender, especially if you follow information theory and why training these things is so hard.

    As it happens, there’s a new talk about exactly this from George D. Montañez. You might find it interesting: https://youtu.be/ShusuVq32hc