• xylogx@lemmy.world
    link
    fedilink
    English
    arrow-up
    134
    arrow-down
    4
    ·
    1 month ago

    So you’re saying the ad driven internet will die? And we will be left with what? Wikipedia and Lemmy? I for one welcome our AI overlords!

    • venusaur@lemmy.world
      link
      fedilink
      English
      arrow-up
      42
      arrow-down
      2
      ·
      edit-2
      1 month ago

      Nah, it’s saying that ad and AI-driven internet will prevail. People only use Google to find an answer and don’t dig deeper, and if they do, it’s often because the links are sponsored. People using GPT’s are even less likely to click a link. Currently no ads, but just wait.

      Apologies if you were joking.

    • jonathan7luke@lemmy.ml
      link
      fedilink
      English
      arrow-up
      21
      ·
      edit-2
      1 month ago

      This is part of the larger problem that AI tools are trained on (and profit off of) content that is produced and hosted by others who are now seeing their traffic change from humans to bots. For content sources that pay for hosting with ads, this means a loss in revenue to pay for hosting. For content sources like Wikipedia, they are seeing their hosting costs increase significantly due to the increase in bot traffic. Even if you want every website that depends on ad revenue to fail (which I don’t entirety agree with), AI is still damaging the open web in other ways. Websites like Wikipedia for example may soon be forced to lock content behind logins or leverage aggressive captchas just to fight the bot traffic, which makes things worse for those of us that still prefer to use actual websites over AI summaries.

      • pinkapple@lemmy.ml
        link
        fedilink
        English
        arrow-up
        5
        arrow-down
        3
        ·
        1 month ago

        Nobody is scraping wikipedia over and over to create datasets for AIs, there are already open datasets and API deals. But wiki in particular has always had a data dump of the entire db bimonthly.

        https://dumps.wikimedia.org/

        • TheOneCurly@lemm.ee
          link
          fedilink
          English
          arrow-up
          10
          arrow-down
          1
          ·
          1 month ago

          You clearly haven’t run a website recently. Until I set up anubis last week I was getting constant requests from dozens of various bot scrapers 24/7. That included the big ones.

          • pinkapple@lemmy.ml
            link
            fedilink
            English
            arrow-up
            4
            arrow-down
            8
            ·
            1 month ago

            Kay, and that has nothing to do with what i said. Scrapers, bots =/= AI. It’s not even the same companies that make the unfree datasets. The scrapers and bots that hit your website are not some random “AI” feeding on data lol. This is what some models are trained on, it’s already free so it’s doesn’t need to be individually rescraped and it’s mostly garbage quality data: https://commoncrawl.org/ Nobody wastes resources rescraping all this SEO infested dump.

            Your issue has everything to do with SEO than anything else. Btw before you diss common crawl, it’s used in research quite a lot so it’s not some evil thing that threatens people’s websites. Add robots.txt maybe.

            • TheOneCurly@lemm.ee
              link
              fedilink
              English
              arrow-up
              10
              arrow-down
              1
              ·
              1 month ago

              Oh ok I’ll just ignore the constant requests from GPTBot, ByteSpider, and the hundreds of others who very plainly, sometimes in their useragent, tell you that they’re grabbing content for training data. Robots.txt is nice and all but manually adding every single up and coming AI company is impossible. Like I said Anubis is the first time I’ve gotten them all to even remotely calm down.

              • pinkapple@lemmy.ml
                link
                fedilink
                English
                arrow-up
                1
                ·
                1 month ago

                Bots only identify themselves and their organization in the user agent, they don’t tell you specifically what they do with the data so stop your fairytales. They do give you a really handy url though with user agents and even IPs jn json if you want to fully block the crawlers but not the search bots sent by user prompts.

                Your ad revenue money can be secured.

                https://platform.openai.com/docs/bots/

                If for some reason you can’t be bothered to edit your own robots.txt (because it’s hard to tell which bots are search bots for muh ad money) then maybe hire someone.

                • TheOneCurly@lemm.ee
                  link
                  fedilink
                  English
                  arrow-up
                  1
                  ·
                  1 month ago

                  Lmao you linked to the same page I did where this text appears:

                  GPTBot is used to make our generative AI foundation models more useful and safe. It is used to crawl content that may be used in training our generative AI foundation models.

                  Also you’re so capitalism brained you assume anyone running a website must be doing so for profit. My hobby projects (personal homepage and personal git forge) were getting slammed by bots while I just paid the bills. I could have locked them both behind an auth portal but then I might as well just take them off the internet and run everything on my LAN.

        • jonathan7luke@lemmy.ml
          link
          fedilink
          English
          arrow-up
          1
          arrow-down
          1
          ·
          1 month ago

          But with the rise of AI, the dynamic is changing: We are observing a significant increase in request volume, with most of this traffic being driven by scraping bots collecting training data for large language models (LLMs) and other use cases. Automated requests for our content have grown exponentially, alongside the broader technology economy, via mechanisms including scraping, APIs, and bulk downloads. This expansion happened largely without sufficient attribution, which is key to drive new users to participate in the movement, and is causing a significant load on the underlying infrastructure that keeps our sites available for everyone.

          - https://diff.wikimedia.org/2025/04/01/how-crawlers-impact-the-operations-of-the-wikimedia-projects/

          • pinkapple@lemmy.ml
            link
            fedilink
            English
            arrow-up
            1
            arrow-down
            1
            ·
            1 month ago

            via mechanisms including scraping, APIs, and bulk downloads.

            Omg exactly! Thanks. Yet nothing about having to use logins to stop bots because that kinda isn’t a thing when you already provide data dumps and an API to wikimedia commons.

            While undergoing a migration of our systems, we noticed that only a fraction of the expensive traffic hitting our core datacenters was behaving how web browsers would usually do, interpreting javascript code. When we took a closer look, we found out that at least 65% of this resource-consuming traffic we get for the website is coming from bots, a disproportionate amount given the overall pageviews from bots are about 35% of the total.

            Source for traffic being scraping data for training models: they’re blocking javascript therefore bots therefore crawlers, just trust me bro.

  • wetbeardhairs@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    51
    ·
    1 month ago

    This is all extrapolated from google’s self published survey of how their users interact with their search results. Approximately 60% of users don’t click anything after a search. Personally I think that is because users have found their results to be seo garbage and not worth clicking on… but that’s just my opinion.

    • CubeOfCheese@sopuli.xyz
      link
      fedilink
      English
      arrow-up
      32
      ·
      1 month ago

      I’ve watched a lot of students do a search after I tell them to research something, look through a few of the summaries, then look at me in defeat. I have to tell them to actually click some links to try and find an answer

      • Glitterbomb@lemmy.world
        link
        fedilink
        English
        arrow-up
        34
        ·
        1 month ago

        I went to college for networking but the most productive class I’ve ever had where I learned the most about the internet was instead back in high school. This teacher would make 20 page packets with the most obscure questions like what’s the weight of model number 62xRG4 (some obscure car part or something) and he told us to google it. We would spend entire classes just searching for information we would never use, but it drilled into me how to go about finding the information I need. It’s been utterly invaluable. Thank you Mr Ward.

        • cardfire@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          4
          ·
          1 month ago

          I love this, so much. Blue Links have been the most critical pass to my future, across my entire life.

          Purple links often, too. I can’t imagine surrendering the ability to sift through information with my own eyes and hands and brain.

    • Jack_Burton@lemmy.world
      link
      fedilink
      English
      arrow-up
      14
      arrow-down
      1
      ·
      1 month ago

      Of course they don’t click anything. Google search has just become a front-end for Gemini, the answer is “served” up right at the top and most people will just take that for Gospel.

      • jj4211@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 month ago

        Even without Gemini, many of my searches are covered by the few word snippets from the top few results. Most of my searches are quick queries with quick answers, usually not me embarking on some huge research effort.

  • db2@lemmy.world
    link
    fedilink
    English
    arrow-up
    56
    arrow-down
    15
    ·
    1 month ago

    The web doesn’t have a business model, cloudflair, you do. And nobody cares because you suck.

    • sugar_in_your_tea@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      67
      arrow-down
      4
      ·
      1 month ago

      Eh, Cloudflare provides a pretty good service for a very reasonable price.

      But yeah, the web doesn’t have a business model in the same way a town square doesn’t, yet you can make a business work in both areas. Make a compelling product and people will pay you for it.

      • Dr. Moose@lemmy.world
        link
        fedilink
        English
        arrow-up
        6
        arrow-down
        3
        ·
        1 month ago

        You mean product that literally makes web unusable for many and tracks your every single step with extremely invasive fingerprinting techniques? That product?

        • sugar_in_your_tea@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          2
          ·
          1 month ago

          That’s a big reason why I don’t use their security layer, mostly just their domain registrar. They have a ton of products that don’t involve tracking your users.

      • ThirdConsul@lemmy.ml
        link
        fedilink
        English
        arrow-up
        10
        arrow-down
        7
        ·
        1 month ago

        Cloudflare provides a pretty good service for a very reasonable price.

        You mean selling fingerprinted user data to advertisers?

  • devfuuu@lemmy.world
    link
    fedilink
    English
    arrow-up
    19
    ·
    1 month ago

    It needs to get even nastier so that it affects all the big players in a huge way so they get to do something about it. While it only affects the indie web we are all just gonna keep suffering.

  • morrowind@lemmy.ml
    link
    fedilink
    English
    arrow-up
    14
    arrow-down
    1
    ·
    1 month ago

    Yeah I think we’re going to be grappling with this issue for at least the next decade. The traditional web model falls apart under AI

    • thejml@lemm.ee
      link
      fedilink
      English
      arrow-up
      30
      arrow-down
      1
      ·
      1 month ago

      To be fair, the traditional web models were falling apart prior to AI as well. We’ve gone so far past “ad driven” that Everything has to be full of ads and clickbait to drive revenue just to run the infrastructure, let alone pay for the pages creation and upkeep. Journalists and developers, services and goods are all using adword soup to try to get anything close to a useful revenue stream and it’ll just keep getting worse until we figure out a better business model. We’re going to increasingly see paywalls to try to make up for that, but a large part of people on the internet won’t want to spend money on quality sources when they use to be able to get it for free. It’s been a race to the bottom for a while and it’s at a point that isn’t sustainable long term. AI just accelerates that to the next level.

      • feannag@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        2
        ·
        1 month ago

        What’s challenging about paywalls and not wanting to spend money is not necessarily not wanting to spend, but convenience and cost. If it costs me 10 cents for each blog or tutorial or github page I look at while working on a project, or 1 cent for every funny video, that adds up. And do I have to put my credit card in for every site? Hope that every site has good enough security to prevent payment information leaks?

        And I don’t think anyone is interested in a Netflix-style internet that fractures into 6 different subscriptions to get every site you need on the web.

        • morrowind@lemmy.ml
          link
          fedilink
          English
          arrow-up
          1
          ·
          1 month ago

          Some sort of universal microtransaction layer is the dream. I believe there’s also a proposed web standard for it.

          Scroll was also making it work before they got bought by Twitter

    • doodledup@lemmy.world
      link
      fedilink
      English
      arrow-up
      6
      ·
      1 month ago

      The traditional web was long gone anyways. There are like a dozent sites you find for any Google query. It’s so hard to find small hidden treasure on the internet.

  • AllHailTheSheep@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    10
    ·
    1 month ago

    maybe their business model. trust me. they’ll find a way to monetize the zero click internet too. then it’s back to square one

    • e461h@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      7
      ·
      1 month ago

      I believe this is why tech execs and investors are so hot on pushing AI into everything. They’ll control everyone’s digital experience and you can 100% count on being force fed ads and paid propaganda. Embrace, extend, extinguish

      • futatorius@lemm.ee
        link
        fedilink
        English
        arrow-up
        2
        ·
        1 month ago

        Yeah, so much for all those promises of disintermediation being a benefit of the web.

  • nutsack@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    6
    ·
    edit-2
    1 month ago

    i like to publish content so that bots can scrape it and serve it to people without attribution i think it’s good i think ill publish some more interesting stuff right away

  • whotookkarl@lemmy.world
    link
    fedilink
    English
    arrow-up
    6
    arrow-down
    4
    ·
    1 month ago

    I’m not buying whatever a billionaire nepo baby CEO monopoly owner is pedaling. Let’s hear what some labor leaders have to say about it for a change.

    • gandalf_der_12te@discuss.tchncs.de
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 month ago

      i’d like to be a labor leader, but i’m not (yet). Yet here’s my opinion:

      Knowledge was meant to be free since the beginning. I look at ideas as human-cultivated, carefully cultured viruses. They’re packages of information that live within a host.

      They’re a lot less aggressive than their feral counterparts, but they’re still individual beings who want to spread. Holding back knowledge is unnatural, and the internet should be free.

  • glitchdx@lemmy.world
    link
    fedilink
    English
    arrow-up
    3
    arrow-down
    1
    ·
    edit-2
    1 month ago

    Trying to comment in this thread and it tells me “Toastify is awesome”? wth?

    edit: nevermind? whatever borked seems to have fixed itself? I don’t know.