The Web was much better and more useful back before it had a business model. Good riddance.
So you’re saying the ad driven internet will die? And we will be left with what? Wikipedia and Lemmy? I for one welcome our AI overlords!
Nah, it’s saying that ad and AI-driven internet will prevail. People only use Google to find an answer and don’t dig deeper, and if they do, it’s often because the links are sponsored. People using GPT’s are even less likely to click a link. Currently no ads, but just wait.
Apologies if you were joking.
“what should I do if I’m going through severe emotional distress? How to choose a good psychiatrist?”
ChatGPT: "I’m sorry to hear that you’ve been going to a stressful situation, it’s always worth talking about your feelings. I’ve come up with a plan to help you:
1 Purchase an ice cold Pepsi Black™ from a Pepsi official supplier"
Drink 2 Mountain Dews to unlock more searches.
Or one of these https://lemmy.world/post/29579726
Haha reminds me of Black Mirror
This is part of the larger problem that AI tools are trained on (and profit off of) content that is produced and hosted by others who are now seeing their traffic change from humans to bots. For content sources that pay for hosting with ads, this means a loss in revenue to pay for hosting. For content sources like Wikipedia, they are seeing their hosting costs increase significantly due to the increase in bot traffic. Even if you want every website that depends on ad revenue to fail (which I don’t entirety agree with), AI is still damaging the open web in other ways. Websites like Wikipedia for example may soon be forced to lock content behind logins or leverage aggressive captchas just to fight the bot traffic, which makes things worse for those of us that still prefer to use actual websites over AI summaries.
Nobody is scraping wikipedia over and over to create datasets for AIs, there are already open datasets and API deals. But wiki in particular has always had a data dump of the entire db bimonthly.
You clearly haven’t run a website recently. Until I set up anubis last week I was getting constant requests from dozens of various bot scrapers 24/7. That included the big ones.
Kay, and that has nothing to do with what i said. Scrapers, bots =/= AI. It’s not even the same companies that make the unfree datasets. The scrapers and bots that hit your website are not some random “AI” feeding on data lol. This is what some models are trained on, it’s already free so it’s doesn’t need to be individually rescraped and it’s mostly garbage quality data: https://commoncrawl.org/ Nobody wastes resources rescraping all this SEO infested dump.
Your issue has everything to do with SEO than anything else. Btw before you diss common crawl, it’s used in research quite a lot so it’s not some evil thing that threatens people’s websites. Add robots.txt maybe.
Oh ok I’ll just ignore the constant requests from GPTBot, ByteSpider, and the hundreds of others who very plainly, sometimes in their useragent, tell you that they’re grabbing content for training data. Robots.txt is nice and all but manually adding every single up and coming AI company is impossible. Like I said Anubis is the first time I’ve gotten them all to even remotely calm down.
Bots only identify themselves and their organization in the user agent, they don’t tell you specifically what they do with the data so stop your fairytales. They do give you a really handy url though with user agents and even IPs jn json if you want to fully block the crawlers but not the search bots sent by user prompts.
Your ad revenue money can be secured.
https://platform.openai.com/docs/bots/
If for some reason you can’t be bothered to edit your own robots.txt (because it’s hard to tell which bots are search bots for muh ad money) then maybe hire someone.
Lmao you linked to the same page I did where this text appears:
GPTBot is used to make our generative AI foundation models more useful and safe. It is used to crawl content that may be used in training our generative AI foundation models.
Also you’re so capitalism brained you assume anyone running a website must be doing so for profit. My hobby projects (personal homepage and personal git forge) were getting slammed by bots while I just paid the bills. I could have locked them both behind an auth portal but then I might as well just take them off the internet and run everything on my LAN.
But with the rise of AI, the dynamic is changing: We are observing a significant increase in request volume, with most of this traffic being driven by scraping bots collecting training data for large language models (LLMs) and other use cases. Automated requests for our content have grown exponentially, alongside the broader technology economy, via mechanisms including scraping, APIs, and bulk downloads. This expansion happened largely without sufficient attribution, which is key to drive new users to participate in the movement, and is causing a significant load on the underlying infrastructure that keeps our sites available for everyone.
- https://diff.wikimedia.org/2025/04/01/how-crawlers-impact-the-operations-of-the-wikimedia-projects/
via mechanisms including scraping, APIs, and bulk downloads.
Omg exactly! Thanks. Yet nothing about having to use logins to stop bots because that kinda isn’t a thing when you already provide data dumps and an API to wikimedia commons.
While undergoing a migration of our systems, we noticed that only a fraction of the expensive traffic hitting our core datacenters was behaving how web browsers would usually do, interpreting javascript code. When we took a closer look, we found out that at least 65% of this resource-consuming traffic we get for the website is coming from bots, a disproportionate amount given the overall pageviews from bots are about 35% of the total.
Source for traffic being scraping data for training models: they’re blocking javascript therefore bots therefore crawlers, just trust me bro.
This is all extrapolated from google’s self published survey of how their users interact with their search results. Approximately 60% of users don’t click anything after a search. Personally I think that is because users have found their results to be seo garbage and not worth clicking on… but that’s just my opinion.
I’ve watched a lot of students do a search after I tell them to research something, look through a few of the summaries, then look at me in defeat. I have to tell them to actually click some links to try and find an answer
I went to college for networking but the most productive class I’ve ever had where I learned the most about the internet was instead back in high school. This teacher would make 20 page packets with the most obscure questions like what’s the weight of model number 62xRG4 (some obscure car part or something) and he told us to google it. We would spend entire classes just searching for information we would never use, but it drilled into me how to go about finding the information I need. It’s been utterly invaluable. Thank you Mr Ward.
I love this, so much. Blue Links have been the most critical pass to my future, across my entire life.
Purple links often, too. I can’t imagine surrendering the ability to sift through information with my own eyes and hands and brain.
I mourn for humanity.
Of course they don’t click anything. Google search has just become a front-end for Gemini, the answer is “served” up right at the top and most people will just take that for Gospel.
Even without Gemini, many of my searches are covered by the few word snippets from the top few results. Most of my searches are quick queries with quick answers, usually not me embarking on some huge research effort.
The web doesn’t have a business model, cloudflair, you do. And nobody cares because you suck.
Eh, Cloudflare provides a pretty good service for a very reasonable price.
But yeah, the web doesn’t have a business model in the same way a town square doesn’t, yet you can make a business work in both areas. Make a compelling product and people will pay you for it.
You mean product that literally makes web unusable for many and tracks your every single step with extremely invasive fingerprinting techniques? That product?
That’s a big reason why I don’t use their security layer, mostly just their domain registrar. They have a ton of products that don’t involve tracking your users.
Cloudflare provides a pretty good service for a very reasonable price.
You mean selling fingerprinted user data to advertisers?
Ever had one of your servers DDoSed before? Clearly not
It needs to get even nastier so that it affects all the big players in a huge way so they get to do something about it. While it only affects the indie web we are all just gonna keep suffering.
Cloudflare already ruined the web way before AI was even a thing.
In what way?
Yeah I think we’re going to be grappling with this issue for at least the next decade. The traditional web model falls apart under AI
To be fair, the traditional web models were falling apart prior to AI as well. We’ve gone so far past “ad driven” that Everything has to be full of ads and clickbait to drive revenue just to run the infrastructure, let alone pay for the pages creation and upkeep. Journalists and developers, services and goods are all using adword soup to try to get anything close to a useful revenue stream and it’ll just keep getting worse until we figure out a better business model. We’re going to increasingly see paywalls to try to make up for that, but a large part of people on the internet won’t want to spend money on quality sources when they use to be able to get it for free. It’s been a race to the bottom for a while and it’s at a point that isn’t sustainable long term. AI just accelerates that to the next level.
What’s challenging about paywalls and not wanting to spend money is not necessarily not wanting to spend, but convenience and cost. If it costs me 10 cents for each blog or tutorial or github page I look at while working on a project, or 1 cent for every funny video, that adds up. And do I have to put my credit card in for every site? Hope that every site has good enough security to prevent payment information leaks?
And I don’t think anyone is interested in a Netflix-style internet that fractures into 6 different subscriptions to get every site you need on the web.
Some sort of universal microtransaction layer is the dream. I believe there’s also a proposed web standard for it.
Scroll was also making it work before they got bought by Twitter
That’s exactly Elon Musk’s goal with making Twitter a payment platform.
Scroll was just for reading websites though. Musk seems to want We chat style super app
Hah. No. That goes all the way back to the 90ies. Tim Berners-Lee proposed that standard.
Did I say Elon came up with the idea? I said that’s his goal.
Also not saying it like it’s a good thing, just stating a fact.
The traditional web was long gone anyways. There are like a dozent sites you find for any Google query. It’s so hard to find small hidden treasure on the internet.
maybe their business model. trust me. they’ll find a way to monetize the zero click internet too. then it’s back to square one
I believe this is why tech execs and investors are so hot on pushing AI into everything. They’ll control everyone’s digital experience and you can 100% count on being force fed ads and paid propaganda. Embrace, extend, extinguish
Yeah, so much for all those promises of disintermediation being a benefit of the web.
Everyone is too busy doomscrolling TikTok to notice.
i like to publish content so that bots can scrape it and serve it to people without attribution i think it’s good i think ill publish some more interesting stuff right away
I didn’t come here for heartwarming stories; yet here I am.
I’m not buying whatever a billionaire nepo baby CEO monopoly owner is pedaling. Let’s hear what some labor leaders have to say about it for a change.
peddling
Nice. For a second I thought I got one of those Nicole DMs lol.
But it’s felony contempt of busoness model!!
i’d like to be a labor leader, but i’m not (yet). Yet here’s my opinion:
Knowledge was meant to be free since the beginning. I look at ideas as human-cultivated, carefully cultured viruses. They’re packages of information that live within a host.
They’re a lot less aggressive than their feral counterparts, but they’re still individual beings who want to spread. Holding back knowledge is unnatural, and the internet should be free.
Trying to comment in this thread and it tells me “Toastify is awesome”? wth?
edit: nevermind? whatever borked seems to have fixed itself? I don’t know.
lmao, I’m laughing so hard at this! It’s probably displaying the wrong text for an error.
@dessalines@lemmy.ml do you have more information on this?