FOSS infrastructure is under attack by AI companies

simple@lemm.ee · 3 months ago

FOSS infrastructure is under attack by AI companies

db0@lemmy.dbzer0.com · 3 months ago

Yep, it hit many lemmy servers as well, including mine. I had to block multiple alibaba subnet to get things back to normal. But I’m expecting the next spam wave.

fjordo@feddit.uk · 3 months ago

I wish these companies would realise that acting like this is a very fast way to get scraping outlawed altogether, which is a shame because it can be genuinely useful (archival, automation, etc).

jol@discuss.tchncs.de · 3 months ago

How can you outlaw something a company in another conhtinent is doing? And specially when they are becoming better as disguising themselves as normal traffic? What will happen is that politicians will see this as another reason to push for everyone having their ID associated with their Internet traffic.

klu9@lemmy.ca · 3 months ago

The Linux Mint forums have been knocked offline multiple times over the last few months, to the point where the admins had to block all Chinese and Brazilian IPs for a while.

deeferg@lemmy.world · 3 months ago

This is the first I’ve heard about Brazil in this type of cyber attack. Is it re-routed traffic going there or are there a large number of Brazilian bot farms now?

klu9@lemmy.ca · 3 months ago

I don’t know why/how, just know that the admins saw the servers were being overwhelmed by traffic from Brazilian IPs and blocked it for a while.

melpomenesclevage@lemmy.dbzer0.com · edit-2 3 months ago

i hear there’s a tool called (I think) ‘nepenthe’ that creates a loop for an LLM, if you use that in combination with a fairly tight blacklist of IP’s you’re certain are LLM crawlers, I bet you could do a lot of damage, and maybe make them slow their shit down, or do this in a more reasonable way.

PrivacyDingus@lemmy.world · 3 months ago

nepenthe

It’s a Markov-chain-based text generator which could be difficult for people to implement on repos depending upon how they’re hosting them. Regardless, any sensibly-built crawler will have rate limits. This means that although Nepenthe is an interesting thought exercise, it’s only going to do anything to things knocked together by people who haven’t thought about it, not the Big Big companies with the real resources who are likely having the biggest impact.

melpomenesclevage@lemmy.dbzer0.com · 3 months ago

might hit a few times, or maybe there’s a version that can puff stuff up the data in the sense of space, and salt it in the sense of utility.

PrivacyDingus@lemmy.world · 3 months ago

any way of slowing things down or wasting resources is a gain I guess

grue@lemmy.world · 3 months ago

ELI5 why the AI companies can’t just clone the git repos and do all the slicing and dicing (running git blame etc.) locally instead of running expensive queries on the projects’ servers?

zovits@lemmy.world · 3 months ago

Takes more effort and results in a static snapshot without being able to track the evolution of the project. (disclaimer: I don’t work with ai, but I’d bet this is the reason and also I don’t intend to defend those scraping twatwaffles in any way, but to offer a possible explanation)

Sturgist@lemmy.ca · 3 months ago

Also having your victim host the costs is an added benefit

Retropunk64@lemmy.world · edit-2 3 months ago

deleted by creator

Fijxu@programming.dev · 3 months ago

AI scrapping is so cancerous. I host a public RedLib instance (redlib.nadeko.net) and due to BingBot and Amazon bots, my instance was always rate limited because the amount of requests they do is insane. What makes me more angry, is that this fucking fuck fuckers use free, privacy respecting services to be able to access Reddit and scrape . THEY CAN’T BE SO GREEDY. Hopefully, blocking their user-agent works fine ;)

enrich@programming.dev · 2 months ago

I posted on your guestbook but the link was broken.

I’d say be wary of Anubis author.

I noticed you started using Anubis recently. Take a look here https://github.com/Xe/x/issues/701 also PRs 702, 703, 704

-She made GNOME say something the project doesn’t agree
-She tried to push her beliefs where it was unnecessary and disrespectful
-She still refuses to remove things in her code that is disrespectful, some are mere comments and serves no real purpose
-She refused to accept PRs, discuss changes, refuse dictionary definitions seemingly because of her ego
-After all that she locked conversations in the issue/PRs as a result nobody else can show support now

If she has a belief, there are other mediums/ways to express it, why like this?

This is unwelcoming and definitely not FOSS spirit.

MonkderVierte@lemmy.ml · edit-2 3 months ago

Assuming we could build a new internet from the ground up, what would be the solution? IPFS for load-balancing?

AbsoluteChicagoDog@lemm.ee · 3 months ago

deleted by creator

dindonmasker@sh.itjust.works · 3 months ago

Maybe letters through the mail to receive posts.

WhyJiffie@sh.itjust.works · 3 months ago

so basically what you are saying is to not put information on public places, but only send information to specific people

AbsoluteChicagoDog@lemm.ee · 3 months ago

deleted by creator

dreadbeef@lemmy.dbzer0.com · 3 months ago

How long will USPS last?

melpomenesclevage@lemmy.dbzer0.com · edit-2 3 months ago

take the resources from them so they don’t have them anymore. infiltrating the teams that do this and exposing or sabotaging the effort. literally fighting back, possibly in ways that involve giving the CEO’s and prominent investors a free trip to an old coal mine.

short of that…

cy_narrator@discuss.tchncs.de · 3 months ago

AI will come up there to abuse it as well

/home/pineapplelover@lemm.ee · 3 months ago

They’re afraid