Wikipedia is using (some) generative AI now

cm0002@lemmy.world · 8 months ago

Wikipedia is using (some) generative AI now

randon31415@lemmy.world · 8 months ago

Wikipedia had bots writing US census gathering-place articles in 2002, 20 years before LLMs were a thing. They’ve got decades of regulations in place, so I am not scared that the quality is going to drop.

F/15/Cali@threads.net@sh.itjust.works · 8 months ago

Remember to download a backup while information quality is still passable

https://dumps.wikimedia.org/backup-index.html

Voyajer@lemmy.world · 8 months ago

It’s not for use in editing articles.

Chozo@fedia.io · 8 months ago

Do these backups also contain the edit histories?

chameleon@fedia.io · 8 months ago

There are both dumps with full history and ones that are just the current set of articles. The full dump happens once a month on the 1st, but will often take ~2 weeks to run to completion, so you probably have to look back to the April 1 2025 dump for those. The metawiki dumps page has all the info.

doodledup@lemmy.world · 8 months ago

Haters gon hate

LupusBlackfur@lemmy.world · 8 months ago

…nothing could possibly go worng!..

(Some of you may remember the original Westworld 1sheet…)

Xanza@lemm.ee · 8 months ago

Wikipedia generally a really good candidate for generative AI.

RandomVideos@programming.dev · 8 months ago

Generative AI suffers from inaccuracy; text AI generators making up believable lies if it doesnt have enough information

Xanza@lemm.ee · edit-2 8 months ago

The idea of generative AI isn’t accuracy, so that’s pretty expected.

Generative AI is designed to be used with a content base and expand on information, not to create new information. You can feed generative AI with the entirety of the current Wikipedia text source and have it expand on subjects which need it, and curtail and simplify other subjects which need it.

You don’t ask generative AI to come up with new information–that’s how you get inaccurate information.

text AI generators making up believable lies if it doesnt have enough information

Let’s not anthropomorphize AI. It doesn’t lie. It uses available data to expand on a subject to make it conversationally complete when it lacks sufficient information on a subject, regardless of whether or not the context is correct. That’s completely different, and you can specifically prohibit an AI from doing that…

AI is great when used appropriately. The issue is that people are using AI as a Google replacement, something it’s not designed to do. AI isn’t a fact engine. LLMs are designed to as closely resemble human speech as possible, not to give correct information to questions. People’s issue with AI is that they’re fucking using it wrong.

This is an exceptionally great usage of AI because you already have the required factual background knowledge. You can simply feed it to your AI telling it not to fill in any gaps and to rewrite articles to be more uniform and to have direct and easy to consume verbiage. This instance is quite literally what generative AI was designed for…to use factual knowledge and to generate context around the existing data.

Issues arise when you use AI for things other than what it was intended, and you don’t give it enough information and it has to generate information to complete datasets. AI will do what you ask, you just have to know how to ask it. That’s why AI prompt engineers are a thing.

sugar_in_your_tea@sh.itjust.works · 8 months ago

Exactly. At work, my team kinda sucks at communication but great w/ facts (we’re engineers, go figure), so they use gen AI to turn facts into nicer-to-read documentation and communication (i.e. personal reviews, emails, documentation, etc). The process is relatively smooth:

generate all the facts in a rough form
ask AI to reword it for whatever purpose
edit it a bit to correct any issues
if needed, ask a coworker to quickly review it

For that task, it works pretty well.

Xanza@lemm.ee · 7 months ago

I also use it for this. Sometimes I can get pretty draconian with my speech, so I feed my long winded responses into AI, and pull out a more readable responses. I even have AI write pull requests sometimes. Works great.

RandomVideos@programming.dev · 8 months ago

I still fear that mistakes may slip through, but those can be spotted if multiple people check the text

There are wikipedia pages that are really obscure(especially from pages that are not in english) and that nobody would probably check to verify its correct

Xanza@lemm.ee · 7 months ago

I still fear that mistakes may slip through

Wikipedia is moderated by real people, so this is a non-issue. The fault tolerance for Wikipedia will be the same from AI vs People because it’s moderated by the same people regardless of where the content comes from.