Wikipedia had bots writing US census gathering-place articles in 2002, 20 years before LLMs were a thing. They’ve got decades of regulations in place, so I am not scared that the quality is going to drop.
Remember to download a backup while information quality is still passable
It’s not for use in editing articles.
Do these backups also contain the edit histories?
There are both dumps with full history and ones that are just the current set of articles. The full dump happens once a month on the 1st, but will often take ~2 weeks to run to completion, so you probably have to look back to the April 1 2025 dump for those. The metawiki dumps page has all the info.
Haters gon hate
…nothing could possibly go worng!..
(Some of you may remember the original Westworld 1sheet…)
Wikipedia generally a really good candidate for generative AI.
Generative AI suffers from inaccuracy; text AI generators making up believable lies if it doesnt have enough information
The idea of generative AI isn’t accuracy, so that’s pretty expected.
Generative AI is designed to be used with a content base and expand on information, not to create new information. You can feed generative AI with the entirety of the current Wikipedia text source and have it expand on subjects which need it, and curtail and simplify other subjects which need it.
You don’t ask generative AI to come up with new information–that’s how you get inaccurate information.
text AI generators making up believable lies if it doesnt have enough information
Let’s not anthropomorphize AI. It doesn’t lie. It uses available data to expand on a subject to make it conversationally complete when it lacks sufficient information on a subject, regardless of whether or not the context is correct. That’s completely different, and you can specifically prohibit an AI from doing that…
AI is great when used appropriately. The issue is that people are using AI as a Google replacement, something it’s not designed to do. AI isn’t a fact engine. LLMs are designed to as closely resemble human speech as possible, not to give correct information to questions. People’s issue with AI is that they’re fucking using it wrong.
This is an exceptionally great usage of AI because you already have the required factual background knowledge. You can simply feed it to your AI telling it not to fill in any gaps and to rewrite articles to be more uniform and to have direct and easy to consume verbiage. This instance is quite literally what generative AI was designed for…to use factual knowledge and to generate context around the existing data.
Issues arise when you use AI for things other than what it was intended, and you don’t give it enough information and it has to generate information to complete datasets. AI will do what you ask, you just have to know how to ask it. That’s why AI prompt engineers are a thing.
Exactly. At work, my team kinda sucks at communication but great w/ facts (we’re engineers, go figure), so they use gen AI to turn facts into nicer-to-read documentation and communication (i.e. personal reviews, emails, documentation, etc). The process is relatively smooth:
- generate all the facts in a rough form
- ask AI to reword it for whatever purpose
- edit it a bit to correct any issues
- if needed, ask a coworker to quickly review it
For that task, it works pretty well.
I also use it for this. Sometimes I can get pretty draconian with my speech, so I feed my long winded responses into AI, and pull out a more readable responses. I even have AI write pull requests sometimes. Works great.
I still fear that mistakes may slip through, but those can be spotted if multiple people check the text
There are wikipedia pages that are really obscure(especially from pages that are not in english) and that nobody would probably check to verify its correct
I still fear that mistakes may slip through
Wikipedia is moderated by real people, so this is a non-issue. The fault tolerance for Wikipedia will be the same from AI vs People because it’s moderated by the same people regardless of where the content comes from.







