Machine translators have made it easier than ever to create error-plagued Wikipedia articles in obscure languages. What happens when AI models get trained on junk pages?
It’s profoundly chauvinistic to think that people who speak other languages don’t have the same depth of literary resource as English-speakers because Wikipedia has fewer users.
Books. They’re called books. Every nation speaking every language has them.
I understand you’re trying to be nice to minority languages, but if you write research papers you either limit your demographic to your own country, or you publish in English (I guess Spanish is pretty world wide as well). If you set out to read a new paper in your field, I doubt you’d pick up something in Mongolian.
Even in Sweden I would write a serious paper in English, so that more of the world could read it. Yes, we have text books for our courses that are in Swedish, but i doubt there are many books covering LLMs being published currently for example.
I’m not “trying to be nice to minority languages”, I’m directly pushing back against the chauvinistic idea that the English Wikipedia is so important that those without it are somehow inferior. There is no “doom spiral”.
As for scientific papers, it’s called a translation. One can write academic literature in one’s native langaue and have it translated for more reach. That isnt the case with Wikipedia which is constantly being edited.
No one is saying those who can’t access or reqd English wikipedia is inferior. The issue here is when what is on a non-english wikipedia article is misleading or flat out harmful (like the article says about growing crops), because of juvenile attempts at letting machine translations getting it very wrong. So what Greenland did was shut down its poorly translated and maintained wiki site instead of letting it fester with misinformation. And this issue compounding when LLMs scrape Wikipedia as a source to learn new languages.
It’s profoundly chauvinistic to think that people who speak other languages don’t have the same depth of literary resource as English-speakers because Wikipedia has fewer users.
Books. They’re called books. Every nation speaking every language has them.
I understand you’re trying to be nice to minority languages, but if you write research papers you either limit your demographic to your own country, or you publish in English (I guess Spanish is pretty world wide as well). If you set out to read a new paper in your field, I doubt you’d pick up something in Mongolian.
Even in Sweden I would write a serious paper in English, so that more of the world could read it. Yes, we have text books for our courses that are in Swedish, but i doubt there are many books covering LLMs being published currently for example.
I’m not “trying to be nice to minority languages”, I’m directly pushing back against the chauvinistic idea that the English Wikipedia is so important that those without it are somehow inferior. There is no “doom spiral”.
As for scientific papers, it’s called a translation. One can write academic literature in one’s native langaue and have it translated for more reach. That isnt the case with Wikipedia which is constantly being edited.
No one is saying those who can’t access or reqd English wikipedia is inferior. The issue here is when what is on a non-english wikipedia article is misleading or flat out harmful (like the article says about growing crops), because of juvenile attempts at letting machine translations getting it very wrong. So what Greenland did was shut down its poorly translated and maintained wiki site instead of letting it fester with misinformation. And this issue compounding when LLMs scrape Wikipedia as a source to learn new languages.