Language Popularity by Wikipedia

As you know, I have a spreadsheet which tries to work out which languages we should focus on when trying to start new l10n teams, by looking at the world Internet population by country, and what languages are spoken in each. This leads to a current top 5 target languages of:

  1. Tagalog
  2. Vietnamese
  3. Malay
  4. Azerbaijani (Azeri)
  5. Urdu

However, chofmann suggested we could cross-check our results by using the size of a language’s Wikipedia as a rough proxy for how popular that language was on the Internet. The data is here; if we take that, remove all the languages we have Firefox in already, and sort by size, we get a top 5 target languages of:

  1. Nepal Bhasa
  2. Azerbaijani (Azeri)
  3. Aromanian
  4. Haitian
  5. Tagalog

The Vietnamese and Malay Wikipedias are tiny. The Urdu one is a bit bigger, but still 1/3 of the size of the smallest one in the top 5.

So perhaps we should consider this second list of languages as our top priority list?

8 thoughts on “Language Popularity by Wikipedia

  1. There’s already been an effort to start translating to Tagalog led by Regnard Raquedan, but it didn’t get far off the ground and seems to have stalled completely at this point.

  2. What’s that list of priorities supposed to achieve?

    Quite commonly, you use priorities to get something done, and some other thing not done. Can you provide examples of those?

  3. Axel: certainly, the previous lists of priorities we have produced has led to Gen doing some Asian outreach work to try and pull together teams for the identified locales. Just the publicity from making the list led to several people commenting on my blog with interest in getting involved. I guess it’s like having a “most wanted” list :-)

  4. you could just combine the top 3 of each list to have your next top 6 priorities…

  5. So… we’ve had Vietnamese since 3.5 (iirc.) It’s basically only 1 person, Hung Nguyen, so we’re trying to expand the team so he doesn’t have to do everything himself. Arky’s also working on this from Vietnam.

    As for the Tagalog effort, my understanding is that Regnard is not organizing the localization effort himself. I haven’t heard anything from that effort in a while- it’s not clear how many Filipinos would want a Tagalog Firefox. It is an effort worth pursuing but until a committed person or group steps up, the Mozilla Philippines community is more focused (and quite active- they organized 5 events for the Firefox 4 launch) in other areas.

    As for Malay, I tried to start something in 2009 and failed. I’m trying again this year, but I have no illusions this time. Much like in the Philippines, everyone I speak to tells me that there’s no need for a Malay Firefox. That may or may not be true. I’m more focused on building a community in Malaysia and once that’s in place localization can either happen or not based on the community’s interest.

  6. I’m not sure Wikipedia size per language will necessarily correlate strongly with the number of internet users for that language. There may be cultural (or simply word-of-mouth) issues that make Wikipedia disproportionately more or less popular in a given language community.

    I would suggest that the total amount of information *on the web* in a given language would be a significantly better proxy. Such stats would probably be more work to compile, but once compiled they should closely represent the amount of active participation on the internet that takes place in any given language.

    On the other hand, lots of Wikipedia content probably *does* indicate a certain amount of interest in and openness toward open-source principles and willingness to volunteer on somebody’s part, so it might not be a bad idea to throw out a “we’re interested in forming a l10n team” announcement for Nepal Bhasa and Aromarian and Haitian and see if you get any nibbles. The worst that happens is you get warnocked.