Two Million English Wikipedia articles! Celebrate?

This weekend, the English Language Wikipedia surpassed two million articles with the creation of [[El Hormiguero]], an article about a Spanish-language television show.

Interestingly, in the weeks preceding this event, there were many on the internal Wikimedia mailing lists who thought this milestone should be downplayed. Article count in itself does not mean much, but it’s still an important achievement in the history of Wikipedia. What it says to me, though, is that the core Wikipedia community is feeling somewhat empty and lost for direction.

The “community” doesn’t really know what motivates and excites them anymore. The thrill of new article creation is gone. There was a time when every article count, mention of Wikipedia in the newspaper or television story about the community was enthusiastically heralded on the lists. There were virtual slaps on the back, digital high fives and a hurrah in the community. But the euphoria of being a revolutionary and disruptive project enabled by the Internet is now simply nostalgia. Wikipedia articles routinely show up in Google searches, while the public’s expectations of quality have gone higher and higher. Wikipedia has become an indispensable part of the Internet landscape. It is no longer just a cool free novelty.

As a result, Wikipedia’s volunteer culture has shifted dramatically from being rouge and revolutionary, to remaining staid and conventional, both in content and in policy.
Instead of two million articles being a time to celebrate, El Hormiguero shows the challenges Wikipedia faces. If you’ve seen my recent blog posts or my Wikimania 2007 presentation, you can probably guess what happened to our dear article. Yes, it was promptly listed on Articles for Deletion by User:Alkivar within 24 hours of creation with the note:

Subject is a non notable tv show from Spain. It fails Wikipedia:Television_episodes content guidelines. There are no google news results once you do a google search and strip out blogs, youtube/google video, wiki clones, and the tv network cuatro who hosts it, you find more references to Nicaragua than you do the TV show. It has not “received significant coverage in reliable sources that are independent of the subject.”

The deletion didn’t get any traction. Eight folks voted right away to keep the article, rightly pointing out that this is a successful show. Depending on English-language sources for a Spanish show is not the best tactic for research. So it appears it will be kept, if for no other reason that it would be embarrassing to tell the world the two millionth article in English Wikipedia didn’t survive 24 hours.

I’ve mentioned what I’ve seen become the main activities in Wikipedia — deleting, pruning, citing and challenging contributions, which makes it a very different atmosphere than the original “anyone can edit” culture that brought the first waves of contributors. Today, there is more concern about keeping the bozos out, and making Wikipedia more “respectable.”

It’s clear now that Wikipedia’s growth curve is starting to get clipped. The latest graphs in [[Wikipedia:Modeling Wikipedia's growth]] show that the top of the S-curve is in sight, and that “slope” is starting to decrease. This is not unique to the English edition, as German has seen the same phenomenon. The best estimates show that English Wikipedia may not double in size anytime soon.

The lack of community clarity on “what’s next” is because of Wikipedia’s coming phase — quality. At Wikimania 2006, Jimmy Wales proclaimed the next challenge was “quality” rather than “growth.” A feature called “stable” or “checked” versions was put forth by members of the German Wikipedia as a way for vetting versions of an article, so they could be checked or marked as non-vandalized, accurate, or some other criteria. A set of authorized users would be able to “mark” or “rate” versions of articles. The Flagged Revisions feature was discussed at the conference and online afterwards with hopes of implementing it within the next year. But one year later, at Wikimania 2007, there was no news as to when the feature would go live, or even have a public test.

To be fair, implementing this feature is pretty hard, functionally and culturally. It drastically changes the nature of the community. By adding this un-wiki feature of encapsulating quality in a metric value, it goes against a wiki culture that always encouraged critical thinking and careful individual evaluation of articles. By giving the public a thumbs up or down, Wikipedians would be vouching for the article and giving it some type of certification. That would be an entirely new role for the community.

Also, the user interface for this feature will be challenging for average users and administrators of the site.

For public users, do you show latest version as we do now, or do you show the last non-vandalized version, or the non-vandalized and fact-checked version? How do you toggle among them, and prevent user confusion about what is being displayed?

For administrators, each rating or action is stored in the database, but it’s another vector for vandalism and trolling. How do you monitor all these actions effectively? The rating feature would dramatically affect the bulk of Wikipedia’s operations when doing diffs, reviewing logs, etc. The rating feature is a very generic one, and there has been no large scale community discussion or consensus on what would be appropriate to rate, or even what they would mean. So Flagged Revisions is in the “cooking” stage, ready to be tested by a small circle of folks by the end of 2007, but getting this feature into the mainline Wikipedia will no doubt be controversial.

It’s a tough road ahead for the community. They have to realize that Wikipedia will not be ever increasing, and this is a special time in its history before it inevitably goes into “maintenance mode.” There have been warnings for years, and people don’t want to see Wikipedia turn into “another DMOZ.”

I don’t think Wikipedia will collapse into a DMOZ state of affairs. A directory of links and sites like DMOZ can get stale faster than a New York bagel, but human knowledge crafted by Wikipedia’s community has a much longer shelf life. However, the current drive-by culture of overly bureaucratic rules and regulations could turn Wikipedia into a positively dreadful place to hang out, and could stunt its growth and quality if we are not careful.

While doing research for the book about Wikipedia, I’ve found Jane Jacobs‘ book The Death and Life of Great American Cities extremely relevant. Jacobs, an urban activist, was not just fighting to preserve New York from the bulldozer of uber-developer Robert Moses, she prescribed ways to keep a big city feeling intimate and personal. The book is about how to keep citizens in contact with each other, to always make sure sidewalks provide individuals with interaction and to maintain a sense of humanity in what could easily become a faceless impersonal jungle of concrete and steel.

Wikipedia needs to learn from this.

It needs to remain people-centered and willing to suffer some inefficiencies in order to keep the community in control, and not at the mercy of a multitude of incoherent policies. Some of the community norms being adopted now are like Moses’ expressways, ripping an institutional path of destruction and uprooting communities at work — Critieria for Speedy Deletion and Requests for Adminship are perhaps the worst of these in the Wikipedia universe.

Regardless of whether it’s 3 million or 5 million as the final “sweet spot” of English Wikipedia, it will be interesting seeing how the community survives while getting there.

7 thoughts on “Two Million English Wikipedia articles! Celebrate?

  1. Andrew,

    I think you’ve unpacked a lot of important issues here. The “people-centered” and “intimate and personal” are probably the most important. I have some ideas about this that I’ll blog up when I get the chance.

    As for a “final” article count, I think that what we will see is a fairly significant residual slope (a few hundred articles per day) that represents a) long-time Wikipedians who are still trying explicitly to search for uncovered topics and research them, and b) the largely untapped vein of topics that only experts in specific fields know about (history fields, in particular).

    The damping of the growth curve so far has been the result of saturation of the pool of potential editors, in addition to the increasing scarcity of “low-hanging fruit”. Most potential contributors to the English Wikipedia (English-speaking internet users) already know about it, and most that haven’t become editors by now never will.

    But I think (hope) that the situation is different if you consider only knowledge workers (professors, teachers, researchers, etc.). While the typical non-editor likes Wikipedia, she/he isn’t interested in writing and researching an encyclopedic topic. Knowledge workers, on the other hand, are still ambivalent about Wikipedia and are gradually being convinced of its value as coverage of their specialty improves. And they like writing and researching, they just think a) they have original research to do and can’t be bothered, and/or b) writing for Wikipedia is not recognized by their profession (and won’t help them get tenure).

    My hope is that as more knowledge workers become editors, it will catalyze even more to start editing. The growth curve would look quite different if, e.g., writing and improving Wikipedia articles was considered professional ‘service’ (in the same sense as editing and peer-reviewing for journals, or writing newspaper columns related to one’s field), something one would put on their C.V. But I think that’s probably overly-optimistic.

  2. If there is a a slowdown in (net) growth by article count, I’d look first at the decline in ‘incestuous’ topics added: WP as media and Internet phenom talking about media and the Internet; WP as uber-IMDB and popular culture documenter talking about every soap opera actor in the anglophone world, about every comic character. I see no shortage of historical and scholarly topics to add. My own practice confirms the ease of putting up new stubs in dozens of areas.

  3. Andrew:

    A good, well-balanced (I almost said NPOV) post. But I’m afraid that something like “flagged revisions” – or whatever the latest buzzword may be – WILL happen, and relatively soon. The reason is simple: Jimbo wants it. He even put a notice on the project page endorsing the idea and slamming bloggers who dislike it.

    I can understand why Jimbo wants to get respectable. He’s tired of people complaining about their WP bios to him. He’s fed up with the (usually bad) jokes about how unreliable Wikipedia is. In his position, I’d want something like flagged revisions, too.

    But I’m not in his position. I don’t care about WP getting “respectable”. I want my Wikipedia unfiltered and raucous as ever. With my edit count, I’d probably get “surveyor” status or whatever they finally call it. (Or maybe not, if the exalted position is confined to admins.)

    Except I don’t want the status, and I wouldn’t do anything to flag any article as meeting my (snicker) “standards”. To paraphrase Groucho, I don’t want to belong to a club that would accept my editorial judgment.

    Another blogger said he’d “prefer complete liberty or rigorous peer review, not this watery combo.” Put me down for complete liberty, or at least unflagged liberty. I wish the proposal would just go away. But Jimbo will make it happen.

  4. Keep on going and the chances are you will stumble on something, perhaps when you are least expecting it. I have never heard of anyone stumbling on something sitting down.

  5. It is in reality a great and useful piece of information. I?¦m glad that you shared this helpful info with us. Please stay us up to date like this. Thank you for sharing.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>