home

Archive for September, 2007

LA Times and the Deletion Roundup

Sunday, September 30th, 2007

I talked at length with the reporter from the LA Times about the entire universe of contemporary Wikipedia issues, before jetting off for China’s October holidays. The circle of Wikipedia bloggers has done a fine job of summarizing the article by David Sarno about deletion/inclusionism, sparked by the Mzoli Meats controversy, so I will simply link to them.

Overall, it was a good treatment. I’m glad to see Sarno took the time to talk at length to understand the complexity of the issue, rather than doing a simplistic “parachute journalism” article.

Erik Moeller on Wikipedia 2.0

Sunday, September 23rd, 2007

This is a response from Erik Moeller to me concerning the New Scientist article I talked about earlier.

To be clear, movement towards a usable stable versions is a good thing. However, this is one of the first major technical- and content-oriented initiatives being handled with money and oversight by the Wikimedia Foundation board of trustees, or a delegate thereof. And for that reason, to channel Dwight Eisenhower, this “is new in the Wikipedia experience.”

Andrew, I attempted to post the following, but I got an error message “Error: This file cannot be used on its own.” when trying to post a comment on your blog.

Luca de Alfaro gave a presentation at Wikimania 2007, and has been in touch with both Sue Gardner and our Technical Staff. Tim Starling commented about his work here:

http://www.nabble.com/Re%3A-%22Software-Weighs-Wikipedians%27-Trustworthiness%22-p12011354.html

Please be sure to read Luca’s actual paper before commenting on any of the potential problems with his approach:

http://www.soe.ucsc.edu/~luca/papers/07/wikiwww2007.pdf

We have provided Luca with the kind of live feed that we normally only give to companies to do his research in real time, and right now he’s working to process a full dump of the English Wikipedia. I have suggested that we could then offer a MediaWiki “tab” that could show the articles with trust coloring overlay.

Initially this could be something that editors add by modifying their user JavaScript, like navigation popups and countless other tools. The trust coloring itself would run on Luca’s servers (but inside a MonoBook skin).

After my conversations with Jim Giles this was condensed into “incorporated into Wikipedia” in the New Scientist article, which is an error (we’re going to send a correction on Monday). It’s not Jim’s mistake, though, as he sent me my quotes for approval, and I overlooked that particular part.

In essence, anything we do with Luca’s work will be done in stages, and with plenty of time for community feedback and so forth.

That said, I personally think that the kind of “overlay” functionality that Luca could provide (trust coloring for Wikipedia articles) is one of many overlays that could be useful. Wikipedia is a treasure for data miners, and in my opinion, it would be neat to think of a way to integrate recent research directly into the site, similar to the way
Google Earth integrates content overlays.

Wikimedia Foundation moving to California

Friday, September 21st, 2007

Sue Gardner of the the Wikimedia Foundation announced today the move of the foundation’s offices from St. Petersburg, Florida to San Francisco, California. Makes sense from a technology point of view, may not be so good for European cross-collaboration, but may be advantageous for Asia and Australia interaction. Some of the announcement details:

In making this decision, we assessed five major cities: Boston, London, New York, San Francisco and Washington, DC - as well as St. Petersburg itself. The upshot: after a fairly detailed analysis, I recommended to the board that the Foundation relocate to San Francisco, and the board accepted that recommendation.

[...]

Here is what’s planned at this point:

- The new office will open sometime this winter. We’ll probably start out in downtown San Francisco, until we get our bearings and choose a permanent location.
- The St. Petersburg office will close late this winter, probably at the end of January.
- We know that many people’s personal circumstances will make it impossible for them to move, but we are hoping that some of the current staff will be able to come with us.
- The servers will remain in Tampa indefinitely. If we do choose to move them, that would be a separate, subsequent decision. At this point, it’s not under active consideration.

Wikipedia 2.0? Hold on now…

Friday, September 21st, 2007

New Scientist has a new article out about Wikipedia’s “stable versions” proposal as a way to address criticism about how to trust articles that are constantly in flux. The idea is that there will be some type of rating system and selection of a presentable version for ordinary passersby.

Jimmy Wales announced the push to this initiative in August 2006, and the German Wikipedians have been working on implementing this as a pilot. While it’s being implemented later than expected, the New Scientist piece does a decent job explaining the impetus for it, and some of its features.

But then things in the article get oh-so-strange, and it’s caused a bit of a firestorm behind the scenes.

The article goes on to describe “trust ratings” for users, based on the work of Luca de Alfaro at UC Santa Cruz and the color coding system. This was a shock to me when I read it, and I consider myself moderately in-the-know.

Specifically in the article, they mention the feature as described by Erik Moeller, the Wikimedia Board of Trustee member most involved with this:

As well as relying on trusted editors, Wikipedia’s upgrade will involve automatically awarding trust ratings to chunks of text within a certain article. Moeller says the new system is due to be incorporated into Wikipedia within the next two months, as an option for the different language communities.

The software that will do this, created by Luca de Alfaro and colleagues at the University of California, Santa Cruz, starts by assigning each Wikipedia contributor a trust rating using the encyclopedia’s vast log of edits, which records every change to every article and the editor involved. Contributors whose edits tend to remain in place are awarded high trust ratings; those whose changes are quickly altered get a low score. The rationale is that if a change is useful and accurate, it is likely to remain intact during subsequent edits, but if it is inaccurate or malicious, it is likely to be changed. Therefore, users who make long-lasting edits are likely to be trustworthy. New users automatically start with a low rating. [Emphasis mine]

When asked about this, Erik referred me to this page on meta that explains part of this rationale: Wikiquality.

What raises my concern is that this wiki page, created for “brainstorming”, was made available just days before the New Scientist article was published, and it seems the publication has taken it as gospel as to what will happen. I’m not aware of how many people have seen or vetted this idea.

I’ll leave my comments at that.

I’m eager to hear the response from the community about this proposal. Let’s just say we had a lively commentary on the WikipediaWeekly podcast just a few hours ago about this, and I’m sure a vigorous conversation will follow.

TechCrunch 40 Results

Thursday, September 20th, 2007

Last night, I was shooting the breeze with Kaiser Kuo about the tech scene in China, and TechCrunch40 came up as something that would be interesting to do this side of the Pacific. For those not familiar with the concept:

The format is simple: Forty of the hottest new startups from around the world will announce and demo their products over a two day period at TechCrunch40. And they don’t pay a cent to do this. They will be selected to participate based on merit alone. In fact, we’re even offering a $50,000 cash award and lining up other in-kind services and awards from a generous group of corporate sponsors. [ref]

As co-sponsor of the conference, Jason Calacanis announced the site mint.com won top honors. Their slogan is “refreshing money management,” while TechCrunch describes them as, “a personal finance application that lets users track and monitor their financials in one place without the need of routine maintenance or accounting knowledge. Their application tracks bank, credit union and credit card transactions and alerts users to upcoming bills, low balances or unusual spending.”

That’s pretty slick, but not entirely new. Back in the dot-com era (I recall 1999 or so, but I’m not entirely sure) I used the site Yodlee.com as an account aggregator that tracked investments and balances for you. It’s pretty scary giving one site all your bank and credit card passwords to manage. For Yodlee, I started by entering one every few weeks until I was sure they weren’t going to fleece me and run off to the Caymans. Yodlee is still around, and I login in occasionally to check my “net worth” in their display.

In this industry it often takes a few generations for something to stick. YouTube wasn’t the first to share video,  and Flickr was not the first to share photos. But they’re certainly the big ones getting attention now. It’s quite curious what makes these services stick when others fail.

Also of note, Kaltura.com took the People’s Choice award at the conference. I had the pleasure of meeting Shay David, co-founder of Kaltura at Wikimania 2007, where he demoed for me their new video editing in a Web page feature using Adobe Flash. It was really slick, and supported a wiki-like video application. This is perhaps the holy grail of the video production world — supporting meaningful video editing collaboration, and Kaltura really impressed me with what they could do within a Web browser.

Can’t wait to play with both of these when I get more time, and when their sites are not completely swamped with traffic (like mint.com right now).

Yahoo! mash, their SNS site

Saturday, September 15th, 2007

In keeping with its trailing-edge tendencies, Yahoo! has had to play catch-up again in the Web 2.0 space. This time it’s social networking.

Let’s do a quick stroll down Yahoo’s road of broken dreams. Yahoo! didn’t do much with GeoCities when it acquired it in 1999, even though sites like MySpace and Xanga cashed in on user-gen sites much later. For audio and video, even though it acquired Broadcast.com early on (making Mark Cuban a billionaire) it doesn’t have a real video product like YouTube or an audio success like iTunes Music Store. Yahoo had to buy its way into photos with Flickr, even though it seems activity there has flattened out. And it only recently upgraded its Yahoo! Mail to become AJAX and Web 2.0 savvy. You can imagine why investors are disheartened by Yahoo! and its strategic direction.

So on that record, Yahoo! has put its entry into the social networking arena — Yahoo! mash.

One tech web site described it best — Xanga + Facebook + Wiki = mash.

It has the sparse but customizable HTML-ability of Xanga/MySpace, the modular components of Facebook, and for an interesting twist, a wiki-like ability for any of your friends (or anybody at all, if you like) to edit your profile page.

That last “wiki” feature is perhaps the only thing that will turn heads. It’s invite only for now, which makes it a pain to find folks you know. Isn’t that the whole point to be able to search and find folks? And once you’ve used Facebook’s clean slick interface, using Yahoo! mash seems like going from Prada to Walmart.

Especially amateurish is their “Mash Pet” feature which appears to be what an engineer scribbled on a whiteboard with five seconds of thought. Iconic and cute it is not.

But they might have something with the wiki idea. It’s got the right potential signal/noise ratio to make it interesting if it can handle rich media.

Send me a note if you want to be invited “in”.

Cory Doctorow in Beijing

Wednesday, September 12th, 2007

Cory Doctorow talks to the crowd at Beijing Bookworm about DRM…

Using Tor: Assume Exit Nodes are Monitored

Tuesday, September 11th, 2007

Ars Technica is reporting that a security specialist was able to grab a bunch of login/passwords after running Tor nodes to illustrate proper and improper use of the widely-used anonymity network. In this particular case, Dan Egerstad volunteered to be part of the Tor network by running “exit nodes,” and boy did he grab a bunch of sensitive logins and passwords.

Particularly embarrassing is the fact the list contained use by embassy staff of Uzbekistan, Kazakhstan, Iran and India among others. There seems to be no other explanation other than the IT departments of these governments actually recommending Tor as standard operating procedure to access their accounts from abroad.

That’s not an appropriate use for Tor at all.

When this story broke in India, one of the news outlets tested the username/password to get into the account of a government official (which is of questionable journalistic ethics):

To check the authenticity, The Indian Express sent a test mail to the Indian Ambassador in China on her official email ID and, using the password posted online, was able to access it. The email account of the Indian Ambassador to China contained details of a visit by Rajya Sabha member Arjun Sengupta to Beijing earlier this month for an ILO conference. There was also a transcript of a meeting this evening which a senior Indian official had with the Chinese Foreign Minister.

Also on this list of shame were Hong Kong political parties and Legislative Council members. Being a former resident of HK, this is particularly bizarre since the HK government has prided itself on being IT savvy on the world stage, even bragging about being the first in the world to use E-certificates on the Smart-ID cards all Hong Kongers carry. It’s ironic the E-cert system is so secure, complex and unusable in HK, while politicians are using cleartext mail protocols and sending data through random untrusted computers.

Egerstad has taken special attention to HK (SCMP, Sep 9, 2007, subscription):

Swedish computer security consultant Dan Egerstad hopes to come to Hong Kong next month and visit some of the legislators and NGOs he exposed on his website as having weak internet security - but only if the police promise not to arrest him at the airport.

Mr Egerstad, 21, published the e-mail passwords of prominent legislators such as the Democratic Party’s Sin Chung-kai and Liberal Party vice-chairman Miriam Lau Kin-yee on the website dErangedsecurity.com. He also published the IP addresses of the e-mail servers.

Mr Egerstad trawled through the e-mails of the One Country Two Systems Research Institute of China and the Liaison Office of the Dalai Lama for Japan and East Asia, as well as the Hong Kong Human Rights Monitor.

How Tor Works

A quick recap: the Tor system works by using a volunteer network of computers that offer to relay your Web traffic, encrypted and anonymously, through the Tor network. It relays your traffic through three Tor intermediary nodes, the idea being that each relay node knows which neighboring node packets are coming from and going to, but no one knows the entire path to the final destination address. There are some really smart people behind Tor like Roger Dingledine, and most experts agree that for anonymity, it does a very good job.

The problem is, people are using Tor without understanding exactly what it does and does not provide.

The weak link is when a user’s data finally emerges at the last computer (the exit node) which relays the request to the public Internet. Anyone operating a final exit node can see what you’re sending and receiving. So while Tor provides for end-user anonymity at the network/packet level (IP address), it does not provide for end-to-end data secrecy. The traffic coming off the the exit node on your behalf is exactly what protocol and data your application (Web browser, mail program, instant messenger, etc) sent out.

If it’s a cleartext data stream like HTTP or mail (IMAP or POP3) then anyone running a Tor exit node can see and capture it. And that’s what Egerstad did — he monitored his exit nodes for:

“gov, government, embassy, military, war, terrorism, passport, visa” as well as domains belonging to governments.

Tor uses the SOCKS proxy protocol to receive transactions for the Tor network. SOCKS has been around a long time and is a solid generic protocol. It handles HTTP (Web) requests as well as other data streams, so yes, it can support end-to-end encrypted sessions using HTTPS or secure sockets. So if you use Tor, combine it with a secure protocol if you need data secrecy! This is where people may get confused — data is encrypted within the Tor network, but it exits the Tor network exactly as your browser or application requested — most likely unencrypted. So use an end-to-end encryption solution in addition to Tor, if that’s what you need.

If you’re surfing CNN or ESPN to get the latest sports scores, no problem. If you’re logging into a system or sending/receiving e-mail, you better make sure it’s encrypted.

Tor has also been in the news related to a phishing/trojan scheme, where spam email asked folks to download Tor, but it really pointed to a trojan program instead.

It’s important to note in both instances, Tor is not the one at fault. The trojan problem is your typical phishing problem — never click on any hyperlink ever sent to you in email, and don’t trust any sites you didn’t find or search yourself.

Tor is a great program, but it’s not a cure-all. You need a wide spectrum of tools to do it right, or you can also do what many corporations do — require the use of a Virtual Private Network, and all your data packets are routed and encrypted back to a trusted corporate home base.

Egerstad had this final harsh warning on his blog:

These governments told their users to use ToR, a software that sends all your traffic through not one but three other servers that you know absolutely nothing about. Yes, two are getting encrypted traffic but that last exit node is not. There are hundreds of thousands ToR-users but finding these kinds of accounts was… hmm… chocking! The person who wrote the security policy on these accounts should reconsider changing profession, start cleaning toilets! These administrators are responsible for giving away their own countries secrets to foreigners. I can’t call it a mistake, this is pure stupidity and not forgivable!

ToR isn’t the problem, just use it for what it’s made for.

Tor is very good for anonymity, but does nothing for adding any data security. In fact, it’s likely more risky, because you are handing traffic over to a stranger (exit node) in cleartext.

I don’t use Tor much, as I don’t often need anonymity. It’s also a sluggish performer because of the three relays for traffic. But when I do use it, I make sure to use Firefox with a virgin clean profile — no cookies, no stored data, no caching, no browsing history. (You can configure Firefox to ask for what profile you want on startup.)

So the big headline? This is not a Tor insecurity. You wouldn’t complain to Home Depot that masking tape failed to seal your PVC pipes. You have to use the right tool for the right job, and the Uzbek government is learning this the hard way.

Two Million English Wikipedia articles! Celebrate?

Monday, September 10th, 2007

This weekend, the English Language Wikipedia surpassed two million articles with the creation of [[El Hormiguero]], an article about a Spanish-language television show.

Interestingly, in the weeks preceding this event, there were many on the internal Wikimedia mailing lists who thought this milestone should be downplayed. Article count in itself does not mean much, but it’s still an important achievement in the history of Wikipedia. What it says to me, though, is that the core Wikipedia community is feeling somewhat empty and lost for direction.

The “community” doesn’t really know what motivates and excites them anymore. The thrill of new article creation is gone. There was a time when every article count, mention of Wikipedia in the newspaper or television story about the community was enthusiastically heralded on the lists. There were virtual slaps on the back, digital high fives and a hurrah in the community. But the euphoria of being a revolutionary and disruptive project enabled by the Internet is now simply nostalgia. Wikipedia articles routinely show up in Google searches, while the public’s expectations of quality have gone higher and higher. Wikipedia has become an indispensable part of the Internet landscape. It is no longer just a cool free novelty.

As a result, Wikipedia’s volunteer culture has shifted dramatically from being rouge and revolutionary, to remaining staid and conventional, both in content and in policy.
Instead of two million articles being a time to celebrate, El Hormiguero shows the challenges Wikipedia faces. If you’ve seen my recent blog posts or my Wikimania 2007 presentation, you can probably guess what happened to our dear article. Yes, it was promptly listed on Articles for Deletion by User:Alkivar within 24 hours of creation with the note:

Subject is a non notable tv show from Spain. It fails Wikipedia:Television_episodes content guidelines. There are no google news results once you do a google search and strip out blogs, youtube/google video, wiki clones, and the tv network cuatro who hosts it, you find more references to Nicaragua than you do the TV show. It has not “received significant coverage in reliable sources that are independent of the subject.”

The deletion didn’t get any traction. Eight folks voted right away to keep the article, rightly pointing out that this is a successful show. Depending on English-language sources for a Spanish show is not the best tactic for research. So it appears it will be kept, if for no other reason that it would be embarrassing to tell the world the two millionth article in English Wikipedia didn’t survive 24 hours.

I’ve mentioned what I’ve seen become the main activities in Wikipedia — deleting, pruning, citing and challenging contributions, which makes it a very different atmosphere than the original “anyone can edit” culture that brought the first waves of contributors. Today, there is more concern about keeping the bozos out, and making Wikipedia more “respectable.”

It’s clear now that Wikipedia’s growth curve is starting to get clipped. The latest graphs in [[Wikipedia:Modeling Wikipedia's growth]] show that the top of the S-curve is in sight, and that “slope” is starting to decrease. This is not unique to the English edition, as German has seen the same phenomenon. The best estimates show that English Wikipedia may not double in size anytime soon.

The lack of community clarity on “what’s next” is because of Wikipedia’s coming phase — quality. At Wikimania 2006, Jimmy Wales proclaimed the next challenge was “quality” rather than “growth.” A feature called “stable” or “checked” versions was put forth by members of the German Wikipedia as a way for vetting versions of an article, so they could be checked or marked as non-vandalized, accurate, or some other criteria. A set of authorized users would be able to “mark” or “rate” versions of articles. The Flagged Revisions feature was discussed at the conference and online afterwards with hopes of implementing it within the next year. But one year later, at Wikimania 2007, there was no news as to when the feature would go live, or even have a public test.

To be fair, implementing this feature is pretty hard, functionally and culturally. It drastically changes the nature of the community. By adding this un-wiki feature of encapsulating quality in a metric value, it goes against a wiki culture that always encouraged critical thinking and careful individual evaluation of articles. By giving the public a thumbs up or down, Wikipedians would be vouching for the article and giving it some type of certification. That would be an entirely new role for the community.

Also, the user interface for this feature will be challenging for average users and administrators of the site.

For public users, do you show latest version as we do now, or do you show the last non-vandalized version, or the non-vandalized and fact-checked version? How do you toggle among them, and prevent user confusion about what is being displayed?

For administrators, each rating or action is stored in the database, but it’s another vector for vandalism and trolling. How do you monitor all these actions effectively? The rating feature would dramatically affect the bulk of Wikipedia’s operations when doing diffs, reviewing logs, etc. The rating feature is a very generic one, and there has been no large scale community discussion or consensus on what would be appropriate to rate, or even what they would mean. So Flagged Revisions is in the “cooking” stage, ready to be tested by a small circle of folks by the end of 2007, but getting this feature into the mainline Wikipedia will no doubt be controversial.

It’s a tough road ahead for the community. They have to realize that Wikipedia will not be ever increasing, and this is a special time in its history before it inevitably goes into “maintenance mode.” There have been warnings for years, and people don’t want to see Wikipedia turn into “another DMOZ.”

I don’t think Wikipedia will collapse into a DMOZ state of affairs. A directory of links and sites like DMOZ can get stale faster than a New York bagel, but human knowledge crafted by Wikipedia’s community has a much longer shelf life. However, the current drive-by culture of overly bureaucratic rules and regulations could turn Wikipedia into a positively dreadful place to hang out, and could stunt its growth and quality if we are not careful.

While doing research for the book about Wikipedia, I’ve found Jane Jacobs‘ book The Death and Life of Great American Cities extremely relevant. Jacobs, an urban activist, was not just fighting to preserve New York from the bulldozer of uber-developer Robert Moses, she prescribed ways to keep a big city feeling intimate and personal. The book is about how to keep citizens in contact with each other, to always make sure sidewalks provide individuals with interaction and to maintain a sense of humanity in what could easily become a faceless impersonal jungle of concrete and steel.

Wikipedia needs to learn from this.

It needs to remain people-centered and willing to suffer some inefficiencies in order to keep the community in control, and not at the mercy of a multitude of incoherent policies. Some of the community norms being adopted now are like Moses’ expressways, ripping an institutional path of destruction and uprooting communities at work — Critieria for Speedy Deletion and Requests for Adminship are perhaps the worst of these in the Wikipedia universe.

Regardless of whether it’s 3 million or 5 million as the final “sweet spot” of English Wikipedia, it will be interesting seeing how the community survives while getting there.

Hooters Beijing

Thursday, September 6th, 2007

Hooter’s in Beijing was putting up their signs this week.

Definitive proof that we live in a globe-alized world (apologies to Tom Friedman).

Interesting allusion to revolutionary art.

What is the Chinese name for the restaurant? The translation:
American Owl Restaurant (or literally, American Cat Head Eagle Restaurant)

I think the meaning is completely lost.

Opening soon, just northeast of Worker’s Stadium.