EDGE 2013: Is The New Public Sphere… Public?

Every year, EDGE.org asks its annual question to top academics, authors, scientists and thinkers. This year’s question: WHAT *SHOULD* WE BE WORRIED ABOUT?

My contribution talks about the plight of user-generated information held within closed, restricted or access-throttled systems, and whether what this new digital “public sphere” is in fact public. Are the creators (and thereby, owners) of social media content (tweets, posts, status updates, photos, et al), assured access to their own work, and for public use, now and in the future?

By strange coincidence, many things I discuss are what Aaron Swartz was fighting for. I finished the piece the day before Aaron took his own life, and hope the things he was passionate about can continue and that we reexamine prosecuting “computer fraud.” RIP Aaron.

Here’s a link to all EDGE responses, or see below for just my piece.

Is The New Public Sphere… Public?
Andrew Lih, Associate Professor of Journalism, Annenberg School for Journalism; Author, The Wikipedia Revolution

The advent of social media sites has allowed a new digital public sphere to evolve, by facilitating many-to-many conversations on a variety of multimedia platforms, from YouTube to Twitter to Weibo. It has connected a global audience and provided a new digital commons that has had profound effects on civil society, soical norms and even regime change in the Middle East.

As important as it has become, are critical aspects of this new public sphere truly public?

There are reasons to be worried.

While we are generating content and connections that are feeding a rich global conversation unimaginable just 10 years ago, we may have no way to recreate, reference, research and study this information stream after the fact. The spectrum of challenges is daunting whether it’s because information is sequestered in private hands, throttled from full access, deleted from sight, retired with failed businesses or shielded from copying because of legal barriers.

Twitter, in particular, has emerged as the heart of a new global public conversation. However, anyone who’s ever used its search function knows the second chance to find content is dubious. Facebook works in a private eyes-only mode by default and is shielded even more from proper search and inspection, not only by the public, but even from the creators of the original content.

How about the easier case of individuals simply asserting control over their own content within these services? User of social media content systems still have sole copyright of their content, though the terms of service users agree to is rather extensive.

Twitter’s is fairly typical: “you grant us a worldwide, non-exclusive, royalty-free license (with the right to sublicense) to use, copy, reproduce, process, adapt, modify, publish, transmit, display and distribute such Content in any and all media or distribution methods (now known or later developed).”

Without passing judgment on the extent or reach of these types of license agreements, the logistics of accessing one’s own data are worrisome. Typically, these services (Twitter, Facebook, Weibo, Instagram, et al) are the sole digital possesser of your words, links, images or video that were created within their system. You may own the copyright, but do you actually possess a copy of what you’ve put into their system? Do you actually control access to your content? Do you have the ability to search and recall the information you created? Is public access to your data (eg. through application programming interfaces) possible now, or guaranteed in the long term?

That we continue to use an array of information systems without assurances about their long term survivability or commitment to open access, and whether they are good stewards of our history and public conversation, should worry us all.

What can be done about this?

To its credit, Twitter has partnered with the Library of Congress to hand over the first four years worth of tweets from 2006 to 2010 for research and study. Since that first collaboration, it has agreed to feed all tweets to the Library on an ongoing basis. This is commendable, but it’s drinking from a virtual firehose, with roughly half a billion new tweets generated every day. Few entities have the technology to handle very big data, but this is truly massive data.

The Twitter arrangement has provided quite a challenge to the Library as they don’t have an adequate way to serve up the data. By their own admission, they haven’t been able to facilitate the 400 or so research inquiries for this data, as they’re still addressing “significant technology challenges to making the archive accessible to researchers in a comprehensive, useful way.”

So far, the Library hasn’t planned on allowing the entire database to be downloaded in its entirety for others to have a shot at crunching the data.

We have reasons to worry that this new digital public sphere, while interconnected and collaborative, is not a true federation of data that can be reconstructed for future generations and made available for proper study. Legal, infrastructural and cooperative challenges abound that will likely keep it fractured, perforated and incoherent for the foreseeable future.

The case of Philip Roth vs Wikipedia

As Wikipedia becomes an increasingly dominant part of our digital media diet, what was once anomalous has become a regular occurrence.

Someone surfing the net comes face to face with a Wikipedia article — about himself. Or about her own work.

There’s erroneous information that needs to be fixed, but Wikipedia’s ten-year old tangle of editing policies stands in the way, and its boisterous editing community can be fearsome.

If a person can put the error into the public spotlight, then publicly shaming Wikipedia’s volunteers into action can do the trick. But not without some pain.

The most recent episode?

The case of Pulitzer Prize winning fiction writer Philip Roth.

His bestselling novel “The Human Stain” tells the story of fictional character Coleman Silk, an African-American professor who presents himself as having a Jewish background and the trials he faces after leaving his university job in disgrace. Widely read and highly acclaimed, the book was reviewed or referenced by many famous writers, such as Michiko Kakutani and Janet Maslin of the New York Times and the noted Harvard professor Henry Louis Gates, Jr.  [1] [2] [3]

The Broyard Theory

But there was a standing mystery about the novel.

After the book’s release in 2000, Roth had not elaborated on the inspiration for the professor Silk character . Over the years, it had become the subject of speculation, with most of the literary world pointing to Anatole Broyard, a famous writer and NY Times critic who “passed” in white circles without explicitly acknowledging his African American roots.

In 2000, Salon.com’s Charles Taylor wrote about Roth’s new book:

The thrill of gossip become literature hovers over “The Human Stain”: There’s no way Roth could have tackled this subject without thinking of Anatole Broyard, the late literary critic who passed as white for many years.

Brent Staples’ 2003 piece in the NY Times  described Silk as a “character who jettisons his black family to live as white was strongly reminiscent of Mr. Broyard.”

Janet Maslin wrote the book was “seemingly prompted by the Broyard story.”

It was such a widely held notion, the Broyard connection was incorporated into the Wikipedia article on “The Human Stain.”

An early 2005 version of the Wikipedia entry cited Henry Louis Gates Jr., and by March 2008, it relayed the theory from Charles Taylor’s Salon.com review.

The view was so pervasive, a list of over a dozen notable citations from prominent writers and publications were found by Wikipedia editors.

Wikipedians researching the topic came across articles as secondary sources that drew parallels between Silk and Anatole Broyard. The references were verifiable, linkable prose from notable writers and respected publications. The core policies of Wikipedia — verifiability, using reliable sources and not undertaking original research — were upheld by using reputable content as the basis for the conclusions.

Roth Explains It All

However, information from Roth in 2008 changed things.

Bloomberg News did an interview with the author about his new book at the time, “Indignation.” Towards the end of the interview, he was asked a casual question about “The Human Stain:”

Hilferty: Is Coleman Silk, the black man who willfully passes as white in “The Human Stain,” based on anyone you knew?
Roth: No. There was much talk at the time that he was based on a journalist and writer named Anatole Broyard. I knew Anatole slightly, and I didn’t know he was black. Eventually there was a New Yorker article describing Anatole’s life written months and months after I had begun my book. So, no connection.

It might have been the first time Roth went on the record saying there was no connection between the fictional Silk and real-life writer Broyard. It seems to be the earliest record on the Internet of this fact.

Fast forward to 2012, and according to Roth, he read the Wikipedia article for [[The Human Stain]] for the first time, and found the erroneous assertions about Anatole Broyard as a template for his main character. In August 2012, Roth’s biographer, Blake Bailey, became an interlocutor who tried to change the Wikipedia entry to remove the false information. It became an unexpected tussle with Wikipedia’s volunteer editors.

Unfortunately for Roth, by the rules of Wikipedia, first-hand information from the mouth of the author does not immediately change Wikipedia. The policies of verifiability and forbidding original research prevent a direct email or a phone call to Wikpedia’s governing foundation or its volunteers from being the final word.

Enter The New Yorker

Frustrated with the process, Roth wrote a long article for the New Yorker, detailing his Wikipedia conundrum. He provided an exhaustive description of the actual inspiration for the professor Silk character: his friend and Princeton professor, Melvin Tumin.

“The Human Stain” was inspired, rather, by an unhappy event in the life of my late friend Melvin Tumin, professor of sociology at Princeton for some thirty years.
And it is this that inspired me to write “The Human Stain”: not something that may or may not have happened in the Manhattan life of the cosmopolitan literary figure Anatole Broyard but what actually did happen in the life of Professor Melvin Tumin, sixty miles south of Manhattan in the college town of Princeton, New Jersey, where I had met Mel, his wife, Sylvia, and his two sons when I was Princeton’s writer-in-residence in the early nineteen-sixties.

Good enough. But the problem arose when Roth attempted to correct the information in Wikipedia with the help of Bailey, his biographer. He wrote:

Yet when, through an official interlocutor, I recently petitioned Wikipedia to delete this misstatement, along with two others, my interlocutor was told by the “English Wikipedia Administrator”—in a letter dated August 25th and addressed to my interlocutor—that I, Roth, was not a credible source: “I understand your point that the author is the greatest authority on their own work,” writes the Wikipedia Administrator—“but we require secondary sources.”

 

Thus was created the occasion for this open letter. After failing to get a change made through the usual channels, I don’t know how else to proceed.

The frustration is understandable. That someone’s first-hand knowledge about their own work could be rejected in this manner seems inane. But it’s a fundamental working process of Wikipedia. It depends on reliable (secondary) sources to vet and vouch for the information.

Because of this, Wikipedia is fundamentally a curated tertiary source — when it works, it’s a researched and verified work that points to references both original and secondary, but mostly the latter.

It’s garbage in, garbage out. It’s only as good as the verifiable sources and references it can link to.

But it is also this policy that infuriates many Wikipedia outsiders.

During the debate over Roth’s edits, one Wikipedia administrator (an experienced editor in the volunteer community) cited Wikipedia’s famous refrain:

Verifiability, not truth, is the burden.
- ChrisGualtieri (talk) 15:53, 8 September 2012 (UTC)

By design, Wikipedia’s community couldn’t use an email from an original source as the final word. Wikipedia depends on information from a reliable source in a tangible form, and the verification it provides.

Reliable sources perform the gatekeeping function familiar in academic publishing, where peer review guarantees a level of rigor and fact checking from those with established track records.

But even with rigorous references, verifiability can be hard.

Consider Roth’s New Yorker piece, where he says:

“The Human Stain” was inspired, rather, by an unhappy event in the life of my late friend Melvin Tumin, professor of sociology at Princeton for some thirty years.

Compare that to the 2008 interview, when asked, “Is Coleman Silk, the black man who willfully passes as white in “The Human Stain,” based on anyone you knew?” Roth said, “No.

This would seem to contradict the New Yorker article. This doesn’t make Roth dishonest. Rather, Roth likely interpreted the question differently in a spoken interview as to whether he knew anyone who “passed” in real life, as Silk did in the novel.

The point of all this?

Truth via verification is not easy or obvious.

Even with multiple reliable sources: a direct transcript from an interview, or the words from the author himself, ferreting out the truth requires standards and deliberation.

As of this writing, Roth’s explanation about the Coleman Silk character has become the dominant one in the Wikipedia article, as it should be.

However, the erroneous speculation about Anatole Broyard was so prevalent and widely held in the years before Roth’s clarification, that it still has a significant mention in the article for historical purposes. There’s still debate how prominent this should be in the entry, given that it’s been flatly denied by Roth.

Lessons

Roth’s New Yorker article caused the article to be fixed, but getting such a prominent soapbox is not a solution that scales for everyone who has a problem with Wikipedia.

After a decade of Wikipedia’s existence as the chaotic encyclopedia that “anyone can edit,” its ironic that its stringent standards for verifiability and moving slowly and deliberately with information now make those qualities a target for criticism.

Wikipedia has been portrayed as being too loose (“Anyone can edit Wikipedia? How can I trust it?”) and too strict (“Wikipedia doesn’t consider Roth a credible source about himself? How can I trust it?”). The fact is, on balance, this yin-yang relationship serves Wikipedia well the vast majority of the time by being responsive and thorough — by being quick by nature, yet slow by design.

It continues to be one of the most visited web properties in the world (fifth according to ComScore), by refining its policies to observe the reputation of living persons and to enforce accuracy in fast-changing articles. Most outsiders would be surprised to see how conscientious and pedantic Wikipedia’s editors are to get things right, despite a mercurial volunteer community in need of a decorum upgrade and the occasional standoff with award winning novelists.

 

A model for Wikipedia version comparison?

I saw an interesting video the other day for the Mac file comparison tool Kaleidoscope, which has an elegant “diff” visualization tool to show changes between two files. The nice thing is the ability to select different modes of highlighting and seeing changes.

I can imagine that the basic file diff tool that Wikipedia has now could learn something from this to help folks sift through changes faster, especially with something as complex as WikiMarkup.

Take a look:

Discussing Wikipedia and Haymarket affair on NPR Talk of the Nation

Today I was on NPR Talk of the Nation discussing the latest controversy around the editing of the [[Haymarket affair]] article in Wikipedia. [See the NPR page here].

In the Chronicle of Higher Education, professor Messer-Kruse documented his attempts to update the Wikipedia entry over a two year period to reflect his groundbreaking research. He was rebuffed and wrote:

The “undue weight” policy posed a problem. Scholars have been publishing the same ideas about the Haymarket case for more than a century. The last published bibliography of titles on the subject has 1,530 entries.

“Explain to me, then, how a ‘minority’ source with facts on its side would ever appear against a wrong ‘majority’ one?” I asked the Wiki-gatekeeper. He responded, “You’re more than welcome to discuss reliable sources here, that’s what the talk page is for. However, you might want to have a quick look at Wikipedia’s civility policy.”

I tried to edit the page again. Within 10 seconds I was informed that my citations to the primary documents were insufficient, as Wikipedia requires its contributors to rely on secondary sources, or, as my critic informed me, “published books.” Another editor cheerfully tutored me in what this means: “Wikipedia is not ‘truth,’ Wikipedia is ‘verifiability’ of reliable sources. Hence, if most secondary sources which are taken as reliable happen to repeat a flawed account or description of something, Wikipedia will echo that.

Therein lies the problem.

Wikipedia depends on secondary sources. If recent scholarship, though accurate, still accounts for a minority view, Wikipedia will wait until the majority view recognizes these new advances as canon. Only then will the article reflect those changes.

What we’re seeing with the new evidence published by Messer-Kruse in August 2011, is that there is a lag time before scholarship is generally accepted. To wit, on the radio show, I read the blurb from the professor’s book and showed why it was perhaps the “perfect storm” of conditions such that Wikipedia would not include his findings right away (emphasis mine).

In this controversial and groundbreaking new history, Timothy Messer-Kruse rewrites the standard narrative of the most iconic event in American labor history: the Haymarket Bombing and Trial of 1886. Using thousands of pages of previously unexamined materials, Messer-Kruse demonstrates that, contrary to longstanding historical opinion, the trial was not the “travesty of justice” it has commonly been depicted as.

Every bolded word is in stark contrast to Wikipedia’s policies of verifiability, reliable sources and undue weight given to minority viewpoints. That is, this book is the first revelation of these new findings, and they haven’t been taken as a consensus view, at least not yet.

In time, if the facts hold up, and there is every reason to believe they will, the rest of Haymarket affair scholarship will reflect the new research, and Wikipedia will reflect that.

So chalk this up under one of the more unusual and modern complaints about Wikipedia you almost never hear:

Can’t you move faster? You’re going too slow.

Wikipedia Blackout Against SOPA

January 18 is a historic day, marking one of the largest online protests ever, against the US SOPA and PIPA bills making their way through Congress. This Storify stream tries to explain the context of Wikipedia’s blackout and the details of the protest and was featured on the front page of Storify.com Jan 17-18.


Check it throughout the blackout period for the latest updates.

 

Teaching Visual Storytelling: The five-shot method and beyond

At Journalism Interactive at the University of Maryland, I’m giving a lightning talk about Teaching Visual Storytelling: The five-shot method and beyond. In addition to the talk about using Michael Rosenblum’s five shot method at USC, I have included checklists journalists can use in the field for shooting better video.

Analyzing Occupy Wall Street, with Rushkoff and Wikipedia

Doug Rushkoff has a great piece on CNN deconstructing the Occupy Wall Street motivations and goals. Just publishing this is commendable on the news network’s part, since he aims his sights right on CNN’s own anchor Erin Burnett for the shallow, gotcha journalism she debuted this week on her new TV show.

I’d also been thinking along Rushkoff’s lines. What exactly was Occupy Wall Street trying to achieve? In many ways, it resembled the WTO protests I covered in 2005 in Hong Kong. That mishmash of protesters from the “Global South,” subsidized farmers from Korea, Southeast Asian sex workers, and domestic maids, among others, had common gripes, but exhibited no central leadership or coherent manifesto. You felt the vibe. You knew what they were against. But you didn’t know where it was going.

WTO protesters in 2005 in Hong Kong

To me, Occupy Wall Street reminds me a lot like the folks who edit Wikipedia — a leaderless grassroots gathering of passionate individuals with similar concerns, trying to find consensus. Rushkoff describes this better as: a “decentralized network-era culture,” concerned about sustainability in their movement, rather than victory.

“It is not about one-pointedness, but inclusion and groping toward consensus. It is not like a book; it is like the Internet,” says Rushkoff.

The full piece is worth the read, because it’s this type of analysis Rushkoff does best: Think Occupy Wall St. is a phase? You don’t get it – CNN.com.

On Steve Jobs, NeXT and the WWW

In memoriam: excerpt from my book, The Wikipedia Revolution, talking about Steve Jobs and his role in creating the read/write Web we know today. RIP.

When Steve Jobs was forced out as the head of Apple Computer in 1987, he stayed in Silicon Valley and put his energies into a new start-up called NeXT. This was while Apple was still shipping computers with nine-inch screens and Microsoft’s most advanced product was an anemic and stiff-looking Windows 2.0. The NeXT machine, on the other hand, launched in October 1988, introduced pioneering features we’re all used to now: a high-resolution “million pixel” display, a read/write optical drive, and a true multitasking operating system. And in classic Steve Jobs style, it was clad in a sexy all-black magnesium cube form factor that made it the envy of computer science departments around the world.

The NeXT megapixel grayscale computer display was its most stunning feature. What it lacked in color it made up for in fineness and texture. It was so large and sharp, folks compared it to reading on paper. This was no coincidence—it used PostScript, a special language from Adobe Systems usually reserved for high-end paper printers.

So when [WWW creator Tim] Berners-Lee was testing out his idea for a World Wide Web to share documents, he used his NeXT cube computer that was geared toward handling high-resolution documents. The first Web browser he ever built was for the NeXT machine, in February 1991. But he had much grander plans than simply creating a “browser” for reading, and in fact called his program a “browser-editor.”

Not only did his program on the NeXT read and display Web pages, it could also alter them and save them. This was a function Berners-Lee had envisioned from the start—a read-write Web of information for sharing.

Given its rich and ambitious origins, it is then quite peculiar that the Web that became popular in the mid-1990s was known only for reading, browsing, and surfing. In the exuberance to push the reading experience, the “write” stuff, which was always meant to be part of the Web, was left behind as a cumbersome feature.

While the first Web browser from Tim Berners-Lee gained notoriety, there was a problem. The sexy features of the NeXT were not cheap. They offered only one model, and few folks could afford a $6,500 NeXT cube. Even NeXT’s follow-on budget version, the NeXT “slab,” was $4,995. It was hardly a computer for the masses.

But oh, what Steve was able to do since then. He returned to Apple, made the NeXT operating system the basis of all Macs, and came to dominate the world of music players, smart phones, and tablet computing.

RIP Steve Jobs, you really did make “computers for the rest of us.”