Every year, EDGE.org asks its annual question to top academics, authors, scientists and thinkers. This year’s question: WHAT *SHOULD* WE BE WORRIED ABOUT?
My contribution talks about the plight of user-generated information held within closed, restricted or access-throttled systems, and whether what this new digital “public sphere” is in fact public. Are the creators (and thereby, owners) of social media content (tweets, posts, status updates, photos, et al), assured access to their own work, and for public use, now and in the future?
By strange coincidence, many things I discuss are what Aaron Swartz was fighting for. I finished the piece the day before Aaron took his own life, and hope the things he was passionate about can continue and that we reexamine prosecuting “computer fraud.” RIP Aaron.
Here’s a link to all EDGE responses, or see below for just my piece.
Is The New Public Sphere… Public?
Andrew Lih, Associate Professor of Journalism, Annenberg School for Journalism; Author, The Wikipedia Revolution
The advent of social media sites has allowed a new digital public sphere to evolve, by facilitating many-to-many conversations on a variety of multimedia platforms, from YouTube to Twitter to Weibo. It has connected a global audience and provided a new digital commons that has had profound effects on civil society, soical norms and even regime change in the Middle East.
As important as it has become, are critical aspects of this new public sphere truly public?
There are reasons to be worried.
While we are generating content and connections that are feeding a rich global conversation unimaginable just 10 years ago, we may have no way to recreate, reference, research and study this information stream after the fact. The spectrum of challenges is daunting whether it’s because information is sequestered in private hands, throttled from full access, deleted from sight, retired with failed businesses or shielded from copying because of legal barriers.
Twitter, in particular, has emerged as the heart of a new global public conversation. However, anyone who’s ever used its search function knows the second chance to find content is dubious. Facebook works in a private eyes-only mode by default and is shielded even more from proper search and inspection, not only by the public, but even from the creators of the original content.
How about the easier case of individuals simply asserting control over their own content within these services? User of social media content systems still have sole copyright of their content, though the terms of service users agree to is rather extensive.
Twitter’s is fairly typical: “you grant us a worldwide, non-exclusive, royalty-free license (with the right to sublicense) to use, copy, reproduce, process, adapt, modify, publish, transmit, display and distribute such Content in any and all media or distribution methods (now known or later developed).”
Without passing judgment on the extent or reach of these types of license agreements, the logistics of accessing one’s own data are worrisome. Typically, these services (Twitter, Facebook, Weibo, Instagram, et al) are the sole digital possesser of your words, links, images or video that were created within their system. You may own the copyright, but do you actually possess a copy of what you’ve put into their system? Do you actually control access to your content? Do you have the ability to search and recall the information you created? Is public access to your data (eg. through application programming interfaces) possible now, or guaranteed in the long term?
That we continue to use an array of information systems without assurances about their long term survivability or commitment to open access, and whether they are good stewards of our history and public conversation, should worry us all.
What can be done about this?
To its credit, Twitter has partnered with the Library of Congress to hand over the first four years worth of tweets from 2006 to 2010 for research and study. Since that first collaboration, it has agreed to feed all tweets to the Library on an ongoing basis. This is commendable, but it’s drinking from a virtual firehose, with roughly half a billion new tweets generated every day. Few entities have the technology to handle very big data, but this is truly massive data.
The Twitter arrangement has provided quite a challenge to the Library as they don’t have an adequate way to serve up the data. By their own admission, they haven’t been able to facilitate the 400 or so research inquiries for this data, as they’re still addressing “significant technology challenges to making the archive accessible to researchers in a comprehensive, useful way.”
So far, the Library hasn’t planned on allowing the entire database to be downloaded in its entirety for others to have a shot at crunching the data.
We have reasons to worry that this new digital public sphere, while interconnected and collaborative, is not a true federation of data that can be reconstructed for future generations and made available for proper study. Legal, infrastructural and cooperative challenges abound that will likely keep it fractured, perforated and incoherent for the foreseeable future.