HyperCities (http://hypercities.com) was a pioneering Digital Humanities project in “thick mapping” directed by UCLA Professor Todd Presner. It used an interactive platform built on Google Maps to facilitate the curation and exploration of collaboratively compiled maps, images, texts, and other media associated with specific places and time periods. Upon the conclusion of active development and maintenance of the project in 2014, the UCLA Digital Library Program assumed stewardship of some of the valuable digital artifacts accumulated over the long life of the HyperCities project. These included large collections of Twitter messages pertaining to events such as the March 2011 Tōhoku earthquake, tsunami and nuclear disaster, and also the “Arab Spring” political movements that began in 2011.
This page presents a case study of an archive of Twitter messages collected for a HyperCities sub-project focused on the Egyptian revolution of January and February 2011. It considers ways of summarizing and exploring such a collection of social media records after they have been disconnected from their original presentation context and in situations in which direct user access to the source data is impractical due to privacy concerns, the sheer scale of the collection, and the relative triviality of any single item.
In response to the National Police Day protests in Egypt on January 25, 2011, HyperCities researchers Presner, David Shepard and Yoh Kawano rapidly built an online interface called HyperCities Now to capture and visualize a real-time sample of Twitter messages related to the protests, using Twitter’s “streaming” application programming interface and the HyperCities platform. The interface remains available at the HyperCities Egypt site, egypt.hypercities.com. The team subsequently reused these technologies to provide similar portals showing tweets regarding the 2011 Libyan uprising and civil war (libya.hypercities.com) and the March 2011 Tōhoku earthquake, tsunami and nuclear disaster (sendai.hypercities.com).
As described in their book HyperCities: Thick Mapping in the Digital Humanities from 2014, Presner, Shepard and Kawano recorded 420,933 tweets and retweets between January 30, 2011 and February 25, 2011 by filtering Twitter’s public “spritzer” data stream for messages using at least one of the hashtags #jan25, #egypt, and #tahrir, and with a source location (either actual coordinates from the user’s mobile device, or a location inferred from their user profile) within approximately 200 miles of Cairo (this was intended to include messages from Egypt’s second largest city of Alexandria).
Due to this filtering approach, the messages captured were more likely to originate from actual sources “on the ground” in Egypt, rather than international commentators and media outlets. It is important, however, to note the unreliability of Twitter’s location data, given that much of it is derived from place names in user profiles (e.g., “Cairo, Egypt”), as well as the more general skepticism voiced by many commentators regarding the questionable level of Twitter use among non-international participants in the Arab Spring. Even so, the advent of real-time social media visualization like HyperCities Egypt raised concerns that user location data actually could place some activists at greater risk of reprisals by revealing their recent positions -- just as, conversely, such information might prove life-saving in certain disaster scenarios.
Previous analyses of the HyperCities Egypt collection
The original HyperCities Now real-time tweet playback sites likely will remain viable for some time, as will the further visualizations of “networks” of related hashtags from the Twitter collections regarding the 2011 Egypt and Libya uprisings created by Presner, Shepard and Kawano for the companion site to their book. Yet it is still important to consider alternative ways of visualizing such collections. Furthermore, in 2015 Twitter provided the ability to retrieve non-deleted tweets from the entire history of the service (which began operation in 2006) via its search API, meaning that it would be possible to reconstruct and even surpass the amount of data contained in the original HyperCities collections by running such queries against the search API or obtaining the tweets from a third-party provider. From the archivist’s perspective, however, such resources may not always be available, and in any case there is scholarly value in being able to examine a historical collection to observe the data that was available to researchers at the time a specific event was occurring.
Frequency plot over time
Simple tweet frequency graphs -- for example, showing the total number of recorded tweets and retweets in each hour -- provide a basic overview of a temporally oriented Twitter archive such as the HyperCities Egypt collection. Most noteworthy in this graph are the two spikes related to the resignation of Egyptian president Hosni Mubarak (the first, on February 10, when he gave a televised address in which he was expected to resign but did not, and then on the following day when he left office officially), the high volume of tweets and retweets leading up them, and the subsequent reduction in tweet volumes. A plot of the total volume of tweets per hour (the “Users” line) compared to the number of unique users per hour largely echoes the hourly tweet/retweet graph above. The “Unique users” line possibly gives a better indication of the number of actual users actively commenting on the event in each hour.
Hashtag and username word clouds
A word cloud also can be effective at providing an at-a-glance summary of certain aspects of a social media collection.
This cloud uses word sizes and colors to visualize the relative frequencies of hashtags in the collection other than the main search hashtags used to identify the tweets to be captured (#jan25, #egypt, #tahrir). It indicates that Hosni Mubarak, the location of Cairo, and the subsequent uprising in Libya were the subjects of other frequently occurring hashtags.
Using a word cloud to visualize the Twitter user handles that appeared the most often in the collection as either senders of tweets or users mentioned in tweets can reveal the main “participants” in the discussion, and also can hint at related aspects such as the degree to which a few users dominated the discussion. Examination of the names by subject specialists also can give some indication of how many official or non-affiliated media sources may have participated in the Twitter discussion. Note, however, that including potentially identifying information (such as personal names) in summary visualizations can be a controversial practice, especially if the person wishes to delete their previously public comments or otherwise obscure their participation in the event after the fact.
Comparing Twitter to television news coverage
The usefulness of archival social media collections increases considerably if their contents can be correlated and contrasted with those of other digital archives, whether stored locally or remotely, that overlap in subject matter, time, or other aspects. In the case of the Arab Spring-related Twitter materials, the extensive recordings of recent television news stored in UCLA’s NewsScape digital resource provided plentiful material for comparison.
The NewsScape (http://tvnews.library.ucla.edu, currently restricted to the UCLA campus and affiliates) contains roughly one hundred hours per day of digitized television news shows from the time period of the Egyptian and Libyan uprisings, recorded primarily from US-based local and national news outlets, although it includes a significant number of international news sources. The contents of the recordings in the NewsScape are made fully searchable via full-text indexing of the transcripts of the news programs as well as detected on-screen text, facilitating direct comparisons of the frequencies of occurrence of key terms in television news coverage to the appearances of related terms in social media.
The frequencies of the terms on television are quite a bit lower than on Twitter, and there are some term “mismatches” across these disparate forms of media. The totemic significance of the #jan25 hashtag, for example, had no equivalent on television. Like the recorded tweets, the television news recordings also represent an unscientific sampling of all possible news programs discussing the situation in Egypt, and exhibit a similar diurnal pattern.
This interactive visualization plots the hourly counts of tweets from the Egyptian revolution discussed above (values on the left vertical axis) against hourly counts of the appearances of the words “Egypt,” “Cairo,” and “Tahrir” in the transcripts and on-screen text of the television news programs in the NewsScape (values on the right vertical axis) recorded during roughly the time period when the Twitter collection was being captured.
The matching peaks in television news coverage during the events related to Hosni Mubarak’s resignation (discussed above) are perhaps unsurprising, but the divergences between the two plots in the days prior to this event are more intriguing. For example, the television news recordings, simply by virtue of extending further back in time that the Twitter collection, indicate an initial spike in interest in the situation in Egypt (at last among western media) around January 29, reacting to the intensification of the protests and the Egyptian government response on January 28. The following several days, the first for which Twitter records exist, show continued elevated interest in the NewsScape collection but only sporadic records in the Twitter collection, which is possibly a consequence of the improvisatory nature of the HyperCities Twitter capture system and also perhaps related to the partial government blockage of Egyptian telecommunication systems beginning on January 27 (which in some interpretations of the event was instrumental in causing more Egyptians to take to the streets). The noticeable drop in Egypt-related terms on television just as the captured Twitter materials record more activity is somewhat puzzling, however.
Comparison to the Libyan Revolution/Civil War
Finally, it can be useful to compare timeline visualizations of different but related events. The above interactive plot shows the hourly frequencies of a collection of 364,645 Twitter messages recorded by the HyperCities team between February 20 and May 17, 2011 by filtering for tweets with locations within several hundred miles of Tripoli (encompassing much of the Mediterranean and southern Europe) and using the hashtags #libya, #feb17, and #gaddafi. This series is compared to the number of hourly occurrences of the words “Libya,” “Gaddafi,” and “Tripoli” on air and on screen during television news programs recorded for the NewsScape between February 16 and May 17.
Some aspects of this visualization resemble those of the Egyptian Revolution case above: although the television news collection captures the first stirrings of media interest in the event, hourly Twitter records are not available until February 23, roughly a week after the beginning of the “February 17th Revolution” -- a period that saw the Libyan regime shut down and then restore the country’s Internet connection twice -- which underscores the difficulty of capturing emerging events via on-demand social media capture (Internet access was cut again on March 3, an event that the Twitter graph seems to record). The volume of mentions on Twitter and on television spikes during key early occurrences of anti-regime protests and government reprisals, but the profile of this event deviates eventually from that of the Egyptian revolution as it becomes apparent that the regime will not be overthrown by a rapid popular uprising but rather via a drawn-out civil war. The level of attention on both broadcast and social media therefore eventually diminishes; indeed, the HyperCities capture project ends on May 17, more than three months prior to the capture of Tripoli by rebel forces.
Record of the HyperCities project in the UCLA Digital Library collections system: http://digital2.library.ucla.edu/projects_collaborations/Hypercities-Los-Angeles.shtml
Todd Presner, David Shepard, Yoh Kawano. 2014. HyperCities: Thick Mapping in the Digital Humanities. Cambridge, MA: Harvard University Press. http://www.hup.harvard.edu/catalog.php?isbn=9780674725348