Metadata is data about data, ‘meta’ meaning “an underlying definition or description.” According to Chowdhury and Chowdhury, 2007, metadata, ‘describes the various attributes of a resource that are deemed useful to access, retrieve and manage it’ and facilitates ‘its discovery, use, sharing and re-use.’ After all, what is the value of a resource that no-one can find and particularly in the 21st century, that no dynamic computer program or search can harness.
Social media as metadata…
Regarding our modern, interactive Web 2.0 world, our studies this week asked us to consider the use of social media as a method of research, in particular a focus on Twitter coupled with an archiving and data visualisation tool called TAGS (Twitter Archiving Google Sheet). TAGS has been developed by Martin Hawksey over the last few years and is a mash-up of the Google Docs and Twitter API, designed for ease of use, originally to automatically monitor tweets at event Twitter hashtags. Once the analysis tool is used, hashtag data results are interpreted visually in a number of ways alongside the data spreadsheet.
So what’s in a tweet?
Astonishingly, around 150 rich points of data, as the popular ‘Map of a Tweet’ by Raffi Krikorian shows. This includes ‘a unique numerical ID attached to each tweet, as well as IDs for all the replies, favorites and retweets that it gets. It also includes a timestamp, a location stamp, the language, the date the account was created, the URL of the author if a website is referenced, the number of followers, and many other technical specifications that engineers can analyze.’ (Dwoskin, WSJ 2014)
Rich pickings in the data mine…
With an estimated 400 million tweets a day there is alot of data! And in order to manage that data for researchers, there are in fact 3 different types of Twitter API allowing different sampling under different terms.
And just like tweets, similarly, across API-linked network giants, the analytical unpacking, mixing, reboxing and display of metadata continues, feeding the marketing and entrepreneurial needs of business and the scholarly needs of our own academic and cultural pursuits.
But with so many social ways to create, share and receive information, we must consider the bigger picture regarding our choices. We must make good decisions both in matters of our own information management and privacy as active web community users and our responsibility for informed representation and ethical practice as researchers. These issues are discussed in many current articles and pose a great deal of debate as Priego, 2014 highlights, ‘There is a wealth of information in a tweet’s metadata that can be beneficial for research in fields other than the Life Sciences. The act of archiving and disseminating public information publicly does not have to be cause for an “ethical dilemma”, as long as the archived and disseminated information was public in the first instance.’ Also that, ‘Individuals worried about the data they publish publicly, freely, openly on Twitter being collected by researchers for research purposes other than the ones they intended should perhaps reconsider how Twitter works.’
But a word of caution..
Furthering the note on researcher responsibility, Martin Hawksey actively provides caution on his TAGS site, posting a quote from a 2012 investigation of bias in large online network sourcing that, ‘We find that the (Twitter) search API over-represents the more central users and does not offer an accurate picture of peripheral activity; we also find that the bias is greater for the network of mentions.’
In addition, a viewpoint from Alistair Brown, LSE blog states that as Twitter data from the free streaming API is limited to a seven day provision, ‘one problem with Twitter is that it does not maintain an easily searchable archive of tweets, meaning that any engagement activity may be lost if not captured more or less as it happens.’ Brown goes on to report that analytics software struggled to produce consistent search results in his experience and that truer representations were accessible whilst employing a platform such as ‘Tweetdeck’ where specific searches can be added as columns and observed in real time.
And finally, a 2013 study by Morestatter et. al, can be seen here. comparing and contrasting the 7 day streaming API and the ‘Twitter Firehose’ which, although allowing full access to the leviathan of all real time public tweet data, is a very costly approach and a difficult challenge to facilitate.
But time to ajourn for now…more on Twitter analysis next week…thanks for reading 😉
Header image courtesy of Wallmu.com, Owl gif courtesy of Giphy.com