Text through the Looking Glass…

I usually start off these posts in wonderment at the evolution of technology within my adult life. This week is no exception as I ponder the ‘control-F’ style text search of my word-processing past in comparison with this week’s focus on text analysis.

Textual analysis is defined as an information gathering process that assists researchers to find ways of understanding how human beings make sense of their world. This can involve working with a mixture of simple quantitative approaches where words mentions are counted, or with a more qualitative method of, for example, ‘concordance,’ where the search and comparison for contexts of words, sentences or passages that share a keyword or phrase may reveal patterns. However, it’s here where I must acknowledge the bigger picture surrounding such quantitative and qualitative methodologies and the links to Franco Moretti and the hotly debated camps of ‘distant reading’ and ‘close reading.’ And rather attempt to explain this in words of my own, I found this Youtube link called ‘Big Data + Old History’ which has been the most accessible step for me in beginning to understand reasons for distant reading.

Currently there is a wide range of free, online tools that can offer a variety of ‘lenses’ to interpret large corpuses of digitised texts – and so for this weeks blog, the citylis DITA gang were asked to reflect on the user experience of Tagxedo, Wordle, Many Eyes, Voyant, and for the brave amongst us, there was the more techie-based TAPoR and ‘R’.

Both Taxgedo and Wordle provide word cloud style visualisations, showing the popularity of words by size of font. Whilst they are appealing to the eye, a useful tool to start an enquiry and have value for engaging younger users, they have also been criticised in terms of appropriacy. Jacob Harris from the journalism standards site NiemanLab describes word clouds as ‘mullets of the internet’ and provides a sharp critique on their potentially misleading nature when qualitative tempering is absent.

Voyant is a text-analysis program that also possesses a word cloud tool, but in addition has many other options for user ‘lenses,’ such as graphs of word frequency across the chosen corpus and a format to compare multiple keywords side by side. Geoffrey Rockwell, one of the project leaders behind the tool reassures us that these computer methods do not ‘replace human interpretation,’ rather that they enhance the researchers ability to perceive trends, test intuition and provoke further focussed re-reading and ‘re’-questioning.

Having experimented with Voyant and data from Flickrs BL 1 Million photostream during the recent map tagathon and the Book of Humour, Wit and Wisdom spreadsheet I compiled to assist the Victorian Meme Machine project, the following screenshots and brief observations were made. For both sets of data, ‘stop lists’ of words, such as common English connectives were added to help focus possible questions.

flickr screenshot 5

Voyant for Flickr data 31st Oct – 3rd Nov 2014

As expected from the BL Flickr data, the highest word counts were naturally the tag ‘map.’ But then coinciding with that, there were 2 prominent Flickr tagger numbers, the details of which I took from the raw text and fed back online to reveal their identities. For this statistical data, the word cloud felt appropriate as a quick insight to the activity of the date range. I’ve no doubt the Flickr API could also supply this info on top taggers – but this way was fun !

vmm screenshot man

Voyant for the ‘Book of Humour, Wit and Wisdom, A Manual of Table-talk’ (1874)

Next was the visualisation for the Book of Humour, Wit and Wisdom. Quick reveals from the word cloud suggested a rather male dominated theme. No surprises there I supposed, given the social bias of the era. But was there any link between ‘men’ and humour ? – did the accidental juxtaposition of ‘great man’ bear any further meaning?

A look at the corpus reader showed that the word ‘man’ was particularly concentrated in one area of the text. On closer reading, this account was far from complimentary towards the gentlemen ‘of street car society.’

Finally I turned to TAPoR text analysis to try the concordance tool for ‘man’ and received 197 entries, which suggested that a fair proportion of the word associations linked to ‘man’ were in fact synonyms of ‘old’ – which could correlate more with the book’s title of ‘wisdom.’ There’s suddenly a whole pathway of next steps opening up – and this text is relatively small. However at this point, time dictates that I leave the investigation but I have thoroughly enjoyed beginning this exploration and can see how this process perpetuates thought and takes root.

Part of TAPoRware concordance tool results for 'man.'

Part of TAPoRware concordance tool results for ‘man.’

More on Text mining next week… till then, toodle-loo ! 😉

blog cloud2


Header Word Cloud from this blog courtesy of Tagxedo 

Altmetrics…quality vs quantity

The last time I trawled the scholarly ‘sea’ for relevant, quality research was when I was studying for my Postgraduate Diploma/MEd around 5 years ago. Back then, citation and impact factor,  government papers and contemporary, trending  theorists and topics were the way to navigate, assess the waves and hopefully make a good catch.

donut2But nowadays there are ‘donuts.’ Woven, colourful donuts that visualise the online ‘attention’ that scholarly articles in journals attract. And before you start thinking that I’ve had a senior moment and mixed up my home baking blog with this one, I am in fact referring to the donut style visualisation from Altmetric.com, a company who have ‘created and maintained a cluster of servers that watch social media sites, newspapers, government policy documents and other sources for mentions of scholarly articles,’ bringing all the recognition together to formulate article level metrics or “alternative metrics.”

Altmetrics.com present a very user friendly ‘Explorer’ interface for search and analysis using the Altmetrics API (also available for scholars/developers), a bookmarklet that you can drag to your search engine task bar that will report on attention received by research you visit online and embeddable ‘donut’ or label badges to denote online impact on users’ article pages. The two previous highlighted links also provide simple overviews as does the embed below.

Besides Altmetrics.com, there are a variety of websites and projects that are calculating online impact, such as ImpactStory, Plum Analytics, Public Library of Science (PLoS) and Publish or Perish. In turn, publishers have begun providing such information to readers, including the Nature Publishing GroupElsevier and (again) the Public Library of Science,

The evolving field of altmetrics provides article-level data. This is in contrast to the traditional bibliometric, journal level, citation method which has received criticism for it’s quantitative bias that can be slow to reveal impact and open to manipulation.

As the altmetrics method uses a range of data sources, it is suggested that it can provide qualitative as well as quantitative information, and aspires to give a finer tuned picture of an article’s influence. It also has possible advantages of constructing that picture at a much greater speed than that of academic publishing.

altmetricHowever, as altmetrics are still in their infancy, there is not as yet a shared view on what choices, analysis or data combinations are a reliable indicator of influence. In addition, there is debate on the correct conduct within and across Twitter, blogs and other social media sources. Altmetrics.com comment themselves in their blog that , ‘Each altmetrics tool will have its own way of handling suspicious activity,’ and that they use ‘a combination of automatic systems and manual curation,‘ that does take much time and effort and so the company also requests that users aid monitoring and report anything unusual.

In terms of addressing scholarly consistency and widening access and impact to research, Ernesto Priego comments on the need for curating and maintaining an academic audience on Twitter, so that a tweeted article is propelled to an optimum reach. A ‘yin yang’ synergy of qualitative and quantitative methods is also argued for, with one informing and the other tempering, culminating in a fair and hopefully trustworthy measure.

Finally, just as assessment has always needed moderation in my familiar world of education and teaching, so does setting agreed standards in what constitutes quality assessment of research in order to bring excellence and consistency to practice. DORA (the San Francisco Declaration on Research Assessment – to which Altmetrics.com has signed) currently provides recommendations for academic institutions, funding agencies and organisations that supply metrics, reminding us in its 2012 report that it is ‘imperative that scientific output is measured accurately and evaluated wisely.’

Natasha Wescoat  'wescoatart' http://goo.gl/HqwulS

Lovely donuty tree picture source Natasha Wescoat ‘wescoatart’