I usually start off these posts in wonderment at the evolution of technology within my adult life. This week is no exception as I ponder the ‘control-F’ style text search of my word-processing past in comparison with this week’s focus on text analysis.
Textual analysis is defined as an information gathering process that assists researchers to find ways of understanding how human beings make sense of their world. This can involve working with a mixture of simple quantitative approaches where words mentions are counted, or with a more qualitative method of, for example, ‘concordance,’ where the search and comparison for contexts of words, sentences or passages that share a keyword or phrase may reveal patterns. However, it’s here where I must acknowledge the bigger picture surrounding such quantitative and qualitative methodologies and the links to Franco Moretti and the hotly debated camps of ‘distant reading’ and ‘close reading.’ And rather attempt to explain this in words of my own, I found this Youtube link called ‘Big Data + Old History’ which has been the most accessible step for me in beginning to understand reasons for distant reading.
Currently there is a wide range of free, online tools that can offer a variety of ‘lenses’ to interpret large corpuses of digitised texts – and so for this weeks blog, the citylis DITA gang were asked to reflect on the user experience of Tagxedo, Wordle, Many Eyes, Voyant, and for the brave amongst us, there was the more techie-based TAPoR and ‘R’.
Both Taxgedo and Wordle provide word cloud style visualisations, showing the popularity of words by size of font. Whilst they are appealing to the eye, a useful tool to start an enquiry and have value for engaging younger users, they have also been criticised in terms of appropriacy. Jacob Harris from the journalism standards site NiemanLab describes word clouds as ‘mullets of the internet’ and provides a sharp critique on their potentially misleading nature when qualitative tempering is absent.
Voyant is a text-analysis program that also possesses a word cloud tool, but in addition has many other options for user ‘lenses,’ such as graphs of word frequency across the chosen corpus and a format to compare multiple keywords side by side. Geoffrey Rockwell, one of the project leaders behind the tool reassures us that these computer methods do not ‘replace human interpretation,’ rather that they enhance the researchers ability to perceive trends, test intuition and provoke further focussed re-reading and ‘re’-questioning.
Having experimented with Voyant and data from Flickrs BL 1 Million photostream during the recent map tagathon and the Book of Humour, Wit and Wisdom spreadsheet I compiled to assist the Victorian Meme Machine project, the following screenshots and brief observations were made. For both sets of data, ‘stop lists’ of words, such as common English connectives were added to help focus possible questions.
As expected from the BL Flickr data, the highest word counts were naturally the tag ‘map.’ But then coinciding with that, there were 2 prominent Flickr tagger numbers, the details of which I took from the raw text and fed back online to reveal their identities. For this statistical data, the word cloud felt appropriate as a quick insight to the activity of the date range. I’ve no doubt the Flickr API could also supply this info on top taggers – but this way was fun !
Next was the visualisation for the Book of Humour, Wit and Wisdom. Quick reveals from the word cloud suggested a rather male dominated theme. No surprises there I supposed, given the social bias of the era. But was there any link between ‘men’ and humour ? – did the accidental juxtaposition of ‘great man’ bear any further meaning?
A look at the corpus reader showed that the word ‘man’ was particularly concentrated in one area of the text. On closer reading, this account was far from complimentary towards the gentlemen ‘of street car society.’
Finally I turned to TAPoR text analysis to try the concordance tool for ‘man’ and received 197 entries, which suggested that a fair proportion of the word associations linked to ‘man’ were in fact synonyms of ‘old’ – which could correlate more with the book’s title of ‘wisdom.’ There’s suddenly a whole pathway of next steps opening up – and this text is relatively small. However at this point, time dictates that I leave the investigation but I have thoroughly enjoyed beginning this exploration and can see how this process perpetuates thought and takes root.
More on Text mining next week… till then, toodle-loo ! 😉
Header Word Cloud from this blog courtesy of Tagxedo