Dabbling in Digital – An Initiation with British Library Labs

British Library Labs is an initiative funded by the Mellon Foundation, currently in its third year. The project actively encourages researchers and developers to work with the Library and its digital collections to address research questions. To achieve this, Labs run yearly competitions, promotion events and assists the Library in the exposure of its digital content for reuse and repurposing. BL Labs work closely with the Digital Scholarship team and run a regular blog.

During my time as a project worker with Labs, I have mostly been involved in the ‘opening up’ access process. My responsibilities have focussed on using filtering criteria to ascertain which collections hold most potential for investigating with regard to the challenges of access and copyright. Upon selection, I have initiated research and contact with collection curators to build a background narrative in preparation for a department presentation and review for publication.

In addition to content research, I have found myself organising delegate packs and mail shots for events, updating the Labs wiki website, researching and composing tweets, transcribing material for winning competition projects and editing film for potential press releases. As the Labs team are pretty busy, I have had to work independently, think on my feet and look up new topics, take a vertical learning curve with Google Drive, spreadsheets and understand the Library’s own database management systems. It’s been demanding and unpredictable but also exciting and satisfying, developing my knowledge of access issues and allowing me to communicate with professionals in the field.

I joined Labs a year after the Flickr 1 Million release, just as the project was approaching its first anniversary and reaching some intriguing developments such as the mass algorithmic tagging of maps by the prolific coding wizard and artist Mario Klingemann (Qusaimondo) and the striking reuse of the images by collage artist David Normal, (exhibited for the 2014 Burning Man Festival, now set for being showcased at the British Library). I originally highlighted some of these projects in relation to the technologies here in the DITA category.

My time with Labs has been serendipitous. Seeing the competition research proposals, technological innovation and creativity elements have inspired my vision for how educational learning environments could operate in the future, such as Theo Kuechel’s 2014 competition entry, ‘BL Toolkit’ which provides a framework for helping schools engage with the British Library Digital Collections, and how that engagement can benefit students in their learning. I am certain that the work experience gained with Labs will greatly influence my choice of dissertation next year and it’s been a fantastic introduction to understanding the diverse opportunities and challenges that a digital library holds.

– This post has been adapted for this blog – see the original at: https://blogs.city.ac.uk/citylis/2015/05/21/citylis-students-wendy-durham-british-library/#.VXq6wflViko


Text through the Looking Glass…

I usually start off these posts in wonderment at the evolution of technology within my adult life. This week is no exception as I ponder the ‘control-F’ style text search of my word-processing past in comparison with this week’s focus on text analysis.

Textual analysis is defined as an information gathering process that assists researchers to find ways of understanding how human beings make sense of their world. This can involve working with a mixture of simple quantitative approaches where words mentions are counted, or with a more qualitative method of, for example, ‘concordance,’ where the search and comparison for contexts of words, sentences or passages that share a keyword or phrase may reveal patterns. However, it’s here where I must acknowledge the bigger picture surrounding such quantitative and qualitative methodologies and the links to Franco Moretti and the hotly debated camps of ‘distant reading’ and ‘close reading.’ And rather attempt to explain this in words of my own, I found this Youtube link called ‘Big Data + Old History’ which has been the most accessible step for me in beginning to understand reasons for distant reading.

Currently there is a wide range of free, online tools that can offer a variety of ‘lenses’ to interpret large corpuses of digitised texts – and so for this weeks blog, the citylis DITA gang were asked to reflect on the user experience of Tagxedo, Wordle, Many Eyes, Voyant, and for the brave amongst us, there was the more techie-based TAPoR and ‘R’.

Both Taxgedo and Wordle provide word cloud style visualisations, showing the popularity of words by size of font. Whilst they are appealing to the eye, a useful tool to start an enquiry and have value for engaging younger users, they have also been criticised in terms of appropriacy. Jacob Harris from the journalism standards site NiemanLab describes word clouds as ‘mullets of the internet’ and provides a sharp critique on their potentially misleading nature when qualitative tempering is absent.

Voyant is a text-analysis program that also possesses a word cloud tool, but in addition has many other options for user ‘lenses,’ such as graphs of word frequency across the chosen corpus and a format to compare multiple keywords side by side. Geoffrey Rockwell, one of the project leaders behind the tool reassures us that these computer methods do not ‘replace human interpretation,’ rather that they enhance the researchers ability to perceive trends, test intuition and provoke further focussed re-reading and ‘re’-questioning.

Having experimented with Voyant and data from Flickrs BL 1 Million photostream during the recent map tagathon and the Book of Humour, Wit and Wisdom spreadsheet I compiled to assist the Victorian Meme Machine project, the following screenshots and brief observations were made. For both sets of data, ‘stop lists’ of words, such as common English connectives were added to help focus possible questions.

flickr screenshot 5

Voyant for Flickr data 31st Oct – 3rd Nov 2014

As expected from the BL Flickr data, the highest word counts were naturally the tag ‘map.’ But then coinciding with that, there were 2 prominent Flickr tagger numbers, the details of which I took from the raw text and fed back online to reveal their identities. For this statistical data, the word cloud felt appropriate as a quick insight to the activity of the date range. I’ve no doubt the Flickr API could also supply this info on top taggers – but this way was fun !

vmm screenshot man

Voyant for the ‘Book of Humour, Wit and Wisdom, A Manual of Table-talk’ (1874)

Next was the visualisation for the Book of Humour, Wit and Wisdom. Quick reveals from the word cloud suggested a rather male dominated theme. No surprises there I supposed, given the social bias of the era. But was there any link between ‘men’ and humour ? – did the accidental juxtaposition of ‘great man’ bear any further meaning?

A look at the corpus reader showed that the word ‘man’ was particularly concentrated in one area of the text. On closer reading, this account was far from complimentary towards the gentlemen ‘of street car society.’

Finally I turned to TAPoR text analysis to try the concordance tool for ‘man’ and received 197 entries, which suggested that a fair proportion of the word associations linked to ‘man’ were in fact synonyms of ‘old’ – which could correlate more with the book’s title of ‘wisdom.’ There’s suddenly a whole pathway of next steps opening up – and this text is relatively small. However at this point, time dictates that I leave the investigation but I have thoroughly enjoyed beginning this exploration and can see how this process perpetuates thought and takes root.

Part of TAPoRware concordance tool results for 'man.'

Part of TAPoRware concordance tool results for ‘man.’

More on Text mining next week… till then, toodle-loo ! 😉

blog cloud2


Header Word Cloud from this blog courtesy of Tagxedo 

API Stir Fry

There’s been a lot of talk about cooking this week, so as I regularly feel like I’m on the Masterchef mystery box challenge when I form my DITA blog, I couldn’t resist a little amusement by tackling another all new technology in a culinary kind of way.

Therefore, in keeping with the foodie theme, my main technological ingredient this week is a wonderful thing called an Application Programming Interface (API). To go with that I’m going to prepare a bed of Web 2.0 technology with a nice helping of Flickr flavoured API (as I think images are just delicious and go with everything) and then I’m going to finish it all off with a little British Library seasoning. I haven’t got a clue what I’ll rustle up for an embedded dessert, but I’m sure something will evolve along the way.

So for starters….

Just like a good cheese, the world wide web has been developing with age. Over the last 20 years it has moved from a static, mild character to a dynamic, powerful, connected experience. Coined Web 2.0, it has evolved to include an array of interactive websites such as Twitter, Facebook, Flickr, Wikis, Google Maps and YouTube with a focus on accessibility, sharing content and creating, where everybody can taste and feedback.

…and so to main course…

Lets take our chosen platform today which is Flickr, a website that is full of metadata from its 5 billion images and which possesses the aforementioned clever communication service – an application programming interface. The Flickr API allows web developers to request, retrieve and return various types of data using its set callable methods within the restrictions of Flickr’s agreed terms and conditions. It operates using a representational state transfer (REST) framework, receiving communication in URLs and responding in XML – according to many sources, an overall easier format to make calls and afterward parse to HTML.

According to Library Mash-ups (Engard, 2009) the free Flickr API is ‘a developers dream’ because of its extensive documentation, test areas, developer discussion groups and blog. As public API’s are generally for noncommercial use, Engard also reinforces the importance of observing and revisiting licensing conditions and terms of service as platforms such as Flickr are entitled to change these at any time.

With its API, Web 2.0 features and the introduction of the Flickr Commons project aiming to increase access and knowledge to public photography archives, Flickr has become more than a social photo sharing site. Over 100 contributing cultural sector institutes recognise the importance of Flickr Commons in digital scholarship, crowd sourcing and it’s potential to re-purpose public domain content in multiple contexts, when compared to other independent library managed software.

Last week I mentioned the British Library’s Flickr Commons contribution of 1 million images, with the British Library Labs team appealing for ‘new, inventive ways to navigate, find and display these ‘unseen illustrations,’ and in particular, an invitation to create a crowd-sourcing application to aid image metadata.

After its first day on Friday 13th of December 2013, the photo-stream received an incredible 5 million hits and now, in less than a year, it’s reached 200 million views! Many artists and researchers have responded in various API mashups and reinventions as seen on the creative projects list here on the British Library’s Wiki set up by the BL Labs for curating public domain content. Highlights include an alternative scrapbook style viewer named Culture Collage and the algorithmic alchemy of Mario Klingemann (aka Quasimondo) – but more about him in blogs to come.

Further related content remixing opportunities have happened within the British Library Sound and Vision department as seen here.

…and dessert?

Well, time for a little colourful embedding – I found some lovely things this week based on searching for artistically reinvented maps (the embed is a teaser so follow the previous hyperlink)

Map Of Los Angeles Street Layout Colored By Orientation by Stephen Von Worley

Map Of Los Angeles Street Layout Colored By Orientation by Stephen Von Worley

and a little from the talented Mr Mario Klingemann (courtesy of Flickr)…

251 Random Flowers Arranged by Similarity

251 Random Flowers Arranged by Similarity

..and a very friendly API explanation that helped me a lot and appealed to my teacher soul. (courtesy of BBY Open)

until next week …don’t worry, be API

Attention ! this image is reversing ….

This week I entered a whole new world of relational databases, the art of SQL, Boolean logic and the need for efficient, relevant search results from online sources. Hmm, the memorable Kupor quote regarding ‘drinking from the fire hydrant’ form Ali’s blog last week immediately came to mind. So much to absorb and so little time…hence bite-size and context needed. So for this entry, I’d like to write briefly about information retrieval in terms of image search technology, linking in my interest on image research and volunteer work at the British Library.

A picture is worth a thousand words….

Image searching, uses algorithms to search for features of still and moving images rather than relying on text indexing. Generally, in order for an image to be findable, it needs to be described in some way and needs metadata – but what happens if the image has no accurate metadata ? A question that has arisen for me when tagging images with little or no information from the British Library photostream for creative research projects.

As a background, image retrieval methods range from from concept based (or text based), approach where keywords or metadata are used, to content based where the image content itself (such as shape, colour or texture) is used to provide the ‘map’ for searching. In recent research comparing methods of image retrieval, content based systems are leading the field as an attempt to ‘bridge the semantic gap,’ as,

‘the starting point of the retrieval process is typically the high level query from a human. Translating or converting the question posed by a human to the low level features seen by the computer illustrates the problem in bridging the semantic gap.(Lew et al. 2002)

Various computer vision and image identification software for content-based image retrieval (CBIR), also known as query by image content (QBIC) and content-based visual information retrieval (CBVIR) has evolved. TinEye is a reverse image search engine developed in 2008.

Screenshot from TinEye homepageThe TinEye reverse image search home page.

As it returns information on where a users selected image appears on the web, this has significant use for improving metadata and in the copyright world both for potential infringement and managing ‘orphan works.’

Following on from TinEye, Google Images launched their own reverse image facility in 2011, directly into their image search bar.

Google's reverse image search

Google’s reverse image search

In terms of comparison, ZDnet’s article by Stephen Chapman claims Google’s ‘vast reach’ is said to outperform TinEye greatly, however further debates online reveal a loyal following for TinEye regarding accuracy and sorting options.

Further application of reverse image retrieval in the British Library …

Very recently, the British Library worked in partnership with the Technology Strategy Board to challenge software developers to produce a tool that could measure or assess the impact of releasing its digital content into the public domain. For example, how were the one million Flickr images from their collection of 19th century Microsoft books being utilised?

Enter Peter Balman, the developer who won the competition with an idea for a tool that searches for British Library’s digital content on the web and gives a detailed breakdown of where, how and by whom it is being used. Named ‘Visibility,’ this project could help the Library make choices around targeting users by releasing similar content and encourage further use and deeper engagement within these groups.

A link to his project is here (NB. playback is good on IE but I had problems in Google Chrome).

Until next week kind viewers…