Over the past two weeks we have been considering the user experience of a range of websites, in terms of experimenting with text/data mining and understanding ‘mark-up’ languages. These websites have included The Old Bailey Online, The Utrecht Digital Humanities Lab research projects and Artists Books Online. The Old Bailey Proceedings Online (OBO) is a fully searchable, digitised collection of all existing OBO editions from 1674 to 1913. There is access to over 197,000 trials and biographical details which are free of charge for non-commercial use. In addition to the text, the website holds digital images of all original pages alongside contemporary pictures and maps. It has advice on methods for searching and holds historical and legal background information. The OBO has a very friendly interface. It offers a wide range of search pathways from it’s search home page. It’s worth exploring the list and understanding the detail of the documents and associated records as this has a direct impact on what you are attempting to seek.
‘Digging away’ recently has been a fascinating experience. I couldn’t resist looking for criminal shenanigans under my own family name (‘Durham’) as ‘Surname’ was available as a category on the main search page. With a result of 156 hits, I then narrowed things down under the drop down ‘punishment’ category to the macabre ‘death sentence.’ Five separate cases stared back at me, the reading of which was pretty tragic. Delving further, the historical background home page explained that in many cases, death sentences were not carried out and that only the Ordinary’s Accounts reveal actual executions at Tyburn. The surname search for these accounts show only one execution for the era.
Opening this link shows the account of the clergy in supporting the condemned. Scans of the original can be viewed and at the bottom is the option to view the text in eXtensible Mark-up Language (XML). ‘Mark-up’ is a way to add levels of machine meaningful, searchable structure to data in a document, in the form of tags that bracket words or phrases. They are embedded in the text, follow a hierarchical structure and are common to digitised forms of documents, to aid discoverability and comparison of content. Naturally, consistency in ‘mark-up’ language approach is key and Charles Goldfarb’s ‘Standard Generalised Mark-up Language’ (SGML), dating from the 1970’s, is still the guideline of choice. And from which the Web’s own basic HTML has descended.
XML is ‘a bundle of SGML conformant rules for making up elements and specifying their content models’ which are simple and work easily with the web. For the OBO, the digitised text was marked up in XML in order to facilitate this structured searching and the generation of statistics. The About page tells us that ‘Trials tend to have a regular structure (though with considerable minor variations) and certain aspects of the text were tagged to reflect the meaning of particular words or phrases, for example names and crimes.’ The list of mark-up catagories can also be viewed on the link.
I then wondered about how to investigate, export and analyse the text for 156 hits, to find further lines of enquiry regarding the kind of offences and punishments related to ‘Durham.’ OBO also has an API which allows a search by ‘trial’ and it allows you to explore the result sets, before exporting to Zotero, or to Voyant for further text analysis. It also provides for ‘drilling’ and ‘undrilling’ of subsets of catagories. As there was no surname catagory, the keyword seemed to be the logical choice, although this would generate anomalies as a term now unattached to ‘Surname.’ Drilling the resulting 327 gave me some feedback but many trials that I opened from the lists were unassociated with Durham as the defendant. Using the common text from the opening sentences of the trials helped refine my requirement further as I tried the following string, ‘Durham+was+indicted OR Durham+were+indicted.’ And a more focussed 24 trials appeared.
So on to exporting to Voyant – a small selection like this wasn’t a problem, although I know there have been problems during the labs sessions due to high useage. The research project ‘Criminal Intent’ originally mentioned that although Voyant tools are capable of a lot, ‘there are limitations and bugs. Although the underlying system has been designed to support large-scale text analysis, the current server infrastructure has performance and reliability issues.’ The project actively welcomes feedback from users in order to improve.
It seems that watches and coats may have been the treasures of choice for my wayward namesakes