DH Benelux 2018

This week DH Benelux will be held in Amsterdam. Together with Thomas Smits, I will present our paper “Seeing History: Analyzing Large-Scale Historical Visual Datasets using Deep Neural Networks.” Digital Humanities research has focused primarily on the analysis of texts. This emphasis stems from the availability of technology to study digitized text. Optical Character Recognition (OCR) allows researchers to use keywords to search and analyse digitized texts. However, archives of digitized sources also contain large numbers of images. This article shows how convolutional neural networks (CNN) can be used to categorize and analyze digizited historical visual sources. We present three different approaches for using CNNs to gain a deeper understanding of visual trends in an archive of digitized Dutch newspapers. These include detecting medium-specific features (separating photographs from illustrations), querying images based on abstract visual aspects (clustering visually similar advertisements), and training a neural network based on visual categories developed by domain experts. We argue that CNNs allow researchers to explore the visual side of the digital turn. They allow archivists and researchers to classify and spot trends in large collections of digitized visual sources in radically new ways.

And with Marieke van Erp, Hugo Huurdeman, and Richard Zijdeman, I will present a poster detailing our work on constructing a recipe web. What we eat has been an important part of our cultural identity. Our food habits can be distilled from recipes. Food websites are a treasure trove of structured recipes that can easily be analysed automatically through their hRecipe or schema.org markup. Unfortunately, this is not the case for historical recipes, often available as part of digitized archives. This means that currently we are unable to answer important questions on how food culture has changed over time.Historical newspapers provide a lens on customs and habits of the past. The challenge is that newspaper data is often unstructured, highly varied, and of fluctuating OCR quality. Therefore, it is difficult to locate and extract recipes from them. 
We present a method for extracting and enriching over 24,000 recipes from four digitized historical newspapers (Het Parool, Trouw, De Volkskrant, and NRC Handelsblad) published between 1950 and 1995 via Delpher.