Posts by Collection

portfolio

publications

Coca-Cola: An Icon of the American Way of Life. An Iterative Text Mining Workflow for Analyzing Advertisements in Dutch Twentieth-Century Newspapers.

Published in Digital Humanities Quarterly, 2017

Download paper here Read more

Recommended citation: Melvin Wevers, Jesper Verhoef (2017). “Coca-Cola: An Icon of the American Way of Life. An Iterative Text Mining Workflow for Analyzing Advertisements in Dutch Twentieth-Century Newspapers.” 11-4 http://digitalhumanities.org:8081/dhq/vol/11/4/000338/000338.html

Constructing a Recipe Web from Historical Newspapers

Published in ISWC 2018, 2018

Historical newspapers provide a lens on customs and habits of the past. For example, recipes published in newspapers highlight what and how we ate and thought about food. The challenge here is that newspaper data is often unstructured and highly varied. Digitised historical newspapers add an additional challenge, namely that of fluctuations in OCR quality. Therefore, it is difficult to locate and extract recipes from them. We present our approach based on distant supervision and automatically extracted lexicons to identify recipes in digitised historical newspapers, to generate recipe tags, and to extract ingredient information. We provide OCR quality indicators and their impact on the extraction process. We enrich the recipes with links to information on the ingredients. Our research shows how natural language processing, machine learning, and semantic web can be combined to construct a rich dataset from heterogeneous newspapers for the historical analysis of food culture. Read more

Recommended citation: van Erp M., Wevers M., Huurdeman H. (2018) Constructing a Recipe Web from Historical Newspapers. In: Vrandečić D. et al. (eds) The Semantic Web – ISWC 2018. ISWC 2018. Lecture Notes in Computer Science, vol 11136. Springer, Cham. https://doi.org/10.1007/978-3-030-00671-6_13 https://link.springer.com/chapter/10.1007/978-3-030-00671-6_13

The visual digital turn: Using neural networks to study historical images

Published in Digital Scholarship in the Humanities, 2019

Digital humanities research has focused primarily on the analysis of texts. This emphasis stems from the availability of technology to study digitized text. Optical character recognition allows researchers to use keywords to search and analyze digitized texts. However, archives of digitized sources also contain large numbers of images. This article shows how convolutional neural networks (CNNs) can be used to categorize and analyze digitized historical visual sources. We present three different approaches to using CNNs for gaining a deeper understanding of visual trends in an archive of digitized Dutch newspapers. These include detecting medium-specific features (separating photographs from illustrations), querying images based on abstract visual aspects (clustering visually similar advertisements), and training a neural network based on visual categories developed by domain experts. We argue that CNNs allow researchers to explore the visual side of the digital turn. They allow archivists and researchers to classify and spot trends in large collections of digitized visual sources in radically new ways. Read more

Recommended citation: Melvin Wevers, Thomas Smits, The visual digital turn: Using neural networks to study historical images, Digital Scholarship in the Humanities, Volume 35, Issue 1, April 2020, Pages 194–207, https://doi.org/10.1093/llc/fqy085 https://academic.oup.com/dsh/article-pdf/35/1/194/32976784/fqy085.pdf

Using Word Embeddings to Examine Gender Bias in Dutch Newspapers, 1950-1990

Published in ACL - LangChang Workshop, 2019

Contemporary debates on filter bubbles and polarization in public and social media raise the question to what extent news media of the past exhibited biases. This paper specifically examines bias related to gender in six Dutch national newspapers between 1950 and 1990. We measure bias related to gender by comparing local changes in word embedding models trained on newspapers with divergent ideological backgrounds. We demonstrate clear differences in gender bias and changes within and between newspapers over time. In relation to themes such as sexuality and leisure, we see the bias moving toward women, whereas, generally, the bias shifts in the direction of men, despite growing female employment number and feminist movements. Even though Dutch society became less stratified ideologically (depillarization), we found an increasing divergence in gender bias between religious and social-democratic on the one hand and liberal newspapers on the other. Methodologically, this paper illustrates how word embeddings can be used to examine historical language change. Future work will investigate how fine-tuning deep contextualized embedding models, such as ELMO, might be used for similar tasks with greater contextual information. Read more

Recommended citation: Melvin Wevers. "Using Word Embeddings to Examine Gender Bias in Dutch Newspapers, 1950-1990." arXiv preprint arXiv:1907.08922 (2019). https://arxiv.org/pdf/1907.08922

Digital begriffsgeschichte: Tracing semantic change using word embeddings

Published in Historical Methods: A Journal of Quantitative and Interdisciplinary History , 2020

Recently, the use of word embedding models (WEM) has received ample attention in the natural language processing community. These models can capture semantic information in large corpora of text by learning distributional properties of words, that is how often particular words appear in specific contexts. Scholars have pointed out the potential of WEMs for historical research. In particular, their ability to capture semantic change might assist historians studying conceptual change or specific discursive formations over time. Concurrently, others voiced their criticism and pointed out that WEMs require large amounts of training data, that they are challenging to evaluate, and they lack the specificity looked for by historians. The ability to examine semantic change resonates with the goals of historians such as Reinhart Koselleck, whose research focused on the formation of concepts and the transformation of semantic fields. However, word embeddings can only be used to study particular types of semantic change, and the model’s use is dependent on the size, quality, and bias in training data. In this article, we examine what is required of historical data to produce reliable WEMs, and we describe the types of questions that can be answered using WEMs. Read more

Recommended citation: Melvin Wevers & Marijn Koolen (2020) Digital begriffsgeschichte: Tracing semantic change using word embeddings, Historical Methods: A Journal of Quantitative and Interdisciplinary History, DOI: 10.1080/01615440.2020.1760157 https://www.tandfonline.com/doi/full/10.1080/01615440.2020.1760157?scroll=top&needAccess=true

Tracking the Consumption Junction: Temporal Dependencies between Articles and Advertisements in Dutch Newspapers

Published in Digital Humanities Quarterly, 2020

Historians have regularly debated whether advertisements can be used as a viable source to study the past. One of their main concerns centered on the question of agency. Were advertisements a reflection of historical events and societal debates, or were ad makers instrumental in shaping society and the ways people interacted with consumer goods? Using techniques from econometrics (Granger causality test) and complexity science (Adaptive Fractal Analysis), this paper analyzes to what extent advertisements shaped or reflected society. We found evidence that indicates a fundamental difference between the dynamic behavior of word use in articles and advertisements published in a century of Dutch newspapers. Articles exhibit persistent trends. Contrary to this, advertisements have a more irregular behavior characterized by short bursts and fast decay, which, in part, mirrors the dynamic through which advertisers introduced terms into public discourse. On the issue of whether advertisements shaped or reflected society, we found particular product types that seemed to be collectively driven by a Granger causality going from advertisements to articles. Generally, we found support for a complex interaction pattern, analogous to Cowan’s concept of the consumption junction. Finally, we discovered noteworthy patterns in terms of Granger causality and long-range dependencies for specific product groups. All in, this study shows how methods from econometrics and complexity science can be applied to humanities data to improve our understanding of complex cultural-historical phenomena such as the role of advertising in society. Read more

Recommended citation: Melvin Wevers, Jianbo Gao, Kristoffer Nielbo (2020). "Tracking the Consumption Junction: Temporal Dependencies between Articles and Advertisements in Dutch Newspapers." Digital Humanities Quarterly, 14:1. http://digitalhumanities.org/dhq/vol/14/2/000445/000445.html

Detecting Faces, Visual Medium Types, and Gender in Historical Advertisements, 1950–1995

Published in ECCV 2020, 2021

Libraries, museums, and other heritage institutions are digitizing large parts of their archives. Computer vision techniques enable scholars to query, analyze, and enrich the visual sources in these archives. However, it remains unclear how well algorithms trained on modern photographs perform on historical material. This study evaluates and adapts existing algorithms. We show that we can detect faces, visual media types, and gender with high accuracy in historical advertisements. It remains difficult to detect gender when faces are either of low quality or relatively small or large. Further optimization of scaling might solve the latter issue, while the former might be ameliorated using upscaling. We show how computer vision can produce meta-data information, which can enrich historical collections. This information can be used for further analysis of the historical representation of gender. Read more

Recommended citation: Wevers M., Smits T. (2020) Detecting Faces, Visual Medium Types, and Gender in Historical Advertisements, 1950–1995. In: Bartoli A., Fusiello A. (eds) Computer Vision – ECCV 2020 Workshops. ECCV 2020. Lecture Notes in Computer Science, vol 12536. Springer, Cham. https://doi.org/10.1007/978-3-030-66096-3_7 https://journals.sagepub.com/doi/10.1177/1470357221992097

The agency of computer vision models as optical instruments

Published in Visual Communication, 2021

Industry and governments have deployed computer vision models to make high-stake decisions in society. While they are often presented as neutral and objective, scholars have recognized that bias in these models might lead to the reproduction of racial, social, cultural and economic inequity. A growing body of work situates the provenance of bias in the collection and annotation of datasets that are needed to train computer vision models. This article moves from studying bias in computer vision models to the agency that is commonly attributed to them: the fact that they are universally seen as being able to make biased decisions. Building on the work of Bruno Latour and Jonathan Crary, the authors discuss computer vision models as agential optical instruments in the production of contemporary visuality. They analyse five interconnected research steps – task selection, category selection, data collection, data labelling and evaluation – of six widely cited benchmark datasets, published during a critical stage in the development of the field (2004–2020): Caltech 101, Caltech 256, PASCAL VOC, ImageNet, MS COCO and Google Open Images. They found that, despite all sorts of justifications, the selection of categories is not based on any general notion of visuality, but depends heavily upon perceived practical applications, the availability of downloadable images and, in conjunction with data collection, favours categories that can be unambiguously described by text. Second, the reliance on Flickr for data collection introduces a temporal bias in computer vision datasets. Third, by comparing aggregate accuracy rates and ‘human’ performance, the dataset papers introduce a false dichotomy between the agency of computer vision models and human observers. In general, the authors argue that the agency of datasets is produced by obscuring the power and subjective choices of its creators and the countless hours of highly disciplined labour of crowd workers. Read more

Recommended citation: Smits, Thomas, and Melvin Wevers. “The Agency of Computer Vision Models as Optical Instruments.” Visual Communication, (March 2021). https://doi.org/10.1177/1470357221992097 https://journals.sagepub.com/doi/10.1177/1470357221992097

Scene Detection in De Boer Historical Photo Collection

Published in ICAART 2021, 2021

This paper demonstrates how transfer learning can be used to improve scene detection applied to a historical press photo collection. Read more

Recommended citation: Wevers, M. (2021). Scene Detection in De Boer Historical Photo Collection. In A. P. Rocha, L. Steels, & J. van den Herik (Eds.), ICAART 2021 : Proceedings of the 13th International Conference on Agents and Artificial Intelligence : February 4-6, 2021. - Volume 1: ARTIDIGH 2021, Vienna, Austria (pp. 601-610). SciTePress https://hdl.handle.net/11245.1/7a0b6c4a-ba46-4367-ba4a-f2258c9ac70a

Event Flow - How Events shaped the Flow of the News, 1950-1995

Published in Computational Humanities Research, 2021

This article relies on information-theoretic measures to examine how events impacted the news for the period 1950-1995. Moreover, we present a method for event characterization in (unstructured) textual sources, offering a taxonomy of events based on the different ways they impacted the flow of news information. The results give us a better understanding of the relationship between events and their impact on news sources with varying ideological backgrounds. Read more

Recommended citation: Wevers, Melvin, Jan Kostkan, Kristoffer L. Nielbo. “Event Flow - How Events shaped the Flow of the News, 1950-1995.” Computational Humanities Research 2021, (November 2021). http://ceur-ws.org/Vol-2989/#long_paper16

Computer Vision for the Humanities: An Introduction to Deep Learning for Image Classification.

Published in Programming Historian, 2022

This is the first of a two-part lesson introducing deep learning based computer vision methods for humanities research. Using a dataset of historical newspaper advertisements and the fastai Python library, the lesson walks through the pipeline of training a computer vision model to perform image classification. Read more

Recommended citation: an Strien, D., Beelen, K., Wevers, M., Smits, T., & McDonough, K. (2022). Computer Vision for the Humanities: An Introduction to Deep Learning for Image Classification (Part 1). Programming Historian, 11 https://doi.org/10.18146/tmg.815

An Analytics of Culture: Modeling Subjectivity, Scalability, Contextuality, and Temporality.

Published in 36th Conference on Neural Information Processing Systems (NeurIPS 2022), 2022

There is a bidirectional relationship between culture and AI; AI models are increasingly used to analyse culture, thereby shaping our understanding of culture. On the other hand, the models are trained on collections of cultural artifacts thereby implicitly, and not always correctly, encoding expressions of culture. This creates a tension that both limits the use of AI for analysing culture and leads to problems in AI with respect to cultural complex issues such as bias. Read more

Recommended citation: Wevers, M. J. H. F., van Noord, N. J. E., Blanke, T., Noordegraaf, J. J., & Worring, M. (2022). An Analytics of Culture: Modeling Subjectivity, Scalability, Contextuality, and Temporality. In 36th Conference on Neural Information Processing Systems (NeurIPS 2022) Neural Information Processing Systems Foundation https://pure.uva.nl/ws/files/99424884/An_Analytics_of_Culture_1.pdf

What to do with 2.000.000 Historical Press Photos? The Challenges and Opportunities of Applying a Scene Detection Algorithm to a Digitised Press Photo Collection.

Published in TMG - Tijdschrift voor Mediageschiedenis, 2022

In 1962, Dutch celebrity Ria Kuyken was attacked by a circus bear. Cees de Boer captured this moment, for which he was awarded both a World Press Photo and the Silver Camera (Zilveren Camera). Though this photo popularised Fotopersbureau De Boer, which Cees had founded in 1945, the importance of the collection lies in its scale. Approximately 2,000,000 photos taken of about 250,000 events in sixty years, accompanied by extensive metadata. Not only major nationwide events are represented, but also subjects of small scale, human interest, such as the shopkeeper around the corner. Our aim is not only the digitisation and publication of all 2,000,000 photo negatives of Fotopersbureau De Boer but also to explore how artificial intelligence can enrich this collection, benefiting both users of the archive and cultural historians studying historical photographs. Read more

Recommended citation: Wevers, M. J. H. F., Vriend, N., & De Bruin, A. (2022). What to do with 2.000.000 Historical Press Photos? The Challenges and Opportunities of Applying a Scene Detection Algorithm to a Digitised Press Photo Collection. TMG – Journal for Media History, 25(1) https://doi.org/10.18146/tmg.815

Mining Historical Ads

Published in Digitised Newspapers - A New Eldorado for Historians?, 2022

Historians have turned their focus to newspaper articles as a proxy of public discourse, while advertisements remain an understudied source of digitized information. This paper shows how historians can use computational methods to work with extensive collections of advertisements. Firstly, this chapter analyzes metadata to better understand the different types of advertisements, which come in a wide range of shapes and sizes. Information on the size and position of advertisements can be used to construct particular subsets of advertisements. Secondly, this chapter describes how textual information can be extracted from historical advertisements, which can subsequently be used for a historical analysis of trends and particularities. For this purpose, we present a case study based on cigarette advertisements. Read more

Recommended citation: Wevers, M. J. H. F. (2022). Mining Historical Ads. In E. Bunout, M. Ehrmann, & F. Clavert (Eds.), Digitised Newspapers – A New Eldorado for Historians?: Tools, Methodology, Epistemology, and the Changing Practices of Writing History in the Context of Historical Newspapers Mass Digitization (Vol. 3). (Studies in Digital History and Hermeneutics; Vol. 3). De Gruyter Oldenbourg. https://www.degruyter.com/document/isbn/9783110729214/html

What Shall We Do With the Unseen Sailor? Estimating the Size of the Dutch East India Company Using an Unseen Species Model.

Published in Computational Humanities Research, 2022

Historians base their inquiries on the sources that are available to them. However, not all sources that are relevant to the historian’s inquiry may have survived the test of time. Consequently, the resulting data can be biased in unknown ways, possibly skewing analyses. This paper deals with the Dutch East India Company its digitized ledgers of contracts. We apply an unseen species model, a method from ecology, to estimate the actual number of unique seafarers contracted. We found that the lower bound of actual seafarers is much higher than what the remaining contracts indicate: at least, thirty-six percent of the seafarers is unknown. Moreover, we found that even in periods when few records survived, we can still credibly estimate a lower bound on the unique number of seafarers. Read more

Recommended citation: Wevers, M. J. H. F., Karsdorp, F., & van Lottum, J. (2022). What Shall We Do With the Unseen Sailor? Estimating the Size of the Dutch East India Company Using an Unseen Species Model. In CHR2022: Proceedings of the Computational Humanities Research Conference 2022 (Vol. 3290, pp. 189-197). https://ceur-ws.org/Vol-3290/short_paper1793.pdf

talks

teaching

We are All Consumers.

graduate course, University of Amsterdam, 2021

A course focused on working with digitized advertisements as a historical source. The analysis of visual historical aterial will be one of the focal points of this course. Syllabus will be added shortly. Read more

History Lab

Undergraduate course, University of Amsterdam, 2021

A course focused on working with digitized newspapers as a historical source. Syllabus will be added shortly. Read more