Audiovisual material in Digital Humanities

Sound and (moving) images in focus

Audiovisual material is perhaps the biggest wave of data to come in the near future (Smith, 2013). This claim is supported by a prospective study conducted by IBM on how the flow of digital data will evolve in the coming two decades. As can be seen in Figure 1 below, the development of audiovisual sources such as video, images and audio, will result in huge amounts of data in the coming decades, both due to the increased production of digital-born data and the massive digitisation of analogue sources. Consequently, audiovisual archives hold the promise of truly big data becoming available to academic researchers.

tsunami of data
Figure 1 – Expected wave of data showing the growth of audiovisual data (video, images, audio) this workshop will deal with. Source: IBM Market Insights 2013

Audiovisual sources have a potentially huge value for the Digital Humanities as they are multi-layered. A single document can provide information regarding language, emotions, speech acts, narrative plots and references to people, places and events. This richness provides interesting data for various disciplines and holds the promise of multidisciplinary collaboration between e.g., computer sciences, social sciences and the humanities. As such, audiovisual material provides a rich playground for the Digital Humanities.

Notwithstanding this exponential growth, the use of audiovisual data by scholars in the social sciences and the humanities (SSH), and the application of digital methods for analysis are still in their infancy. Audiovisual material such as television, photos and oral history recordings have not yet received the same level of attention from scholars as written sources. Several reasons might account for this deficit. Firstly, the relatively young age of these source types compared to text; this is reflected in scepticism on their value for academic research outside a relatively small community of specialists. Secondly, the contemporary and commercial value of many audiovisual sources results in considerable constraints for use due to issues of copyright. Thirdly, the linear structure of audiovisual sources is problematic for hermeneutic analysis as it is more time-consuming compared to textual sources. Finally, no widespread accepted digital research methods for the discovery and analysis of audiovisual content exist as of yet. Unlike fellow scholars who study text and have a multitude of refined tools at their disposal, scholars specialised in documentaries, photo, film and audiovisual oral history collections, face considerable limitations in the various stages of the research process (De Jong et al., 2011). In the context of the proposed workshop, two themes will play a crucial role.

Theme 1: Indexing and searching audiovisual data

The first step in an SSH research process is the identification of relevant and interesting material. However, obtaining good search results is highly dependent on the richness and the level of granularity of the metadata assigned to the sources. Metadata is usually attributed to a document by a knowledgeable archivist. However, considering the sheer size of digital audiovisual content that is being produced daily, manual annotation is no longer feasible. Consequently, one of the first big challenges within the realm of audiovisual archives is the development of systems for accurate automatic annotation.

One could say that a revolution is needed similar to the one that full-text search (or automated text indexing) brought about. Content-based image retrieval has only recently made enough progress to be usable for scholars. Techniques such as speech recognition and computer vision will support exploration of digital audiovisual archives on the basis of multiple modalities such as text, sound and image. However, this introduces the problem of the so called semantic gap, which refers to the difficulty of translating low-level pixel data and sound waves into meaningful annotations (Smeulders et al., 2000). How this semantic gap affects discovery of material in audiovisual archives is still under exploration.

Theme 2: Analysing audiovisual data

Besides identifying relevant content, an even bigger challenge on the side of the humanities and social sciences lies in providing tools for the next phase of the research process: the analysis and interpretation of content. While text mining has led to the phenomenon of distant reading of textual material (Moretti, 2013), which is strongly dependent on good visualisation tools, the advances in speech and image recognition have not yet led to a method of ‘distant viewing’ of audiovisual data. Processing large amounts of data and enabling researchers to trace patterns or discrepancies in their material are thus not yet feasible. Moreover, the lack of metadata which is often a feature of audiovisual archives introduces additional difficulties in heuristic practices (Fickers, 2012). Consequently, scholars working with (moving) images and sound are at a disadvantage in the evolving field of the Digital Humanities and effort has to be put in envisioning solutions.


The workshop aims to bring scholars and computer scientists together to discuss the following key questions in four subsequent sessions.

  1. Why are audiovisual data/archives scarcely used within the (Digital) Humanities?
  2. What are possible strategies to stimulate the use of audiovisual data/archives within the  Digital Humanities?
  3. Which examples of digital tools applied on audiovisual data/archives can serve as best practices?
  4. What should be the priorities on the  agenda for the future uptake of audiovisual data/archives in the Digital Humanities?

The keynotes within the first two sessions will be delivered by Andreas Fickers, professor of contemporary and digital history at the University of Luxembourg, and Dr. Arjan van Hessen, specialist in speech technology and member of the Executive Board of CLARIN-NL. The first will talk about the use of audiovisual sources within humanities research, and the second will discuss the necessary technical and infrastructural provisions for the analysis of these sources. For the third session scholars are invited to submit papers and demos that illustrate the potential of applying DH approaches to audiovisual data with a focus on lessons learned. The final session is dedicated to the assessment and evaluation of the findings and aims at formulating a research agenda for the future. To disseminate the results of the workshop among a broader audience, the initiators intend to propose a special issue on this topic to a Digital Humanities journal.


The proposed workshop is initiated by researchers working within the EU FP7 research project AXES – Access to audiovisual archives ( We thank the AXES project for the financial support to organise the workshop.


Jong, F. de, Ordelman, R., & Scagliola, S. (2011). Audio-visual Collections and the User Needs of Scholars in the Humanities: a Case for Co-Development. In Proceedings of the 2nd Conference on Supporting Digital Humanities (SDH 2011). Copenhagen, Denmark.

Fickers, A. (2012). Towards A New Digital Historicism? Doing History In The Age Of Abundance. VIEW Journal of European Television History and Culture, 1(1), 19–26.

Moretti, F. (2013). Distant Reading. Verso Books.

Smeulders, A. W., Worring, M., Santini, S., Gupta, A., & Jain, R. (2000). Content-based image retrieval at the end of the early years. Pattern Analysis and Machine Intelligence, IEEE Transactions on22(12), 1349-1380.

Smith, J. R. (2013). Riding the multimedia big data wave. In Proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval – SIGIR  ’13. New York, New York, USA: ACM Press. doi:10.1145/2484028.2494492