Reusing Existing Digital Content
This summer Work Package 5 (WP5) completed its first deliverable: “Analysis of existing data sources and services”. WP5 focuses on the meSch software and physical infrastructure and integration of the different components. The first task of this work package was to identify existing data sources and services that can be reused in meSch. In this blog post I will explain how meSch is planning to reuse existing digital content and address some of the findings from the research done in this task.
The need for this task
A wealth of digital cultural heritage content is currently available in online repositories and archives, it is however accessed only in a limited way and utilised through rather static modes of delivery. With the meSch toolkit it will be able for heritage professionals to use their own existing digital cultural heritage content as well as other available content on the web to create narratives for smart exhibits. In that way the physical experience of museum exhibits will be enriched with the wealth of digital cultural heritage content that is already available on the web or locked in museum databases.
Three museums take part in the meSch project. These are:
- The Museo Storico Italiano della Guerra (The National War Museum) in Trento, Italy.
- The Museon in The Hague (The Netherlands), a museum for science and culture.
- The Allard Pierson Museum in Amsterdam (The Netherlands), an archaeological museum affiliated with the University of Amsterdam.
These museums provide test beds for the technology developed in the meSch project. Later on in the project extensive case studies will be held in these museums. Therefore in the analysis of existing sources we had a special focus on sources that could be of interest for the case studies, but we also included more general sources.
After identification of suitable sources, the sources needed to be subjected to an analysis. The analysis of the identified sources provides relevant input for the data integration component of the meSch platform that will be developed in WP5. In order to analyse the sources we set up a framework. This framework assumes a functional model based on experiences from the Dutch national portal Geheugen van Nederland (Memory of the Netherlands) and Europeana and builds on existing specifications from other projects, institutions and initiatives, like THE BASICS (a set of guidelines for digital heritage developed by DEN), Linked Data and Open Data. The framework focuses on three main aspects: a) interoperability, b) metadata and content and c) copyrights and licences.
What sources have been identified and analysed?
The framework was used to analyse both the existing digital sources of the three museums in the meSch consortium (for which we used the term internal sources) as well as a selection of relevant available data sources and services on the web (referred to as external sources).
The internal sources include digital reproductions of the objects in the collection, collection registration databases and various contextual information stored in all sorts of ways (i.e. word documents, excel files, multimedia sources).
For the external sources we have focused mainly on cultural heritage sources and services. Museums, archives, libraries and other cultural institutions, sometimes referred to as GLAMs (Galleries, Libraries, Archives and Museums), have been opening up their digital collections and metadata for some years now, enabling reuse by external parties. While there are still obstacles to be overcome (legal, technical, organisational and economic), a large amount of open cultural data can be found on the web. A very well-known source is of course Europeana, which currently provides metadata from close to 30 million cultural heritage objects, and which allows any kind of reuse. Various other sources for reusable digital cultural heritage (metadata as well as content) are listed on the website of the OpenGlam movement, which promotes free and open access to digital cultural heritage. Another interesting place to find sources is a Wiki on Museum APIs. This extensive list is updated constantly and includes not only Museum APIs, but ‘Museum, gallery, library, archive, archaeology and cultural heritage (GLAM++) APIs, linked and open data services for open cultural data’.
Besides these we have looked into educational sources that could be of interest, like the Dutch platform Wikiwijs: an open, internet-based platform, where teachers can find, download, (further) develop and share educational resources. This source is of interest for museums with a clear educational mission, like Museon. More general sources like Wikimedia, YouTube, Flickr and DBpedia are of interest for meSch too, as they can provide additional background information for the smart exhibits to be developed.
Things to consider
The analysis showed that the potentially relevant existing sources and services are very diverse. Integration of these sources will not be a straightforward process; there are technical issues and issues regarding rights and licences to be taken into account.
The different preferred methods for data reuse that have been described in the analysis framework (such as OAI-PMH for metadata harvesting, search APIs and the availability of data as Linked Open Data) have all been identified in the analysis of external sources. This is not the case for the internal sources. Integration of internal sources that do not provide these reuse mechanisms will need a separate approach. The metadata of most of the sources from the cultural domain are Dublin Core compatible which increases interoperability. However, the external sources from other domains, like Flickr and Wikimedia, use custom-made metadata schemes.
In the analysis framework the licences that allow reuse of data and content have been listed. For the external sources it is clear if and how reuse is allowed, as most use a Creative Commons licence. However, the sources of the three museums have not all been equipped with an appropriate licence and some of the data is not meant to be reused by third parties. It will be difficult for the meSch system to automatically handle the different licences (including the more restrictive ones, such as CC-BY and CC-SA). This will be researched further.
Finally, it became clear that the ready available and highly structured sources that are analysed in this deliverable are not the only resources curators envisage using when creating smart exhibitions with the meSch tool. Integration of these other types of resources (such as educational leaflets, magazine texts and catalogue books) into the meSch platform requires a different approach.
The data integration component of the meSch platform is being developed further in WP5. This component supports a mediator architecture approach which allows querying multiple repositories using a common query language. Therefore adapters for data integration are being developed. The first ones are those for integrating Europeana and Museon sources. This requires the adoption of a common data model into which existing data should be converted. As a first step we chose to adopt the Dublin Core schema. WP5 will investigate the possible limitations and extensions needed.
WP5 is also working on an internal content repository based on Liferay portal server and the development of an internal data model that allows the representation of the richness of the content that curators need to build meSch exhibitions. All this will be integrated in a first sample, which will allow us to evaluate the strengths and limitations of the chosen approach and identify the necessary extensions.
In addition to these activities, in collaboration with WP4 (the work package devoted to researching the content personalization technologies for both onsite and online interaction) we are collecting and analysing various actual samples of digital material taken from the internal sources of the three partner museums and from relevant external sources. These samples are being classified in terms of media type, content genre, intended audience and narrative structure, to evaluate their potential of reuse for building meSch interactive exhibitions.
D5.1 is a restricted deliverable, and as such it will not be made available on the website.
For more information about the technical side of things, have a look at an earlier blog post on the meSch website about WP5: An Update About the meSch Server Architecture