This project began with two fundamental goals: on the one hand, to preserve bibliographic material, which by the very nature of how it is made, is, and will be in ever-greater danger of disappearing; and, on the other hand, to disseminate the informational resources that are in great demand by researchers and the general public as widely as possible.
Although the effect of the acidity of the paper – the result of it having been produced from cellulose pulp – is not well known, contact with the air degrades it to the point that this effect is known as ‘slow fire’ at international preservation and conservation centres. If to the acidity of the paper we add the fact that it is of poor quality (newspapers were printed to be used immediately) and that it is constantly handled (as well as being exposed to light) by users at newspaper libraries, we find ourselves with a serious conservation problem that makes the digitisation of historical newspapers and magazines more than recommendable.
So, in 2003, as a result of a cooperative planning effort between the then Ministry of Culture and the autonomous regions, the digitisation of historical newspapers began, although it was not until 2006 that the library was made accessible to the public. Since then, the digitisation work has continued year after year, and more content has been added to the collection. Now, in August 2017, this digital newspaper library offers 7,541,157 pages from 2,369 publications, and 4,070 articles that come from the collections of 97 institutions.
The collection – written in the various official Spanish languages – includes material from all the autonomous regions. There is also published material from the former Spanish colonies in America, Africa and the Philippines. These are, in many cases, unique collections of great interest to both researchers and the general public. The collection is very varied, and includes an extensive collection of official bulletins, illustrated magazines, satirical publications, women’s press, modern cultural magazines, political newspapers, underground press, etc.
The Virtual Library of Historical Newspapers uses the Digibib digital bibliographic application, developed by Digibis S.L. The digitisation work not only involves scanning the documents; the collections also undergo an in-depth bibliographic process via which they are assigned the necessary metadata to enable search and retrieval functions.
As its central cataloguing format, the BVPH works with and uploads records using the MARC21 format (https://www.loc.gov/marc/), and three of its variants: marc21 for bibliographic data, for holdings data, and for authority data.
Using records in the MARC21 format, the application automatically creates a series of mappings in various formats allowing users to download the metadata of all these variants: Dublin Core, Ficha, ISBD, MODS, MARCXML, MARC label, Bibtex, Jisc, Mets, EDM, SKOS, ALTO, etc.
As we said, the MARC21 format is used to catalogue works, but to upload digital objects, the METS (http://www.loc.gov/standards/mets/) format is used; while for the optical recognition of characters the ALTO (http://www.loc.gov/standards/alto/) format is used; and for the conservation of our conservation system (http://travesia.mcu.es/portalnb/jspui/handle/10421/9003) the PREMIS format is used (http://www.loc.gov/standards/premis/).
With regard to image formats, the BVPH allows disseminated digital copies to be downloaded in average quality in JPG and PDF formats, while for conservation, it uses TIF format.
The digital newspaper library website offers many different options for searching, viewing and downloading content, allowing BVPH users to carry out in-depth research. In addition to these options, the BVPH offers the possibility of searching its records via an SRU server (http://prensahistorica.mcu.es/en/estaticos/contenido.cmd?pagina=estaticos/sru) as well as with the OAI-PMH protocol (https://www.openarchives.org/pmh/).
With regard to this last protocol, the Virtual Library of Historical Newspapers has an OAI-PMH repository (http://prensahistorica.mcu.es/i18n/oai/oai.cmd) that allows its records to be compiled by repositories or OAI-PMH aggregators such as HISPANA (http://hispana.mcu.es/en/estaticos/contenido.cmd?pagina=estaticos/presentacion), EUROPEANA (http://www.europeana.eu/portal/es), WordCat (https://www.worldcat.org/), etc. The repository is listed as a supplier of data in the OAI-PMH registry of providers of the Open Archives Initiative (http://www.openarchives.org/Register/BrowseSites) and at OAISter (http://www.oaister.org/viewcolls.html).
As for dissemination, apart from metadata and digital objects, the BVPH offers a news section and an RSS channel that allow users to stay abreast of updates that take place with regard to application and collection.
Although the first BVPH versions did not offer the option of searching the content of publications (within the text of the digitised images), technological evolution and user demands led us to implement this feature that allows users to search for any word on any page of any digitised newspaper. The OCR (Optical Character Recognition) ALTO (Analyzed Layout Text Object) format was thus chosen; a free and open format, duly documented and maintained by the Library of Congress since 2009.
On the one hand, ALTO allows a facsimile image of the digitised newspaper to be maintained and, on the other hand, below, it shows text files of the texts and indicates the coordinates of each of the characters recognised on a page, making it possible to index and search the entire text content.
This feature provides the public with a tool allowing a much more in depth and efficient way to research historical newspapers; one of the richest, most varied and most characteristic resources of information of recent centuries.