GDZ

Document Server

Services

Development
Workflow / Goobi
Standards

Projects

Download

Provider


Modules for the digital library: Goobi

There are good reasons to do the retro digitizing of stock directly on location, namely in the libraries. To mention is here: reduction of transport costs, controlled carefully handling of endangered stock or quick availability on site. Deciding compared to an external solution is particularly the advantage that experienced library staff can be integrated at the digitizing and indexing process.

Just the indexing as regards content of the digitized data through meta- and substructural data is of vital importance: search in the full text or direct navigation in the (digital) tables of contents in the web browser is only possible by indexing. Also the scholarly infrastructure of the next generation (like: Semantic Web) depends on metadata and can't do anything with pure scanned images.

The digitizing and indexing of own stock confronts the libraries with some problems, mainly regarding organisational things. Already from a scan volume at medium level of several books per day it seems to be reasonable to split the procedures (like: preparation, digitizing, quality assurance, image improvement, indexing of meta- and substructural data) to different specialized staff. In projects sponsored by third-party funds there is often the following aggravating factor: different work steps are done at different locations based on division of labour. To be able to control this kind of production regardless of location or time a workflow software for digital libraries was developed at the Digitizing Center in cooperation with the Research and Development Department of the SUB. In the project "RusDML" (Russian Digital Mathematics Library) - with the aim to build up a digital core archive of Russian articels concerning mathematics over two years -  the distributed production of altogether 283.000 pages was coordinated as a prototype.

click here to scale up

The organisation of the workflow drafted in the screenshot makes very high demands on the software to be used. The software package for this cooperative and collaborative work environment is the central module and fulfils the following functions:

  • Platform independence (web application), because the tool has to be available for the partners world wide
  • Central administration of the metadata, i. e. indexing and completion of the metadata from different locations (for example: generating the Russian metadata in Moscow, transliterations in Hannover)
  • Central administration of the digitized data (images)
  • Import and export interfaces for metadata and for external digitized data (for example from Russia)
  • Controlling mechanisms: which partner is at which stage of his work, what journal is at which stage of processing etc.
  • Problem reports, completion of a work step and forwarding to the next stage work level (possibly a changing of/delivery to a partner)
click here to scale up

Components for the distributed workflow management were integrated to ensure the administration of a distributed production and communication between the different partners. It can be adjusted for every single work step wether it is done in parallel to the previous or after its successful completion. Furthermore correction loops are mapped to annotate assessed defective data of previous steps later on. The application is conceived to be transparent, thus it is possible for each project partner to examine the stage of the substeps including allocation of persons for each single work detailed and at any time.

By registration of a persistent URL the released digitized data can be referenced. Afore the data are linked to the multilingual metadata - that are compiled out of different data bases - at article level. Here the project attends to the translation and transliteration in particular: to ensure a user-friendly search all titles are offered in the language of the original and additionally in English and/or Russian translation. All names and titles that occur originally in Cyrillic letters are transliterated into the Latin alphabet (according to ISO 9 and DIN 1460). By import into the document management system everything is available online.

click here to scale up

From the very beginning the workflow software was conceived to be reusable to the maximum. Consequentially it is used for the whole production at the GDZ. Interested requests of other libraries showed the general interest in such production software vividly. It is logical to advance the software and combine it with other modules - like the access control or the evaluation of use. By these means an extensive solution is generated to implement the (mass) digitizing under the name of "Goobi" (Goettingen online-objects binaries). The figure illustrates the modular structure. The software is public domain and according to this available open source. Medium term aim is to establish a developer community which guarantees the sustained support and further development  of the programs.

For further information click here: www.goobi.org