Workflow in the GDZ
For every item to be digitized in the GDZ data are collected in the proprietary developed workflow administration "Goobi" (open source), according to the concrete requirements of the digitization (flare angle, resolution, colour depth), to be able to control the process of digitization at any time.
For the partially very old books route cards made of acid-free paper are used in the manner that page and ink have no contact. While scanning different levels of difficulty - which can result from the material - are causing different scanning times. A good acquainted scan operator is able to produce about 350 pages per hour, that means 175 scans are done (out of material of medium difficulty).
During the quality control the scans are checked up on readability, order and completeness. Partially postprocessing will take place here.
Afterwards the bitonal digitized data are worked over in the batch mode. For this purpose basing points are specified, which are essential for the batch processing. For example in mathematics it has to be defined, how small the smallest index in a formula should be or how big the dot on the i respectively. After this the smaller black clouds of pixels can be erased according to these requirements. Additionally the alignment and centring of the text block on each single page is done. With the aid of an adapted version of the CAD-software "Pixedit" these optimizations can be done over night fully automated, so that the particular daily production is optimized completely.
The aim is a presentation that remains constant over all pages. For those data which are not yet touched up in the batch mode (Digital Master), two backup copies are archived on CD-ROM and will be referenced in EROMM. The formal indexing of the digitized data benefits from the metadata tool developed for the DFG project RusDML. With the tool, geared to collaboration, first metadata are collected like title, volume, year, issue, PPN of the original and of the digitized data. This happens before the scan. Concerning journals and collected editions, once scanned, each article with author and page number is recorded. Later on the correct connection of the printed page number, structural unit and linked digitized data will be important. An examination of the correct mapping will be done by the indexing staff by looking up the beginning of the article.
The RusDML-tool exports valid METS-data, in which all technical and structural metadata are included. These can be imported directly into the open-source-system of document management (proprietary development based on TYPO3) and are consequently available online.
The data of the exemplar have to be completed, while temporary indexed in the GBV catalogue, to proof the digitized data. More precisely it is about additional categories which descripe questions about the format and expose the possession of rights, also the URL and a proof in the EROMM database.
The PPN of the original is referenced for the complete catalogue recording and vice versa the indexed printed version is referenced to the PPN of the digitized data. Finally the import into the central index of digitized prints (ZVDD)[2] will be made automated.
In addition to the indexing described through metadata at volume and substructure level, data for a full text version will be recorded from those books which are not printed with gothic types.
On this matter the OCR software „Finereader“, in fact a version modified by the GDZ, will be integrated into the workflow. This offers the following options:
- Indexing of the word coordinates to highlight the hits in the image provided
- Supporting the recording of the substructure data (paging sequences etc.) through OCR
- Structuring of the full texts in the TEI-format through an automated matching by dint of the available substructure data
Just the last point allows a scholarly treatment of the normally invisible full texts. Depending on the OCR quality a text structured this way can be provided to interested users, for instance over the OAI interface of the DMS system.




