CLOUTODO_Logo

With its comfortable, seamless workflow, docWizz has established itself as a global leader in conversion software: The pages of scanned newspaper archives and library holdings are converted, enriched with sustainable METS/ALTO metadata, secured long-term and made available for flexible further use.

Seamless Workflow

From importing the scans to exporting the METS/ALTO or IIIF files, docWizz runs through all the conversion steps (cropping, deskewing, zoning, layout analysis and OCR) in one seamless workflow. This all-in-one application, combined with continuously optimised processes, results in projects that are both cost and time effective.

Artificial Intelligence

docWizz proven layout analysis is now optional enhanced by machine-learning. This automated step gets visibly more precise results that significantly reduce the amount of any additional manual work. For an even more precise layout analysis, we can use individual training data to tailor the analysis to your specific project materials.

Universally applicable

The flexible machine-learning-supported layout analysis allows docWizz to process any publication type and layout format. A broad choice of OCR Engines give access to a huge variety of languages and writing systems. docWizz scales easily between projects of a few thousand to many millions of pages.

Multiple file formats

Various import, export and metadata formats are supported. Import formats are TIF, JPG, JP2, GIF, PNG, BMP, CR2 and PDF. In the export you get METS (including both METS physical structural maps and METS logical structural maps) and ALTO XML, image files, IIIF, PDF, PDF/A, custom XML formats (full-text, other), RTF and EPUB. Metadata schemes are MIX, MARC21, MODS, DC.

Premium Support

With over 40 years of success in implementing large and mass digitization projects for renowned libraries and service providers such as The British Library and Digital Divide Data (DDD), our CCS team offers worldwide first-class service and professional support.

Thanks to the efficient and robust conversion with docWizz, you will produce data with high information content for your sustainable, searchable digital archive.

docWizz Layoutanalysis

1. Import

After the scanned print or microform document pages are imported, they undergo cropping and deskewing.

2. Zoning/ Layout Analysis

Supported by artificial intelligence, the structural elements of a page are identified. E.G Article headings, photos, paragraphs, captions.

3. Structure Analysis

The structural analysis includes the identification of the components of the entire publication, such as table of contents, articles, chapters and appendix.

4. Text Recognition (OCR)

From the set of supported OCR systems, docWizz will automatically select the best engine based on language, font, and zone information.

5. Export

In the final step, the data is outputted in METS/ALTO metadata standard format for libraries, saved and is then available for further use.

docWizz is used by innovative, renowned customers around the globe and is the software of choice for many service providers. To date, some 200 million document pages have been successfully processed with docWizz, including collections from 15 national libraries.