OCCAM (OCR, ClassificAtion & Machine Translation) responds to action line “Integration projects” on the integration (and extension) of CEF (Connecting Europe Facility) Automated Translation into multilingual digital cross-border services. The Action proposes the integration of image classification, Translation Memories (TMs), Optical Character Recognition (OCR), and Machine Translation (MT) to support the automated translation of scanned documents – a document type that currently cannot be processed by the CEF eTranslation service.
OCCAM will develop two use cases: (i) the Business Registers Interconnection System (BRIS) use case and (ii) the Digital Humanities use case.
For use case (i), OCCAM will develop an eDelivery-compliant Reference Implementation for the BRIS DSI. During the Action, the system will target the Belgian and the Czech Business Registers. The languages covered are: Dutch, French, German, Czech and English.
(1) use image classification to identify scanned documents
(2) retrieve corresponding source text and translations from Member State databases through existing APIs and TMs, as a primary translation workflow
(3) switch to OCR and MT when no such interfaces are available, as a secondary workflow
The Reference Implementation will be made adaptable, so it can be used post- Action to translate scanned documents from:
(1) other Member State Business Registers
(2) other DSIs, such as, e.g., the Online Dispute Resolution (ODR) DSI: OCR and MT can be used to recognise and translate text contained within images (e.g. in manuals for consumer goods) as used in various file formats during dispute resolution.
For use case (ii) OCCAM will expose its OCR and MT core as middleware services that can be used by the Open Source community to assist in OCR and MT tasks for the Digital Humanities domain. Dedicated OCR and MT models will be trained and made available through open source repositories. The existing European Digital Humanities network will be used to promote the use of OCCAM. Examples and connectors with existing Open Source packages and Digital Humanities collaboration platforms will be developed to ensure the uptake of the solution.
One of the most important achievements of the OCCAM Action will be that it will contribute to an integrated use of pan-European infrastructures: pan-European Digital Humanities Infrastructures and Digital Service Infrastructures will be used for knowledge creation across languages and countries, and contribute to the accessibility of European cultural heritage to all its citizens, in their own language.