Reading Georgian Manuscripts Automatically on the eScriptorium Platform
Main Article Content
Abstract
The article outlines the development of means for an automatic reading of Georgian manuscripts on the eScriptorium platform and the first results achieved with them. After an overview of the efforts undertaken in applying Optical Character Recognition (OCR) to Georgian printed books since the late 1980’s and a short introduction into the basics of the eScriptorium approach to Handwritten Text Recognition (HCR) and its functionalities, it exemplifies the application of the three core procedures of eScriptorium, which consist in the automatic segmentation of text-covered regions and lines, the automatic transcription of the detected lines based on manual input and the training of appropriate models, and the alignment with existing electronic texts in order to provide reliable ground truth for further training. With a total of 292 manually transcribed pages and 3812 pages with aligned (but not yet always corrected) text that have been processed so far, there is a strong material basis for further improvement of the models and the reading results depending on them.