Handwritten Kazakh and Russian (HKR) database for text recognition

مؤلف البحث

Daniyar Nurseitov, Kairat Bostanbekov, Daniyar Kurmankhojayev, Anel Alimova, Abdelrahman Abdallah & Rassul Tolegenov

تاريخ البحث

Wed, 09/01/2021 - 12:00

قسم البحث

قسم تكنولوجيا المعلومات

مجلة البحث

Multimedia Tools and Applications

المشارك في البحث

abdoelsayed2016

ملخص البحث

In this paper, we introduce a large-scale dataset, called HKR, to address challenging detection and recognition problems of handwritten Russian and Kazakh text in scanned documents. We present a new Russian and Kazakh database (with about 95% of Russian and 5% of Kazakh words/sentences respectively) for offline handwriting recognition. A few pre-processing and segmentation procedures have been developed together with the database. The database is written in Cyrillic and shares the same 33 characters. Besides these characters, the Kazakh alphabet also contains 9 additional specific characters. This dataset is a collection of forms. The sources of all the forms in the datasets were generated by LaTeXwhich subsequently was filled out by persons with their handwriting. The database consists of more than 1500 filled forms. There are approximately 63000 sentences, more than 715699 symbols produced by approximately 200 different writers. It can serve researchers in the field of handwriting recognition tasks by using deep and machine learning. For experiments, we used several popular text recognition methods for word and line recognition like CTC-based and attention-based methods. The results indicate the diversity of HKR. The dataset is available at https://github.com/abdoelsayed2016/HKR_Dataset.

كلية الحاسبات والمعلومات

آخر الأبحاث

جامعة
اسيوط

روابط هامة

عنواننا

Typography

Body

General

Header

Main Menu

Footer

Copyright