首页 | 官方网站   微博 | 高级检索  
     


The IAM-database: an English sentence database for offline handwriting recognition
Authors:U-V Marti  H Bunke
Affiliation:(1) Department of Computer Science, University of Bern, Neubrückstrasse 10, 3011 Bern, Switzerland; e-mail: {marti,bunke}@iam.unibe.ch , CH
Abstract:In this paper we describe a database that consists of handwritten English sentences. It is based on the Lancaster-Oslo/Bergen (LOB) corpus. This corpus is a collection of texts that comprise about one million word instances. The database includes 1,066 forms produced by approximately 400 different writers. A total of 82,227 word instances out of a vocabulary of 10,841 words occur in the collection. The database consists of full English sentences. It can serve as a basis for a variety of handwriting recognition tasks. However, it is expected that the database would be particularly useful for recognition tasks where linguistic knowledge beyond the lexicon level is used, because this knowledge can be automatically derived from the underlying corpus. The database also includes a few image-processing procedures for extracting the handwritten text from the forms and the segmentation of the text into lines and words. Received September 28, 2001 / Revised October 10, 2001
Keywords:: Handwriting recognition –  Database –  Unconstrained English sentences –  Corpus –  Linguistic knowledge
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号