首页 | 官方网站   微博 | 高级检索  
     


Machine Learning for Intelligent Processing of Printed Documents
Authors:Floriana Esposito  Donato Malerba  Francesca A Lisi
Affiliation:(1) Dipartimento di Informatica, Università degli Studi di Bari, via Orabona 4, 70125 Bari, Italy;(2) Dipartimento di Informatica, Università degli Studi di Bari, via Orabona 4, 70125 Bari, Italy;(3) Dipartimento di Informatica, Università degli Studi di Bari, via Orabona 4, 70125 Bari, Italy
Abstract:A paper document processing system is an information system component which transforms information on printed or handwritten documents into a computer-revisable form. In intelligent systems for paper document processing this information capture process is based on knowledge of the specific layout and logical structures of the documents. This article proposes the application of machine learning techniques to acquire the specific knowledge required by an intelligent document processing system, named WISDOM++, that manages printed documents, such as letters and journals. Knowledge is represented by means of decision trees and first-order rules automatically generated from a set of training documents. In particular, an incremental decision tree learning system is applied for the acquisition of decision trees used for the classification of segmented blocks, while a first-order learning system is applied for the induction of rules used for the layout-based classification and understanding of documents. Issues concerning the incremental induction of decision trees and the handling of both numeric and symbolic data in first-order rule learning are discussed, and the validity of the proposed solutions is empirically evaluated by processing a set of real printed documents.
Keywords:learning and knowledge discovery  intelligent information systems  intelligent document processing  decision-tree learning  first-order rule induction
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号