首页 | 官方网站   微博 | 高级检索  
     


Automatic generation of structured hyperdocuments from document images
Authors:Ji-Yeon Lee  Jeong-Seon Park  Hyeran Byun  Jongsub Moon  Seong-Whan Lee
Affiliation:a Center for Artificial Vision Research, Department of Computer Science and Engineering, Korea University, Anam-Dong, Seongbuk-ku, Seoul 136-701, South Korea
b Department of Computer Science, Yonsei University, 134 Shinchon-dong, Seodaemoon-ku, Seoul 120-749, South Korea
c Department of Electronics and Information Engineering, Korea University, Chochiwon, Yeongi-kun, Chungnam 339-800, South Korea
Abstract:As sharing documents through the World Wide Web has been recently and constantly increasing, the need for creating hyperdocuments to make them accessible and retrievable via the internet, in formats such as HTML and SGML/XML, has also been rapidly rising. Nevertheless, only a few works have been done on the conversion of paper documents into hyperdocuments. Moreover, most of these studies have concentrated on the direct conversion of single-column document images that include only text and image objects. In this paper, we propose two methods for converting complex multi-column document images into HTML documents, and a method for generating a structured table of contents page based on the logical structure analysis of the document image. Experiments with various kinds of multi-column document images show that, by using the proposed methods, their corresponding HTML documents can be generated in the same visual layout as that of the document images, and their structured table of contents page can be also produced with the hierarchically ordered section titles hyperlinked to the contents.
Keywords:Structured hyperdocument  Multi-column document  Document conversion  Document image understanding  Logical structure analysis
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号