首页 | 官方网站   微博 | 高级检索  
     

连通区的页面分割与分类方法
引用本文:王姝华,曹阳,李佐,蔡士杰.连通区的页面分割与分类方法[J].计算机辅助设计与图形学学报,2002,14(1):17-20,25.
作者姓名:王姝华  曹阳  李佐  蔡士杰
作者单位:1. 南京大学计算机软件新技术国家重点实验室,南京,210093
2. 香港理工大学建筑与房地产系,香港
摘    要:页面分割与分类是文档处理的关键步骤,但目前多数方法对页面的块和倾斜进行了限制,文中提出一种新的基于连通区的页面分割与分类方法,首行采用快速算法抽取页面内的连通区,然后利用改进的PLSA算法分割页面,并根据连通区的分布情况以及块的特征对块进行分类,该方法页面分割与分类紧密结合,充分考虑到块的局部特征,保证块分类的正确性,大大提高了算法效率。

关 键 词:文档处理  页面分割  页面分类  PLSA  图像处理  计算机

Approach to Page Segmentation and Classification
Wang Shuhua,Cao Yang,Li Zuo,Cai Shijie.Approach to Page Segmentation and Classification[J].Journal of Computer-Aided Design & Computer Graphics,2002,14(1):17-20,25.
Authors:Wang Shuhua  Cao Yang  Li Zuo  Cai Shijie
Affiliation:Wang Shuhua 1) Cao Yang 2) Li Zuo 1) Cai Shijie 1) 1)
Abstract:Page segmentation and classification is the key procedure in document processing. But most current algorithms can only process pages with limited shape of blocks and no skew angle. In this paper, a new approach to page segmentation and classification based on connected components is introduced. First, the connected components in page image are extracted quickly. Then a RLSA algorithm based on the connected components is adopted for page segmentation. Furthermore, distribution of the connected components in one block and global features of the block are analyzed to classify different blocks. This approach not only combines the page segmentation and classification together, which improves the running efficiency, but also takes into consideration the local features of block, which assures the correctness of block classification.
Keywords:document processing  page segmentation  page classification  Run\|Length Smoothing Algorithm
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号