首页 | 官方网站   微博 | 高级检索  
     


Near-duplicate document image matching: A graphical perspective
Authors:Li Liu  Yue Lu  Ching Y Suen
Affiliation:1. Department of Computer Science and Technology, East China Normal University, Shanghai 200241, China;2. ECNU-SRI Joint Lab for Pattern Analysis and Intelligent System, Shanghai Research Institute of China Post Group, Shanghai 200062, China;3. Centre for Pattern Recognition and Machine Intelligence, Concordia University, Montreal, Canada, H3G1M8
Abstract:A near-duplicate document image matching approach characterized by a graphical perspective is proposed in this paper. Document images are represented by graphs whose nodes correspond to the objects in the images. Consequently, the image matching problem is then converted to graph matching. To deal with the instability of object segmentation, a multi-granularity object tree is constructed for a document image. Each level in the tree corresponds to one possible object segmentation, while different levels are characterized by various object granularities. Some graphs can be generated from the tree and the objects associated with each graph may be of different granularities. Two graphs with the maximum similarity are found from the multi-granularity object trees of the two near-duplicate document images which are to be matched. The encouraging experimental results have demonstrated the effectiveness of the proposed approach.
Keywords:Document images  Near-duplicate documents  Document image matching  Graph representation  Multi-granularity object tree  Graph matching
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号