首页 | 官方网站   微博 | 高级检索  
     

手写中文地址识别后处理方法的研究
引用本文:龙翀,庄丽,朱小燕,黄开竹,孙俊,堀田悦伸,直井聡.手写中文地址识别后处理方法的研究[J].中文信息学报,2006,20(6):71-76.
作者姓名:龙翀  庄丽  朱小燕  黄开竹  孙俊  堀田悦伸  直井聡
作者单位:1.清华大学计算机系智能技术与系统国家重点实验室2.富士通研究开发中心有限公司3.富士通研究所
基金项目:国家自然科学基金资助(60321002),富士通研究开发中心资助
摘    要:OCR(光学字符识别技术)作为方便有效的字体识别技术,在办公自动化、信息恢复、数字图书馆等方面发挥着日益重要的作用。语言模型在OCR后处理,特别是在中文的文字识别后处理方面有着广泛的应用。本文针对手写中文地址的后处理,讨论了语言模型的粒度对识别正确率的影响,分析了基于字和基于词的语言模型各自的优点和缺点,并采用了基于词的语言模型,在此基础上提出了加权词图搜索算法。实验证明,在58269条中文手写地址的测试集上,手写地址的整体识别率由原来的28.56%上升到了75.66% ,错误率下降了65.93% ,大大提高了系统的性能。

关 键 词:人工智能  模式识别  OCR  语言模型  后处理  
文章编号:1003-0077(2006)06-0069-06
收稿时间:2005-12-19
修稿时间:2005年12月19

A Post-processing Approach for Handwritten Chinese Address Recognition
LONG Chong,ZHUANG Li,ZHU Xiao-yan,HUANG Kai-zhu,SUN Jun,Yoshinobu Hotta,Satoshi Naoi.A Post-processing Approach for Handwritten Chinese Address Recognition[J].Journal of Chinese Information Processing,2006,20(6):71-76.
Authors:LONG Chong  ZHUANG Li  ZHU Xiao-yan  HUANG Kai-zhu  SUN Jun  Yoshinobu Hotta  Satoshi Naoi
Affiliation:1.Department of Computer Science and Technology State Key Laboratory of Intelligent Technology and Systems , Tsinghua University2.Information Technology Laboratory , Fujitsu R&D Center Co. Ltd.3.Fujitsu Laboratories Ltd
Abstract:OCR(Optical Character Recognition),a convenient and efficient automatic character recognition tool,is becoming more and more important in office automation, information recovery and digital library.Language Model is widely used in OCR post-processing,especially in Chinese.In this paper,we focus on the post-processing of handwritten Chinese addresses,and discuss the relationship between the granularity of language model and system performance.The character-based and the word-based language models are both discussed.Their advantages and disadvantages are also presented.After analysis,the word-based language model is adopted,and then weighted word graph and its algorithm are proposed.Experiments on 58269 handwritten Chinese addresses show that the performance of the OCR system has been greatly improved and the recognition precision increases from 28.56% to 74.15%,which means 63.82% error reduction.
Keywords:artificial intelligence  pattern recognition  OCR  language model  post-processing[KH 2mmD]
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号