首页 | 官方网站   微博 | 高级检索  
     

一种基于字词结合的汉字识别上下文处理新方法
引用本文:李元祥,丁晓青,吴佑寿.一种基于字词结合的汉字识别上下文处理新方法[J].计算机研究与发展,2002,39(7):838-842.
作者姓名:李元祥  丁晓青  吴佑寿
作者单位:1. 清华大学电子工程系智能技术与系统国家重点实验室,北京,100084;解放军理工大学气象学院,南京,211101
2. 清华大学电子工程系智能技术与系统国家重点实验室,北京,100084
基金项目:国家自然科学基金 (69972 0 2 4),国家“八六三”高技术研究发展计划基金 (863 -3 0 6-ZT0 3 -0 3 -1)资助
摘    要:根据字、词信息之间的互补性,提出一种字、词结合的上下文处理方法.在单字识别的基础上,首先利用前向一后向搜索算法在较大的候选集上进行基于字bigram模型的上下文处理,在提高文本识别率的同时可提高候选集的效率;然后在较小的候选集上进行基于词bigram模型的上下文处理.该方法在兼顾处理速度的同时,可有效地提高文本识别率.脱机手写体汉字文本(约6.6万字)识别中的实验表明:经字bigram模型处理,文本识别率由处理前的81.58%提高至94.50%,文本前10选累计正确率由94.33%提高到98.25%;再经词bigram模型处理,文本识别率进一步提高至95.75%。

关 键 词:汉字识别  语言模型  上下文处理  前向-后向搜索算法  候选集效率

A NOVEL METHOD BASED ON INTEGRATING CHARACTERSWITH WORDS FOR CONTEXTUAL PROCESSINGOF CHINESE CHARACTER RECOGNITION
Abstract:According to the complementarity between Chinese characters and Chinese words, a novel contextual processing method is put forward, which integrates character based language model with word based language model. On the basis of isolated character recognition, character based bigram post processing using forward backward search is first executed on big candidate sets, which improves both the recognition rate of document (RRD) and the efficiency of candidate sets (the accumulated recognition rate of the top ten candidates is greatly boosted). Then, word based bigram post processing is executed on small candidate sets to further improve the RRD. This method effectively improves the RRD while giving attention to the processing speed in the meantime. Experimental results on off line handwritten Chinese documents (about 66000 characters) demonstrate the effectiveness of the novel method: character based bigram post processing improves the RRD to 94.50% from 81.58% RRD before post processing, and the accumulated recognition rate of the top ten candidates boosts from 94.33% to 98.25%. The 95 75% RRD is obtained after word based bigram post processing.
Keywords:Chinese character recognition  language model  contextual processing  forward  backward search algorithm  efficiency of candidate set
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号