面向自然场景图像的三阶段文字识别框架 A three-stage text recognition framework for natural scene images期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

面向自然场景图像的三阶段文字识别框架

引用本文：	邹北骥,杨文君,刘姝,姜灵子.面向自然场景图像的三阶段文字识别框架[J].浙江大学学报(理学版),2021,48(1):1-8.

作者姓名：	邹北骥杨文君刘姝姜灵子

作者单位：	1.中南大学计算机学院,湖南长沙 410083 2.湖南省机器视觉与智慧医疗工程技术研究中心,湖南长沙 410083

基金项目：	国家自然科学基金资助项目(61902435)；科技部重大项目(2018AAA0102102)；湖南省科技计划项目(2017WK2074);教育部学科创新引智基地项目(B18059)；湖南省自然科学基金资助项目(2019JJ50808)；2020年大学生创新创业训练计划支持项目（GCX2020325Y）.

摘要：	文字识别技术在文档管理、图像理解、视觉导航等中具有重要应用。然而，自然场景中的文字通常排列任意、形状不一、字体多样，难以被检测和识别。提出了面向自然场景图像的三阶段文字识别框架，该框架包括文字检测、文字矫正和文字识别。首先，利用特征金字塔网络分割图像中的字符，基于双向长短期记忆网络获取字符间的亲和度，连接孤立字符构建单词行，文字检测率（F分数）高达91.97%。然后，通过多目标矫正网络矫正被检测文字，以应对场景图像文字的复杂形变，增强阅读性。最后，通过注意力序列识别网络按序输出预测结果，实现单词级识别，文字识别正确率达84.98%。
关键词：	文字识别自然场景文字检测文字矫正
收稿时间：	2020-09-23
A three-stage text recognition framework for natural scene images

ZOU Beiji,YANG Wenjun,LIU Shu,JIANG Lingzi.A three-stage text recognition framework for natural scene images[J].Journal of Zhejiang University(Sciences Edition),2021,48(1):1-8.

Authors:	ZOU Beiji YANG Wenjun LIU Shu JIANG Lingzi

Affiliation:	1.School of Computer Science and Engineering, Central South University, Changsha 410083, China 2.Hunan Engineering Research Center of Machine Vision and Intelligent Medicine, Changsha 410083, China

Abstract:	Text recognition technology plays an important role in applications such as document management,image understanding,and visual navigation.However,the appearances text in natural scenes are often of arbitrary orientation,different shape and various fonts which makes it difficult to be detected and recognized.For natural scene images with irregular texts,a three-stage text recognition framework for natural scene images is proposed,including text detection,rectification and recognition.Firstly,a feature pyramid network is used to segment the character instances,and the affinity among them is predicted by a bidirectional long short-term memory,so as to group the isolated characters into words. It is reported that the F-score of text detection is as high as 91.97%.The detected words are then rectified by a multi-object rectification network,which can deal with complicated distortion of scene text to improve its readability.Finally,an attention-based sequence recognition network outputs the predictions in sequence to achieve the word-level recognition,where the recognition accuracy is as high as 84.98%.

Keywords:	text recognition natural scene text detection text rectification
本文献已被 CNKI 等数据库收录！
	点击此处可从《浙江大学学报(理学版)》浏览原始摘要信息
	点击此处可从《浙江大学学报(理学版)》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏