首页 | 官方网站   微博 | 高级检索  
     

基于BiLSTM-Attention唇语识别的研究
引用本文:刘大运,房国志,骆天依,魏华杰,王倩,李修政,李骜.基于BiLSTM-Attention唇语识别的研究[J].计算技术与自动化,2020,39(1):150-155.
作者姓名:刘大运  房国志  骆天依  魏华杰  王倩  李修政  李骜
作者单位:哈尔滨理工大学计算机科学与技术学院,黑龙江 哈尔滨,150080;哈尔滨理工大学测控技术与通信工程学院,黑龙江 哈尔滨,150080;哈尔滨理工大学自动化学院,黑龙江 哈尔滨,150080
基金项目:黑龙江省大学生创新创业训练计划;国家自然科学基金
摘    要:为了解决唇语识别中唇部特征提取和时序关系识别存在的问题,提出了一种双向长短时记忆网络(BiLSTM)和注意力机制(Attention Mechanism)相结合的深度学习模型。首先将唇部20个关键点得到的唇部不同位置的高度和宽度作为唇部的特征,使用BiLSTM对唇部特征序列进行时序编码,然后利用注意力机制来发掘不同时刻唇部时序特征对于整体唇语识别的不同权重,最后利用Softmax进行分类。在公开的唇语识别数据集GRID和MIRACL-VC上与传统的唇语识别模型进行实验对比。在GRID数据集上准确率至少提高了13.4%,在MIRACL-VC单词数据集上准确率至少提高了15.3%,短语数据集上准确率至少提高了9.2%。同时还与其他编码模型进行了实验对比,实验结果表明该模型能有效地提高唇语识别的准确率。

关 键 词:唇语识别  双向长短时记忆网络  注意力机制  深度学习  时序编码

Research on Lip-reading Based on BiLSTM-Attention
LIU Da-yun,FANG Guo-zhi,LUO Tian-yi,WEI Hua-jie,Wang Qian,Li Xiu-zheng,Li Ao.Research on Lip-reading Based on BiLSTM-Attention[J].Computing Technology and Automation,2020,39(1):150-155.
Authors:LIU Da-yun  FANG Guo-zhi  LUO Tian-yi  WEI Hua-jie  Wang Qian  Li Xiu-zheng  Li Ao
Affiliation:(School of Computer Science and Technology,Harbin University of Scienceand Technology,Harbin,Heilongjiang 150080,China;School of Measurement and Control Technology and Communication Engineering,Harbin University of Science and Technology,Harbin,Heilongjiang 150080,China;School of Automation,Harbin University of Science and Technology,Harbin,Heilongjiang 150080,China)
Abstract:In order to solve the existing problems in lip feature extraction and temporal relation recognition during the research of lip-reading,a deep learning model based on bi-directional long short-term memory(BiLSTM)and attention mechanism(Attention)is proposed.Firstly,the height and width of the different positions of the lip obtained from the 20 key points of the lip are taken as the characteristics of the lip.Secondly,the BiLSTM model is utilized to encode temporal information.Thirdly,the attention mechanism is used to explore different weights of lip sequential features at different times toward the overall lip language recognition.Finally,we use Softmax classifier to classify.Compared with the conventional lip-learning models at the current lip language recognition database GRID and MIRACL-VC,we find the recognition accuracy rate is more than 13.4%higher than that on GRID.In the MIRACL-VC word database,the accuracy rate increased by at least 15.3%,and the accuracy rate in the phrase database increased by at least 9.2%.At the same time,compared with other coding models,the experimental results show that this model can effectively improve the accuracy of lip-reading.
Keywords:lip-reading  bi-directional long short-term memory  attention mechanism  deep learning  sequential coding
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《计算技术与自动化》浏览原始摘要信息
点击此处可从《计算技术与自动化》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号