期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Learning discriminative motion feature for enhancing multi-modal action recognition

《Journal of Visual Communication and Image Representation》2021

Video action recognition is an important topic in computer vision tasks. Most of the existing methods use CNN-based models, and multiple modalities of image features are captured from the videos, such as static frames, dynamic images, and optical flow features. However, these mainstream features contain much static information including object and background information, where the motion information of the action itself is not distinguished and strengthened. In this work, a new kind of motion feature is proposed without static information for video action recognition. We propose a quantization of motion network based on the bag-of-feature method to learn significant and discriminative motion features. In the learned feature map, the object and background information is filtered out, even if the background is moving in the video. Therefore, the motion feature is complementary to the static image feature and the static information in the dynamic image and optical flow. A multi-stream classifier is built with the proposed motion feature and other features, and the performance of action recognition is enhanced comparing to other state-of-the-art methods. 相似文献

2.

基于3D ResNet-LSTM的多视角人体动作识别方法

杨思佳辛山刘悦张雷《电讯技术》2023,23(6)

在基于视频图像的动作识别中,由于固定视角相机所获取的不同动作视频存在视角差异,会造成识别准确率降低等问题。使用多视角视频图像是提高识别准确率的方法之一,提出基于三维残差网络（3D Residual Network,3D ResNet）和长短时记忆（Long Short-term Memory,LSTM）网络的多视角人体动作识别算法,通过3D ResNet学习各视角动作序列的融合时空特征,利用多层LSTM网络继续学习视频流中的长期活动序列表示并深度挖掘视频帧序列之间的时序信息。在NTU RGB+D 120数据集上的实验结果表明,该模型对多视角视频序列动作识别的准确率可达83.2%。相似文献

3.

Image label transfer: Short video labelling by using frame auto-encoder

吕朝辉黄诣洋《中国邮电高校学报(英文版)》2020,27(1):92-99

The number of short videos on the Internet is huge, but most of them are unlabeled. In this paper, a rough labelling method of short video based on the neural network of image classification is proposed. Convolutional auto-encoder is applied to train and learn unlabeled video frames, in order to obtain the feature in certain level of the network. Using these features, we extract key-frames of the video by our method of feature clustering. We put these key-frames which represent the video content into the image classification network, so that we can get the labels for every video clip. We also compare the different architectures of convolutional auto-encoder, while optimizing and selecting the better performance architecture through our experiment result. In addition, the video frame feature from the convolutional auto-encoder is compared with those features from other extraction methods. On the whole, this paper propose a method of image labels transferring for the realization of short video rough labelling, which can be applied to the video classes with few labeled samples. 相似文献

4.

Video description with subject,verb and object supervision

Wang Yue Liu Jinlai Wang Xiaojie 《中国邮电高校学报(英文版)》2019,26(2):52-58

相似文献

5.

融合CNN与LSTM的网络质量KQI数据特征提取与投诉预警

蒋仕宝杜翠凤聂丹彤《移动通信》2020,(2):69-75

为解决传统网络质量KQI数据难以提取有效特征的问题,提出一种融合CNN和LSTM的网络质量KQI数据特征提取与预测方法。首先,分别采用CNN和LSTM获取KQI数据的特征表述和隐含层特征向量;然后引入Soft Attention Model来获得注意力分配概率分布;再将注意力分配概率分布与隐含层特征向量加权求和得到融合特征表达,从而得到数据的融合特征表达--空间维度和时间维度,并以多步预测的方法验证融合特征的有效性。研究表明,本文提出的算法能够有效预测、定位用户投诉问题,网优部门可根据实时的诊断结果,结合设备优化充分改善现有的网络质量,实现网络质量的主动干预,提升用户满意度。相似文献

6.

一种基于SVM主动学习的卡通视频检测方法

高新波田春娜张娜《电子与信息学报》2007,29(6):1338-1342

通过分析卡通与非卡通视频在视觉上的差异,对视频片断提取了MPEG-7描述子等8组视觉特征来构造卡通视频的特征空间;并将主动相关反馈技术引入到支撑向量机(SVM)算法中,设计了一种基于主动学习的卡通视频检测分类方法。利用大量实际视频片断所做的测试实验结果表明,该文选取的特征对卡通和非卡通视频有较好的区分能力;且与单纯的SVM算法以及传统相关反馈和SVM算法结合的方法相比,该文算法在检测性能上有较大的优势。相似文献

7.

基于显著特征增强的跨模态视频片段检索

杨金福刘玉斌宋琳闫雪《电子与信息学报》2022,44(12):4395-4404

随着视频获取设备和技术的不断发展,视频数量增长快速,在海量视频中精准查找目标视频片段是具有挑战的任务。跨模态视频片段检索旨在根据输入一段查询文本,模型能够从视频库中找出符合描述的视频片段。现有的研究工作多是关注文本与候选视频片段的匹配,忽略了视频上下文的“语境”信息,在视频理解时,存在对特征关系表达不足的问题。针对此,该文提出一种基于显著特征增强的跨模态视频片段检索方法,通过构建时间相邻网络学习视频的上下文信息,然后使用轻量化残差通道注意力突出视频片段的显著特征,提升神经网络对视频语义的理解能力。在公开的数据集TACoS和ActivityNet Captions的实验结果表明,该文所提方法能更好地完成视频片段检索任务,比主流的基于匹配的方法和基于视频-文本特征关系的方法取得了更好的表现。相似文献

8.

A novel recurrent hybrid network for feature fusion in action recognition

《Journal of Visual Communication and Image Representation》2017

Action recognition in video is one of the most important and challenging tasks in computer vision. How to efficiently combine the spatial-temporal information to represent video plays a crucial role for action recognition. In this paper, a recurrent hybrid network architecture is designed for action recognition by fusing multi-source features: a two-stream CNNs for learning semantic features, a two-stream single-layer LSTM for learning long-term temporal feature, and an Improved Dense Trajectories (IDT) stream for learning short-term temporal motion feature. In order to mitigate the overfitting issue on small-scale dataset, a video data augmentation method is used to increase the amount of training data, as well as a two-step training strategy is adopted to train our recurrent hybrid network. Experiment results on two challenging datasets UCF-101 and HMDB-51 demonstrate that the proposed method can reach the state-of-the-art performance. 相似文献

9.

基于改进SBR算法的人脸特征点稳定检测

王宇胡哲昊涂晓光刘建华蒋涛许将军原子昊杜金花《电讯技术》2023,63(5):719-724

基于图像的特征点检测器在静态图像上取得了卓越的性能,然而这些方法应用于视频或序列图像时其精度和稳定性显著降低。配准监督(Supervision-by-Registration, SBR)算法利用光流算法(Lucas-Kanade, LK)追踪,可通过无标注视频训练针对视频的特征点检测器,已取得较好的结果,但LK算法仍存在一定局限性,导致检测的特征点序列在时空上的连贯性不强。为获得精准、稳定、连贯的人脸特征点序列检测效果,提出了平滑一致性损失函数、权重掩码函数对传统SBR网络模型进行改进。网络中添加长短期记忆网络(Long Short-Term Memory, LSTM)提高模型训练鲁棒性,在模型训练中使用平滑一致性损失函数提供稳定性约束,获得准确且稳定的人脸视频特征点检测器。在300VW、Youtube Celebrities数据集上的验证显示,SBR改进模型将人脸视频特征点检测的标准化平均误差(Normalized Mean Error, NME)从4.74降低至4.56,且视觉上人脸特征点检测的抖动显著减少。相似文献

10.

Gesture recognition for human–machine interaction in table tennis video based on deep semantic understanding

《Signal Processing: Image Communication》2020

The analysis of moving objects in videos, especially the recognition of human motions and gestures, is attracting increasing emphasis in computer vision area. However, most existing video analysis methods do not take into account the effect of video semantic information. The topological information of the video image plays an important role in describing the association relationship of the image content, which will help to improve the discriminability of the video feature expression. Based on the above considerations, we propose a video semantic feature learning method that integrates image topological sparse coding with dynamic time warping algorithm to improve the gesture recognition in videos. This method divides video feature learning into two phases: semi-supervised video image feature learning and supervised optimization of video sequence features. Next, a distance weighting based dynamic time warping algorithm and K-nearest neighbor algorithm is leveraged to recognize gestures. We conduct comparative experiments on table tennis video dataset. The experimental results show that the proposed method is more discriminative to the expression of video features and can effectively improve the recognition rate of gestures in sports video. 相似文献

11.

基于三维卷积和哈希方法的视频检索算法

陈汗青李菲菲陈虬《电子科技》2022,35(4):35-39

视频信息检索与其他多媒体检索的最大不同在于视频信息量较大,因此进行视频间相似度计算时的计算量较大。此外,对视频特征的提取中常常忽略视频帧之间的时间相关性,从而导致特征提取不充分,影响视频检索的精度。为此,文中提出基于三维卷积和哈希方法的视频检索方法。该方法构建了一个端到端的框架,使用三维卷积神经网络来提取视频中代表帧的特征,并将视频特征映射到低维的汉明空间中去,在汉明空间计算相似度。在两个视频数据集下的实验结果表明,相较于当前最新的视频检索算法,文中所提方法在精度上有较大的提升。相似文献

12.

Video summarization via global feature difference optimization

ZHANG Yunzuo LIU Yameng 《光电子快报》2023,19(9):570-576

Video summarization aims at selecting valuable clips for browsing videos with high efficiency. Previous approaches typically focus on aggregating temporal features while ignoring the potential role of visual representations in summarizing videos. In this paper, we present a global difference-aware network (GDANet) that exploits the feature difference across frame and video as guidance to enhance visual features. Initially, a difference optimization module (DOM) is devised to enhance the discriminability of visual features, bringing gains in accurately aggregating temporal cues. Subsequently, a dual-scale attention module (DSAM) is introduced to capture informative contextual information. Eventually, we design an adaptive feature fusion module (AFFM) to make the network adaptively learn context representations and perform feature fusion effectively. We have conducted experiments on benchmark datasets, and the empirical results demonstrate the effectiveness of the proposed framework. 相似文献

13.

融入注意力机制的深度学习动作识别

张宇张雷《电讯技术》2021,61(10):1205-1212

针对现有的深度学习方法在人体动作识别中易出现过拟合、易受到干扰信息影响、特征表达能力不足的问题,提出了一种融入注意力机制的深度学习动作识别方法.该方法在数据预处理中提出了视频数据增强算法,降低了模型过拟合的风险,然后在视频帧采样过程中对现有的采样算法进行了改进,有效抑制了干扰信息的影响,并在特征提取部分提出了融入注意力的残差网络,提高了模型的特征提取能力;之后,利用长短时记忆(Long Short-Term Memory,LSTM)网络解决了空间特征的时序关联问题;最后,通过Softmax完成了相应动作的分类.实验结果表明,在UCF YouTube、KTH和HMDB-51数据集上,所提方法的识别率分别为96.72％、98.06％和64.81％. 相似文献

14.

Sign language recognition based on global-local attention

《Journal of Visual Communication and Image Representation》2021

相似文献

15.

DeepRD:LSTM-based Siamese network for Android repackaged applications detection

Run WANG Benxiao TANG Li’na WANG 《通信学报》2018,39(8):69-82

The state-of-art techniques in Android repackaging detection relied on experts to define features,however,these techniques were not only labor-intensive and time-consuming,but also the features were easily guessed by attackers.Moreover,the feature representation of applications which defined by experts cannot perform well to the common types of repackaging detection,which caused a high false negative rate in the real detection scenario.A deep learning-based repackaged applications detection approach was proposed to learn the program semantic features automatically for addressing the above two issues.Firstly,control and data flow analysis were taken for applications to form a sequence feature representation.Secondly,the sequence features were transformed into vectors based on word embedding model to train a Siamese LSTM network for automatically program feature learning.Finally,repackaged applications were detected based on the similarity measurement of learned program features.Experimental results show that the proposed approach achieves a precision of 95.7% and false negative rate of 6.2% in an open sourced dataset AndroZoo. 相似文献

16.

基于LSTM网络和特征融合的通信干扰识别

魏迪曾海彬洪锋马松袁田《电讯技术》2022,62(4):450-456

针对现有通信干扰信号识别方法识别效果不佳的问题,提出了一种基于长短时记忆网络(Long Short-Term Memory,LSTM)和特征融合的通信干扰识别方法.该方法利用LSTM网络提取干扰信号的特征,通过LSTM强大的序列特征提取能力提升干扰信号特征提取的性能;通过提取信号的时域和频域特征后进行特征融合,使用全连... 相似文献

17.

基于深度卷积与全局特征的图像密集字幕描述

下载免费PDF全文

武文博顾广华刘青茹赵志明李刚《信号处理》2020,36(9):1525-1532

为了解决图像密集字幕描述中感兴趣区域(Regions of interest,ROI)定位不准确与区域粗粒度描述问题,本文提出了一种基于深度卷积与全局特征的图像密集字幕描述算法,该算法采用残差网络与并行LSTM(Long Short Term Memory)网络的联合模型对存在的区域重叠定位和粗粒度描述细节信息不完整问题进一步改进。首先利用深度残差网络与Faster R-CNN(Faster R-Convolutional Neural Network)的RPN(Regional Proposal Network)层获取更精准区域边界框,以便避免区域标记重叠;然后将全局特征、局部特征和上下文特征信息分别输入并行LSTM网络且采用融合算子将三种不同输出整合以获得最终描述语句。通过在公开数据集上与两种主流算法对比表明本文模型具有一定优越性。相似文献

18.

融合空间-时间双网络流和视觉注意的人体行为识别

刘天亮谯庆伟万俊伟戴修斌罗杰波《电子与信息学报》2018,40(10):2395-2401

该文受人脑视觉感知机理启发,在深度学习框架下提出融合时空双网络流和视觉注意的行为识别方法。首先,采用由粗到细Lucas-Kanade估计法逐帧提取视频中人体运动的光流特征。然后,利用预训练模型微调的GoogLeNet神经网络分别逐层卷积并聚合给定时间窗口视频中外观图像和相应光流特征。接着,利用长短时记忆多层递归网络交叉感知即得含高层显著结构的时空流语义特征序列;解码时间窗口内互相依赖的隐状态;输出空间流视觉特征描述和视频窗口中每帧标签概率分布。其次,利用相对熵计算时间维每帧注意力置信度,并融合空间网络流感知序列标签概率分布。最后,利用softmax分类视频中行为类别。实验结果表明,与其他现有方法相比,该文行为识别方法在分类准确度上具有显著优势。相似文献

19.

Temporal matching prior network for vehicle license plate detection and recognition in videos

Seok Bong Yoo Mikyong Han 《ETRI Journal》2020,42(3):411-419

In real‐world intelligent transportation systems, accuracy in vehicle license plate detection and recognition is considered quite critical. Many algorithms have been proposed for still images, but their accuracy on actual videos is not satisfactory. This stems from several problematic conditions in videos, such as vehicle motion blur, variety in viewpoints, outliers, and the lack of publicly available video datasets. In this study, we focus on these challenges and propose a license plate detection and recognition scheme for videos based on a temporal matching prior network. Specifically, to improve the robustness of detection and recognition accuracy in the presence of motion blur and outliers, forward and bidirectional matching priors between consecutive frames are properly combined with layer structures specifically designed for plate detection. We also built our own video dataset for the deep training of the proposed network. During network training, we perform data augmentation based on image rotation to increase robustness regarding the various viewpoints in videos. 相似文献

20.

基于混合结构神经网络的自适应背景模型 总被引：1，自引：0，他引：1

下载免费PDF全文

王志明张丽包宏《电子学报》2011,39(5):1053-1058

本文提出一种基于神经网络的视频中运动目标检测自适应背景模型.对每个像素点(或局部区域)建立一个混合结构的神经网络背景模型,模型由一个4层前馈神经网络组成,输入层接受像素HSV特征,特征层实现特征提取功能,模式层以概率神经网络的方式完成像素属于背景概率的计算,输出层以赢者取胜的方式完成前景背景分类和模式层激活节点选择功能... 相似文献