首页 | 官方网站   微博 | 高级检索  
 共查询到20条相似文献,搜索用时 93 毫秒
为了更好地对人体动作的长时时域信息进行建模,提出了一种结合时序动态图和双流卷积网络的人体行为识别算法.首先,利用双向顺序池化算法来构建时序动态图,实现视频从三维空间到二维空间的映射,用来提取动作的表观和长时时序信息;然后提出了基于inceptionV3的双流卷积网络,包含表观及长时运动流和短时运动流,分别以时序动态图和...  相似文献   

时序动作检测作为视频理解中的一项基本任务,被广泛应用于人机交互、视频监控、智能安防等领域.基于卷积神经网络,提出了一种改进的编码-解码时序动作检测算法.改进后的算法分两阶段进行:首先,替换特征提取网络,用残差结构网络提取视频帧的深度特征;之后,构建编码-解码时序卷积网络.采用联接的方式进行特征融合,改进上采样的形式,并运用新的激活函数LReLU进行训练,提高网络的检测精度.实验结果表明,所提算法在时序动作检测数据集MERL Shopping和GTEA上取得了优良的效果.  相似文献   

王庆文  胡海洋 《电子科技》2021,34(8):14-18,86
在智能制造环境中,基于动作识别的工作流识别方法难以定位出视频中工作流活动的开始和结束时间.为了从视频中对工作流中的活动进行时序定位,文中对R-C3D网络模型进行改进并提出了一种基于时序行为检测的工作流识别方法.在文中所提出的工作流识别方法中,采用一种随机稀疏采样策略来减少相邻视频帧的冗余,并使用Res3D网络来提取视频...  相似文献   

陈莹  龚苏明 《电子与信息学报》2021,43(12):3538-3545
针对现有通道注意力机制对各通道信息直接全局平均池化而忽略其局部空间信息的问题,该文结合人体行为识别研究提出了两种改进通道注意力模块,即矩阵操作的时空(ST)交互模块和深度可分离卷积(DS)模块。ST模块通过卷积和维度转换操作提取各通道时空加权信息数列,经卷积得到各通道的注意权重;DS模块首先利用深度可分离卷积获取各通道局部空间信息,然后压缩通道尺寸使其具有全局的感受野,接着通过卷积操作得到各通道注意权重,进而完成通道注意力机制下的特征重标定。将改进后的注意力模块插入基础网络并在常见的人体行为识别数据集UCF101和HDBM51上进行实验分析,实现了准确率的提升。  相似文献   

3D多支路聚合轻量网络视频行为识别算法研究   总被引:1,自引:0,他引:1       下载免费PDF全文
为构建拥有2D神经网络速度同时保持3D神经网络性能的视频行为识别模型,提出3D多支路聚合轻量网络行为识别算法.首先,利用分组卷积将神经网络分割成多个支路;其次,为促进支路间信息流动,加入具有信息聚合功能的多路复用模块;最后,引入自适应注意力机制,对通道与时空信息进行重定向.实验表明,本算法在UCF101数据集上的计算成本为11.5GFlops,准确率为96.2%;在HMDB51数据集上的计算成本为11.5GFlops,准确率为74.7%.与其他行为识别算法相比,提高了视频识别网络的效率,体现出一定识别速度和准确率优势.  相似文献   

周凡  赵轩  邵杰 《电子科技》2022,(8):7-13
功率放大器作为辐射源发射机的核心部件,其工作行为具有高非线性、强记忆性等特点,导致功率放大器的行为建模难度颇高。针对该问题,文中提出了一种基于深度时序卷积网络的功率放大器行为建模方法。该方法采用的神经网络模型由多个多维时序卷积块构成,每个时序卷积块由数个用于增加网络感受野的因果膨胀卷积以及用于提高梯度反馈效率的残差结构组成。模型通过并行卷积操作,克服了传统卷积网络无法处理可变长序列的弊端,在保留功率放大器记忆特性的同时,提高了行为建模的效率。针对实测数据的行为建模结果表明,相比于现有的Volterra级数以及循环神经网络建模方法,文中提出的方法可显著提升行为建模精度,且在行为建模效率方面,相较于循环神经网络建模方法,将实现时间降低了一个数量级。  相似文献   

针对地铁站特定场合下,人体异常行为识别无法有效利用帧间运动时间维度信息,导致人体异常行为识别准确率不高的问题,提出一种深层次残差长短期双流网络结构。将RGB帧和连续光流帧作为双流网络的输入,分别利用ResNet34提取低层特征信息,空间流网络提取运动外观特征信息,时间流网络提取光流运动信息,然后将特征信息输入长短期记忆(LSTM)网络,有效学习空间外观和光流运动的帧间关联时间信息,并且通过多种加权融合策略加强模型识别效果。最后在地铁站异常行为数据集上验证提出的网络结构,并与原双流网络进行对比,改进后的网络识别准确率提高了4.7%,融合后的模型准确率提高了12.9%。实验结果表明,所提方法能够充分利用时间维度信息,可有效提高异常行为识别准确率,在昏暗环境下仍有较好的识别效果。  相似文献   

胡正平  邱悦  翟丰鋆  赵梦瑶  毕帅 《信号处理》2021,37(8):1470-1478
视频行为识别算法在特征提取过程中,存在未聚焦视频图像显著区域信息的问题,使模型分类效果不理想.为了提高网络区别关注的能力,提出融入注意力机制的视频多尺度时序行为识别算法模型.在视频长-短时序网络中分别融入通道-空间注意力和通道注意力模块,引入注意力机制使网络在训练过程中重新分配权重,捕捉视频内容与位置兴趣点,提高网络的...  相似文献   

本文根据羊不同行为的特征,提出一种基于改进卷积神经网络的羊行为识别方法。构建卷积核尺寸全部为3×3的卷积神经网络(Convolutional Neural Networks, CNN);使用缩放指数线性单元(scaled exponential linear units,SeLU)为激活函数,使网络具有自归一化功能;以最大池化(max pooling)为下采样;在全连接层中采用丢弃(Alpha dropout)操作提高网络泛化能力,使用余弦退火动态学习率进行动态微调;进一步使用softmax分类器作为网络输出,最终构建出羊行为识别网络模型。实验结果表明:本文方法对羊进食行为识别准确率达到90.30%,站立行为识别准确率达到94.16%。坐卧行为识别准确率能达到91.90%。该模型能够实现羊不同行为的监测,且有较高的准确性,有助于提高畜牧管理效率和养殖智能化水平。  相似文献   

人体行为识别是计算机视觉和模式识别领域的研究热点之一。作为人体行为识别的一个重要分支,人体异常行为检测近年来也不断得到学界及工业界的重视。人体行为识别研究从早期的依赖人体形状特征发展到基于梯度设计的特征检测,再到当前随着神经网络的新发展,深度学习开始广泛应用于行为识别。同时由于红外波段具有适应弱光照环境、可全天候检测等优点,基于该波段的人体行为识别研究开始兴起,它也必将成为人体行为识别领域中一个新的研究热点。  相似文献   

Modeling and reasoning of the interactions between multiple entities (actors and objects) are beneficial for the action recognition task. In this paper, we propose a 3D Deformable Convolution Temporal Reasoning (DCTR) network to model and reason about the latent relationship dependencies between different entities in videos. The proposed DCTR network consists of a spatial modeling module and a temporal reasoning module. The spatial modeling module uses 3D deformable convolution to capture relationship dependencies between different entities in the same frame, while the temporal reasoning module uses Conv-LSTM to reason about the changes of multiple entity relationship dependencies in the temporal dimension. Experiments on the Moments-in-Time dataset, UCF101 dataset and HMDB51 dataset demonstrate that the proposed method outperforms several state-of-the-art methods.  相似文献   

刘桂玉  刘佩林  钱久超 《信息技术》2020,(5):121-124,130
基于3D骨架的动作识别技术现已成为人机交互的重要手段。为了提高3D动作识别的精度,文中提出一种将3D骨架特征和2D图片特征进行融合的双流神经网络。其中一个网络处理3D骨架序列,另一个网络处理2D图片。最后再将二者的特征进行融合,以提高识别精度。相较于单独使用3D骨架的动作识别,文中所使用的方法在NTU_RGBD数据集以及SYSU数据集上都有了很大的精度提升。  相似文献   

在常规的车辆目标检测中,YOLO,SSD,RCNN等深度模型都获得了较好的检测效果,但是在无人驾驶系统中,车辆的速度、方向、相对距离等因素对于系统来说十分重要,所以采用二维车辆检测对于驾驶场景的理解还远远不够。激光点云数据蕴含着丰富的三维环境信息,融合点云数据和深度网络的三维车辆检测已成为未来的发展方向。文章给出了一种基于点云网络与卷积神经网络的三维车辆检测方法,首先,使用CRC和输入尺寸有关的SDP技术来提高车辆检测的准确性;其次,采用点云网络结构(Pointnet)来处理点云数据,实现三维目标检测,研究表明设计网络结构在检测精度上有着较大的优势。  相似文献   

The impedance of a semiconducting sheet of arbitrary boundary overlaid by a gate capacitance is analyzed as a function of frequency; the contacts are the gate contact and an ohmic contact at the sheet boundary. General asymptotic expressions for low and high frequency capacitance and for high frequency resistance are given. The low frequency equivalent series resistance is characterized by a form factor which depends on the configuration of the contact and can be determined experimentally from the frequencies at which equivalent circuit parameters extrapolated from the low and high frequency regimes intersect. The theoretical solution for the impedance pertaining to a rectangular boundary is given. The dependence of the impedance on the external a.c. circuit termination to a second gate contact is discussed for the case that the semiconducting sheet is bounded by distributed gate capacitances at both sides.  相似文献   

王桐  赵昕琳 《通信学报》2014,35(Z2):8-52
现有的网络社区划分方法以社区为主体,机械地将每一个节点划分到某一个社区,在真实网络中,对于活跃度低的用户进行划分会大大降低划分精确度,同时增加时间复杂度,并具有较小的划分意义。因此,将蛙跳算法与社区划分相结合,通过对青蛙性能的排序,提取活跃度高的用户,从而提高划分精确度。实验结果表明该方法具有良好的性能。  相似文献   

Weakly supervised temporal action localization is a challenging computer vision problem that uses only video-level labels and lacks the supervision of temporal annotations. In this task, the majority of existing methods usually identify the most discriminative snippets and ignore other relevant snippets. To address this problem, we propose a deep feature enhancing and selecting network. It generates multiple masks for both capturing more complete temporal interval of actions and keeping its high classification accuracy. After that, we further propose a novel selection strategy to balance the influence of multiple masks and improve the model performance. In the experiments, we evaluate the proposed method on the THUMOS’14 and ActivityNet datasets, and the results show the effectiveness of our approach for weakly supervised temporal action localization.  相似文献   

3D skeleton sequences contain more effective and discriminative information than RGB video and are more suitable for human action recognition. Accurate extraction of human skeleton information is the key to the high accuracy of action recognition. Considering the correlation between joint points, in this work, we first propose a skeleton feature extraction method based on complex network. The relationship between human skeleton points in each frame is coded as a network. The changes of action over time are described by a time series network composed of skeleton points. Network topology attributes are used as feature vectors, complex network coding and LSTM are combined to recognize human actions. The method was verified on the NTU RGB + D60, MSR Action3D and UTKinect-Action3D dataset, and have achieved good performance, respectively. It shows that the method of extracting skeleton features based on complex network can properly identify different actions. This method that considers the temporal information and the relationship between skeletons at the same time plays an important role in the accurate recognition of human actions.  相似文献   

Weakly supervised temporal action localization (WSTAL) is crucial for real world applications, as it relieves the huge burden of frame-level annotations for fully supervised action detection. Most existing WSTAL methods focused on classifying video snippets, or detecting action boundaries. However, the predictions from these well-designed models have not been fully utilized. Accordingly, we propose a weakly-supervised framework called the progressive enhancement network (PEN), which takes full advantages of the predictions generated by the preceding models to enhance the subsequent models. Specifically, snippet-level pseudo labels are generated from the preceding predictions by considering the similarity and temporal distance between action snippets. Then subsequent models are progressively enhanced by using pseudo labels as a supervision, and utilizing their underlying semantics to make the feature representation more qualified for the temporal localization task. Extensive experiments which are carried out on two popular benchmarks, THUMOS’14 and ActivityNet v1.2, demonstrate the effectiveness of our method.  相似文献   

LiDAR-based 3D object detection is important for autonomous driving scene perception, but point clouds produced by LiDAR are irregular and unstructured in nature, and cannot be adopted by the conventional Convolutional Neural Networks (CNN). Recently, Graph Convolutional Networks (GCN) has been proved as an ideal way to handle non-Euclidean structure data, as well as for point cloud processing. However, GCN involves massive computation for searching adjacent nodes, and the heavy computational cost limits its applications in processing large-scale LiDAR point cloud in autonomous driving. In this work, we adopt a frustum-based point cloud-image fusion scheme to reduce the amount of LiDAR point clouds, thus making the GCN-based large-scale LiDAR point clouds feature learning feasible. On this basis, we propose an efficient graph attentional network to accomplish the goal of 3D object detection in autonomous driving, which can learn features from raw LiDAR point cloud directly without any conversions. We evaluate the model on the public KITTI benchmark dataset, the 3D detection mAP is 63.72% on KITTI Cars, Pedestrian and Cyclists, and the inference speed achieves 7.9 fps on a single GPU, which is faster than other methods of the same type.  相似文献   

3D Human Pose Reconstruction (HPR) is a challenging task due to less availability of 3D ground truth data and projection ambiguity. To address these limitations, we propose a three-stage deep network having the workflow of 2D Human Pose Estimation (HPE) followed by 3D HPR; which utilizes the proposed Frame Specific Pose Estimation (FSPE), Multi-Stage Cascaded Feature Connection (MSCFC) and Feature Residual Connection (FRC) Sub-level Strategies. In the first stage, the FSPE concept with the MSCFC strategy has been used for 2D HPE. In the second stage, the basic deep learning concepts like convolution, batch normalization, ReLU, and dropout have been utilized with the FRC Strategy for spatial 3D reconstruction. In the last stage, LSTM deep architecture has been used for temporal refinement. The effectiveness of the technique has been demonstrated on MPII, Human3.6M, and HumanEva-I datasets. From the experiments, it has been observed that the proposed method gives competitive results to the recent state-of-the-art techniques.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号