首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 406 毫秒
1.
Using computer vision and deep learning (e.g., Convolutional Neural Networks) to automatically recognise unsafe behaviour from digital images can help managers identify and respond quickly to such actions and mitigate an adverse event. However, there has been a tendency for computer vision studies in construction to focus solely on detecting unsafe behaviour (i.e., object detection) or the regions of interest with pre-defined labels. Moreover, such approaches have been unable to consider rich semantic information among multiple unsafe actions in a digital image. The research we present in this paper uses a safety rule query to determine and locate several unsafe behaviours in a digital image by employing a visual grounding approach. Our approach consists of: (1) visual and text feature extraction, (2) recursive sub-query, and (3) generation of the bounding box. We validate our approach by conducting an experiment to demonstrate it is effectiveness. The results from an experimental study demonstrate an average precision, recall, and F1-score were 0.55, 0.85, and 0.65, respectively, suggesting our approach can accurately identify and locate different types of unsafe behaviours from digital images acquired from a construction site.  相似文献   

2.
Predicting unsafe behaviour in advance can enable remedial measures to be put in place to mitigate likely accidents on construction sites. Prevailing safety studies in construction tend to be retrospective and focus on examining the conditions that contribute to unsafe behaviour from a psychological perspective. While such studies are warranted, they can also not visually comprehend the dynamic and complex conditions that influence unsafe behaviour. In this paper, we aim to contribute to filling this void and, in doing so, combine computer vision with Long-Short Term Memory (LSTM) to predict unsafe behaviours from videos automatically. Our proposed approach for predicting unsafe behaviour is based on: (1) tracking people using a SiamMask; (2) predicting the trajectory of people using an improved Social-LSTM; and (3) predicting unsafe behaviour using Franklin's point inclusion polygon (PNPoly) algorithm. We use the Wuhan metro project as a case to evaluate our approach’s feasibility and effectiveness. Our adopted SiamMask method outperforms current techniques used for tracking people. Additionally, our improved Social-LSTM can achieve higher accuracy on trajectory prediction than other methods (e.g., Social-GAN). The research findings demonstrate that our developed computer vision approach can be used to accurately predict unsafe behaviour on construction sites.  相似文献   

3.
Recent advances in the field of computer vision can be attributed to the emergence of deep learning techniques, in particular convolutional neural networks. Neural networks, partially inspired by the brain's visual cortex, enable a computer to “learn” the most important features of the images it is shown in relation to a specific, specified task. Given sufficient data and time, (deep) convolutional neural networks offer more easily designed, more generalizable, and significantly more accurate end‐to‐end systems than is possible with previously employed computer vision techniques. This review paper seeks to provide an overview of deep learning in the field of computer vision with an emphasis on recent progress in tasks involving 3D visual data. Through a backdrop of the mammalian visual processing system, we hope to also provide inspiration for future advances in automated visual processing.  相似文献   

4.
Stratified 3D reconstruction, or a layer-by-layer 3D reconstruction upgraded from projective to affine, then to the final metric reconstruction, is a well-known 3D reconstruction method in computer vision. It is also a key supporting technology for various well-known applications, such as streetview, smart3D, oblique photogrammetry. Generally speaking, the existing computer vision methods in the literature can be roughly classified into either the geometry-based approaches for spatial vision or the learning-based approaches for object vision. Although deep learning has demonstrated tremendous success in object vision in recent years, learning 3D scene reconstruction from multiple images is still rare, even not existent, except for those on depth learning from single images. This study is to explore the feasibility of learning the stratified 3D reconstruction from putative point correspondences across images, and to assess whether it could also be as robust to matching outliers as the traditional geometry-based methods do. In this study, a special parsimonious neural network is designed for the learning. Our results show that it is indeed possible to learn a stratified 3D reconstruction from noisy image point correspondences, and the learnt reconstruction results appear satisfactory although they are still not on a par with the state-of-the-arts in the structure-from-motion community due to largely its lack of an explicit robust outlier detector such as random sample consensus (RANSAC). To the best of our knowledge, our study is the first attempt in the literature to learn 3D scene reconstruction from multiple images. Our results also show that how to implicitly or explicitly integrate an outlier detector in learning methods is a key problem to solve in order to learn comparable 3D scene structures to those by the current geometry-based state-of-the-arts. Otherwise any significant advancement of learning 3D structures from multiple images seems difficult, if not impossible. Besides, we even speculate that deep learning might be, in nature, not suitable for learning 3D structure from multiple images, or more generally, for solving spatial vision problems.  相似文献   

5.
Construction workplace hazard detection requires engineers to analyze scenes manually against many safety rules, which is time-consuming, labor-intensive, and error-prone. Computer vision algorithms are yet to achieve reliable discrimination of anomalous and benign object relations underpinning safety violation detections. Recently developed deep learning-based computer vision algorithms need tens of thousands of images, including labels of the safety rules violated, in order to train deep-learning networks for acquiring spatiotemporal reasoning capacity in complex workplaces. Such training processes need human experts to label images and indicate whether the relationship between the worker, resource, and equipment in the scenes violate spatiotemporal arrangement rules for safe and productive operations. False alarms in those manual labels (labeling no-violation images as having violations) can significantly mislead the machine learning process and result in computer vision models that produce inaccurate hazard detections. Compared with false alarms, another type of mislabels, false negatives (labeling images having violations as “no violations”), seem to have fewer impacts on the reliability of the trained computer vision models.This paper examines a new crowdsourcing approach that achieves above 95% accuracy in labeling images of complex construction scenes having safety-rule violations, with a focus on minimizing false alarms while keeping acceptable rates of false negatives. The development and testing of this new crowdsourcing approach examine two fundamental questions: (1) How to characterize the impacts of a short safety-rule training process on the labeling accuracy of non-professional image annotators? And (2) How to properly aggregate the image labels contributed by ordinary people to filter out false alarms while keeping an acceptable false negative rate? In designing short training sessions for online image annotators, the research team split a large number of safety rules into smaller sets of six. An online image annotator learns six safety rules randomly assigned to him or her, and then labels workplace images as “no violation” or ‘violation” of certain rules among the six learned by him or her. About one hundred and twenty anonymous image annotators participated in the data collection. Finally, a Bayesian-network-based crowd consensus model aggregated these labels from annotators to obtain safety-rule violation labeling results. Experiment results show that the proposed model can achieve close to 0% false alarm rates while keeping the false negative rate below 10%. Such image labeling performance outdoes existing crowdsourcing approaches that use majority votes for aggregating crowdsourced labels. Given these findings, the presented crowdsourcing approach sheds lights on effective construction safety surveillance by integrating human risk recognition capabilities into advanced computer vision.  相似文献   

6.

Deep reinforcement learning augments the reinforcement learning framework and utilizes the powerful representation of deep neural networks. Recent works have demonstrated the remarkable successes of deep reinforcement learning in various domains including finance, medicine, healthcare, video games, robotics, and computer vision. In this work, we provide a detailed review of recent and state-of-the-art research advances of deep reinforcement learning in computer vision. We start with comprehending the theories of deep learning, reinforcement learning, and deep reinforcement learning. We then propose a categorization of deep reinforcement learning methodologies and discuss their advantages and limitations. In particular, we divide deep reinforcement learning into seven main categories according to their applications in computer vision, i.e. (i) landmark localization (ii) object detection; (iii) object tracking; (iv) registration on both 2D image and 3D image volumetric data (v) image segmentation; (vi) videos analysis; and (vii) other applications. Each of these categories is further analyzed with reinforcement learning techniques, network design, and performance. Moreover, we provide a comprehensive analysis of the existing publicly available datasets and examine source code availability. Finally, we present some open issues and discuss future research directions on deep reinforcement learning in computer vision.

  相似文献   

7.
There is a tendency for accidents and even fatalities to arise when people enter hazardous work areas during the construction of projects in urban areas. A limited amount of research has been devoted to developing vision-based proximity warning systems that can determine when people enter a hazardous area automatically. Such systems, however, are unable to identify specific hazards and the status of a piece of plant (e.g., excavator) in real-time. In this paper, we address this limitation and develop a real-time smart video surveillance system that can detect people and the status of plant (i.e. moving or stationary) in a hazardous area. The application of this approach is demonstrated during the construction of a mega-project, the Wuhan Rail Transit System in China. We reveal that our combination of computer vision and deep learning can accurately recognize people in a hazardous work area in real-time during the construction of transport projects. Our developed systems can provide instant feedback concerning unsafe behavior and thus enable appropriate actions to be put in place to prevent their re-occurrence.  相似文献   

8.
For construction safety and health, continuous monitoring of unsafe conditions and action is essential in order to eliminate potential hazards in a timely manner. As a robust and automated means of field observation, computer vision techniques have been applied for the extraction of safety related information from site images and videos, and regarded as effective solutions complementary to current time-consuming and unreliable manual observational practices. Although some research efforts have been directed toward computer vision-based safety and health monitoring, its application in real practice remains premature due to a number of technical issues and research challenges in terms of reliability, accuracy, and applicability. This paper thus reviews previous attempts in construction applications from both technical and practical perspectives in order to understand the current status of computer vision techniques, which in turn suggests the direction of future research in the field of computer vision-based safety and health monitoring. Specifically, this paper categorizes previous studies into three groups—object detection, object tracking, and action recognition—based on types of information required to evaluate unsafe conditions and acts. The results demonstrate that major research challenges include comprehensive scene understanding, varying tracking accuracy by camera position, and action recognition of multiple equipment and workers. In addition, we identified several practical issues including a lack of task-specific and quantifiable metrics to evaluate the extracted information in safety context, technical obstacles due to dynamic conditions at construction sites and privacy issues. These challenges indicate a need for further research in these areas. Accordingly, this paper provides researchers insights into advancing knowledge and techniques for computer vision-based safety and health monitoring, and offers fresh opportunities and considerations to practitioners in understanding and adopting the techniques.  相似文献   

9.

Deep learning proved its efficiency in many fields of computer science such as computer vision, image classifications, object detection, image segmentation, and more. Deep learning models primarily depend on the availability of huge datasets. Without the existence of many images in datasets, different deep learning models will not be able to learn and produce accurate models. Unfortunately, several fields don't have access to large amounts of evidence, such as medical image processing. For example. The world is suffering from the lack of COVID-19 virus datasets, and there is no benchmark dataset from the beginning of 2020. This pandemic was the main motivation of this survey to deliver and discuss the current image data augmentation techniques which can be used to increase the number of images. In this paper, a survey of data augmentation for digital images in deep learning will be presented. The study begins and with the introduction section, which reflects the importance of data augmentation in general. The classical image data augmentation taxonomy and photometric transformation will be presented in the second section. The third section will illustrate the deep learning image data augmentation. Finally, the fourth section will survey the state of the art of using image data augmentation techniques in the different deep learning research and application.

  相似文献   

10.
医学影像的诊断是许多临床决策的基础,而医学影像的智能分析是医疗人工智能的重要组成部分。与此同时,随着越来越多3D空间传感器的兴起和普及,3D计算机视觉正变得越发重要。本文关注医学影像分析和3D计算机的交叉领域,即医学3D计算机视觉或医学3D视觉。本文将医学3D计算机视觉系统划分为任务、数据和表征3个层面,并结合最新文献呈现这3个层面的研究进展。在任务层面,介绍医学3D计算机视觉中的分类、分割、检测、配准和成像重建,以及这些任务在临床诊断和医学影像分析中的作用和特点。在数据层面,简要介绍了医学3D数据中最重要的数据模态:包括计算机断层成像(computed tomography,CT)、磁共振成像(magnetic resonance imaging,MRI)、正电子放射断层成像(positron emission tomography,PET)等,以及一些新兴研究提出的其他数据格式。在此基础上,整理了医学3D计算机视觉中重要的研究数据集,并标注其数据模态和主要视觉任务。在表征层面,介绍并讨论了2D网络、3D网络和混合网络在医学3D数据的表征学习上的优缺点。此外,针对医学影像中普遍存在的小数据问题,重点讨论了医学3D数据表征学习中的预训练问题。最后,总结了目前医学3D计算机视觉的研究现状,并指出目前尚待解决的研究挑战、问题和方向。  相似文献   

11.
Several prototype vision-based approaches have been developed to capture and recognize unsafe behavior in construction automatically. Vision-based approaches have been difficult to use due to their inability to identify individuals who commit unsafe acts when captured using digital images/video. To address this problem, we applied a novel deep learning approach that utilizes a Spatial and Temporal Attention Pooling Network to remove redundant information contained in a video to enable a person’s identity to be automatically determined. The deep learning approach we have adopted focuses on: (1) extracting spatial feature maps using the spatial attention network; (2) extracting temporal information using the temporal attention networks; and (3) recognizing a person’s identity by computing the distance between features. To validate the feasibility and effectiveness of the adopted deep learning approach, we created a database of videos that contained people performing their work on construction sites, conducted an experiment, and then performed k-fold cross-validation. The results demonstrated that the approach could accurately identify a person’s identity from videos captured from construction sites. We suggest that our computer-vision approach can potentially be used by site managers to automatically recognize those individuals that engage in unsafe behavior and therefore be used to provide instantaneous feedback about their actions and possible consequences.  相似文献   

12.
13.
Earthwork operations are crucial parts of most construction projects. Heavy construction equipment and workers are often required to work in limited workspaces simultaneously. Struck-by accidents resulting from poor worker and equipment interactions account for a large proportion of accidents and fatalities on construction sites. The emerging technologies based on computer vision and artificial intelligence offer an opportunity to enhance construction safety through advanced monitoring utilizing site cameras. A crucial pre-requisite to the development of safety monitoring applications is the ability to identify accurately and localize the position of the equipment and its critical components in 3D space. This study proposes a workflow for excavator 3D pose estimation based on deep learning using RGB images. In the proposed workflow, an articulated 3D digital twin of an excavator is used to generate the necessary data for training a 3D pose estimation model. In addition, a method for generating hybrid datasets (simulation and laboratory) for adapting the 3D pose estimation model for various scenarios with different camera parameters is proposed. Evaluations prove the capability of the workflow in estimating the 3D pose of excavators. The study concludes by discussing the limitations and future research opportunities.  相似文献   

14.
15.
基于深度卷积特征的细粒度图像分类研究综述   总被引:1,自引:0,他引:1  
罗建豪  吴建鑫 《自动化学报》2017,43(8):1306-1318
细粒度图像分类问题是计算机视觉领域一项极具挑战的研究课题,其目标是对子类进行识别,如区分不同种类的鸟.由于子类别间细微的类间差异和较大的类内差异,传统的分类算法不得不依赖于大量的人工标注信息.近年来,随着深度学习的发展,深度卷积神经网络为细粒度图像分类带来了新的机遇.大量基于深度卷积特征算法的提出,促进了该领域的快速发展.本文首先从该问题的定义以及研究意义出发,介绍了细粒度图像分类算法的发展现状.之后,从强监督与弱监督两个角度对比分析了不同算法之间的差异,并比较了这些算法在常用数据集上的性能表现.最后,我们对这些算法进行了总结,并讨论了该领域未来可能的研究方向及其面临的挑战.  相似文献   

16.
近年来,深度学习理论与应用技术获得了快速发展,其在计算机视觉中的应用日益广泛和深入,在诸多计算机视觉任务中取得了受人注目的成绩,给现有的计算机视觉教学内容带来了不容忽视的影响。在总结深度学习理论在计算机视觉各方面应用现状的基础上,提出计算机视觉教学内容的适应性革新,将深度学习理论融入计算机视觉教学中,更好地体现相关学科理论发展对计算机视觉教学内容变革的促进作用。  相似文献   

17.
18.
Computational Visual Media - Researchers have achieved great success in dealing with 2D images using deep learning. In recent years, 3D computer vision and geometry deep learning have gained ever...  相似文献   

19.
In recent years, computer vision finds wide applications in maritime surveillance with its sophisticated algorithms and advanced architecture. Automatic ship detection with computer vision techniques provide an efficient means to monitor as well as track ships in water bodies. Waterways being an important medium of transport require continuous monitoring for protection of national security. The remote sensing satellite images of ships in harbours and water bodies are the image data that aid the neural network models to localize ships and to facilitate early identification of possible threats at sea. This paper proposes a deep learning based model capable enough to classify between ships and no-ships as well as to localize ships in the original images using bounding box technique. Furthermore, classified ships are again segmented with deep learning based auto-encoder model. The proposed model, in terms of classification, provides successful results generating 99.5% and 99.2% validation and training accuracy respectively. The auto-encoder model also produces 85.1% and 84.2% validation and training accuracies. Moreover the IoU metric of the segmented images is found to be of 0.77 value. The experimental results reveal that the model is accurate and can be implemented for automatic ship detection in water bodies considering remote sensing satellite images as input to the computer vision system.  相似文献   

20.
雨天会影响室外图像捕捉的质量,进而引起户外视觉任务性能下降。基于深度学习的单幅图像去雨研究因算法性能优越而引起了大家的关注,并且聚焦点集中在数据集的质量、图像去雨方法、单幅图像去雨后续高层任务的研究和性能评价指标等方面。为了方便研究者快速全面了解该领域,本文从上述4个方面综述了基于深度学习的单幅图像去雨的主流文献。依据数据集的构建方式将雨图数据集分为4类:基于背景雨层简单加和、背景雨层复杂融合、生成对抗网络 (generative adversarial network,GAN)数据驱动合成的数据集,以及半自动化采集的真实数据集。依据任务场景、采取的学习机制以及网络设计对主流算法分类总结。综述了面向单任务和联合任务的去雨算法,单任务即雨滴、雨纹、雨雾和暴雨的去除;联合任务即雨滴和雨纹、所有噪声去除。综述了学习机制和网络构建方式(比如:卷积神经网络 (convolutional neural network,CNN)结构多分支组合,GAN的生成结构,循环和多阶段结构,多尺度结构,编解码结构,基于注意力,基于Transformer)以及数据模型双驱动的构建方式。综述了单幅图像去雨后续高层任务的研究文献和图像去雨算法性能的评价指标。通过合成数据集和真实数据集上的综合实验对比,证实了领域知识隐式引导网络构建可以有效提升算法性能,领域知识显式引导正则化网络的学习有潜力进一步提升算法的泛化性。最后,指出单幅图像去雨工作目前面临的挑战和未来的研究方向。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号