In recent years, one of the most popular techniques in the computer vision community has been the deep learning technique. As a data-driven technique, deep model requires enormous amounts of accurately labelled training data, which is often inaccessible in many real-world applications. A data-space solution is Data Augmentation (DA), that can artificially generate new images out of original samples. Image augmentation strategies can vary by dataset, as different data types might require different augmentations to facilitate model training. However, the design of DA policies has been largely decided by the human experts with domain knowledge, which is considered to be highly subjective and error-prone. To mitigate such problem, a novel direction is to automatically learn the image augmentation policies from the given dataset using Automated Data Augmentation (AutoDA) techniques. The goal of AutoDA models is to find the optimal DA policies that can maximize the model performance gains. This survey discusses the underlying reasons of the emergence of AutoDA technology from the perspective of image classification. We identify three key components of a standard AutoDA model: a search space, a search algorithm and an evaluation function. Based on their architecture, we provide a systematic taxonomy of existing image AutoDA approaches. This paper presents the major works in AutoDA field, discussing their pros and cons, and proposing several potential directions for future improvements.
相似文献Deep reinforcement learning augments the reinforcement learning framework and utilizes the powerful representation of deep neural networks. Recent works have demonstrated the remarkable successes of deep reinforcement learning in various domains including finance, medicine, healthcare, video games, robotics, and computer vision. In this work, we provide a detailed review of recent and state-of-the-art research advances of deep reinforcement learning in computer vision. We start with comprehending the theories of deep learning, reinforcement learning, and deep reinforcement learning. We then propose a categorization of deep reinforcement learning methodologies and discuss their advantages and limitations. In particular, we divide deep reinforcement learning into seven main categories according to their applications in computer vision, i.e. (i) landmark localization (ii) object detection; (iii) object tracking; (iv) registration on both 2D image and 3D image volumetric data (v) image segmentation; (vi) videos analysis; and (vii) other applications. Each of these categories is further analyzed with reinforcement learning techniques, network design, and performance. Moreover, we provide a comprehensive analysis of the existing publicly available datasets and examine source code availability. Finally, we present some open issues and discuss future research directions on deep reinforcement learning in computer vision.
相似文献The major barrier while using deep learning models is lack of large number of images in the training dataset. In fact, there is a need of thousands of images in each image categories based on the complexity of problem. Prior studies have shown that picture augmentation techniques can be used to enhance the number of images in a training dataset artificially. These techniques can aid in improving the overall learning process and performance of a deep learning model. Hence, to address this problem we have proposed three algorithms. Firstly, two image acquisition algorithms have been proposed to systematically obtain real field images for testing and images from public datasets for training a model. Secondly, an algorithm is proposed to describe the procedure how the augmentations can be applied to enhance the datasets. During this study, we have investigated 52 augmentations that can allow enhancing the size of input dataset by improving the quantity of images. To perform the classification process of four maize crop diseases, a new convolutional neural network model is developed and several experiments have been performed to prove its effectiveness. Firstly, two tests were carried out using the original dataset from Kaggle public repository and the augmented dataset. When compared with the original dataset, the model improved by 5.14% with the augmented dataset. Secondly, three experiments carried out to evaluate the performance of proposed augmentation method. Experimental results demonstrated that the proposed approach outperforms the existing three approaches by 27.38%, 3.14%, and 1.34% during the classification process. The proposed IPA augmentation method has been compared with six existing methods: Full Stage Data Augmentation Framework, LeafGAN, Novel Augmentation method based on GAN, Wasserstein Generative Adversarial Network (WGAN), Activation Reconstruction-GAN, and Step-by-Step Data Augmentation Method and experimental results show that performance is better than existing methods by 28.31%, 19.76%, 20.18%, 13.75%, 2.42%, and 12.68% respectively.
相似文献Person Re-Identification (person re-ID) is an image retrieval task which identifies the same person in different camera views. Generally, a good person re-ID model requires a large dataset containing over 100000 images to reduce the risk of over-fitting. Most current handcrafted person re-ID datasets, however, are insufficient for training a learning model with high generalization ability. In addition, the lacking of images with various levels of occlusion is still remaining in most existing datasets. Motivated by these two problems, this paper proposes a new data augmentation method called Random Linear Interpolation that can enlarge the sizes of person re-ID datasets and improve the generalization ability of the learning model. The key enabler of our approach is generating fused images by interpolating pairs of original images. In other words, the innovation of the proposed approach is considering data augmentation between two random samples. Plenty of experimental results demonstrates that the proposed method is effective to improve baseline models. On Market1501 and DukeMTMC-reID datasets, our approach can achieve 92.71% and 82.19% rank-1 accuracy, respectively.
相似文献Deep learning has been extensively researched in the field of document analysis and has shown excellent performance across a wide range of document-related tasks. As a result, a great deal of emphasis is now being placed on its practical deployment and integration into modern industrial document processing pipelines. It is well known, however, that deep learning models are data-hungry and often require huge volumes of annotated data in order to achieve competitive performances. And since data annotation is a costly and labor-intensive process, it remains one of the major hurdles to their practical deployment. This study investigates the possibility of using active learning to reduce the costs of data annotation in the context of document image classification, which is one of the core components of modern document processing pipelines. The results of this study demonstrate that by utilizing active learning (AL), deep document classification models can achieve competitive performances to the models trained on fully annotated datasets and, in some cases, even surpass them by annotating only 15–40% of the total training dataset. Furthermore, this study demonstrates that modern AL strategies significantly outperform random querying, and in many cases achieve comparable performance to the models trained on fully annotated datasets even in the presence of practical deployment issues such as data imbalance, and annotation noise, and thus, offer tremendous benefits in real-world deployment of deep document classification models. The code to reproduce our experiments is publicly available at https://github.com/saifullah3396/doc_al.
相似文献