首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 796 毫秒
1.
We present a system architecture for domestic robots that allows them to learn object categories after one sample object was initially learned. We explore the situation in which a human teaches a robot a novel object, and the robot enhances such learning by using a large amount of image data from the Internet. The main goal of this research is to provide a robot with capabilities to enhance its learning while minimizing time and effort required for a human to train a robot. Our active learning approach consists of learning the object name using speech interface, and creating a visual object model by using a depth-based attention model adapted to the robot’s personal space. Given the object’s name (keyword), a large amount of object-related images from two main image sources (Google Images and the LabelMe website) are collected. We deal with the problem of separating good training samples from noisy images by performing two steps: (1) Similar image selection using a Simile Selector Classifier, and (2) non-real image filtering by implementing a variant of Gaussian Discriminant Analysis. After web image selection, object category classifiers are then trained and tested using different objects of the same category. Our experiments demonstrate the effectiveness of our robot learning approach.  相似文献   

2.
Keyword-based image search engines are now very popular for accessing large amounts of Web images on the Internet. Most existing keyword-based image search engines may return large amounts of junk images (which are irrelevant to the given query word), because the text terms that are loosely associated with the Web images are also used for image indexing. The objective of the proposed work is to effectively filter out the junk images from image search results. Therefore, bilingual image search results for the same keyword-based query are integrated to identify the clusters of the junk images and the clusters of the relevant images. Within relevant image clusters, the results are further refined by removing the duplications under a coarse-to-fine structure. Experiments for a large number of bilingual keyword-based queries (5,000 query words) are simultaneously performed on two keyword-based image search engines (Google Images in English and Baidu Images in Chinese), and our experimental results have shown that integrating bilingual image search results can filter out the junk images effectively.  相似文献   

3.
NeTra: A toolbox for navigating large image databases   总被引:17,自引:0,他引:17  
We present here an implementation of NeTra, a prototype image retrieval system that uses color, texture, shape and spatial location information in segmented image regions to search and retrieve similar regions from the database. A distinguishing aspect of this system is its incorporation of a robust automated image segmentation algorithm that allows object- or region-based search. Image segmentation significantly improves the quality of image retrieval when images contain multiple complex objects. Images are segmented into homogeneous regions at the time of ingest into the database, and image attributes that represent each of these regions are computed. In addition to image segmentation, other important components of the system include an efficient color representation, and indexing of color, texture, and shape features for fast search and retrieval. This representation allows the user to compose interesting queries such as “retrieve all images that contain regions that have the color of object A, texture of object B, shape of object C, and lie in the upper of the image”, where the individual objects could be regions belonging to different images. A Java-based web implementation of NeTra is available at http://vivaldi.ece.ucsb.edu/Netra.  相似文献   

4.
The explosion of the Internet provides us with a tremendous resource of images shared online. It also confronts vision researchers the problem of finding effective methods to navigate the vast amount of visual information. Semantic image understanding plays a vital role towards solving this problem. One important task in image understanding is object recognition, in particular, generic object categorization. Critical to this problem are the issues of learning and dataset. Abundant data helps to train a robust recognition system, while a good object classifier can help to collect a large amount of images. This paper presents a novel object recognition algorithm that performs automatic dataset collecting and incremental model learning simultaneously. The goal of this work is to use the tremendous resources of the web to learn robust object category models for detecting and searching for objects in real-world cluttered scenes. Humans contiguously update the knowledge of objects when new examples are observed. Our framework emulates this human learning process by iteratively accumulating model knowledge and image examples. We adapt a non-parametric latent topic model and propose an incremental learning framework. Our algorithm is capable of automatically collecting much larger object category datasets for 22 randomly selected classes from the Caltech 101 dataset. Furthermore, our system offers not only more images in each object category but also a robust object category model and meaningful image annotation. Our experiments show that OPTIMOL is capable of collecting image datasets that are superior to the well known manually collected object datasets Caltech 101 and LabelMe.  相似文献   

5.
6.
This paper investigates the problem of modeling Internet images and associated text or tags for tasks such as image-to-image search, tag-to-image search, and image-to-tag search (image annotation). We start with canonical correlation analysis (CCA), a popular and successful approach for mapping visual and textual features to the same latent space, and incorporate a third view capturing high-level image semantics, represented either by a single category or multiple non-mutually-exclusive concepts. We present two ways to train the three-view embedding: supervised, with the third view coming from ground-truth labels or search keywords; and unsupervised, with semantic themes automatically obtained by clustering the tags. To ensure high accuracy for retrieval tasks while keeping the learning process scalable, we combine multiple strong visual features and use explicit nonlinear kernel mappings to efficiently approximate kernel CCA. To perform retrieval, we use a specially designed similarity function in the embedded space, which substantially outperforms the Euclidean distance. The resulting system produces compelling qualitative results and outperforms a number of two-view baselines on retrieval tasks on three large-scale Internet image datasets.  相似文献   

7.
Automatic image annotation aims at predicting a set of semantic labels for an image. Because of large annotation vocabulary, there exist large variations in the number of images corresponding to different labels (“class-imbalance”). Additionally, due to the limitations of human annotation, several images are not annotated with all the relevant labels (“incomplete-labelling”). These two issues affect the performance of most of the existing image annotation models. In this work, we propose 2-pass k-nearest neighbour (2PKNN) algorithm. It is a two-step variant of the classical k-nearest neighbour algorithm, that tries to address these issues in the image annotation task. The first step of 2PKNN uses “image-to-label” similarities, while the second step uses “image-to-image” similarities, thus combining the benefits of both. We also propose a metric learning framework over 2PKNN. This is done in a large margin set-up by generalizing a well-known (single-label) classification metric learning algorithm for multi-label data. In addition to the features provided by Guillaumin et al. (2009) that are used by almost all the recent image annotation methods, we benchmark using new features that include features extracted from a generic convolutional neural network model and those computed using modern encoding techniques. We also learn linear and kernelized cross-modal embeddings over different feature combinations to reduce semantic gap between visual features and textual labels. Extensive evaluations on four image annotation datasets (Corel-5K, ESP-Game, IAPR-TC12 and MIRFlickr-25K) demonstrate that our method achieves promising results, and establishes a new state-of-the-art on the prevailing image annotation datasets.  相似文献   

8.
目标检测的任务是从图像中精确且高效地识别、定位出大量预定义类别的物体实例。随着深度学习的广泛应用,目标检测的精确度和效率都得到了较大提升,但基于深度学习的目标检测仍面临改进与优化主流目标检测算法的性能、提高小目标物体检测精度、实现多类别物体检测、轻量化检测模型等关键技术的挑战。针对上述挑战,本文在广泛文献调研的基础上,从双阶段、单阶段目标检测算法的改进与结合的角度分析了改进与优化主流目标检测算法的方法,从骨干网络、增加视觉感受野、特征融合、级联卷积神经网络和模型的训练方式的角度分析了提升小目标检测精度的方法,从训练方式和网络结构的角度分析了用于多类别物体检测的方法,从网络结构的角度分析了用于轻量化检测模型的方法。此外,对目标检测的通用数据集进行了详细介绍,从4个方面对该领域代表性算法的性能表现进行了对比分析,对目标检测中待解决的问题与未来研究方向做出预测和展望。目标检测研究是计算机视觉和模式识别中备受青睐的热点,仍然有更多高精度和高效的算法相继提出,未来将朝着更多的研究方向发展。  相似文献   

9.
Thumbnail images are used to display a large collection of photos in various digital devices. It aims for people to browse and search the image collection effectively. The provided thumbnail images are expressed in a much lower resolution compared to the resolution of the original image. Thus, it faces a significant problem of how to represent the content of a given image effectively in a tiny thumbnail image. Many image thumbnailing methods have been presented in literature for this purpose. However, the existing thumbnailing methods are designed to use a single method to all kinds of images, regardless of image contents. On the other hand, the proposed method employs two different thumbnail generation methods either of which is applied according to corresponding image context. To achieve this, we first classify images into two groups by detecting the object existence. Then, an ROI cropping method using a saliency map is presented for images with objects, in order to represent the important region of images in the thumbnail. Images without any interesting objects, such as landscape images, are considered to be resized by using a simple scaling method to maintain the whole image context. Experimental results show that the proposed method yields comparable performance on a variety of datasets.  相似文献   

10.
11.
光学遥感图像目标检测算法综述   总被引:4,自引:0,他引:4  
聂光涛  黄华 《自动化学报》2021,47(8):1749-1768
目标检测技术是光学遥感图像理解的基础问题, 具有重要的应用价值. 本文对遥感图像目标检测算法发展进行了梳理和分析. 首先阐述了遥感图像目标检测的特点和挑战; 之后系统总结了典型的检测方法, 包括早期的基于手工设计特征的算法和现阶段基于深度学习的方法, 对于深度学习方法首先介绍了典型的目标检测模型, 进而针对遥感图像本身的难点详细梳理了优化改进方案; 接着介绍了常用的检测数据集, 并对现有方法的性能进行比较; 最后对现阶段问题进行总结并对未来发展趋势进行展望.  相似文献   

12.
In this work we discuss the problem of automatically determining bounding box annotations for objects in images whereas we only assume weak labeling in the form of global image labels. We therefore are only given a set of positive images all containing at least one instance of a desired object and a negative set of images which represent background. Our goal is then to determine the locations of the object instances within the positive images by bounding boxes. We also describe and analyze a method for automatic bounding box annotation which consists of two major steps. First, we apply a statistical model for determining visual features which are likely to be indicative for the respective object class. Based on these feature models we infer preliminary estimations for bounding boxes. Second, we use a CCCP training algorithm for latent structured SVM in order to improve the initial estimations by using them as initializations for latent variables modeling the optimal bounding box positions. We evaluate our approach on three publicly available datasets.  相似文献   

13.
针对细粒度图像分类任务中难以对图中具有鉴别性对象进行有效学习的问题,本文提出了一种基于注意力机制的弱监督细粒度图像分类算法.该算法能有效定位和识别细粒度图像中语义敏感特征.首先在经典卷积神经网络的基础上通过线性融合特征得到对象整体信息的表达,然后通过视觉注意力机制进一步提取特征中具有鉴别性的细节部分,获得更完善的细粒度特征表达.所提算法实现了线性融合和注意力机制的结合,可看作是多网络分支合作训练共同优化的网络模型,从而让网络模型对整体信息和局部信息都有更好的表达能力.在3个公开可用的细粒度识别数据集上进行了验证,实验结果表明,所提方法有效性均优于基线方法,且达到了目前先进的分类水平.  相似文献   

14.
针对传统的流形学习算法不能对位于黎曼流形上的协方差描述子进行有效降维这一问题,本文提出一种推广的流形学习算法,即基于Log-Euclidean黎曼核的自适应半监督正交局部保持投影(Log-Euclidean Riemannian kernel-based adaptive semi-supervised orthogonal locality preserving projection,LRK-ASOLPP),并将其成功用于高分辨率遥感影像目标分类问题.首先,提取图像每个像素点处的几何结构特征,计算图像特征的协方差描述子;其次,通过采用Log-Euclidean黎曼核将协方差描述子投影到再生核Hilbert空间;然后,基于流形学习理论,建立黎曼流形上半监督正交局部保持投影算法模型,利用交替迭代更新算法对目标函数进行优化求解,同时获得相似性权矩阵和低维投影矩阵;最后,利用求得的低维投影矩阵计算测试样本的低维投影,并用K—近邻、支持向量机(Support victor machine,SVM)等分类器对其进行分类.三个高分辨率遥感影像数据集上的实验结果说明了该算法的有效性与可行性.  相似文献   

15.

Deep learning proved its efficiency in many fields of computer science such as computer vision, image classifications, object detection, image segmentation, and more. Deep learning models primarily depend on the availability of huge datasets. Without the existence of many images in datasets, different deep learning models will not be able to learn and produce accurate models. Unfortunately, several fields don't have access to large amounts of evidence, such as medical image processing. For example. The world is suffering from the lack of COVID-19 virus datasets, and there is no benchmark dataset from the beginning of 2020. This pandemic was the main motivation of this survey to deliver and discuss the current image data augmentation techniques which can be used to increase the number of images. In this paper, a survey of data augmentation for digital images in deep learning will be presented. The study begins and with the introduction section, which reflects the importance of data augmentation in general. The classical image data augmentation taxonomy and photometric transformation will be presented in the second section. The third section will illustrate the deep learning image data augmentation. Finally, the fourth section will survey the state of the art of using image data augmentation techniques in the different deep learning research and application.

  相似文献   

16.
Similarity search and content-based retrieval have become widely used in multimedia database systems that often manage huge data collections. Unfortunately, many effective content-based similarity models cannot be fully utilized for larger datasets, as they are computationally demanding and require massive parallel processing for both feature extraction and query evaluation tasks. In this work, we address the performance issues of effective similarity models based on feature signatures, where we focus on fast feature extraction from image thumbnails using affordable hardware. More specifically, we propose a multi-GPU implementation that increases the extraction speed by two orders of magnitude with respect to a single-threaded CPU implementation. Since the extraction algorithm is not directly parallelizable, we propose a modification of the algorithm embracing the SIMT execution model. We have experimentally verified that our GPU extractor can be successfully used to index large image datasets comprising millions of images. In order to obtain optimal extraction parameters, we employed the GPU extractor in an extensive empirical investigation of the parameter space. The experimental results are discussed from the perspectives of both performance and similarity precision.  相似文献   

17.
Most existing visual saliency analysis algorithms assume that the input image is clean and does not have any disturbances. However, this situation is not always the case. In this paper, we provide an extensive evaluation of visual saliency analysis algorithms in noisy images. We analyze the noise immunity of saliency analysis algorithms by evaluating the performances of the algorithms in noisy images with increasing noise scales and by studying the effects of applying different denoising methods before performing saliency analysis. We use 10 state-of-the-art saliency analysis algorithms and 7 typical image denoising methods on 4 eye fixation datasets and 2 salient object detection datasets. Our experiments show that the performances of saliency analysis algorithms decrease with increasing image noise scales in general. An exception is that the nonlinear features (NF) integrated algorithm shows good noise immunity. We also find that image denoising methods can greatly improve the noise immunity of the algorithms. Our results show that the combination of NF and Median denoising method works best on eye fixation datasets and the combination of saliency optimization (SO) and color block-matching and 3D filtering (C-BM3D) method works best on salient object detection datasets. The combination of SO and Average denoising method works best for applications wherein time efficiency is a major concern for both types of datasets.  相似文献   

18.
Images containing rip channels are used in oceanographic studies and can be preprocessed for these studies by identifying which regions of the image contain rip channels. For thousands of images, this process can become cumbersome. In recent years, object detection has become a successful approach for identifying regions of an image. There are several different algorithms for detecting objects from images, however, there is no guidance as to which algorithm works well for detecting rip channels. This paper aims to compare and explore state-of-the-art machine learning algorithms, including the Viola–Jones algorithm, convolution neural networks, and a meta-learner on a dataset of rip channel images. Along with the comparison, another objective is to find suitable features for rip channels and to implement the meta-classifier for competition with the state of the art. The comparison suggests the meta-classifier is the most promising detection model. In addition, five new Haar features are found to successfully supplement the original Haar feature set. The final comparison of these models will help guide researchers when choosing an appropriate model for rip channel detection, the new Haar features provide researchers with valuable data for detecting rip channels, and the meta-classifier provides a method for increasing the accuracy of a detector through classifier stacking.  相似文献   

19.
目的 随着公共安全领域中大规模图像监控及视频数据的增长以及智能交通的发展,车辆检索有着极其重要的应用价值。针对已有车辆检索中自动化和智能化水平低、难以获取精确的检索结果等问题,提出一种多任务分段紧凑特征的车辆检索方法,有效利用车辆基本信息的多样性和关联性实现实时检索。方法 首先,利用相关任务之间的联系提高检索精度和细化图像特征,因此构造了一种多任务深度卷积网络分段学习车辆不同属性的哈希码,将图像语义和图像表示相结合,并采用最小化图像编码使学习到的车辆的不同属性特征更具有鲁棒性;然后,选用特征金字塔网络提取车辆图像的实例特征并利用局部敏感哈希再排序方法对提取到的特征进行检索;最后,针对无法获取查询车辆目标图像的特殊情况,采用跨模态辅助检索方法进行检索。结果 提出的检索方法在3个公开数据集上均优于目前主流的检索方法,其中在CompCars数据集上检索精度达到0.966,在VehicleID数据集上检索精度提升至0.862。结论 本文提出的多任务分段紧凑特征的车辆检索方法既能得到最小化图像编码及图像实例特征,还可在无法获取目标检索图像信息时进行跨模态检索,通过实验对比验证了方法的有效性。  相似文献   

20.
目的 在传统的词袋模型图像搜索问题中,许多工作致力于提高局部特征的辨识能力。图像搜索得到的图像在细节部分和查询图像相似,但是有时候这些图像在语义层面却差别很大。而基于全局特征的图像搜索在细节部分丢失了很多信息,致使布局相似实则不相关的图像被认为是相关图像。为了解决这个问题,本文利用深度卷积特征来构建一个动态匹配核函数。方法 利用这个动态匹配核函数,在鼓励相关图像之间产生匹配对的同时,抑制不相关图像之间匹配对的个数。该匹配核函数将图像在深度卷积神经网络全连接层最后一层特征作为输入,构建一个动态匹配核函数。对于相关图像,图像之间的局部特征匹配数量和质量都会相对增强。反之,对于不相关的图像,这个动态匹配核函数会在减少局部特征匹配的同时,降低其匹配得分。结果 从数量和质量上评估了提出的动态匹配核函数,提出了两个指标来量化匹配核函数的表现。基于这两个指标,本文对中间结果进行了分析,证实了动态匹配核函数相比于静态匹配核函数的优越性。最后,本文在5个公共数据集进行了大量的实验,在对各个数据集的检索工作中,得到的平均准确率从85.11%到98.08%,均高于此领域的同类工作。结论 实验结果表明了本文方法是有效的,并且其表现优于当前这一领域的同类工作。本文方法相比各种深度学习特征提取方法具有一定优势,由于本文方法使用特征用于构建动态匹配内核,而不是粗略编码进行相似性匹配,因此能在所有数据集上获得更好的性能。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号