首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 171 毫秒
1.
Finding people in pictures presents a particularly difficult object recognition problem. We show how to find people by finding candidate body segments, and then constructing assemblies of segments that are consistent with the constraints on the appearance of a person that result from kinematic properties. Since a reasonable model of a person requires at least nine segments, it is not possible to inspect every group, due to the huge combinatorial complexity.We propose two approaches to this problem. In one, the search can be pruned by using projected versions of a classifier that accepts groups corresponding to people. We describe an efficient projection algorithm for one popular classifier, and demonstrate that our approach can be used to determine whether images of real scenes contain people.The second approach employs a probabilistic framework, so that we can draw samples of assemblies, with probabilities proportional to their likelihood, which allows to draw human-like assemblies more often than the non-person ones. The main performance problem is in segmentation of images, but the overall results of both approaches on real images of people are encouraging.  相似文献   

2.
互联网上含有大量多字体混合、形变、拉伸、左右结构字形、倾斜畸变等复杂场景下的敏感文字图片,在处理相关图片过程中存在特征提取难、识别率低的问题.本文提出基于空间变换网络与密集神经网络的方法对图片敏感文字进行特征提取与变换矫正,使用了深层双向GRU网络与CTC时域连接网络对序列特征信息进行标记预测,序列化处理文本的方式可较好地提升距离较宽文字与模糊文字信息的处理能力.实验结果表明,本模型在Caffe-OCR中文合成数据集和CTW数据集中分别实现了87.0%和90.3%识别准确率,平均识别时间达到了26.3 ms/图.  相似文献   

3.
In this paper we show how surface orientation information inferred using shape-from-shading can be used to aid the process of fitting a 3D morphable model to an image of a face. We consider the problem of model dominance and show how shading constraints can be used to refine morphable model shape estimates, offering the possibility of exceeding the maximum possible accuracy of the model. We use this observation to motivate an optimisation scheme based on surface normal error. This ensures the fullest possible use of the information conveyed by the shading in an image. Moreover, our framework allows estimation of per-vertex albedo and bump maps which are not constrained to lie within the span of the model. This means the recovered model is capable of describing shape and reflectance phenomena not present in the training set. We show reconstruction and synthesis results and demonstrate that the shape and albedo estimates can be used for illumination insensitive recognition using only a single gallery image.  相似文献   

4.
We generalise the notion of a bubble beyond the financial domain, by showing how a single social mechanism, based on an information feedback-loop, explains both financial bubbles and other seemingly disparate social phenomena, such as the recognition of academic articles, website popularity, and the spread of rumours.

We discuss examples of phenomena explained by this bubble mechanism, as well as other phenomena that exhibit certain bubble characteristics, yet are not bubbles according to our model. Finally, we present mathematical mechanisms for two phenomena that conform with our model, and show by computer simulation how they exhibit bubble behaviour.  相似文献   

5.
The aim in this paper is to use principal geodesic analysis to model the statistical variations for sets of facial needle maps. We commence by showing how to represent the distribution of surface normals using the exponential map. Shape deformations are described using principal geodesic analysis on the exponential map. Using ideas from robust statistics we show how this deformable model may be fitted to facial images in which there is significant self-shadowing. Moreover, we demonstrate that the resulting shape-from-shading algorithm can be used to recover accurate facial shape and albedo from real world images. In particular, the algorithm can effectively fill-in the facial surface when more than 30% of its area is subject to self-shadowing. To investigate the utility of the shape parameters delivered by the method, we conduct experiments with illumination insensitive face recognition. We present a novel recognition strategy in which similarity is measured in the space of the principal geodesic parameters. We also use the recovered shape information to generate illumination normalized prototype images on which recognition can be performed. Finally we show that, from a single input image, we are able to generate the basis images employed by a number of well known illumination-insensitive recognition algorithms. We also demonstrate that the principal geodesics provide an efficient parameterization of the space of harmonic basis images.  相似文献   

6.
Social relation analysis via images is a new research area that has attracted much interest recently. As social media usage increases, a wide variety of information can be extracted from the growing number of consumer photos shared online, such as the category of events captured or the relationships between individuals in a given picture. Family is one of the most important units in our society, thus categorizing family photos constitutes an essential step toward image-based social analysis and content-based retrieval of consumer photos. We propose an approach that combines multiple unique and complimentary cues for recognizing family photos. The first cue analyzes the geometric arrangement of people in the photograph, which characterizes scene-level information with efficient yet discriminative capability. The second cue models facial appearance similarities to capture and quantify relevant pairwise relations between individuals in a given photo. The last cue investigates the semantics of the context in which the photo was taken. Experiments on a dataset containing thousands of family and non-family pictures collected from social media indicate that each individual model produces good recognition results. Furthermore, a combined approach incorporating appearance, geometric and semantic features significantly outperforms the state of the art in this domain, achieving 96.7% classification accuracy.  相似文献   

7.
Conversational Actions and Discourse Situations   总被引:2,自引:0,他引:2  
We use the idea that actions performed in a conversation become part of the common ground as the basis for a model of context that reconciles in a general and systematic fashion the differences between the theories of discourse context used for reference resolution, intention recognition, and dialogue management. We start from the treatment of anaphoric accessibility developed in discourse representation theory (DRT), and we show first how to obtain a discourse model that, while preserving DRT's basic ideas about referential accessibility, includes information about the occurrence of speech acts and their relations. Next, we show how the different kinds of 'structure' that play a role in conversation—discourse segmentation, turn‐taking, and grounding—can be formulated in terms of information about speech acts, and use this same information as the basis for a model of the interpretation of fragmentary input.  相似文献   

8.
Visualizers, like logicians, have long been concerned with meaning. Generalizing from MacEachren's overview of cartography, visualizers have to think about how people extract meaning from pictures (psychophysics), what people understand from a picture (cognition), how pictures are imbued with meaning (semiotics), and how in some cases that meaning arises within a social and/or cultural context. If we think of the communication acts carried out in the visualization process further levels of meaning are suggested. Visualization begins when someone has data that they wish to explore and interpret; the data are encoded as input to a visualization system, which may in its turn interact with other systems to produce a representation. This is communicated back to the user(s), who have to assess this against their goals and knowledge, possibly leading to further cycles of activity. Each phase of this process involves communication between two parties. For this to succeed, those parties must share a common language with an agreed meaning. We offer the following three steps, in increasing order of formality: terminology (jargon), taxonomy (vocabulary), and ontology. Our argument in this article is that it's time to begin synthesizing the fragments and views into a level 3 model, an ontology of visualization. We also address why this should happen, what is already in place, how such an ontology might be constructed, and why now.  相似文献   

9.
In the age of digital photography, the amount of photos we have in our personal collections has increased substantially along with the effort needed to manage these new, larger collections. This issue has already been addressed in various ways: from organization by meta-data analysis to image recognition and social network analysis. We introduce a new, more personal perspective on photowork that aims at understanding the user and his/her subjective relationship to the photos. It does so by means of implicit human–computer interaction, that is, by observing the user’s interaction with the photos. In order to study this interaction, we designed an experiment to see how people behave when manipulating photos on a tablet and how this implicitly conveyed information can be used to aid photo collection management.  相似文献   

10.
In this study, we propose a method for the recognition and retrieval of a flower species in the natural environment based on a multi-layer technique, and we also suggest novel applications. First, the study suggests how to capture a flower object that is blooming in the natural environment, as well as the corresponding background. Secondly, an experimental analysis is conducted for the purpose of improving the optimal method of feature extraction for color, texture, and shape. Thirdly, the study developed a flower-image automatic-recognition technology that can be utilized in a mobile environment. We performed experiments on 29,463 images of 300 species of blooming flowers that were collected in South Korea between 2011 and 2014. We found image recognition to be 91.26% for the 1st-ranking recognition of the flower image and 97.40% for the 6th-ranking recognition. These results show that the color–texture–shape features of the flower pictures are the most effective; furthermore, the effectiveness and validity of this suggested method for demonstration services are verified in this paper.  相似文献   

11.
Temporal motion models for monocular and multiview 3D human body tracking   总被引:1,自引:0,他引:1  
We explore an approach to 3D people tracking with learned motion models and deterministic optimization. The tracking problem is formulated as the minimization of a differentiable criterion whose differential structure is rich enough for optimization to be accomplished via hill-climbing. This avoids the computational expense of Monte Carlo methods, while yielding good results under challenging conditions. To demonstrate the generality of the approach we show that we can learn and track cyclic motions such as walking and running, as well as acyclic motions such as a golf swing. We also show results from both monocular and multi-camera tracking. Finally, we provide results with a motion model learned from multiple activities, and show how this models might be used for recognition.  相似文献   

12.
Recovery of temporal information from static images of handwriting   总被引:3,自引:0,他引:3  
The problem of off-line handwritten character recognition has eluded a satisfactory solution for several decades. Researchers working in the area of on-line recognition have had greater success, but the possibility of extracting on-line information from static images has not been fully explored. The experience of forensic document examiners assures us that in many cases, such information can be successfully recovered.We outline the design of a system for the recovery of temporal information from static handwritten images. We provide a taxonomy of local, regional and global temporal clues which are often found in hand-written samples, and describe methods for recovering these clues from the image.We show how this system can benefit from obtaining a comprehensive understanding of the handwriting signal and a detailed analysis of stroke and sub-stroke properties. We suggest that the recovery task requires that we break away from traditional thresholding and thinning techniques, and we provide a framework for such analysis. We demonstrate how isolated temporal clues can reliably be extracted from this framework and propose a control structure for integrating the partial information.We show how many seemingly ambiguous situations can be resolved by the derived clues and our knowledge of the writing process, and provide several examples to illustrate our approach.The support of this research by the Ricoh Corporation is gratefully acknowledged.  相似文献   

13.
Automatic indexing and content-based retrieval of captioned images   总被引:2,自引:0,他引:2  
Srihari  R.K. 《Computer》1995,28(9):49-56
  相似文献   

14.
Object recognition using shape-from-shading   总被引:2,自引:0,他引:2  
Investigates whether surface topography information extracted from intensity images using a shape-from-shading (SFS) algorithm can be used for the purposes of 3D object recognition. We consider how curvature and shape-index information delivered by this algorithm can be used to recognize objects based on their surface topography. We explore two contrasting object recognition strategies. The first of these is based on a low-level attribute summary and uses histograms of curvature and orientation measurements. The second approach is based on the structural arrangement of constant shape-index maximal patches and their associated region attributes. We show that region curvedness and a string ordering of the regions according to size provides recognition accuracy of about 96 percent. By polling various recognition schemes, including a graph matching method, we show that a recognition rate of 98-99 percent is achievable  相似文献   

15.
Community detection in social networks is a well-studied problem. A community in social network is commonly defined as a group of people whose interactions within the group are more than outside the group. It is believed that people’s behavior can be linked to the behavior of their social neighborhood. While shared characteristics of communities have been used to validate the communities found, to the best of authors’ knowledge, it is not demonstrated in the literature that communities found using social interaction data are like-minded, i.e., they behave similarly in terms of their interest in items (e.g., movie, products). In this paper, we experimentally demonstrate, on a social networking movie rating dataset, that people who are interested in an item are socially better connected than the overall graph. Motivated by this fact, we propose a method for finding communities wherein like-mindedness is an explicit objective. We find small tight groups with many shared interests using a frequent item set mining approach and use these as building blocks for the core of these like-minded communities. We show that these communities have higher similarity in their interests compared to communities found using only the interaction information. We also compare our method against a baseline where the weight of edges are defined based on similarity in interests between nodes and show that our approach achieves far higher level of like-mindedness amongst the communities compared to this baseline as well.  相似文献   

16.
《Advanced Robotics》2013,27(4):405-428
Robots designed to interact socially with people require reliable estimates of human position and motion. Additional pose data such as body orientation may enable a robot to interact more effectively by providing a basis for inferring contextual social information such as people's intentions and relationships. To this end, we have developed a system for simultaneously tracking the position and body orientation of many people, using a network of laser range finders mounted at torso height. An individual particle filter is used to track the position and velocity of each human, and a parametric shape model representing the person's cross-sectional contour is fit to the observed data at each step. We demonstrate the system's tracking accuracy quantitatively in laboratory trials and we present results from a field experiment observing subjects walking through the lobby of a building. The results show that our method can closely track torso and arm movements, even with noisy and incomplete sensor data, and we present examples of social information observable from this orientation and positioning information that may be useful for social robots.  相似文献   

17.
A key assumption of traditional machine learning approach is that the test data are draw from the same distribution as the training data. However, this assumption does not hold in many real-world scenarios. For example, in facial expression recognition, the appearance of an expression may vary significantly for different people. As a result, previous work has shown that learning from adequate person-specific data can improve the expression recognition performance over the one from generic data. However, person-specific data is typically very sparse in real-world applications due to the difficulties of data collection and labeling, and learning from sparse data may suffer from serious over-fitting. In this paper, we propose to learn a person-specific model through transfer learning. By transferring the informative knowledge from other people, it allows us to learn an accurate model for a new subject with only a small amount of person-specific data. We conduct extensive experiments to compare different person-specific models for facial expression and action unit (AU) recognition, and show that transfer learning significantly improves the recognition performance with a small amount of training data.  相似文献   

18.
The capacity of human recognition memory was investigated by Standing, who presented several groups of participants with different numbers of pictures (from 20 to 10 000), and subsequently tested their ability to distinguish between previously presented and novel pictures. The estimated number of pictures retained in recognition memory by different groups when plotted as a logarithmic function of the number of pictures presented formed a straight line, representing a power-law relationship. Here, we investigate if published models of familiarity discrimination can replicate Standing's results. We first consider a simplified assumption that visual stimuli are represented by uncorrelated patterns of firing of visual neurons providing input to the familiarity discrimination network. We show that for this case three models (Familiarity discrimination based on Energy (FamE), Anti-Hebbian and Info-max) can reproduce the observed power-law relationship when their synaptic weights are appropriately initialized. For more realistic assumptions on neural representation of stimuli, the FamE model is no longer able to reproduce the power-law relationship in simulations, while the Anti-Hebbian and Info-max can reproduce it. Nevertheless, the slopes of the power-law relationships produced by the models in all simulations differ from that observed by Standing. We discuss possible reasons for this difference, including separate contributions of familiarity and recollection processes, and describe experimentally testable predictions based on our analysis.  相似文献   

19.
In this paper, we present an approach for consistently labeling people and for detecting human–object interactions using mono-camera surveillance video. The approach is based on a robust appearance-based correlogram model combined with histogram information to model color distributions of people and objects in the scene. The models are dynamically built from non-stationary objects, which are the outputs of background subtraction, and are used to identify objects on a frame-by-frame basis. We are able to detect when people merge into groups and to segment them even during partial occlusion. We can also detect when a person deposits or removes an object. The models persist when a person or object leaves the scene and are used to identify them when they reappear. Experiments show that the models are able to accommodate perspective foreshortening that occurs with overhead camera angles, as well as partial occlusion. The results show that this is an effective approach that is able to provide important information to algorithms performing higher-level analysis, such as activity recognition, where human–object interactions play an important role.  相似文献   

20.
We conducted a meta-synthesis of five different studies that developed, tested, and implemented new technologies for the purpose of collecting observations of daily living (ODL). From this synthesis, we developed a model to explain user motivation as it relates to ODL collection. We describe this model that includes six factors that motivate patients’ collection of ODL data: usability, illness experience, relevance of ODL, information technology infrastructure, degree of burden, and emotional activation. We show how these factors can act as barriers or facilitators to the collection of ODL data and how interacting with care professionals and sharing ODL data may also influence ODL collection, health-related awareness, and behavior change. The model we developed and used to explain ODL collection can be helpful to researchers and designers who study and develop new, personal health technologies to empower people to improve their health.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号