首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Observers can rapidly perform a variety of visual tasks such as categorizing a scene as open, as outdoor, or as a beach. Although we know that different tasks are typically associated with systematic differences in behavioral responses, to date, little is known about the underlying mechanisms. Here, we implemented a single integrated paradigm that links perceptual processes with categorization processes. Using a large image database of natural scenes, we trained machine-learning classifiers to derive quantitative measures of task-specific perceptual discriminability based on the distance between individual images and different categorization boundaries. We showed that the resulting discriminability measure accurately predicts variations in behavioral responses across categorization tasks and stimulus sets. We further used the model to design an experiment, which challenged previous interpretations of the so-called “superordinate advantage.” Overall, our study suggests that observed differences in behavioral responses across rapid categorization tasks reflect natural variations in perceptual discriminability.  相似文献   

2.
The ability to quickly categorize visual scenes is critical to daily life, allowing us to identify our whereabouts and to navigate from one place to another. Rapid scene categorization relies heavily on the kinds of objects scenes contain; for instance, studies have shown that recognition is less accurate for scenes to which incongruent objects have been added, an effect usually interpreted as evidence of objects'' general capacity to activate semantic networks for scene categories they are statistically associated with. Essentially all real-world scenes contain multiple objects, however, and it is unclear whether scene recognition draws on the scene associations of individual objects or of object groups. To test the hypothesis that scene recognition is steered, at least in part, by associations between object groups and scene categories, we asked observers to categorize briefly-viewed scenes appearing with object pairs that were semantically consistent or inconsistent with the scenes. In line with previous results, scenes were less accurately recognized when viewed with inconsistent versus consistent pairs. To understand whether this reflected individual or group-level object associations, we compared the impact of pairs composed of mutually related versus unrelated objects; i.e., pairs, which, as groups, had clear associations to particular scene categories versus those that did not. Although related and unrelated object pairs equally reduced scene recognition accuracy, unrelated pairs were consistently less capable of drawing erroneous scene judgments towards scene categories associated with their individual objects. This suggests that scene judgments were influenced by the scene associations of object groups, beyond the influence of individual objects. More generally, the fact that unrelated objects were as capable of degrading categorization accuracy as related objects, while less capable of generating specific alternative judgments, indicates that the process by which objects interfere with scene recognition is separate from the one through which they inform it.  相似文献   

3.
He X  Yang Z  Tsien JZ 《PloS one》2011,6(5):e20002
Humans can categorize objects in complex natural scenes within 100-150 ms. This amazing ability of rapid categorization has motivated many computational models. Most of these models require extensive training to obtain a decision boundary in a very high dimensional (e.g., ~6,000 in a leading model) feature space and often categorize objects in natural scenes by categorizing the context that co-occurs with objects when objects do not occupy large portions of the scenes. It is thus unclear how humans achieve rapid scene categorization.To address this issue, we developed a hierarchical probabilistic model for rapid object categorization in natural scenes. In this model, a natural object category is represented by a coarse hierarchical probability distribution (PD), which includes PDs of object geometry and spatial configuration of object parts. Object parts are encoded by PDs of a set of natural object structures, each of which is a concatenation of local object features. Rapid categorization is performed as statistical inference. Since the model uses a very small number (~100) of structures for even complex object categories such as animals and cars, it requires little training and is robust in the presence of large variations within object categories and in their occurrences in natural scenes. Remarkably, we found that the model categorized animals in natural scenes and cars in street scenes with a near human-level performance. We also found that the model located animals and cars in natural scenes, thus overcoming a flaw in many other models which is to categorize objects in natural context by categorizing contextual features. These results suggest that coarse PDs of object categories based on natural object structures and statistical operations on these PDs may underlie the human ability to rapidly categorize scenes.  相似文献   

4.
Yao JG  Gao X  Yan HM  Li CY 《PloS one》2011,6(1):e16343

Background

Instantaneous object discrimination and categorization are fundamental cognitive capacities performed with the guidance of visual attention. Visual attention enables selection of a salient object within a limited area of the visual field; we referred to as “field of attention” (FA). Though there is some evidence concerning the spatial extent of object recognition, the following questions still remain unknown: (a) how large is the FA for rapid object categorization, (b) how accuracy of attention is distributed over the FA, and (c) how fast complex objects can be categorized when presented against backgrounds formed by natural scenes.

Methodology/Principal Findings

To answer these questions, we used a visual perceptual task in which subjects were asked to focus their attention on a point while being required to categorize briefly flashed (20 ms) photographs of natural scenes by indicating whether or not these contained an animal. By measuring the accuracy of categorization at different eccentricities from the fixation point, we were able to determine the spatial extent and the distribution of accuracy over the FA, as well as the speed of categorizing objects using stimulus onset asynchrony (SOA). Our results revealed that subjects are able to rapidly categorize complex natural images within about 0.1 s without eye movement, and showed that the FA for instantaneous image categorization covers a visual field extending 20°×24°, and accuracy was highest (>90%) at the center of FA and declined with increasing eccentricity.

Conclusions/Significance

In conclusion, human beings are able to categorize complex natural images at a glance over a large extent of the visual field without eye movement.  相似文献   

5.
Recognizing an object takes just a fraction of a second, less than the blink of an eye. Applying multivariate pattern analysis, or “brain decoding”, methods to magnetoencephalography (MEG) data has allowed researchers to characterize, in high temporal resolution, the emerging representation of object categories that underlie our capacity for rapid recognition. Shortly after stimulus onset, object exemplars cluster by category in a high-dimensional activation space in the brain. In this emerging activation space, the decodability of exemplar category varies over time, reflecting the brain’s transformation of visual inputs into coherent category representations. How do these emerging representations relate to categorization behavior? Recently it has been proposed that the distance of an exemplar representation from a categorical boundary in an activation space is critical for perceptual decision-making, and that reaction times should therefore correlate with distance from the boundary. The predictions of this distance hypothesis have been born out in human inferior temporal cortex (IT), an area of the brain crucial for the representation of object categories. When viewed in the context of a time varying neural signal, the optimal time to “read out” category information is when category representations in the brain are most decodable. Here, we show that the distance from a decision boundary through activation space, as measured using MEG decoding methods, correlates with reaction times for visual categorization during the period of peak decodability. Our results suggest that the brain begins to read out information about exemplar category at the optimal time for use in choice behaviour, and support the hypothesis that the structure of the representation for objects in the visual system is partially constitutive of the decision process in recognition.  相似文献   

6.

Background

Since the pioneering study by Rosch and colleagues in the 70s, it is commonly agreed that basic level perceptual categories (dog, chair…) are accessed faster than superordinate ones (animal, furniture…). Nevertheless, the speed at which objects presented in natural images can be processed in a rapid go/no-go visual superordinate categorization task has challenged this “basic level advantage”.

Principal Findings

Using the same task, we compared human processing speed when categorizing natural scenes as containing either an animal (superordinate level), or a specific animal (bird or dog, basic level). Human subjects require an additional 40–65 ms to decide whether an animal is a bird or a dog and most errors are induced by non-target animals. Indeed, processing time is tightly linked with the type of non-targets objects. Without any exemplar of the same superordinate category to ignore, the basic level category is accessed as fast as the superordinate category, whereas the presence of animal non-targets induces both an increase in reaction time and a decrease in accuracy.

Conclusions and Significance

These results support the parallel distributed processing theory (PDP) and might reconciliate controversial studies recently published. The visual system can quickly access a coarse/abstract visual representation that allows fast decision for superordinate categorization of objects but additional time-consuming visual analysis would be necessary for a decision at the basic level based on more detailed representations.  相似文献   

7.
Viewpoint-specific scene representations in human parahippocampal cortex   总被引:15,自引:0,他引:15  
Epstein R  Graham KS  Downing PE 《Neuron》2003,37(5):865-876
The "parahippocampal place area" (PPA) responds more strongly in functional magnetic resonance imaging (fMRI) to scenes than to faces, objects, or other visual stimuli. We used an event-related fMRI adaptation paradigm to test whether the PPA represents scenes in a viewpoint-specific or viewpoint-invariant manner. The PPA responded just as strongly to viewpoint changes that preserved intrinsic scene geometry as it did to complete scene changes, but less strongly to object changes within the scene. In contrast, lateral occipital cortex responded more strongly to object changes than to spatial changes. These results demonstrate that scene processing in the PPA is viewpoint specific and suggest that the PPA represents the relationship between the observer and the surfaces that define local space.  相似文献   

8.
We investigated whether low-level processed image properties that are shared by natural scenes and artworks – but not veridical face photographs – affect the perception of facial attractiveness and age. Specifically, we considered the slope of the radially averaged Fourier power spectrum in a log-log plot. This slope is a measure of the distribution of special frequency power in an image. Images of natural scenes and artworks possess – compared to face images – a relatively shallow slope (i.e., increased high spatial frequency power). Since aesthetic perception might be based on the efficient processing of images with natural scene statistics, we assumed that the perception of facial attractiveness might also be affected by these properties. We calculated Fourier slope and other beauty-associated measurements in face images and correlated them with ratings of attractiveness and age of the depicted persons (Study 1). We found that Fourier slope – in contrast to the other tested image properties – did not predict attractiveness ratings when we controlled for age. In Study 2A, we overlaid face images with random-phase patterns with different statistics. Patterns with a slope similar to those in natural scenes and artworks resulted in lower attractiveness and higher age ratings. In Studies 2B and 2C, we directly manipulated the Fourier slope of face images and found that images with shallower slopes were rated as more attractive. Additionally, attractiveness of unaltered faces was affected by the Fourier slope of a random-phase background (Study 3). Faces in front of backgrounds with statistics similar to natural scenes and faces were rated as more attractive. We conclude that facial attractiveness ratings are affected by specific image properties. An explanation might be the efficient coding hypothesis.  相似文献   

9.
The cognitive and neural mechanisms for recognizing and categorizing behavior are not well understood in non-human animals. In the current experiments, pigeons and humans learned to categorize two non-repeating, complex human behaviors (“martial arts” vs. “Indian dance”). Using multiple video exemplars of a digital human model, pigeons discriminated these behaviors in a go/no-go task and humans in a choice task. Experiment 1 found that pigeons already experienced with discriminating the locomotive actions of digital animals acquired the discrimination more rapidly when action information was available than when only pose information was available. Experiments 2 and 3 found this same dynamic superiority effect with naïve pigeons and human participants. Both species used the same combination of immediately available static pose information and more slowly perceived dynamic action cues to discriminate the behavioral categories. Theories based on generalized visual mechanisms, as opposed to embodied, species-specific action networks, offer a parsimonious account of how these different animals recognize behavior across and within species.  相似文献   

10.
Marois R  Yi DJ  Chun MM 《Neuron》2004,41(3):465-472
Cognitive models of attention propose that visual perception is a product of two stages of visual processing: early operations permit rapid initial categorization of the visual world, while later attention-demanding capacity-limited stages are necessary for the conscious report of the stimuli. Here we used the attentional blink paradigm and fMRI to neurally distinguish these two stages of vision. Subjects detected a face target and a scene target presented rapidly among distractors at fixation. Although the second, scene target frequently went undetected by the subjects, it nonetheless activated regions of the medial temporal cortex involved in high-level scene representations, the parahippocampal place area (PPA). This PPA activation was amplified when the stimulus was consciously perceived. By contrast, the frontal cortex was activated only when scenes were successfully reported. These results suggest that medial temporal cortex permits rapid categorization of the visual input, while the frontal cortex is part of a capacity-limited attentional bottleneck to conscious report.  相似文献   

11.
The ability to detect sudden changes in the environment is critical for survival. Hearing is hypothesized to play a major role in this process by serving as an “early warning device,” rapidly directing attention to new events. Here, we investigate listeners'' sensitivity to changes in complex acoustic scenes—what makes certain events “pop-out” and grab attention while others remain unnoticed? We use artificial “scenes” populated by multiple pure-tone components, each with a unique frequency and amplitude modulation rate. Importantly, these scenes lack semantic attributes, which may have confounded previous studies, thus allowing us to probe low-level processes involved in auditory change perception. Our results reveal a striking difference between “appear” and “disappear” events. Listeners are remarkably tuned to object appearance: change detection and identification performance are at ceiling; response times are short, with little effect of scene-size, suggesting a pop-out process. In contrast, listeners have difficulty detecting disappearing objects, even in small scenes: performance rapidly deteriorates with growing scene-size; response times are slow, and even when change is detected, the changed component is rarely successfully identified. We also measured change detection performance when a noise or silent gap was inserted at the time of change or when the scene was interrupted by a distractor that occurred at the time of change but did not mask any scene elements. Gaps adversely affected the processing of item appearance but not disappearance. However, distractors reduced both appearance and disappearance detection. Together, our results suggest a role for neural adaptation and sensitivity to transients in the process of auditory change detection, similar to what has been demonstrated for visual change detection. Importantly, listeners consistently performed better for item addition (relative to deletion) across all scene interruptions used, suggesting a robust perceptual representation of item appearance.  相似文献   

12.
The way we experience the space around us is highly subjective. It has been shown that motion potentialities that are intrinsic to our body influence our space categorization. Furthermore, we have recently demonstrated that in the extrapersonal space, our categorization also depends on the movement potential of other agents. When we have to categorize the space as “Near” or “Far” between a reference and a target, the space categorized as “Near” is wider if the reference corresponds to a biological agent that has the potential to walk, instead of a biological and non-biological agent that cannot walk. But what exactly drives this “Near space extension”? In the present paper, we tested whether abstract beliefs about the biological nature of an agent determine how we categorize the space between the agent and an object. Participants were asked to first read a Pinocchio story and watch a correspondent video in which Pinocchio acts like a real human, in order to become more transported into the initial story. Then they had to categorize the location ("Near" or "Far") of a target object located at progressively increasing or decreasing distances from a non-biological agent (i.e., a wooden dummy) and from a biological agent (i.e., a human-like avatar). The results indicate that being transported into the Pinocchio story, induces an equal “Near” space threshold with both the avatar and the wooden dummy as reference frames.  相似文献   

13.
The aim of this study is to explore whether matrices and MP trees used to produce systematic categories of organisms could be useful to produce categories of ideas in history of science. We study the history of the use of trees in systematics to represent the diversity of life from 1766 to 1991. We apply to those ideas a method inspired from coding homologous parts of organisms. We discretize conceptual parts of ideas, writings and drawings about trees contained in 41 main writings; we detect shared parts among authors and code them into a 91-characters matrix and use a tree representation to show who shares what with whom. In other words, we propose a hierarchical representation of the shared ideas about trees among authors: this produces a “tree of trees.” Then, we categorize schools of tree-representations. Classical schools like “cladists” and “pheneticists” are recovered but others are not: “gradists” are separated into two blocks, one of them being called here “grade theoreticians.” We propose new interesting categories like the “buffonian school,” the “metaphoricians,” and those using “strictly genealogical classifications.” We consider that networks are not useful to represent shared ideas at the present step of the study. A cladogram is made for showing who is sharing what with whom, but also heterobathmy and homoplasy of characters. The present cladogram is not modelling processes of transmission of ideas about trees, and here it is mostly used to test for proximity of ideas of the same age and for categorization.  相似文献   

14.
The processes underlying object recognition are fundamental for the understanding of visual perception. Humans can recognize many objects rapidly even in complex scenes, a task that still presents major challenges for computer vision systems. A common experimental demonstration of this ability is the rapid animal detection protocol, where human participants earliest responses to report the presence/absence of animals in natural scenes are observed at 250–270 ms latencies. One of the hypotheses to account for such speed is that people would not actually recognize an animal per se, but rather base their decision on global scene statistics. These global statistics (also referred to as spatial envelope or gist) have been shown to be computationally easy to process and could thus be used as a proxy for coarse object recognition. Here, using a saccadic choice task, which allows us to investigate a previously inaccessible temporal window of visual processing, we showed that animal – but not vehicle – detection clearly precedes scene categorization. This asynchrony is in addition validated by a late contextual modulation of animal detection, starting simultaneously with the availability of scene category. Interestingly, the advantage for animal over scene categorization is in opposition to the results of simulations using standard computational models. Taken together, these results challenge the idea that rapid animal detection might be based on early access of global scene statistics, and rather suggests a process based on the extraction of specific local complex features that might be hardwired in the visual system.  相似文献   

15.
This work analyzed the perceptual attributes of natural dynamic audiovisual scenes. We presented thirty participants with 19 natural scenes in a similarity categorization task, followed by a semi-structured interview. The scenes were reproduced with an immersive audiovisual display. Natural scene perception has been studied mainly with unimodal settings, which have identified motion as one of the most salient attributes related to visual scenes, and sound intensity along with pitch trajectories related to auditory scenes. However, controlled laboratory experiments with natural multimodal stimuli are still scarce. Our results show that humans pay attention to similar perceptual attributes in natural scenes, and a two-dimensional perceptual map of the stimulus scenes and perceptual attributes was obtained in this work. The exploratory results show the amount of movement, perceived noisiness, and eventfulness of the scene to be the most important perceptual attributes in naturalistically reproduced real-world urban environments. We found the scene gist properties openness and expansion to remain as important factors in scenes with no salient auditory or visual events. We propose that the study of scene perception should move forward to understand better the processes behind multimodal scene processing in real-world environments. We publish our stimulus scenes as spherical video recordings and sound field recordings in a publicly available database.  相似文献   

16.
Without doubt general video and sound, as found in large multimedia archives, carry emotional information. Thus, audio and video retrieval by certain emotional categories or dimensions could play a central role for tomorrow''s intelligent systems, enabling search for movies with a particular mood, computer aided scene and sound design in order to elicit certain emotions in the audience, etc. Yet, the lion''s share of research in affective computing is exclusively focusing on signals conveyed by humans, such as affective speech. Uniting the fields of multimedia retrieval and affective computing is believed to lend to a multiplicity of interesting retrieval applications, and at the same time to benefit affective computing research, by moving its methodology “out of the lab” to real-world, diverse data. In this contribution, we address the problem of finding “disturbing” scenes in movies, a scenario that is highly relevant for computer-aided parental guidance. We apply large-scale segmental feature extraction combined with audio-visual classification to the particular task of detecting violence. Our system performs fully data-driven analysis including automatic segmentation. We evaluate the system in terms of mean average precision (MAP) on the official data set of the MediaEval 2012 evaluation campaign''s Affect Task, which consists of 18 original Hollywood movies, achieving up to .398 MAP on unseen test data in full realism. An in-depth analysis of the worth of individual features with respect to the target class and the system errors is carried out and reveals the importance of peak-related audio feature extraction and low-level histogram-based video analysis.  相似文献   

17.
有关场景一致性效应的研究发现,人们对与背景语义一致的前景物体的命名、分类、搜索和再认等都快于与背景不一致的物体.和与情境一致的物体相比,与情境不一致的物体在中央顶区等部位,会诱发一个幅度更大的负波(N390).旁海马皮层/旁海马位置区(PHC/PPA)和压部后皮层(RSC)是负责场景加工的重要脑区.场景一致性效应的时间进程可能首先由低空间分辨率(LSF)信息激活眶额皮层(OFC)(130 ms左右)、PHC/PPA和RSC,之后LSF信息与高空间分辨率(HSF)信息在颞叶进行整合.在诸多理论模型中,情境促进模型从生理角度对一致性效应作了较充分的解释.  相似文献   

18.
The parahippocampal place area: recognition, navigation, or encoding?   总被引:24,自引:0,他引:24  
R Epstein  A Harris  D Stanley  N Kanwisher 《Neuron》1999,23(1):115-125
The parahippocampal place area (PPA) has been demonstrated to respond more strongly in fMRI to scenes depicting places than to other kinds of visual stimuli. Here, we test several hypotheses about the function of the PPA. We find that PPA activity (1) is not affected by the subjects' familiarity with the place depicted, (2) does not increase when subjects experience a sense of motion through the scene, and (3) is greater when viewing novel versus repeated scenes but not novel versus repeated faces. Thus, we find no evidence that the PPA is involved in matching perceptual information to stored representations in memory, in planning routes, or in monitoring locomotion through the local or distal environment but some evidence that it is involved in encoding new perceptual information about the appearance and layout of scenes.  相似文献   

19.
Cognitive theories in visual attention and perception, categorization, and memory often critically rely on concepts of similarity among objects, and empirically require measures of “sameness” among their stimuli. For instance, a researcher may require similarity estimates among multiple exemplars of a target category in visual search, or targets and lures in recognition memory. Quantifying similarity, however, is challenging when everyday items are the desired stimulus set, particularly when researchers require several different pictures from the same category. In this article, we document a new multidimensional scaling database with similarity ratings for 240 categories, each containing color photographs of 16–17 exemplar objects. We collected similarity ratings using the spatial arrangement method. Reports include: the multidimensional scaling solutions for each category, up to five dimensions, stress and fit measures, coordinate locations for each stimulus, and two new classifications. For each picture, we categorized the item''s prototypicality, indexed by its proximity to other items in the space. We also classified pairs of images along a continuum of similarity, by assessing the overall arrangement of each MDS space. These similarity ratings will be useful to any researcher that wishes to control the similarity of experimental stimuli according to an objective quantification of “sameness.”  相似文献   

20.
It is widely agreed that in object categorization bottom-up and top-down influences interact. How top-down processes affect categorization has been primarily investigated in isolation, with only one higher level process at a time being manipulated. Here, we investigate the combination of different top-down influences (by varying the level of category, the animacy and the background of the object) and their effect on rapid object categorization. Subjects participated in a two-alternative forced choice rapid categorization task, while we measured accuracy and reaction times. Subjects had to categorize objects on the superordinate, basic or subordinate level. Objects belonged to the category animal or vehicle and each object was presented on a gray, congruent (upright) or incongruent (inverted) background. The results show that each top-down manipulation impacts object categorization and that they interact strongly. The best categorization was achieved on the superordinate level, providing no advantage for basic level in rapid categorization. Categorization between vehicles was faster than between animals on the basic level and vice versa on the subordinate level. Objects in homogenous gray background (context) yielded better overall performance than objects embedded in complex scenes, an effect most prominent on the subordinate level. An inverted background had no negative effect on object categorization compared to upright scenes. These results show how different top-down manipulations, such as category level, category type and background information, are related. We discuss the implications of top-down interactions on the interpretation of categorization results.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号