首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 421 毫秒
1.
A user's focus of attention plays an important role in human-computer interaction applications, such as a ubiquitous computing environment and intelligent space, where the user's goal and intent have to be continuously monitored. We are interested in modeling people's focus of attention in a meeting situation. We propose to model participants' focus of attention from multiple cues. We have developed a system to estimate participants' focus of attention from gaze directions and sound sources. We employ an omnidirectional camera to simultaneously track participants' faces around a meeting table and use neural networks to estimate their head poses. In addition, we use microphones to detect who is speaking. The system predicts participants' focus of attention from acoustic and visual information separately. The system then combines the output of the audio- and video-based focus of attention predictors. We have evaluated the system using the data from three recorded meetings. The acoustic information has provided 8% relative error reduction on average compared to only using one modality. The focus of attention model can be used as an index for a multimedia meeting record. It can also be used for analyzing a meeting.  相似文献   

2.
With the development of eye gaze tracking technology, much research has been performed to adopt this technology for interfacing with home appliances by severely disabled and wheelchair-bound users. For this purpose, two cameras are usually required, one for calculating the gaze position of the user, and the other for detecting and recognizing the home appliance. In order to accurately calculate the gaze position on the home appliance that the user looks at, the Z-distance and direction of the home appliance from the user should be correctly measured. Therefore, stereo cameras or depth-measuring devices such as Kinect are necessary, but they have limitations such as the need for additional camera calibration, and low acquisition speed for two cameras or a large-size of Kinect device. To overcome this problem, we propose a new method for estimating the continuous Z-distances and discrete directions of home appliances using one (small-sized) near-infrared (NIR) web camera and a fuzzy system. Experimental results show that the proposed method can accurately estimate the Z-distances and directions to home appliances.  相似文献   

3.
When estimating human gaze directions from captured eye appearances, most existing methods assume a fixed head pose because head motion changes eye appearance greatly and makes the estimation inaccurate. To handle this difficult problem, in this paper, we propose a novel method that performs accurate gaze estimation without restricting the user's head motion. The key idea is to decompose the original free-head motion problem into subproblems, including an initial fixed head pose problem and subsequent compensations to correct the initial estimation biases. For the initial estimation, automatic image rectification and joint alignment with gaze estimation are introduced. Then compensations are done by either learning-based regression or geometric-based calculation. The merit of using such a compensation strategy is that the training requirement to allow head motion is not significantly increased; only capturing a 5-s video clip is required. Experiments are conducted, and the results show that our method achieves an average accuracy of around 3° by using only a single camera.  相似文献   

4.
This paper studies the design and application of a novel visual attention model designed to compute user's gaze position automatically, i.e., without using a gaze-tracking system. The model we propose is specifically designed for real-time first-person exploration of 3D virtual environments. It is the first model adapted to this context which can compute in real time a continuous gaze point position instead of a set of 3D objects potentially observed by the user. To do so, contrary to previous models which use a mesh-based representation of visual objects, we introduce a representation based on surface-elements. Our model also simulates visual reflexes and the cognitive processes which take place in the brain such as the gaze behavior associated to first-person navigation in the virtual environment. Our visual attention model combines both bottom-up and top-down components to compute a continuous gaze point position on screen that hopefully matches the user's one. We conducted an experiment to study and compare the performance of our method with a state-of-the-art approach. Our results are found significantly better with sometimes more than 100 percent of accuracy gained. This suggests that computing a gaze point in a 3D virtual environment in real time is possible and is a valid approach, compared to object-based approaches. Finally, we expose different applications of our model when exploring virtual environments. We present different algorithms which can improve or adapt the visual feedback of virtual environments based on gaze information. We first propose a level-of-detail approach that heavily relies on multiple-texture sampling. We show that it is possible to use the gaze information of our visual attention model to increase visual quality where the user is looking, while maintaining a high-refresh rate. Second, we introduce the use of the visual attention model in three visual effects inspired by the human visual system namely: depth-of-field blur, camera- motions, and dynamic luminance. All these effects are computed based on the simulated gaze of the user, and are meant to improve user's sensations in future virtual reality applications.  相似文献   

5.
We introduce a system to compute both head orientation and gaze detection from a single image. The system uses a camera with fixed parameters and requires no user calibration. Our approach to head orientation is based on a geometrical model of the human face, and is derived form morphological and physiological data. Eye gaze detection is based on a geometrical model of the human eye. Two new algorithms are introduced that require either two or three feature points to be extracted from each image. Our algorithms are robust and run in real-time on a typical PC, which makes our system useful for a large variety of needs, from driver attention monitoring to machine-human interaction.  相似文献   

6.
We propose an electric wheelchair controlled by gaze direction and eye blinking. A camera is set up in front of a wheelchair user to capture image information. The sequential captured image is interpreted to obtain the gaze direction and eye blinking properties. The gaze direction is expressed by the horizontal angle of the gaze, and this is derived from the triangle formed by the centers of the eyes and the nose. The gaze direction and eye blinking are used to provide direction and timing commands, respectively. The direction command relates to the direction of movement of the electric wheelchair, and the timing command relates to the time when the wheelchair should move. The timing command with an eye blinking mechanism is designed to generate ready, backward movement, and stop commands for the electric wheelchair. Furthermore, to move at a certain velocity, the electric wheelchair also receives a velocity command as well as the direction and timing commands. The disturbance observer-based control system is used to control the direction and velocity. For safety purposes, an emergency stop is generated when the electric wheelchair user does not focus their gaze consistently in any direction for a specifi ed time. A number of simulations and experiments were conducted with the electric wheelchair in a laboratory environment.  相似文献   

7.
HoloTabletop is a low-cost holographic-like tabletop interactive system. This system analyzes user’s head position and gaze location in a real time setting and computes the corresponding anamorphic illusion image. The anamorphic illusion image is displayed on a 2D horizontally-located monitor, yet offers stereo vision to the user. The user is able to view and interact with the 3D virtual objects without wearing any special glasses or devices. The experimental results and user studies verify that the proposed HoloTabletop system offers excellent stereo vision while no visual fatigue will be caused to human eyes. This system is a great solution for many interactive applications such as 3D board games and stereo map browsing.  相似文献   

8.
We have developed an easy-to-use and cost-effective system to construct textured 3D animated face models from videos with minimal user interaction. This is a particularly challenging task for faces due to a lack of prominent textures. We develop a robust system by following a model-based approach: we make full use of generic knowledge of faces in head motion determination, head tracking, model fitting, and multiple-view bundle adjustment. Our system first takes, with an ordinary video camera, images of a face of a person sitting in front of the camera turning their head from one side to the other. After five manual clicks on two images to indicate the position of the eye corners, nose tip and mouth corners, the system automatically generates a realistic looking 3D human head model that can be animated immediately (different poses, facial expressions and talking). A user, with a PC and a video camera, can use our system to generate his/her face model in a few minutes. The face model can then be imported in his/her favorite game, and the user sees themselves and their friends take part in the game they are playing. We have demonstrated the system on a laptop computer live at many events, and constructed face models for hundreds of people. It works robustly under various environment settings.  相似文献   

9.
We address the problem of recognizing the visual focus of attention (VFOA) of meeting participants based on their head pose. To this end, the head pose observations are modeled using a Gaussian mixture model (GMM) or a hidden Markov model (HMM) whose hidden states correspond to the VFOA. The novelties of this paper are threefold. First, contrary to previous studies on the topic, in our setup, the potential VFOA of a person is not restricted to other participants only. It includes environmental targets as well (a table and a projection screen), which increases the complexity of the task, with more VFOA targets spread in the pan as well as tilt gaze space. Second, we propose a geometric model to set the GMM or HMM parameters by exploiting results from cognitive science on saccadic eye motion, which allows the prediction of the head pose given a gaze target. Third, an unsupervised parameter adaptation step not using any labeled data is proposed, which accounts for the specific gazing behavior of each participant. Using a publicly available corpus of eight meetings featuring four persons, we analyze the above methods by evaluating, through objective performance measures, the recognition of the VFOA from head pose information obtained either using a magnetic sensor device or a vision-based tracking system. The results clearly show that in such complex but realistic situations, the VFOA recognition performance is highly dependent on how well the visual targets are separated for a given meeting participant. In addition, the results show that the use of a geometric model with unsupervised adaptation achieves better results than the use of training data to set the HMM parameters.  相似文献   

10.
视线估计能够反映人的关注焦点,对理解人类的情感、兴趣等主观意识有重要作用。但目前用于视线估计的单目眼睛图像容易因头部姿态的变化而失真,导致视线估计的准确性下降。提出一种新型分类视线估计方法,利用三维人脸模型与单目相机的内在参数,通过人脸的眼睛与嘴巴中心的三维坐标形成头部姿态坐标系,从而合成相机坐标系与头部姿态坐标系,并建立归一化坐标系,实现相机坐标系的校正。复原并放大归一化得到的灰度眼部图像,建立基于表观的卷积神经网络模型分类方法以估计视线方向,并利用黄金分割法优化搜索,进一步降低误差。在MPIIGaze数据集上的实验结果表明,相比已公开的同类算法,该方法能降低约7.4%的平均角度误差。  相似文献   

11.
在单相机单光源条件下,针对现有视线估计方法标定过程复杂的问题,提出一种新的单点标定视线估计方法. 该方法预先建立屏幕中多个点的视线估计统计模型,进而通过插值估计用户在屏幕中的视点. 主要创新工作有:1) 提出一种基于统计的单点标定视线估计模型,降低了标定过程的复杂度;2) 采用增量学习方法进一步更新模型,提高模型对不同用户以及头部运动的适应性. 实验证明,本文方法在设备简单、允许头部运动的前提下,只需单点标定就能够取得较高精度.  相似文献   

12.
Eye tracking is one of the most prominent modalities to track user attention while interacting with computational devices. Today, most of the current eye tracking frameworks focus on tracking the user gaze during website browsing or while performing other tasks and interactions with a digital device. Most frameworks have in common that they do not exploit gaze as an input modality. In this paper we describe the realization of a framework named viGaze. Its main goal is to provide an easy to use framework to exploit the use of eye gaze as an input modality in various contexts. Therefore it provides features to explore explicit and implicit interactions in complex virtual environments by using the eye gaze of a user for various interactions. The viGaze framework is flexible and can be easily extended to incorporate other input modalities typically used in Post-WIMP interfaces such as gesture or foot input. In this paper we describe the key components of our viGaze framework and additionally describe a user study that was conducted to test the framework. The user study took place in a virtual retail environment, which provides a challenging pervasive environment and contains complex interactions that can be supported by gaze. The participants performed two gaze-based interactions with products on virtual shelves and started an interaction cycle between the products and an advertisement monitor placed on the shelf. We demonstrate how gaze can be used in Post-WIMP interfaces to steer the attention of users to certain components of the system. We conclude by discussing the advantages provided through the viGaze framework and highlighting the potentials of gaze-based interaction.  相似文献   

13.
Applications such as telepresence and training involve the display of real or synthetic humans to multiple viewers. When attempting to render the humans with conventional displays, non-verbal cues such as head pose, gaze direction, body posture, and facial expression are difficult to convey correctly to all viewers. In addition, a framed image of a human conveys only a limited physical sense of presence—primarily through the display’s location. While progress continues on articulated robots that mimic humans, the focus has been on the motion and behavior of the robots rather than on their appearance. We introduce a new approach for robotic avatars of real people: the use of cameras and projectors to capture and map both the dynamic motion and the appearance of a real person onto a humanoid animatronic model. We call these devices animatronic Shader Lamps Avatars (SLA). We present a proof-of-concept prototype comprised of a camera, a tracking system, a digital projector, and a life-sized styrofoam head mounted on a pan-tilt unit. The system captures imagery of a moving, talking user and maps the appearance and motion onto the animatronic SLA, delivering a dynamic, real-time representation of the user to multiple viewers.  相似文献   

14.
针对人体在大空间范围内自由运动时视线方向难以追踪的问题,构建了一套基于光学跟踪设备的头戴式视线追踪系统。系统通过被动式光学追踪设备和头戴式眼部摄像机获取使用者的头部运动状态与眼部图像,然后依据初始标定结果来估计使用者自由运动状态下的视线方向;最后对系统进行简化,得到了适用于同类环境、与具体硬件设备无关的视线跟踪三点三面三变换几何模型。对系统进行应用实验和误差分析表明,使用者在3.0 * 3.2 * 2.0 m的大工作空间内自由运动时视线追踪误差为1.69度,频率为20赫兹。  相似文献   

15.
16.
Conventional iris recognition requires a high-resolution camera equipped with a zoom lens and a near-infrared illuminator to observe iris patterns. Moreover, with a zoom lens, the viewing angle is small, restricting the user’s head movement. To address these limitations, periocular recognition has recently been studied as biometrics. Because the larger surrounding area of the eye is used instead of iris region, the camera having the high-resolution sensor and zoom lens is not necessary for the periocular recognition. In addition, the image of user’s eye can be captured by using the camera having wide viewing angle, which reduces the constraints to the head movement of user’s head during the image acquisition. Previous periocular recognition methods extract features in Cartesian coordinates sensitive to the rotation (roll) of the eye region caused by in-plane rotation of the head, degrading the matching accuracy. Thus, we propose a novel periocular recognition method that is robust to eye rotation (roll) based on polar coordinates. Experimental results with open database of CASIA-Iris-Distance database (CASIA-IrisV4) show that the proposed method outperformed the others.  相似文献   

17.
In this paper we propose a system for the analysis of user generated video (UGV). UGV often has a rich camera motion structure that is generated at the time the video is recorded by the person taking the video, i.e., the ?camera person.? We exploit this structure by defining a new concept known as camera view for temporal segmentation of UGV. The segmentation provides a video summary with unique properties that is useful in applications such as video annotation. Camera motion is also a powerful feature for identification of keyframes and regions of interest (ROIs) since it is an indicator of the camera person's interests in the scene and can also attract the viewers' attention. We propose a new location-based saliency map which is generated based on camera motion parameters. This map is combined with other saliency maps generated using features such as color contrast, object motion and face detection to determine the ROIs. In order to evaluate our methods we conducted several user studies. A subjective evaluation indicated that our system produces results that is consistent with viewers' preferences. We also examined the effect of camera motion on human visual attention through an eye tracking experiment. The results showed a high dependency between the distribution of fixation points of the viewers and the direction of camera movement which is consistent with our location-based saliency map.  相似文献   

18.
Eye contact and gaze awareness play a significant role for conveying emotions and intentions during face-to-face conversation. Humans can perceive each other's gaze quite naturally and accurately. However, the gaze awareness/perception are ambiguous during video teleconferencing performed by computer-based devices (such as laptops, tablet, and smart-phones). The reasons for this ambiguity are the (i) camera position relative to the screen and (ii) 2D rendition of 3D human face i.e., the 2D screen is unable to deliver an accurate gaze during video teleconferencing. To solve this problem, researchers have proposed different hardware setups with complex software algorithms. The most recent solution for accurate gaze perception employs 3D interfaces, such as 3D screens and 3D face-masks. However, today commonly used video teleconferencing devices are smart devices with 2D screens. Therefore, there is a need to improve gaze awareness/perception in these smart devices. In this work, we have revisited the question; how to improve a remote user's gaze awareness among his/her collaborators. Our hypothesis is that ‘an accurate gaze perception can be achieved by the3D embodimentof a remote user's head gesture during video teleconferencing’. We have prototyped an embodied telepresence system (ETS) for the 3D embodiment of a remote user's head. Our ETS is based on a 3-DOF neck robot with a mounted smart device (tablet PC). The electromechanical platform in combination with a smart device is a novel setup that is used for studying gaze awareness/perception in 2D screen-based smart devices during video teleconferencing. Two important gaze-related issues are considered in this work; namely (i) ‘Mona-Lisa Gaze Effect’ – the gaze is always directed at the person independent of his position in the room, and (ii) ‘Gaze Awareness/Faithfulness’ – the ability to perceive an accurate spatial relationship between the observing person and the object by an actor. Our results confirm that the 3D embodiment of a remote user head not only mitigates the Mona Lisa gaze effect but also supports three levels of gaze faithfulness, hence, accurately projecting the human gaze in distant space.  相似文献   

19.
This paper presents a real-time framework for computationally tracking objects visually attended by the user while navigating in interactive virtual environments. In addition to the conventional bottom-up (stimulus-driven) saliency map, the proposed framework uses top-down (goal-directed) contexts inferred from the user's spatial and temporal behaviors, and identifies the most plausibly attended objects among candidates in the object saliency map. The computational framework was implemented using GPU, exhibiting high computational performance adequate for interactive virtual environments. A user experiment was also conducted to evaluate the prediction accuracy of the tracking framework by comparing objects regarded as visually attended by the framework to actual human gaze collected with an eye tracker. The results indicated that the accuracy was in the level well supported by the theory of human cognition for visually identifying single and multiple attentive targets, especially owing to the addition of top-down contextual information. Finally, we demonstrate how the visual attention tracking framework can be applied to managing the level of details in virtual environments, without any hardware for head or eye tracking.  相似文献   

20.
Gaze shifts require the coordinated movement of both the eyes and the head in both animals and humanoid robots. To achieve this the brain and the robot control system needs to be able to perform complex non-linear sensory-motor transformations between many degrees of freedom and resolve the redundancy in such a system. In this article we propose a hierarchical neural network model for performing 3-D coordinated gaze shifts. The network is based on the PC/BC-DIM (Predictive Coding/Biased Competition with Divisive Input Modulation) basis function model. The proposed model consists of independent eyes and head controlled circuits with mutual interactions for the appropriate adjustment of coordination behaviour. Based on the initial eyes and head positions the network resolves redundancies involved in 3-D gaze shifts and produces accurate gaze control without any kinematic analysis or imposing any constraints. Furthermore the behaviour of the proposed model is consistent with coordinated eye and head movements observed in primates.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号