首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 140 毫秒
1.
Abstract. The Perceptive Workbench endeavors to create a spontaneous and unimpeded interface between the physical and virtual worlds. Its vision-based methods for interaction constitute an alternative to wired input devices and tethered tracking. Objects are recognized and tracked when placed on the display surface. By using multiple infrared light sources, the object's 3-D shape can be captured and inserted into the virtual interface. This ability permits spontaneity, since either preloaded objects or those objects selected at run-time by the user can become physical icons. Integrated into the same vision-based interface is the ability to identify 3-D hand position, pointing direction, and sweeping arm gestures. Such gestures can enhance selection, manipulation, and navigation tasks. The Perceptive Workbench has been used for a variety of applications, including augmented reality gaming and terrain navigation. This paper focuses on the techniques used in implementing the Perceptive Workbench and the system's performance.  相似文献   

2.
Query by video clip   总被引:15,自引:0,他引:15  
Typical digital video search is based on queries involving a single shot. We generalize this problem by allowing queries that involve a video clip (say, a 10-s video segment). We propose two schemes: (i) retrieval based on key frames follows the traditional approach of identifying shots, computing key frames from a video, and then extracting image features around the key frames. For each key frame in the query, a similarity value (using color, texture, and motion) is obtained with respect to the key frames in the database video. Consecutive key frames in the database video that are highly similar to the query key frames are then used to generate the set of retrieved video clips. (ii) In retrieval using sub-sampled frames, we uniformly sub-sample the query clip as well as the database video. Retrieval is based on matching color and texture features of the sub-sampled frames. Initial experiments on two video databases (basketball video with approximately 16,000 frames and a CNN news video with approximately 20,000 frames) show promising results. Additional experiments using segments from one basketball video as query and a different basketball video as the database show the effectiveness of feature representation and matching schemes.  相似文献   

3.
Abstract. Conventional tracking methods encounter difficulties as the number of objects, clutter, and sensors increase, because of the requirement for data association. Statistical tracking, based on the concept of network tomography, is an alternative that avoids data association. It estimates the number of trips made from one region to another in a scene based on interregion boundary traffic counts accumulated over time. It is not necessary to track an object through a scene to determine when an object crosses a boundary. This paper describes statistical tracing and presents an evaluation based on the estimation of pedestrian and vehicular traffic intensities at an intersection over a period of 1 month. We compare the results with those from a multiple-hypothesis tracker and manually counted ground-truth estimates. Received: 30 August 2001 / Accepted: 28 May 2002 Correspondence to: J.E. Boyd  相似文献   

4.
Real-time multiple vehicle detection and tracking from a moving vehicle   总被引:18,自引:0,他引:18  
Abstract. A real-time vision system has been developed that analyzes color videos taken from a forward-looking video camera in a car driving on a highway. The system uses a combination of color, edge, and motion information to recognize and track the road boundaries, lane markings and other vehicles on the road. Cars are recognized by matching templates that are cropped from the input data online and by detecting highway scene features and evaluating how they relate to each other. Cars are also detected by temporal differencing and by tracking motion parameters that are typical for cars. The system recognizes and tracks road boundaries and lane markings using a recursive least-squares filter. Experimental results demonstrate robust, real-time car detection and tracking over thousands of image frames. The data includes video taken under difficult visibility conditions. Received: 1 September 1998 / Accepted: 22 February 2000  相似文献   

5.
We propose a system that simultaneously utilizes the stereo disparity and optical flow information of real-time stereo grayscale multiresolution images for the recognition of objects and gestures in human interactions. For real-time calculation of the disparity and optical flow information of a stereo image, the system first creates pyramid images using a Gaussian filter. The system then determines the disparity and optical flow of a low-density image and extracts attention regions in a high-density image. The three foremost regions are recognized using higher-order local autocorrelation features and linear discriminant analysis. As the recognition method is view based, the system can process the face and hand recognitions simultaneously in real time. The recognition features are independent of parallel translations, so the system can use unstable extractions from stereo depth information. We demonstrate that the system can discriminate the users, monitor the basic movements of the user, smoothly learn an object presented by users, and can communicate with users by hand signs learned in advance. Received: 31 January 2000 / Accepted: 1 May 2001 Correspondence to: I. Yoda (e-mail: yoda@ieee.org, Tel.: +81-298-615941, Fax: +81-298-613313)  相似文献   

6.
Password hardening based on keystroke dynamics   总被引:2,自引:0,他引:2  
We present a novel approach to improving the security of passwords. In our approach, the legitimate user’s typing patterns (e.g., durations of keystrokes and latencies between keystrokes) are combined with the user’s password to generate a hardened password that is convincingly more secure than conventional passwords alone. In addition, our scheme automatically adapts to gradual changes in a user’s typing patterns while maintaining the same hardened password across multiple logins, for use in file encryption or other applications requiring a long-term secret key. Using empirical data and a prototype implementation of our scheme, we give evidence that our approach is viable in practice, in terms of ease of use, improved security, and performance. Published online: 26 October 2001  相似文献   

7.
Abstract. This paper proposes a novel tracking strategy that can robustly track a person or other object within a fixed environment using a pan, tilt, and zoom camera with the help of a pre-recorded image database. We define a set of camera states which is sufficient to survey the environment for the target. Background images for these camera states are stored as an image database. During tracking, camera movements are restricted to these states. Tracking and segmentation are simplified, as each tracking image can be compared with the corresponding pre-recorded background image. Received: 26 August 1999 / Accepted: 22 February 2000  相似文献   

8.
Extraction of special effects caption text events from digital video   总被引:2,自引:1,他引:1  
Abstract. The popularity of digital video is increasing rapidly. To help users navigate libraries of video, algorithms that automatically index video based on content are needed. One approach is to extract text appearing in video, which often reflects a scene's semantic content. This is a difficult problem due to the unconstrained nature of general-purpose video. Text can have arbitrary color, size, and orientation. Backgrounds may be complex and changing. Most work so far has made restrictive assumptions about the nature of text occurring in video. Such work is therefore not directly applicable to unconstrained, general-purpose video. In addition, most work so far has focused only on detecting the spatial extent of text in individual video frames. However, text occurring in video usually persists for several seconds. This constitutes a text event that should be entered only once in the video index. Therefore it is also necessary to determine the temporal extent of text events. This is a non-trivial problem because text may move, rotate, grow, shrink, or otherwise change over time. Such text effects are common in television programs and commercials but so far have received little attention in the literature. This paper discusses detecting, binarizing, and tracking caption text in general-purpose MPEG-1 video. Solutions are proposed for each of these problems and compared with existing work found in the literature. Received: January 29, 2002 / Accepted: September 13, 2002 D. Crandall is now with Eastman Kodak Company, 1700 Dewey Avenue, Rochester, NY 14650-1816, USA; e-mail: david.crandall@kodak.com S. Antani is now with the National Library of Medicine, 8600 Rockville Pike, Bethesda, MD 20894, USA; e-mail: antani@nlm.nih.gov Correspondence to: David Crandall  相似文献   

9.
A model-driven approach for real-time road recognition   总被引:6,自引:0,他引:6  
This article describes a method designed to detect and track road edges starting from images provided by an on-board monocular monochromic camera. Its implementation on specific hardware is also presented in the framework of the VELAC project. The method is based on four modules: (1) detection of the road edges in the image by a model-driven algorithm, which uses a statistical model of the lane sides which manages the occlusions or imperfections of the road marking – this model is initialized by an off-line training step; (2) localization of the vehicle in the lane in which it is travelling; (3) tracking to define a new search space of road edges for the next image; and (4) management of the lane numbers to determine the lane in which the vehicle is travelling. The algorithm is implemented in order to validate the method in a real-time context. Results obtained on marked and unmarked road images show the robustness and precision of the method. Received: 18 November 2000 / Accepted: 7 May 2001  相似文献   

10.
Abstract. The image sequence in a video taken by a moving camera may suffer from irregular perturbations because of irregularities in the motion of the person or vehicle carrying the camera. We show how to use information in the image sequence to correct the effects of these irregularities so that the sequence is smoothed, i.e., is approximately the same as the sequence that would have been obtained if the motion of the camera had been smooth. Our method is based on the fact that the irregular motion is almost entirely rotational, and that the rotational image motion can be detected and corrected if a distant object, such as the horizon, is visible. Received: 14 February 2001 / Accepted: 11 February 2002 Correspondence to: A. Rosenfeld  相似文献   

11.
We present efficient schemes for scheduling the delivery of variable-bit-rate MPEG-compressed video with stringent quality-of-service (QoS) requirements. Video scheduling is being used to improve bandwidth allocation at a video server that uses statistical multiplexing to aggregate video streams prior to transporting them over a network. A video stream is modeled using a traffic envelope that provides a deterministic time-varying bound on the bit rate. Because of the periodicity in which frame types in an MPEG stream are typically generated, a simple traffic envelope can be constructed using only five parameters. Using the traffic-envelope model, we show that video sources can be statistically multiplexed with an effective bandwidth that is often less than the source peak rate. Bandwidth gain is achieved without sacrificing the stringency of the requested QoS. The effective bandwidth depends on the arrangement of the multiplexed streams, which is a measure of the lag between the GOP periods of various streams. For homogeneous streams, we give an optimal scheduling scheme for video sources at a video-on-demand server that results in the minimum effective bandwidth. For heterogeneous sources, a sub-optimal scheduling scheme is given, which achieves acceptable bandwidth gain. Numerical examples based on traces of MPEG-coded movies are used to demonstrate the effectiveness of our schemes.  相似文献   

12.
目的 目前已有的人体姿态跟踪算法的跟踪精度仍有待提高,特别是对灵活运动的手臂部位的跟踪。为提高人体姿态的跟踪精度,本文首次提出一种将视觉时空信息与深度学习网络相结合的人体姿态跟踪方法。方法 在人体姿态跟踪过程中,利用视频时间信息计算出人体目标区域的运动信息,使用运动信息对人体部位姿态模型在帧间传递;考虑到基于图像空间特征的方法对形态较为固定的人体部位如躯干和头部能够较好地检测,而对手臂的检测效果较差,构造并训练一种轻量级的深度学习网络,用于生成人体手臂部位的附加候选样本;利用深度学习网络生成手臂特征一致性概率图,与视频空间信息结合计算得到最优部位姿态,并将各部位重组为完整人体姿态跟踪结果。结果 使用两个具有挑战性的人体姿态跟踪数据集VideoPose2.0和YouTubePose对本文算法进行验证,得到的手臂关节点平均跟踪精度分别为81.4%和84.5%,与现有方法相比有明显提高;此外,通过在VideoPose2.0数据集上的实验,验证了本文提出的对下臂附加采样的算法和手臂特征一致性计算的算法能够有效提高人体姿态关节点的跟踪精度。结论 提出的结合时空信息与深度学习网络的人体姿态跟踪方法能够有效提高人体姿态跟踪的精度,特别是对灵活运动的人体姿态下臂关节点的跟踪精度有显著提高。  相似文献   

13.
Wood inspection with non-supervised clustering   总被引:9,自引:0,他引:9  
Abstract. The appearance of sawn timber has huge natural variations that the human inspector easily compensates for mentally when determining the types of defects and the grade of each board. However, for automatic wood inspection systems these variations are a major source for complication. This makes it difficult to use textbook methodologies for visual inspection. These methodologies generally aim at systems that are trained in a supervised manner with samples of defects and good material, but selecting and labeling the samples is an error-prone process that limits the accuracy that can be achieved. We present a non-supervised clustering-based approach for detecting and recognizing defects in lumber boards. A key idea is to employ a self-organizing map (SOM) for discriminating between sound wood and defects. Human involvement needed for training is minimal. The approach has been tested with color images of lumber boards, and the achieved false detection and error escape rates are low. The approach also provides a self-intuitive visual user interface. Received: 16 December 2000 / Accepted: 8 December 2001 Correspondence to: O. Silvén  相似文献   

14.
Excessive buffer requirement to handle continuous-media playbacks is an impediment to cost- effective provisioning for on-line video retrieval. Given the skewed distribution of video popularity, it is expected that often there are concurrent playbacks of the same video file within a short time interval. This creates an opportunity to batch multiple requests and to service them with a single stream from the disk without violating the on-demand constraint. However, there is a need to keep data in memory between successive uses to do this. This leads to a buffer space trade-off between servicing a request in memory mode vs. servicing it in disk-mode. In this work, we develop a novel algorithm to minimize the buffer requirement to support a set of concurrent playbacks. One of the beauties of the proposed scheme is that it enables the server to dynamically adapt to the changing workload while minimizing the total buffer space requirement. Our algorithm makes a significant contribution in decreasing the total buffer requirement, especially when the user access pattern is biased in favor of a small set of files. The idea of the proposed scheme is modeled in detail using an analytical formulation, and optimality of the algorithm is proved. An analytical framework is developed so that the proposed scheme can be used in combination with various existing disk-scheduling strategies. Our simulation results confirm that under certain circumstances, it is much more resource efficient to support some of the playbacks in memory mode and subsequently the proposed scheme enables the server to minimize the overall buffer space requirement.  相似文献   

15.
I/O scheduling for digital continuous media   总被引:4,自引:0,他引:4  
A growing set of applications require access to digital video and audio. In order to provide playback of such continuous media (CM), scheduling strategies for CM data servers (CMS) are necessary. In some domains, particularly defense and industrial process control, the timing requirements of these applications are strict and essential to their correct operation. In this paper we develop a scheduling strategy for multiple access to a CMS such that the timing guarantees are maintained at all times. First, we develop a scheduling strategy for the steady state, i.e., when there are no changes in playback rate or operation. We derive an optimal Batched SCAN (BSCAN) algorithm that requires minimum buffer space to schedule concurrent accesses. The scheduling strategy incorporates two key constraints: (1) data fetches from the storage system are assumed to be in integral multiples of the block size, and (2) playback guarantees are ensured for frame-oriented streams when each frame can span multiple blocks. We discuss modifications to the scheduling strategy to handle compressed data like motion-JPEG and MPEG. Second, we develop techniques to handle dynamic changes brought about by VCR-like operations executed by applications. We define a suite of primitive VCR-like operations that can be executed. We show that an unregulated change in the BSCAN schedule, in response to VCR-like operations, will affect playback guarantees. We develop two general techniques to ensure playback guarantees while responding to VCR-like operations: passive and active accumulation. Using user response time as a metric we show that active accumulation algorithms outperform passive accumulation algorithms. An optimal response-time algorithm in a class of active accumulation strategies is derived. The results presented here are validated by extensive simulation studies.  相似文献   

16.
A network that offers deterministic, i.e., worst case, quality-of-service guarantees to variable-bit-rate (VBR) video must provide a resource reservation mechanism that allocates bandwidth, buffer space, and other resources for each video stream. Such a resource reservation scheme must be carefully designed, otherwise network resources are wasted. A key component for the design of a resource reservation scheme is the traffic characterization method that specifies the traffic arrivals on a video stream. The traffic characterization should accurately describe the actual arrivals, so that a large number of streams can be supported; but it must also map directly into efficient traffic-policing mechanisms that monitor arrivals on each stream. In this study, we present a fast and accurate traffic characterization method for stored VBR video in networks with a deterministic service. We use this approximation to obtain a traffic characterization that can be efficiently policed by a small number of leaky buckets. We present a case study where we apply our characterization method to networks that employ a dynamic resource reservation scheme with renegotiation. We use traces from a set of 25–30-min MPEG sequences to evaluate our method against other characterization schemes from the literature.  相似文献   

17.
Synchronized delivery and playout of distributed stored multimedia streams   总被引:8,自引:0,他引:8  
Multimedia streams such as audio and video impose tight temporal constraints for their presentation. Often, related multimedia streams, such as audio and video, must be presented in a synchronized way. We introduce a novel scheme to ensure the continuous and synchronous delivery of distributed stored multimedia streams across a communications network. We propose a new protocol for synchronized playback and compute the buffer required to achieve both, the continuity within a single substream and the synchronization between related substreams. The scheme is very general and does not require synchronized clocks. Using a resynchronization protocol based on buffer level control, the scheme is able to cope with server drop-outs and clock drift. The synchronization scheme has been implemented and the paper concludes with our experimental results.  相似文献   

18.
Automatic text segmentation and text recognition for video indexing   总被引:13,自引:0,他引:13  
Efficient indexing and retrieval of digital video is an important function of video databases. One powerful index for retrieval is the text appearing in them. It enables content-based browsing. We present our new methods for automatic segmentation of text in digital videos. The algorithms we propose make use of typical characteristics of text in videos in order to enable and enhance segmentation performance. The unique features of our approach are the tracking of characters and words over their complete duration of occurrence in a video and the integration of the multiple bitmaps of a character over time into a single bitmap. The output of the text segmentation step is then directly passed to a standard OCR software package in order to translate the segmented text into ASCII. Also, a straightforward indexing and retrieval scheme is introduced. It is used in the experiments to demonstrate that the proposed text segmentation algorithms together with existing text recognition algorithms are suitable for indexing and retrieval of relevant video sequences in and from a video database. Our experimental results are very encouraging and suggest that these algorithms can be used in video retrieval applications as well as to recognize higher level semantics in videos.  相似文献   

19.
Easy-to-use audio/video authoring tools play a crucial role in moving multimedia software from research curiosity to mainstream applications. However, research in multimedia authoring systems has rarely been documented in the literature. This paper describes the design and implementation of an interactive video authoring system called Zodiac, which employs an innovative edit history abstraction to support several unique editing features not found in existing commercial and research video editing systems. Zodiac provides users a conceptually clean and semantically powerful branching history model of edit operations to organize the authoring process, and to navigate among versions of authored documents. In addition, by analyzing the edit history, Zodiac is able to reliably detect a composed video stream's shot and scene boundaries, which facilitates interactive video browsing. Zodiac also features a video object annotation capability that allows users to associate annotations to moving objects in a video sequence. The annotations themselves could be text, image, audio, or video. Zodiac is built on top of MMFS, a file system specifically designed for interactive multimedia development environments, and implements an internal buffer manager that supports transparent lossless compression/decompression. Shot/scene detection, video object annotation, and buffer management all exploit the edit history information for performance optimization.  相似文献   

20.
In this paper we present a novel computer vision based hand-tracking technique, which is capable of robustly tracking 6+4DOF of the human hand in real-time (at least 25 frames per second) with the help of 3 (or more) off-the-shelf consumer cameras. ‘6+4DOF’ means that the system can track the global pose (6 continuous parameters for translation and rotation) of 4 different gestures. A key feature of our system is its fully automatic real-time initialization procedure, which, along with a sound tracking-lost detector, makes the system fit for real-world applications. Because of this, our method acts as an enabling technology for uncumbersome hand-based 3D Human-Computer-Interaction (HCI). Previously, using the hand as an at least 6DOF input device involved the use of either datagloves or markers. Using our tracking we evaluated the use of the hand as an input device for two prevalent Virtual Reality applications: fly-through exploration of a virtual world and a simple digital assembly simulation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号