首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 336 毫秒
1.
Automatic text segmentation and text recognition for video indexing   总被引:13,自引:0,他引:13  
Efficient indexing and retrieval of digital video is an important function of video databases. One powerful index for retrieval is the text appearing in them. It enables content-based browsing. We present our new methods for automatic segmentation of text in digital videos. The algorithms we propose make use of typical characteristics of text in videos in order to enable and enhance segmentation performance. The unique features of our approach are the tracking of characters and words over their complete duration of occurrence in a video and the integration of the multiple bitmaps of a character over time into a single bitmap. The output of the text segmentation step is then directly passed to a standard OCR software package in order to translate the segmented text into ASCII. Also, a straightforward indexing and retrieval scheme is introduced. It is used in the experiments to demonstrate that the proposed text segmentation algorithms together with existing text recognition algorithms are suitable for indexing and retrieval of relevant video sequences in and from a video database. Our experimental results are very encouraging and suggest that these algorithms can be used in video retrieval applications as well as to recognize higher level semantics in videos.  相似文献   

2.
In this paper, an integrated offline recognition system for unconstrained handwriting is presented. The proposed system consists of seven main modules: skew angle estimation and correction, printed-handwritten text discrimination, line segmentation, slant removing, word segmentation, and character segmentation and recognition, stemming from the implementation of already existing algorithms as well as novel algorithms. This system has been tested on the NIST, IAM-DB, and GRUHD databases and has achieved accuracy that varies from 65.6% to 100% depending on the database and the experiment.  相似文献   

3.
Abstract. The use of hand gestures provides an attractive means of interacting naturally with a computer-generated display. Using one or more video cameras, the hand movements can potentially be interpreted as meaningful gestures. One key problem in building such an interface without a restricted setup is the ability to localize and track the human arm robustly in video sequences. This paper proposes a multiple-cue localization scheme combined with a tracking framework to reliably track the dynamics of the human arm in unconstrained environments. The localization scheme integrates the multiple cues of motion, shape, and color for locating a set of key image features. Using constraint fusion, these features are tracked by a modified extended Kalman filter that exploits the articulated structure of the human arm. Moreover, an interaction scheme between tracking and localization is used for improving the estimation process while reducing the computational requirements. The performance of the localization/tracking framework is validated with the help of extensive experiments and simulations. These experiments include tracking with calibrated stereo camera and uncalibrated broadcast video. Received: 19 January 2001 / Accepted: 27 December 2001 Correspondence to: R. Sharma  相似文献   

4.
We discuss development of a word-unigram language model for online handwriting recognition. First, we tokenize a text corpus into words, contrasting with tokenization methods designed for other purposes. Second, we select for our model a subset of the words found, discussing deviations from an N-most-frequent-words approach. From a 600-million-word corpus, we generated a 53,000-word model which eliminates 45% of word-recognition errors made by a character-level-model baseline system. We anticipate that our methods will be applicable to offline recognition as well, and to some extent to other recognizers, such as speech recognizers and video retrieval systems. Received: November 1, 2001 / Revised version: July 22, 2002  相似文献   

5.
Multimedia mail     
Electronic mail for traditional text exchange as asynchronous communication means between computer users is widely built upon in many application areas. Whereas Multimedia-Mail systems – including text, graphics, still images, audio, video and documents – were limited to isolated communities – at least two very promising approaches are being under development: the MIME (Multipurpose Internet Mail Extension), an extension of Internet Mail as well as the Multimedia Teleservice based on CCITT Recommendation X.400(88) being under development within the BERKOM project funded by the German TELEKOM. The store-and-forward mechanism inherent to electronic mail is complemented in the later one by an additional exchange mechanism allowing the resolution of references to message content, e.g. video. Such references may be put into a message in place of the content itself. Internet/MIME and OSI/X.400, their interworking, asynchronous information server access via Multimedia-Mail, as well as possible future developments especially in the area of asynchronous Computer Supported Cooperative Work (CSCW) are discussed.  相似文献   

6.
The PROSPER (Proof and Specification Assisted Design Environments) project advocates the use of toolkits which allow existing verification tools to be adapted to a more flexible format so that they can be treated as components. A system incorporating such tools becomes another component that can be embedded in an application. This paper describes the software toolkit developed by the project. The nature of communication between components is specified in a language-independent way. It is implemented in several common programming languages to allow a wide variety of tools to have access to the toolkit. Published online: 19 November 2002 Work funded by ESPRIT Framework IV Grant LTR 26241. RID="*" ID="*"Michael Norrish is supported by the Michael and Morven Heller Research Fellowship at St. Catharine’s College, Cambridge. RID="**" ID="**"Konrad Slind is now at the School of Computing, University of Utah, Salt Lake City UT 84112, USA.  相似文献   

7.
Handling a tertiary storage device, such as an optical disk library, in the framework of a disk-based stream service model, requires a sophisticated streaming model for the server, and it should consider the device-specific performance characteristics of tertiary storage. This paper discusses the design and implementation of a video server which uses tertiary storage as a source of media archiving. We have carefully designed the streaming mechanism for a server whose key functionalities include stream scheduling, disk caching and admission control. The stream scheduling model incorporates the tertiary media staging into a disk-based scheduling process, and also enhances the utilization of tertiary device bandwidth. The disk caching mechanism manages the limited capacity of the hard disk efficiently to guarantee the availability of media segments on the hard disk. The admission controller provides an adequate mechanism which decides upon the admission of a new request based on the current resource availability of the server. The proposed system has been implemented on a general-purpose operating system and it is fully operational. The design principles of the server are validated with real experiments, and the performance characteristics are analyzed. The results guide us on how servers with tertiary storage should be deployed effectively in a real environment. RID="*" ID="*" e-mail: hjcha@cs.yonsei.ac.kr  相似文献   

8.
This paper describes a novel method for extracting text from document pages of mixed content. The method works by detecting pieces of text lines in small overlapping columns of width , shifted with respect to each other by image elements (good default values are: of the image width, ) and by merging these pieces in a bottom-up fashion to form complete text lines and blocks of text lines. The algorithm requires about 1.3 s for a 300 dpi image on a PC with a Pentium II CPU, 300 MHz, MotherBoard Intel440LX. The algorithm is largely independent of the layout of the document, the shape of the text regions, and the font size and style. The main assumptions are that the background be uniform and that the text sit approximately horizontally. For a skew of up to about 10 degrees no skew correction mechanism is necessary. The algorithm has been tested on the UW English Document Database I of the University of Washington and its performance has been evaluated by a suitable measure of segmentation accuracy. Also, a detailed analysis of the segmentation accuracy achieved by the algorithm as a function of noise and skew has been carried out. Received April 4, 1999 / Revised June 1, 1999  相似文献   

9.
The research done by the Tenet Group in multimedia networking has reached a point where it may be useful to reflect on the significance of its results for the current debate on how integrated-services internetworks should be designed. Such reflections constitute the main subject of this paper. The principles of the work and the conclusions reached so far by the Tenet researchers are discussed in the light of the conflict between the two major technologies being proposed to build future information infrastructures: namely, the Internet and the ATM technologies. The Tenet approach suggests one feasible way for resolving the conflict to the advantage of all the users of those infrastructures. This paper discusses various fundamental aspects of integrated-services network design: the choice of the service model, the type of charging policy to be adopted, and the selection of a suitable architecture.  相似文献   

10.
Abstract. The image sequence in a video taken by a moving camera may suffer from irregular perturbations because of irregularities in the motion of the person or vehicle carrying the camera. We show how to use information in the image sequence to correct the effects of these irregularities so that the sequence is smoothed, i.e., is approximately the same as the sequence that would have been obtained if the motion of the camera had been smooth. Our method is based on the fact that the irregular motion is almost entirely rotational, and that the rotational image motion can be detected and corrected if a distant object, such as the horizon, is visible. Received: 14 February 2001 / Accepted: 11 February 2002 Correspondence to: A. Rosenfeld  相似文献   

11.
Approximate query processing using wavelets   总被引:7,自引:0,他引:7  
Approximate query processing has emerged as a cost-effective approach for dealing with the huge data volumes and stringent response-time requirements of today's decision support systems (DSS). Most work in this area, however, has so far been limited in its query processing scope, typically focusing on specific forms of aggregate queries. Furthermore, conventional approaches based on sampling or histograms appear to be inherently limited when it comes to approximating the results of complex queries over high-dimensional DSS data sets. In this paper, we propose the use of multi-dimensional wavelets as an effective tool for general-purpose approximate query processing in modern, high-dimensional applications. Our approach is based on building wavelet-coefficient synopses of the data and using these synopses to provide approximate answers to queries. We develop novel query processing algorithms that operate directly on the wavelet-coefficient synopses of relational tables, allowing us to process arbitrarily complex queries entirely in the wavelet-coefficient domain. This guarantees extremely fast response times since our approximate query execution engine can do the bulk of its processing over compact sets of wavelet coefficients, essentially postponing the expansion into relational tuples until the end-result of the query. We also propose a novel wavelet decomposition algorithm that can build these synopses in an I/O-efficient manner. Finally, we conduct an extensive experimental study with synthetic as well as real-life data sets to determine the effectiveness of our wavelet-based approach compared to sampling and histograms. Our results demonstrate that our techniques: (1) provide approximate answers of better quality than either sampling or histograms; (2) offer query execution-time speedups of more than two orders of magnitude; and (3) guarantee extremely fast synopsis construction times that scale linearly with the size of the data. Received: 7 August 2000 / Accepted: 1 April 2001 Published online: 7 June 2001  相似文献   

12.
Abstract. For document images corrupted by various kinds of noise, direct binarization images may be severely blurred and degraded. A common treatment for this problem is to pre-smooth input images using noise-suppressing filters. This article proposes an image-smoothing method used for prefiltering the document image binarization. Conceptually, we propose that the influence range of each pixel affecting its neighbors should depend on local image statistics. Technically, we suggest using coplanar matrices to capture the structural and textural distribution of similar pixels at each site. This property adapts the smoothing process to the contrast, orientation, and spatial size of local image structures. Experimental results demonstrate the effectiveness of the proposed method, which compares favorably with existing methods in reducing noise and preserving image features. In addition, due to the adaptive nature of the similar pixel definition, the proposed filter output is more robust regarding different noise levels than existing methods. Received: October 31, 2001 / October 09, 2002 Correspondence to:L. Fan (e-mail: fanlixin@ieee.org)  相似文献   

13.
A network that offers deterministic, i.e., worst case, quality-of-service guarantees to variable-bit-rate (VBR) video must provide a resource reservation mechanism that allocates bandwidth, buffer space, and other resources for each video stream. Such a resource reservation scheme must be carefully designed, otherwise network resources are wasted. A key component for the design of a resource reservation scheme is the traffic characterization method that specifies the traffic arrivals on a video stream. The traffic characterization should accurately describe the actual arrivals, so that a large number of streams can be supported; but it must also map directly into efficient traffic-policing mechanisms that monitor arrivals on each stream. In this study, we present a fast and accurate traffic characterization method for stored VBR video in networks with a deterministic service. We use this approximation to obtain a traffic characterization that can be efficiently policed by a small number of leaky buckets. We present a case study where we apply our characterization method to networks that employ a dynamic resource reservation scheme with renegotiation. We use traces from a set of 25–30-min MPEG sequences to evaluate our method against other characterization schemes from the literature.  相似文献   

14.
Easy-to-use audio/video authoring tools play a crucial role in moving multimedia software from research curiosity to mainstream applications. However, research in multimedia authoring systems has rarely been documented in the literature. This paper describes the design and implementation of an interactive video authoring system called Zodiac, which employs an innovative edit history abstraction to support several unique editing features not found in existing commercial and research video editing systems. Zodiac provides users a conceptually clean and semantically powerful branching history model of edit operations to organize the authoring process, and to navigate among versions of authored documents. In addition, by analyzing the edit history, Zodiac is able to reliably detect a composed video stream's shot and scene boundaries, which facilitates interactive video browsing. Zodiac also features a video object annotation capability that allows users to associate annotations to moving objects in a video sequence. The annotations themselves could be text, image, audio, or video. Zodiac is built on top of MMFS, a file system specifically designed for interactive multimedia development environments, and implements an internal buffer manager that supports transparent lossless compression/decompression. Shot/scene detection, video object annotation, and buffer management all exploit the edit history information for performance optimization.  相似文献   

15.
Abstract. We exploit the gap in ability between human and machine vision systems to craft a family of automatic challenges that tell human and machine users apart via graphical interfaces including Internet browsers. Turing proposed [Tur50] a method whereby human judges might validate “artificial intelligence” by failing to distinguish between human and machine interlocutors. Stimulated by the “chat room problem” posed by Udi Manber of Yahoo!, and influenced by the CAPTCHA project [BAL00] of Manuel Blum et al. of Carnegie-Mellon Univ., we propose a variant of the Turing test using pessimal print: that is, low-quality images of machine-printed text synthesized pseudo-randomly over certain ranges of words, typefaces, and image degradations. We show experimentally that judicious choice of these ranges can ensure that the images are legible to human readers but illegible to several of the best present-day optical character recognition (OCR) machines. Our approach is motivated by a decade of research on performance evaluation of OCR machines [RJN96,RNN99] and on quantitative stochastic models of document image quality [Bai92,Kan96]. The slow pace of evolution of OCR and other species of machine vision over many decades [NS96,Pav00] suggests that pessimal print will defy automated attack for many years. Applications include `bot' barriers and database rationing. Received: February 14, 2002 / Accepted: March 28, 2002 An expanded version of: A.L. Coates, H.S. Baird, R.J. Fateman (2001) Pessimal Print: a reverse Turing Test. In: {\it Proc. 6th Int. Conf. on Document Analysis and Recognition}, Seattle, Wash., USA, September 10–13, pp. 1154–1158 Correspondence to: H. S. Baird  相似文献   

16.
Abstract. Automatic acquisition of CAD models from existing objects requires accurate extraction of geometric and topological information from the input data. This paper presents a range image segmentation method based on local approximation of scan lines. The method employs edge models that are capable of detecting noise pixels as well as position and orientation discontinuities of varying strengths. Region-based techniques are then used to achieve a complete segmentation. Finally, a geometric representation of the scene, in the form of a surface CAD model, is produced. Experimental results on a large number of real range images acquired by different range sensors demonstrate the efficiency and robustness of the method. Received: 1 August 2000 / Accepted: 23 January 2002 Correspondence to: I. Khalifa  相似文献   

17.
Excessive buffer requirement to handle continuous-media playbacks is an impediment to cost- effective provisioning for on-line video retrieval. Given the skewed distribution of video popularity, it is expected that often there are concurrent playbacks of the same video file within a short time interval. This creates an opportunity to batch multiple requests and to service them with a single stream from the disk without violating the on-demand constraint. However, there is a need to keep data in memory between successive uses to do this. This leads to a buffer space trade-off between servicing a request in memory mode vs. servicing it in disk-mode. In this work, we develop a novel algorithm to minimize the buffer requirement to support a set of concurrent playbacks. One of the beauties of the proposed scheme is that it enables the server to dynamically adapt to the changing workload while minimizing the total buffer space requirement. Our algorithm makes a significant contribution in decreasing the total buffer requirement, especially when the user access pattern is biased in favor of a small set of files. The idea of the proposed scheme is modeled in detail using an analytical formulation, and optimality of the algorithm is proved. An analytical framework is developed so that the proposed scheme can be used in combination with various existing disk-scheduling strategies. Our simulation results confirm that under certain circumstances, it is much more resource efficient to support some of the playbacks in memory mode and subsequently the proposed scheme enables the server to minimize the overall buffer space requirement.  相似文献   

18.
Extracting curved text lines using local linearity of the text line   总被引:1,自引:0,他引:1  
In order to enhance the ability of document analysis systems, we need a text line extraction method which can handle not only straight text lines but also text lines in various shapes. This paper proposes a new method called Extended Linear Segment Linking (ELSL for short), which is able to extract text lines in arbitrary orientations and curved text lines. We also consider the existence of both horizontally and vertically printed text lines on the same page. The new method can produce text line candidates for multiple orientations. We verify the ability of the method by some experiments as well. Received December 21, 1998 / Revised version September 2, 1999  相似文献   

19.
Much work on video servers has concentrated on movies on demand, in which a relatively small number of titles are viewed and users are given basic VCR-style controls. This paper concentrates on analyzing video server performance for non-linear access applications. In particular, we study two non-linear video applications: video libraries, in which users select from a large collection of videos and may be interested in viewing only a small part of the title; and video walk-throughs, in which users can move through an image-mapped representation of a space. We present a characterization of the workloads of these applications. Our simulation studies show that video server architectures developed for movies on demand can be adapted to video library usage, though caching is less effective and the server can support a smaller user population for non-linear video applications. We also show that video walk-throughs require extremely large amounts of RAM buffering to provide adequate performance for even a small number of users.  相似文献   

20.
Abstract. This paper presents a novel technique for detecting possible defects in two-dimensional wafer images with repetitive patterns using prior knowledge. The technique has a learning ability that can create a golden-block database from the wafer image itself, then modify and refine its content when used in further inspections. The extracted building block is stored as a golden block for the detected pattern. When new wafer images with the same periodical pattern arrive, we do not have to recalculate their periods and building blocks. A new building block can be derived directly from the existing golden block after eliminating alignment differences. If the newly derived building block has better quality than the stored golden block, then the golden block is replaced with the new building block. With the proposed algorithm, our implementation shows that a significant amount of processing time is saved. Also, the storage overhead of golden templates is reduced significantly by storing golden blocks only. Received: 21 February 2001 / Accepted: 21 April 2002 Correspondence to: S.-U. Guan  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号