共查询到20条相似文献,搜索用时 336 毫秒
1.
Automatic text segmentation and text recognition for video indexing 总被引:13,自引:0,他引:13
Efficient indexing and retrieval of digital video is an important function of video databases. One powerful index for retrieval
is the text appearing in them. It enables content-based browsing. We present our new methods for automatic segmentation of
text in digital videos. The algorithms we propose make use of typical characteristics of text in videos in order to enable
and enhance segmentation performance. The unique features of our approach are the tracking of characters and words over their
complete duration of occurrence in a video and the integration of the multiple bitmaps of a character over time into a single
bitmap. The output of the text segmentation step is then directly passed to a standard OCR software package in order to translate
the segmented text into ASCII. Also, a straightforward indexing and retrieval scheme is introduced. It is used in the experiments
to demonstrate that the proposed text segmentation algorithms together with existing text recognition algorithms are suitable
for indexing and retrieval of relevant video sequences in and from a video database. Our experimental results are very encouraging
and suggest that these algorithms can be used in video retrieval applications as well as to recognize higher level semantics
in videos. 相似文献
2.
E. Kavallieratou N. Fakotakis G. Kokkinakis 《International Journal on Document Analysis and Recognition》2002,4(4):226-242
In this paper, an integrated offline recognition system for unconstrained handwriting is presented. The proposed system consists
of seven main modules: skew angle estimation and correction, printed-handwritten text discrimination, line segmentation, slant
removing, word segmentation, and character segmentation and recognition, stemming from the implementation of already existing
algorithms as well as novel algorithms. This system has been tested on the NIST, IAM-DB, and GRUHD databases and has achieved
accuracy that varies from 65.6% to 100% depending on the database and the experiment. 相似文献
3.
Abstract. The use of hand gestures provides an attractive means of interacting naturally with a computer-generated display. Using one
or more video cameras, the hand movements can potentially be interpreted as meaningful gestures. One key problem in building
such an interface without a restricted setup is the ability to localize and track the human arm robustly in video sequences.
This paper proposes a multiple-cue localization scheme combined with a tracking framework to reliably track the dynamics of
the human arm in unconstrained environments. The localization scheme integrates the multiple cues of motion, shape, and color
for locating a set of key image features. Using constraint fusion, these features are tracked by a modified extended Kalman filter that exploits the articulated structure of the human arm. Moreover, an interaction scheme between tracking and localization
is used for improving the estimation process while reducing the computational requirements. The performance of the localization/tracking
framework is validated with the help of extensive experiments and simulations. These experiments include tracking with calibrated
stereo camera and uncalibrated broadcast video.
Received: 19 January 2001 / Accepted: 27 December 2001
Correspondence to: R. Sharma 相似文献
4.
John F. Pitrelli Amit Roy 《International Journal on Document Analysis and Recognition》2003,5(2-3):126-137
We discuss development of a word-unigram language model for online handwriting recognition. First, we tokenize a text corpus
into words, contrasting with tokenization methods designed for other purposes. Second, we select for our model a subset of
the words found, discussing deviations from an N-most-frequent-words approach. From a 600-million-word corpus, we generated a 53,000-word model which eliminates 45% of word-recognition
errors made by a character-level-model baseline system. We anticipate that our methods will be applicable to offline recognition
as well, and to some extent to other recognizers, such as speech recognizers and video retrieval systems.
Received: November 1, 2001 / Revised version: July 22, 2002 相似文献
5.
Gerd Schürmann 《Multimedia Systems》1996,4(5):281-295
Electronic mail for traditional text exchange as asynchronous communication means between computer users is widely built upon
in many application areas. Whereas Multimedia-Mail systems – including text, graphics, still images, audio, video and documents
– were limited to isolated communities – at least two very promising approaches are being under development: the MIME (Multipurpose
Internet Mail Extension), an extension of Internet Mail as well as the Multimedia Teleservice based on CCITT Recommendation
X.400(88) being under development within the BERKOM project funded by the German TELEKOM. The store-and-forward mechanism
inherent to electronic mail is complemented in the later one by an additional exchange mechanism allowing the resolution of
references to message content, e.g. video. Such references may be put into a message in place of the content itself. Internet/MIME
and OSI/X.400, their interworking, asynchronous information server access via Multimedia-Mail, as well as possible future
developments especially in the area of asynchronous Computer Supported Cooperative Work (CSCW) are discussed. 相似文献
6.
Louise A. Dennis Graham Collins Michael Norrish Richard J. Boulton Konrad Slind Thomas F. Melham 《International Journal on Software Tools for Technology Transfer (STTT)》2003,4(2):189-210
The PROSPER (Proof and Specification Assisted Design Environments) project advocates the use of toolkits which allow existing
verification tools to be adapted to a more flexible format so that they can be treated as components. A system incorporating
such tools becomes another component that can be embedded in an application. This paper describes the software toolkit developed
by the project. The nature of communication between components is specified in a language-independent way. It is implemented
in several common programming languages to allow a wide variety of tools to have access to the toolkit.
Published online: 19 November 2002
Work funded by ESPRIT Framework IV Grant LTR 26241.
RID="*"
ID="*"Michael Norrish is supported by the Michael and Morven Heller Research Fellowship at St. Catharine’s College, Cambridge.
RID="**"
ID="**"Konrad Slind is now at the School of Computing, University of Utah, Salt Lake City UT 84112, USA. 相似文献
7.
Handling a tertiary storage device, such as an optical disk library, in the framework of a disk-based stream service model,
requires a sophisticated streaming model for the server, and it should consider the device-specific performance characteristics
of tertiary storage. This paper discusses the design and implementation of a video server which uses tertiary storage as a
source of media archiving. We have carefully designed the streaming mechanism for a server whose key functionalities include
stream scheduling, disk caching and admission control. The stream scheduling model incorporates the tertiary media staging
into a disk-based scheduling process, and also enhances the utilization of tertiary device bandwidth. The disk caching mechanism
manages the limited capacity of the hard disk efficiently to guarantee the availability of media segments on the hard disk.
The admission controller provides an adequate mechanism which decides upon the admission of a new request based on the current
resource availability of the server. The proposed system has been implemented on a general-purpose operating system and it
is fully operational. The design principles of the server are validated with real experiments, and the performance characteristics
are analyzed. The results guide us on how servers with tertiary storage should be deployed effectively in a real environment.
RID="*"
ID="*" e-mail: hjcha@cs.yonsei.ac.kr 相似文献
8.
Pietro Parodi Roberto Fontana 《International Journal on Document Analysis and Recognition》1999,2(2-3):67-79
This paper describes a novel method for extracting text from document pages of mixed content. The method works by detecting
pieces of text lines in small overlapping columns of width , shifted with respect to each other by image elements (good default values are: of the image width, ) and by merging these pieces in a bottom-up fashion to form complete text lines and blocks of text lines. The algorithm requires
about 1.3 s for a 300 dpi image on a PC with a Pentium II CPU, 300 MHz, MotherBoard Intel440LX. The algorithm is largely independent
of the layout of the document, the shape of the text regions, and the font size and style. The main assumptions are that the
background be uniform and that the text sit approximately horizontally. For a skew of up to about 10 degrees no skew correction
mechanism is necessary. The algorithm has been tested on the UW English Document Database I of the University of Washington
and its performance has been evaluated by a suitable measure of segmentation accuracy. Also, a detailed analysis of the segmentation
accuracy achieved by the algorithm as a function of noise and skew has been carried out.
Received April 4, 1999 / Revised June 1, 1999 相似文献
9.
Domenico Ferrari 《Multimedia Systems》1998,6(3):179-185
The research done by the Tenet Group in multimedia networking has reached a point where it may be useful to reflect on the
significance of its results for the current debate on how integrated-services internetworks should be designed. Such reflections
constitute the main subject of this paper. The principles of the work and the conclusions reached so far by the Tenet researchers
are discussed in the light of the conflict between the two major technologies being proposed to build future information infrastructures:
namely, the Internet and the ATM technologies. The Tenet approach suggests one feasible way for resolving the conflict to
the advantage of all the users of those infrastructures. This paper discusses various fundamental aspects of integrated-services
network design: the choice of the service model, the type of charging policy to be adopted, and the selection of a suitable
architecture. 相似文献
10.
Abstract. The image sequence in a video taken by a moving camera may suffer from irregular perturbations because of irregularities
in the motion of the person or vehicle carrying the camera. We show how to use information in the image sequence to correct
the effects of these irregularities so that the sequence is smoothed, i.e., is approximately the same as the sequence that
would have been obtained if the motion of the camera had been smooth. Our method is based on the fact that the irregular motion
is almost entirely rotational, and that the rotational image motion can be detected and corrected if a distant object, such
as the horizon, is visible.
Received: 14 February 2001 / Accepted: 11 February 2002
Correspondence to: A. Rosenfeld 相似文献
11.
Approximate query processing using wavelets 总被引:7,自引:0,他引:7
Kaushik Chakrabarti Minos Garofalakis Rajeev Rastogi Kyuseok Shim 《The VLDB Journal The International Journal on Very Large Data Bases》2001,10(2-3):199-223
Approximate query processing has emerged as a cost-effective approach for dealing with the huge data volumes and stringent
response-time requirements of today's decision support systems (DSS). Most work in this area, however, has so far been limited
in its query processing scope, typically focusing on specific forms of aggregate queries. Furthermore, conventional approaches
based on sampling or histograms appear to be inherently limited when it comes to approximating the results of complex queries
over high-dimensional DSS data sets. In this paper, we propose the use of multi-dimensional wavelets as an effective tool
for general-purpose approximate query processing in modern, high-dimensional applications. Our approach is based on building
wavelet-coefficient synopses of the data and using these synopses to provide approximate answers to queries. We develop novel query processing algorithms
that operate directly on the wavelet-coefficient synopses of relational tables, allowing us to process arbitrarily complex
queries entirely in the wavelet-coefficient domain. This guarantees extremely fast response times since our approximate query execution engine
can do the bulk of its processing over compact sets of wavelet coefficients, essentially postponing the expansion into relational
tuples until the end-result of the query. We also propose a novel wavelet decomposition algorithm that can build these synopses
in an I/O-efficient manner. Finally, we conduct an extensive experimental study with synthetic as well as real-life data sets
to determine the effectiveness of our wavelet-based approach compared to sampling and histograms. Our results demonstrate
that our techniques: (1) provide approximate answers of better quality than either sampling or histograms; (2) offer query
execution-time speedups of more than two orders of magnitude; and (3) guarantee extremely fast synopsis construction times
that scale linearly with the size of the data.
Received: 7 August 2000 / Accepted: 1 April 2001 Published online: 7 June 2001 相似文献
12.
Lixin Fan Liying Fan Chew Lim Tan 《International Journal on Document Analysis and Recognition》2003,5(2-3):88-101
Abstract. For document images corrupted by various kinds of noise, direct binarization images may be severely blurred and degraded.
A common treatment for this problem is to pre-smooth input images using noise-suppressing filters. This article proposes an
image-smoothing method used for prefiltering the document image binarization. Conceptually, we propose that the influence
range of each pixel affecting its neighbors should depend on local image statistics. Technically, we suggest using coplanar matrices to capture the structural and textural distribution of similar pixels at each site. This property adapts the smoothing process
to the contrast, orientation, and spatial size of local image structures. Experimental results demonstrate the effectiveness
of the proposed method, which compares favorably with existing methods in reducing noise and preserving image features. In
addition, due to the adaptive nature of the similar pixel definition, the proposed filter output is more robust regarding
different noise levels than existing methods.
Received: October 31, 2001 / October 09, 2002
Correspondence to:L. Fan (e-mail: fanlixin@ieee.org) 相似文献
13.
A network that offers deterministic, i.e., worst case, quality-of-service guarantees to variable-bit-rate (VBR) video must
provide a resource reservation mechanism that allocates bandwidth, buffer space, and other resources for each video stream.
Such a resource reservation scheme must be carefully designed, otherwise network resources are wasted. A key component for
the design of a resource reservation scheme is the traffic characterization method that specifies the traffic arrivals on a video stream. The traffic characterization should accurately describe the
actual arrivals, so that a large number of streams can be supported; but it must also map directly into efficient traffic-policing
mechanisms that monitor arrivals on each stream. In this study, we present a fast and accurate traffic characterization method
for stored VBR video in networks with a deterministic service. We use this approximation to obtain a traffic characterization
that can be efficiently policed by a small number of leaky buckets. We present a case study where we apply our characterization
method to networks that employ a dynamic resource reservation scheme with renegotiation. We use traces from a set of 25–30-min
MPEG sequences to evaluate our method against other characterization schemes from the literature. 相似文献
14.
Easy-to-use audio/video authoring tools play a crucial role in moving multimedia software from research curiosity to mainstream
applications. However, research in multimedia authoring systems has rarely been documented in the literature. This paper describes
the design and implementation of an interactive video authoring system called Zodiac, which employs an innovative edit history abstraction to support several unique editing features not found in existing commercial
and research video editing systems. Zodiac provides users a conceptually clean and semantically powerful branching history model of edit operations to organize the authoring process, and to navigate among versions of authored documents. In addition,
by analyzing the edit history, Zodiac is able to reliably detect a composed video stream's shot and scene boundaries, which facilitates interactive video browsing.
Zodiac also features a video object annotation capability that allows users to associate annotations to moving objects in a video sequence. The annotations themselves could
be text, image, audio, or video. Zodiac is built on top of MMFS, a file system specifically designed for interactive multimedia development environments, and implements an internal buffer
manager that supports transparent lossless compression/decompression. Shot/scene detection, video object annotation, and buffer
management all exploit the edit history information for performance optimization. 相似文献
15.
Henry S. Baird Allison L. Coates Richard J. Fateman 《International Journal on Document Analysis and Recognition》2003,5(2-3):158-163
Abstract. We exploit the gap in ability between human and machine vision systems to craft a family of automatic challenges that tell
human and machine users apart via graphical interfaces including Internet browsers. Turing proposed [Tur50] a method whereby
human judges might validate “artificial intelligence” by failing to distinguish between human and machine interlocutors. Stimulated
by the “chat room problem” posed by Udi Manber of Yahoo!, and influenced by the CAPTCHA project [BAL00] of Manuel Blum et
al. of Carnegie-Mellon Univ., we propose a variant of the Turing test using pessimal print: that is, low-quality images of machine-printed text synthesized pseudo-randomly over certain ranges of words, typefaces,
and image degradations. We show experimentally that judicious choice of these ranges can ensure that the images are legible
to human readers but illegible to several of the best present-day optical character recognition (OCR) machines. Our approach
is motivated by a decade of research on performance evaluation of OCR machines [RJN96,RNN99] and on quantitative stochastic
models of document image quality [Bai92,Kan96]. The slow pace of evolution of OCR and other species of machine vision over
many decades [NS96,Pav00] suggests that pessimal print will defy automated attack for many years. Applications include `bot'
barriers and database rationing.
Received: February 14, 2002 / Accepted: March 28, 2002
An expanded version of: A.L. Coates, H.S. Baird, R.J. Fateman (2001) Pessimal Print: a reverse Turing Test. In: {\it Proc.
6th Int. Conf. on Document Analysis and Recognition}, Seattle, Wash., USA, September 10–13, pp. 1154–1158
Correspondence to: H. S. Baird 相似文献
16.
Abstract. Automatic acquisition of CAD models from existing objects requires accurate extraction of geometric and topological information
from the input data. This paper presents a range image segmentation method based on local approximation of scan lines. The
method employs edge models that are capable of detecting noise pixels as well as position and orientation discontinuities
of varying strengths. Region-based techniques are then used to achieve a complete segmentation. Finally, a geometric representation
of the scene, in the form of a surface CAD model, is produced. Experimental results on a large number of real range images
acquired by different range sensors demonstrate the efficiency and robustness of the method.
Received: 1 August 2000 / Accepted: 23 January 2002
Correspondence to: I. Khalifa 相似文献
17.
Excessive buffer requirement to handle continuous-media playbacks is an impediment to cost- effective provisioning for on-line
video retrieval. Given the skewed distribution of video popularity, it is expected that often there are concurrent playbacks
of the same video file within a short time interval. This creates an opportunity to batch multiple requests and to service
them with a single stream from the disk without violating the on-demand constraint. However, there is a need to keep data
in memory between successive uses to do this. This leads to a buffer space trade-off between servicing a request in memory mode vs. servicing it in disk-mode. In this work, we develop a novel algorithm to minimize the buffer requirement to support a set of concurrent playbacks.
One of the beauties of the proposed scheme is that it enables the server to dynamically adapt to the changing workload while
minimizing the total buffer space requirement. Our algorithm makes a significant contribution in decreasing the total buffer
requirement, especially when the user access pattern is biased in favor of a small set of files. The idea of the proposed
scheme is modeled in detail using an analytical formulation, and optimality of the algorithm is proved. An analytical framework
is developed so that the proposed scheme can be used in combination with various existing disk-scheduling strategies. Our
simulation results confirm that under certain circumstances, it is much more resource efficient to support some of the playbacks
in memory mode and subsequently the proposed scheme enables the server to minimize the overall buffer space requirement. 相似文献
18.
Hideaki Goto Hirotomo Aso 《International Journal on Document Analysis and Recognition》1999,2(2-3):111-119
In order to enhance the ability of document analysis systems, we need a text line extraction method which can handle not
only straight text lines but also text lines in various shapes. This paper proposes a new method called Extended Linear Segment
Linking (ELSL for short), which is able to extract text lines in arbitrary orientations and curved text lines. We also consider
the existence of both horizontally and vertically printed text lines on the same page. The new method can produce text line
candidates for multiple orientations. We verify the ability of the method by some experiments as well.
Received December 21, 1998 / Revised version September 2, 1999 相似文献
19.
Much work on video servers has concentrated on movies on demand, in which a relatively small number of titles are viewed
and users are given basic VCR-style controls. This paper concentrates on analyzing video server performance for non-linear
access applications. In particular, we study two non-linear video applications: video libraries, in which users select from
a large collection of videos and may be interested in viewing only a small part of the title; and video walk-throughs, in
which users can move through an image-mapped representation of a space. We present a characterization of the workloads of
these applications. Our simulation studies show that video server architectures developed for movies on demand can be adapted
to video library usage, though caching is less effective and the server can support a smaller user population for non-linear
video applications. We also show that video walk-throughs require extremely large amounts of RAM buffering to provide adequate
performance for even a small number of users. 相似文献
20.
Abstract. This paper presents a novel technique for detecting possible defects in two-dimensional wafer images with repetitive patterns
using prior knowledge. The technique has a learning ability that can create a golden-block database from the wafer image itself,
then modify and refine its content when used in further inspections. The extracted building block is stored as a golden block
for the detected pattern. When new wafer images with the same periodical pattern arrive, we do not have to recalculate their
periods and building blocks. A new building block can be derived directly from the existing golden block after eliminating
alignment differences. If the newly derived building block has better quality than the stored golden block, then the golden
block is replaced with the new building block. With the proposed algorithm, our implementation shows that a significant amount
of processing time is saved. Also, the storage overhead of golden templates is reduced significantly by storing golden blocks
only.
Received: 21 February 2001 / Accepted: 21 April 2002
Correspondence to: S.-U. Guan 相似文献