首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Association Rule Mining algorithms operate on a data matrix (e.g., customers products) to derive association rules [AIS93b, SA96]. We propose a new paradigm, namely, Ratio Rules, which are quantifiable in that we can measure the “goodness” of a set of discovered rules. We also propose the “guessing error” as a measure of the “goodness”, that is, the root-mean-square error of the reconstructed values of the cells of the given matrix, when we pretend that they are unknown. Another contribution is a novel method to guess missing/hidden values from the Ratio Rules that our method derives. For example, if somebody bought $10 of milk and $3 of bread, our rules can “guess” the amount spent on butter. Thus, unlike association rules, Ratio Rules can perform a variety of important tasks such as forecasting, answering “what-if” scenarios, detecting outliers, and visualizing the data. Moreover, we show that we can compute Ratio Rules in a single pass over the data set with small memory requirements (a few small matrices), in contrast to association rule mining methods which require multiple passes and/or large memory. Experiments on several real data sets (e.g., basketball and baseball statistics, biological data) demonstrate that the proposed method: (a) leads to rules that make sense; (b) can find large itemsets in binary matrices, even in the presence of noise; and (c) consistently achieves a “guessing error” of up to 5 times less than using straightforward column averages. Received: March 15, 1999 / Accepted: November 1, 1999  相似文献   

2.
Model checking for a probabilistic branching time logic with fairness   总被引:4,自引:0,他引:4  
We consider concurrent probabilistic systems, based on probabilistic automata of Segala & Lynch [55], which allow non-deterministic choice between probability distributions. These systems can be decomposed into a collection of “computation trees” which arise by resolving the non-deterministic, but not probabilistic, choices. The presence of non-determinism means that certain liveness properties cannot be established unless fairness is assumed. We introduce a probabilistic branching time logic PBTL, based on the logic TPCTL of Hansson [30] and the logic PCTL of [55], resp. pCTL [14]. The formulas of the logic express properties such as “every request is eventually granted with probability at least p”. We give three interpretations for PBTL on concurrent probabilistic processes: the first is standard, while in the remaining two interpretations the branching time quantifiers are taken to range over a certain kind of fair computation trees. We then present a model checking algorithm for verifying whether a concurrent probabilistic process satisfies a PBTL formula assuming fairness constraints. We also propose adaptations of existing model checking algorithms for pCTL [4, 14] to obtain procedures for PBTL under fairness constraints. The techniques developed in this paper have applications in automatic verification of randomized distributed systems. Received: June 1997 / Accepted: May 1998  相似文献   

3.
4.
Summary. This paper proposes a framework for detecting global state predicates in systems of processes with approximately-synchronized real-time clocks. Timestamps from these clocks are used to define two orderings on events: “definitely occurred before” and “possibly occurred before”. These orderings lead naturally to definitions of 3 distinct detection modalities, i.e., 3 meanings of “predicate held during a computation”, namely: (“ possibly held”), (“ definitely held”), and (“ definitely held in a specific global state”). This paper defines these modalities and gives efficient algorithms for detecting them. The algorithms are based on algorithms of Garg and Waldecker, Alagar and Venkatesan, Cooper and Marzullo, and Fromentin and Raynal. Complexity analysis shows that under reasonable assumptions, these real-time-clock-based detection algorithms are less expensive than detection algorithms based on Lamport's happened-before ordering. Sample applications are given to illustrate the benefits of this approach. Received: January 1999 / Accepted: November 1999  相似文献   

5.
Approximate query mapping: Accounting for translation closeness   总被引:2,自引:0,他引:2  
In this paper we present a mechanism for approximately translating Boolean query constraints across heterogeneous information sources. Achieving the best translation is challenging because sources support different constraints for formulating queries, and often these constraints cannot be precisely translated. For instance, a query [score>8] might be “perfectly” translated as [rating>0.8] at some site, but can only be approximated as [grade=A] at another. Unlike other work, our general framework adopts a customizable “closeness” metric for the translation that combines both precision and recall. Our results show that for query translation we need to handle interdependencies among both query conjuncts as well as disjuncts. As the basis, we identify the essential requirements of a rule system for users to encode the mappings for atomic semantic units. Our algorithm then translates complex queries by rewriting them in terms of the semantic units. We show that, under practical assumptions, our algorithm generates the best approximate translations with respect to the closeness metric of choice. We also present a case study to show how our technique may be applied in practice. Received: 15 October 2000 / Accepted: 15 April 2001 Published online: 28 June 2001  相似文献   

6.
The granularity of given temporal information is the level of abstraction at which information is expressed. Different units of measure allow one to represent different granularities. Indeterminacy is often present in temporal information given at different granularities: temporal indeterminacy is related to incomplete knowledge of when the considered fact happened. Focusing on temporal databases, different granularities and indeterminacy have to be considered in expressing valid time, i.e., the time at which the information is true in the modeled reality. In this paper, we propose HMAP (The term is the transliteration of an ancient Greek poetical word meaning “day”.), a temporal data model extending the capability of defining valid times with different granularity and/or with indeterminacy. In HMAP, absolute intervals are explicitly represented by their start,end, and duration: in this way, we can represent valid times as “in December 1998 for five hours”, “from July 1995, for 15 days”, “from March 1997 to October 15, 1997, between 6 and 6:30 p.m.”. HMAP is based on a three-valued logic, for managing uncertainty in temporal relationships. Formulas involving different temporal relationships between intervals, instants, and durations can be defined, allowing one to query the database with different granularities, not necessarily related to that of data. In this paper, we also discuss the complexity of algorithms, allowing us to evaluate HMAP formulas, and show that the formulas can be expressed as constraint networks falling into the class of simple temporal problems, which can be solved in polynomial time. Received 6 August 1998 / Accepted 13 July 2000 Published online: 13 February 2001  相似文献   

7.
Handling message semantics with Generic Broadcast protocols   总被引:1,自引:0,他引:1  
Summary. Message ordering is a fundamental abstraction in distributed systems. However, ordering guarantees are usually purely “syntactic,” that is, message “semantics” is not taken into consideration despite the fact that in several cases semantic information about messages could be exploited to avoid ordering messages unnecessarily. In this paper we define the Generic Broadcast problem, which orders messages only if needed, based on the semantics of the messages. The semantic information about messages is introduced by conflict relations. We show that Reliable Broadcast and Atomic Broadcast are special instances of Generic Broadcast. The paper also presents two algorithms that solve Generic Broadcast. Received: August 2000 / Accepted: August 2001  相似文献   

8.
A labelling approach for the automatic recognition of tables of contents (ToC) is described in this paper. A prototype is used for the electronic consulting of scientific papers in a digital library system named Calliope. This method operates on a roughly structured ASCII file, produced by OCR. The recognition approach operates by text labelling without using any a priori model. Labelling is based on part-of-speech tagging (PoS) which is initiated by a primary labelling of text components using some specific dictionaries. Significant tags are first grouped into homogeneous classes according to their grammar categories and then reduced in canonical forms corresponding to article fields: “title” and “authors”. Non-labelled tokens are integrated in one or another field by either applying PoS correction rules or using a structure model generated from well-detected articles. The designed prototype operates very well on different ToC layouts and character recognition qualities. Without manual intervention, a 96.3% rate of correct segmentation was obtained on 38 journals, including 2,020 articles, accompanied by a 93.0% rate of correct field extraction. Received April 5, 2000 / Revised February 19, 2001  相似文献   

9.
The promise of mobile devices lies not in their capacity to duplicate the capabilities of desktop machines, but rather in their promise of enabling location-specific tasks. One of the challenges that must be addressed if they are to be used in this way is how intuitive interfaces for mobile devices can be designed that enable access to location-specific services usable across locations. We are developing a prototype mobile valet application that presents location-specific services organised around the tasks associated with a location. The basic elements of the interface exploits commonalties in the way we address tasks at various locations just as the familiar “file” and “edit” menus in various software applications exploit regularities in software tasks.  相似文献   

10.
Design and analysis of a video-on-demand server   总被引:6,自引:0,他引:6  
The availability of high-speed networks, fast computers and improved storage technology is stimulating interest in the development of video on-demand services that provide facilities similar to a video cassette player (VCP). In this paper, we present a design of a video-on-demand (VOD) server, capable of supporting a large number of video requests with complete functionality of a remote control (as used in VCPs), for each request. In the proposed design, we have used an interleaved storage method with constrained allocation of video and audio blocks on the disk to provide continuous retrieval. Our storage scheme interleaves a movie with itself (while satisfying the constraints on video and audio block allocation. This approach minimizes the starting delay and the buffer requirement at the user end, while ensuring a jitter-free display for every request. In order to minimize the starting delay and to support more non-concurrent requests, we have proposed the use of multiple disks for the same movie. Since a disk needs to hold only one movie, an array of inexpensive disks can be used, which reduces the overall cost of the proposed system. A scheme supported by our disk storage method to provide all the functions of a remote control such as “fast-forwarding”, “rewinding” (with play “on” or “off”), “pause” and “play” has also been discussed. This scheme handles a user request independent of others and satisfies it without degrading the quality of service to other users. The server design presented in this paper achieves the multiple goals of high disk utilization, global buffer optimization, cost-effectiveness and high-quality service to the users.  相似文献   

11.
In the literature, many feature types are proposed for document classification. However, an extensive and systematic evaluation of the various approaches has not yet been done. In particular, evaluations on OCR documents are very rare. In this paper we investigate seven text representations based on n-grams and single words. We compare their effectiveness in classifying OCR texts and the corresponding correct ASCII texts in two domains: business letters and abstracts of technical reports. Our results indicate that the use of n-grams is an attractive technique which can even compare to techniques relying on a morphological analysis. This holds for OCR texts as well as for correct ASCII texts. Received February 17, 1998 / Revised April 8, 1998  相似文献   

12.
Transforming paper documents into XML format with WISDOM++   总被引:1,自引:1,他引:0  
The transformation of scanned paper documents to a form suitable for an Internet browser is a complex process that requires solutions to several problems. The application of an OCR to some parts of the document image is only one of the problems. In fact, the generation of documents in HTML format is easier when the layout structure of a page has been extracted by means of a document analysis process. The adoption of an XML format is even better, since it can facilitate the retrieval of documents in the Web. Nevertheless, an effective transformation of paper documents into this format requires further processing steps, namely document image classification and understanding. WISDOM++ is a document processing system that operates in five steps: document analysis, document classification, document understanding, text recognition with an OCR, and transformation into HTML/XML format. The innovative aspects described in the paper are: the preprocessing algorithm, the adaptive page segmentation, the acquisition of block classification rules using techniques from machine learning, the layout analysis based on general layout principles, and a method that uses document layout information for conversion to HTML/XML formats. A benchmarking of the system components implementing these innovative aspects is reported. Received June 15, 2000 / Revised November 7, 2000  相似文献   

13.
We present a new approach to the tracking of very non-rigid patterns of motion, such as water flowing down a stream. The algorithm is based on a “disturbance map”, which is obtained by linearly subtracting the temporal average of the previous frames from the new frame. Every local motion creates a disturbance having the form of a wave, with a “head” at the present position of the motion and a historical “tail” that indicates the previous locations of that motion. These disturbances serve as loci of attraction for “tracking particles” that are scattered throughout the image. The algorithm is very fast and can be performed in real time. We provide excellent tracking results on various complex sequences, using both stabilized and moving cameras, showing a busy ant column, waterfalls, rapids and flowing streams, shoppers in a mall, and cars in a traffic intersection. Received: 24 June 1997 / Accepted: 30 July 1998  相似文献   

14.
Making a resource or facility “accessible” to persons with disabilities does not necessarily make it “usable” to all such members of the population. The authors present some valuable lessons learned by National Science Foundation researchers and educators striving to make engaging activities and inviting curricula inclusive of all students, regardless of ability. Published online: 9 April 2002  相似文献   

15.
We present part of an industrial project where mechanized theorem proving is used for the validation of a translator which generates safety critical software. In this project, the mechanized proof is decomposed in two parts: one is done “online”, at each run of the translator, by a custom prover which checks automatically that the result of each translation meets some verification conditions; the other is done “offline”, once for all, interactively with a general purpose prover; the offline proof shows that the verification conditions checked by the online prover are sufficient to guarantee the correctness of each translation. The provably correct verification conditions can thus be seen as specifications for the online prover. This approach is called mechanized result verification. This paper describes the project requirements and explains the motivations to formal validation by mechanized result verification, provides an overview of the formalization of the specifications for the online prover and discusses in detail some issues we have addressed in the mechanized offline proof.  相似文献   

16.
Most environments are passive– deaf, dumb and blind, unaware of their inhabitants and unable to assist them in a meaningful way. However, with the advent of ubiquitous computing – ever smaller, cheaper and faster computational devices embedded in a growing variety of “smart” objects – it is becoming increasingly possible to create active environments: physical spaces that can sense and respond appropriately to the people and activities taking place within them. Most of the early ubiquitous computing applications focus on how individuals interact with their environments as they work on foreground tasks. In contrast, this paper focuses on how groups of people affect and are affected by background aspects of their environments.  相似文献   

17.
The aspect Bernoulli model: multiple causes of presences and absences   总被引:1,自引:0,他引:1  
We present a probabilistic multiple cause model for the analysis of binary (0–1) data. A distinctive feature of the aspect Bernoulli (AB) model is its ability to automatically detect and distinguish between “true absences” and “false absences” (both of which are coded as 0 in the data), and similarly, between “true presences” and “false presences” (both of which are coded as 1). This is accomplished by specific additive noise components which explicitly account for such non-content bearing causes. The AB model is thus suitable for noise removal and data explanatory purposes, including omission/addition detection. An important application of AB that we demonstrate is data-driven reasoning about palaeontological recordings. Additionally, results on recovering corrupted handwritten digit images and expanding short text documents are also given, and comparisons to other methods are demonstrated and discussed.
Mikael ForteliusEmail:

Ella Bingham   received her M.Sc. degree in Engineering Physics and Mathematics at Helsinki University of Technology in 1998, and her Dr.Sc. degree in Computer Science at Helsinki University of Technology in 2003. She is currently at Helsinki Institute for Information Technology, located at the University of Helsinki. Her research interests include statistical data analysis and machine learning. Ata Kabán   is a lecturer in the School of Computer Science of the University of Birmingham, since 2003. She holds a B.Sc. degree in computer science (1999) from the University “Babes-Bolya” of Cluj-Napoca, Romania, and a Ph.D. in computer science (2001) from the University of Paisley, UK. Her current research interests concern statistical machine learning and data mining. Prior to her career in computer science, she obtained a B.A. degree in musical composition (1994) and the M.A. (1995) and Ph.D. (1999) degrees in musicology from the Music Academy “Gh. Dima” of Cluj-Napoca, Romania. Mikael Fortelius   is a palaeontologist with special interest in plant-eating mammals of the Cenozoic, especially ungulates and their relationship with habitat and climate change (the Ungulate Condition). Mikael is Professor of Evolutionary Palaeontology in the Department of Geology and Group Leader in the Institute of Biotechnology (BI), University of Helsinki. Since 1992, he has been engaged in developing a database of Neogene Old World Mammals (). The NOW database is maintained at the Finnish Museum of Natural History and developed in collaboration with an extensive Advisory Board; data access and downloading are entirely public.   相似文献   

18.
Z -coordinates} of the vertices are completely governed by the Z-coordinates assigned to four selected ones. This allows describing the spatial polygonal mesh with just its 2D projection plus the heights of four vertices. As a consequence, these projections essentially capture the “spatial meaning” of the given surface, in the sense that, whatever spatial interpretations are drawn from them, they all exhibit essentially the same shape. Published online: 5 February 2003  相似文献   

19.
This paper presents the current state of the A2iA CheckReaderTM – a commercial bank check recognition system. The system is designed to process the flow of payment documents associated with the check clearing process: checks themselves, deposit slips, money orders, cash tickets, etc. It processes document images and recognizes document amounts whatever their style and type – cursive, hand- or machine printed – expressed as numerals or as phrases. The system is adapted to read payment documents issued in different English- or French-speaking countries. It is currently in use at more than 100 large sites in five countries and processes daily over 10 million documents. The average read rate at the document level varies from 65 to 85% with a misread rate corresponding to that of a human operator (1%). Received October 13, 2000 / Revised December 4, 2000  相似文献   

20.
Information security is perceived as an important and vital aspect for the survival of any business. Preserving user identity and limiting the access of web resources only to the humans and restricting ‘bots’ is an ever challenging area of study. With the increase in computing power and development of newer approaches towards circumvention and reverse-engineering, the recognition gap present between the machines and the humans is said to be decreasing. Turing test and its modified versions are in place to deal with such problems and ways to resolve them by developing complex algorithms for bot prevention systems like CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart). This paper will deal with the use of “Machine Vision” for judging the ability of the machines to compete with humans in breaking sequences of security systems like CAPTCHA. Reverse Turing test will be put to practise here. Complex image recognition technologies and novel approaches towards using Human interactive proofs (HIP) are discussed. The progress of Turing test over the past 60 years has been paid due attention at the end. After all this experimentation, it can be said that the current machine vision is quite poor and is far worse than it is expected to be.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号