首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
The relationship between written and spoken words is convoluted in languages with a deep orthography such as English and therefore it is difficult to devise explicit rules for generating the pronunciations for unseen words. Pronunciation by analogy (PbA) is a data-driven method of constructing pronunciations for novel words from concatenated segments of known words and their pronunciations. PbA performs relatively well with English and outperforms several other proposed methods. However, the method inherently generates several candidate pronunciations and its performance depends critically on a good scoring function to choose the best one of them.Previous PbA algorithms have used several different scoring heuristics such as the product of the frequencies of the component pronunciations of the segments, or the number of different segmentations that yield the same pronunciation, and different combinations of these methods, to evaluate the candidate pronunciations. In this article, we instead propose to use a probabilistically justified scoring rule. We show that this principled approach alone yields better accuracy than any previously published PbA algorithm. Furthermore, combined with certain ad hoc modifications motivated by earlier algorithms, the performance can in some cases be further increased.  相似文献   

3.
We propose an approach for information extraction for multi-page printed document understanding. The approach is designed for scenarios in which the set of possible document classes, i.e., documents sharing similar content and layout, is large and may evolve over time. Describing a new class is a very simple task: the operator merely provides a few samples and then, by means of a GUI, clicks on the OCR-generated blocks of a document containing the information to be extracted. Our approach is based on probability: we derived a general form for the probability that a sequence of blocks contains the searched information. We estimate the parameters for a new class by applying the maximum likelihood method to the samples of the class. All these parameters depend only on block properties that can be extracted automatically from the operator actions on the GUI. Processing a document of a given class consists in finding the sequence of blocks, which maximizes the corresponding probability for that class. We evaluated experimentally our proposal using 807 multi-page printed documents of different domains (invoices, patents, data-sheets), obtaining very good results—e.g., a success rate often greater than 90% even for classes with just two samples.  相似文献   

4.
Comparative trajectory sensitivity is investigated with respect to small probabilistic perturbations in initial state and plant parameters when a neighboring feedback control rather than the nominally equivalent open-loop control is used.The uncertainty in initial state and plant parameters is modeled by jointly normally distributed random variables with known mean and known covariance matrix. The nominal control is assumed to yield a satisfactory nominal trajectory and it is desired to preserve this shape in non-nominal situations. The likelihood of the state provides a measure of insensitivity at time t.Using the system equations linearized around the nominal trajectory, the joint distributions of the incremental state are computed in both open-loop and closed-loop cases. It is established that the augmented nominal state when a neighboring optimal feedback control is used is at least as likely as that when the nominally equivalent open-loop control is used. The domain where the augmented incremental state for the optimal closed-loop control is more densely distributed than that corresponding to open-loop control is shown to be a hyper-hyperboloid. This domain is unbounded in certain directions, thus pointing out a potential disadvantage in using optimal feedback control.  相似文献   

5.
6.
We present a novel decentralized probabilistic approach to visual tracking of articulated objects. Analyzing articulated motion is challenging because (1) the high degrees of freedom potentially demands tremendous computation, and (2) the solution is confronted by the numerous local optima existed in a high dimensional parametric space. To ease these problems, we propose a decentralized approach that analyzes limbs locally and reinforces the spatial coherence among them at the same time. The computational model of the proposed approach is based on a dynamic Markov network, a generative model which characterizes the dynamics, the image observations of each individual limb, as well as the spatial coherence among them. Probabilistic mean field variational analysis provides an efficient computational diagram to obtain the approximate inference of the motion posteriors. We thus design the mean field Monte Carlo (MFMC) algorithm, where a set of low dimensional particle filters interact with one another and solve the high dimensional problem collaboratively. We also present a variational maximum a posteriori (MAP) algorithm, which has a rigorous theoretic foundation, to approach to the optimal MAP estimate of the articulated motion. Both algorithms achieve linear complexity w.r.t. the number of articulated subparts and have the potential of parallel computing. Experiments on human body tracking demonstrate the significance, effectiveness and efficiency of the proposed methods.  相似文献   

7.
This paper presents a novel revision of the framework of Hybrid Probabilistic Logic Programming, along with a complete semantics characterization, to enable the encoding of and reasoning about real-world applications. The language of Hybrid Probabilistic Logic Programs framework is extended to allow the use of non-monotonic negation, and two alternative semantical characterizations are defined: stable probabilistic model semantics and probabilistic well-founded semantics. These semantics generalize the stable model semantics and well-founded semantics of traditional normal logic programs, and they reduce to the semantics of Hybrid Probabilistic Logic programs for programs without negation. It is the first time that two different semantics for Hybrid Probabilistic Programs with non-monotonic negation as well as their relationships are described. This proposal provides the foundational grounds for developing computational methods for implementing the proposed semantics. Furthermore, it makes it clearer how to characterize non-monotonic negation in probabilistic logic programming frameworks for commonsense reasoning. An erratum to this article can be found at  相似文献   

8.
Recently, the study of incorporating probability theory and fuzzy logic has received much interest. To endow the traditional fuzzy rule-based systems (FRBs) with probabilistic features to handle randomness, this paper presents a probabilistic fuzzy neural network (ProFNN) by introducing the probability of input linguistic terms and providing linguistic meaning into the connectionist architecture. ProFNN integrates the probabilistic information of fuzzy rules into the antecedent parts and quantifies the impacts of the rules on the consequent parts using mutual subsethood, which work in conjunction with volume defuzzification in a gradient descent learning frame work. Despite the increase in the number of parameters, ProFNN provides a promising solution to deal with randomness and fuzziness in a single frame. To evaluate the performance and applicability of the proposed approach, ProFNN is carried out on various benchmarking problems and compared with other existing models with a performance better than most of them.  相似文献   

9.
A unified approach to ranking in probabilistic databases   总被引:1,自引:0,他引:1  
Ranking is a fundamental operation in data analysis and decision support and plays an even more crucial role if the dataset being explored exhibits uncertainty. This has led to much work in understanding how to rank the tuples in a probabilistic dataset in recent years. In this article, we present a unified approach to ranking and top-k query processing in probabilistic databases by viewing it as a multi-criterion optimization problem and by deriving a set of features that capture the key properties of a probabilistic dataset that dictate the ranked result. We contend that a single, specific ranking function may not suffice for probabilistic databases, and we instead propose two parameterized ranking functions, called PRF ω and PRF e, that generalize or can approximate many of the previously proposed ranking functions. We present novel generating functions-based algorithms for efficiently ranking large datasets according to these ranking functions, even if the datasets exhibit complex correlations modeled using probabilistic and/xor trees or Markov networks. We further propose that the parameters of the ranking function be learned from user preferences, and we develop an approach to learn those parameters. Finally, we present a comprehensive experimental study that illustrates the effectiveness of our parameterized ranking functions, especially PRF e, at approximating other ranking functions and the scalability of our proposed algorithms for exact or approximate ranking.  相似文献   

10.
Process mining aims at deriving order relations between tasks recorded by event logs in order to construct their corresponding process models. The quality of the results is not only determined by the mining algorithm being used, but also by the quality of the provided event logs. As a criterion of log quality, completeness measures the magnitude of information for process mining covered by an event log. In this paper, we focus on the evaluation of the local completeness of an event log. In particular, we consider the direct succession (DS) relations between the tasks of a business process. Based on our previous work, an improved approach called CPL+ is proposed in this paper. Experiments show that the proposed CPL+ works better than other approaches, on event logs that contain a small amount of traces. Finally, by further investigating CPL+, we also found that the more distinct DSs observed in an event log, the lower the local completeness of the log is.  相似文献   

11.
We present a new approach to address the problem of large sequence mining from big data. The particular problem of interest is the effective mining of long sequences from large-scale location data to be practical for Reality Mining applications, which suffer from large amounts of noise and lack of ground truth. To address this complex data, we propose an unsupervised probabilistic topic model called the distant n-gram topic model (DNTM). The DNTM is based on latent Dirichlet allocation (LDA), which is extended to integrate sequential information. We define the generative process for the model, derive the inference procedure, and evaluate our model on both synthetic data and real mobile phone data. We consider two different mobile phone datasets containing natural human mobility patterns obtained by location sensing, the first considering GPS/wi-fi locations and the second considering cell tower connections. The DNTM discovers meaningful topics on the synthetic data as well as the two mobile phone datasets. Finally, the DNTM is compared to LDA by considering log-likelihood performance on unseen data, showing the predictive power of the model. The results show that the DNTM consistently outperforms LDA as the sequence length increases.  相似文献   

12.
Summary. We set out a modal logic for reasoning about multilevel security of probabilistic systems. This logic contains expressions for time, probability, and knowledge. Making use of the Halpern-Tuttle framework for reasoning about knowledge and probability, we give a semantics for our logic and prove it is sound. We give two syntactic definitions of perfect multilevel security and show that their semantic interpretations are equivalent to earlier, independently motivated characterizations. We also discuss the relation between these characterizations of security and between their usefulness in security analysis.  相似文献   

13.
The focus of this paper is the pseudometric used as a key concept in our previous work on optimal supervisory control of probabilistic discrete event systems. The pseudometric is employed to measure the behavioural similarity between probabilistic systems, and initially was defined as a greatest fixed point of a monotone function. This paper further characterizes the pseudometric. First, it gives a logical characterization of the pseudometric so that the distance between two systems is measured by a formula that distinguishes between the systems the most. A trace characterization of the pseudometric is then derived from the logical characterization, characterizing the similarity between systems from a language perspective. Further, the solution of the problem of approximation of a given probabilistic generator with another generator of a prespecified structure is suggested such that the new model is as close as possible to the original one in the pseudometric. The significance of the approximation is then discussed, especially with respect to previous work on optimal supervisory control of probabilistic discrete event systems.  相似文献   

14.
A Bayesian game is a game of incomplete information in which the rules of the game are not fully known to all players. We consider the Bayesian game of Battle of Sexes that has several Bayesian Nash equilibria and investigate its outcome when the underlying probability set is obtained from generalized Einstein–Podolsky–Rosen experiments. We find that this probability set, which may become non-factorizable, results in a unique Bayesian Nash equilibrium of the game.  相似文献   

15.
Feature transformation (FT) for dimensionality reduction has been deeply studied in the past decades. While the unsupervised FT algorithms cannot effectively utilize the discriminant information between classes in classification tasks, existing supervised FT algorithms have not yet caught up with the advances in classifier design. In this paper, based on the idea of controlling the probability of correct classification of a future test point as big as possible in the transformed feature space, a new supervised FT method called minimax probabilistic feature transformation (MPFT) is proposed for multi-class dataset. The experimental results on the UCI benchmark datasets and the high dimensional cancer gene expression datasets demonstrate that the proposed feature transformation methods are superior or competitive to several classical FT methods.  相似文献   

16.
Many real-world knowledge-based systems must deal with information coming from different sources that invariably leads to incompleteness, overspecification, or inherently uncertain content. The presence of these varying levels of uncertainty doesn’t mean that the information is worthless – rather, these are hurdles that the knowledge engineer must learn to work with. In this paper, we continue work on an argumentation-based framework that extends the well-known Defeasible Logic Programming (DeLP) language with probabilistic uncertainty, giving rise to the Defeasible Logic Programming with Presumptions and Probabilistic Environments (DeLP3E) model. Our prior work focused on the problem of belief revision in DeLP3E, where we proposed a non-prioritized class of revision operators called AFO (Annotation Function-based Operators) to solve this problem. In this paper, we further study this class and argue that in some cases it may be desirable to define revision operators that take quantitative aspects into account, such as how the probabilities of certain literals or formulas of interest change after the revision takes place. To the best of our knowledge, this problem has not been addressed in the argumentation literature to date. We propose the QAFO (Quantitative Annotation Function-based Operators) class of operators, a subclass of AFO, and then go on to study the complexity of several problems related to their specification and application in revising knowledge bases. Finally, we present an algorithm for computing the probability that a literal is warranted in a DeLP3E knowledge base, and discuss how it could be applied towards implementing QAFO-style operators that compute approximations rather than exact operations.  相似文献   

17.
Transduction is an inference mechanism adopted from several classification algorithms capable of exploiting both labeled and unlabeled data and making the prediction for the given set of unlabeled data only. Several transductive learning methods have been proposed in the literature to learn transductive classifiers from examples represented as rows of a classical double-entry table (or relational table). In this work we consider the case of examples represented as a set of multiple tables of a relational database and we propose a new relational classification algorithm, named TRANSC, that works in a transductive setting and employs a probabilistic approach to classification. Knowledge on the data model, i.e., foreign keys, is used to guide the search process. The transductive learning strategy iterates on a k-NN based re-classification of labeled and unlabeled examples, in order to identify borderline examples, and uses the relational probabilistic classifier Mr-SBC to bootstrap the transductive algorithm. Experimental results confirm that TRANSC outperforms its inductive counterpart (Mr-SBC).  相似文献   

18.
We present a novel multivariate classification technique based on Genetic Programming. The technique is distinct from Genetic Algorithms and offers several advantages compared to Neural Networks and Support Vector Machines. The technique optimizes a set of human-readable classifiers with respect to some user-defined performance measure. We calculate the Vapnik-Chervonenkis dimension of this class of learning machines and consider a practical example: the search for the Standard Model Higgs Boson at the LHC. The resulting classifier is very fast to evaluate, human-readable, and easily portable. The software may be downloaded at: http://cern.ch/~cranmer/PhysicsGP.html.  相似文献   

19.
In this paper, we introduce a Bayesian approach, inspired by probabilistic principal component analysis (PPCA) (Tipping and Bishop in J Royal Stat Soc Ser B 61(3):611–622, 1999), to detect objects in complex scenes using appearance-based models. The originality of the proposed framework is to explicitly take into account general forms of the underlying distributions, both for the in-eigenspace distribution and for the observation model. The approach combines linear data reduction techniques (to preserve computational efficiency), non-linear constraints on the in-eigenspace distribution (to model complex variabilities) and non-linear (robust) observation models (to cope with clutter, outliers and occlusions). The resulting statistical representation generalises most existing PCA-based models (Tipping and Bishop in J Royal Stat Soc Ser B 61(3):611–622, 1999; Black and Jepson in Int J Comput Vis 26(1):63–84, 1998; Moghaddam and Pentland in IEEE Trans Pattern Anal Machine Intell 19(7):696–710, 1997) and leads to the definition of a new family of non-linear probabilistic detectors. The performance of the approach is assessed using receiver operating characteristic (ROC) analysis on several representative databases, showing a major improvement in detection performances with respect to the standard methods that have been the references up to now.This revised version was published online in November 2004 with corrections to the section numbers.  相似文献   

20.
Discrete event simulation is viewed as solving a fixed point problem whose unknowns are infinite histories or streams of event and time information. Stream domains provide two notions of convergence, which correspond to the usual categorization of simulation methods. Metric convergence leads to optimistic parallel simulation (the classic event list mechanism turns out to be a specialization), and convergence in the sense of partial orders leads to conservative parallel simulation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号

京公网安备 11010802026262号