首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
We investigate the folding energy landscape for a given RNA sequence through Boltzmann ensemble (BE) sampling of RNA secondary structures. The ensemble of sampled structures is used to derive distributions of energies and base‐pair distances between two configurations. We identify structural features that can be utilized for RNA gene finding. Characterization of the EL through BE sampling of secondary structures is computationally demanding and has multiple heterogeneous stages. We develop the Distributed Adaptive Runtime Environment to effectively address the computational requirements. Distributed Adaptive Runtime Environment is built upon an extensible and interoperable pilot‐job and supports the concurrent execution of a broad range of task sizes across a range of infrastructure. It is used to investigate two RNA systems of different sizes, S‐adenosyl methionine (SAM) binding RNA sequences known as SAM‐I riboswitches, and the S gene of the bovine corona virus RNA genome. We demonstrate how the implementation lowers the total time to solution for increases in RNA length, the number of sequences investigated, and the number of sampled structures. The distributions of energies and base‐pair distances reveal variations in folding dynamics and pathways among the SAM riboswitch sequences. Our results for BCoV RNA genome sequences also indicate sensitivity of folding to coding‐neutral variations in sequence. We search for a characteristic motif from within the SAM‐I consensus structure – a four‐way junction, among BE sampled structures for all 2910 SAM‐I sequences identified from Rfam (the curated ncRNA family database). We find that BE sampling provides insight into the variations in conformational distribution among sequences of the same ncRNA family. Therefore, BE sampling of secondary structures is a viable pre‐processing or post‐processing tool to complement comparative sequence analysis. The understanding gained shows how appropriately designed cyberinfrastructure can provide new insight into RNA folding and structure formation. Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

2.
Possibilistic distributions admit both measures of uncertainty and (metric) distances defining their information closeness. For general pairs of distributions these measures and metrics were first introduced in the form of integral expressions. Particularly important are pairs of distributions p and q which have consonant ordering—for any two events x and y in the domain of discourse p(x)⪋ p(y) if and only if q(x) ⪋ q(y). We call such distributions confluent and study their information distances.

This paper presents discrete sum form of uncertainty measures of arbitrary distributions, and uses it to obtain similar representations of metrics on the space of confluent distributions. Using these representations, a number of properties like additivity. monotonicity and a form of distributivity are proven. Finally, a branching property is introduced, which will serve (in a separate paper) to characterize axiomatically possibilistic information distances.  相似文献   


3.
We give the analytical definitions of the Chernoff, Bhattacharyya and Jeffreys–Matusita probabilistic distances between two Dirichlet distributions and two Beta distributions as its special case. For all other known probabilistic distances we show their inappropriateness in the analytical case. We discuss the parameter learning of the Dirichlet distribution from a finite sample set and present an application for split-and-merge image segmentation.  相似文献   

4.
Some articulated motion representations rely on frame-wise abstractions of the statistical distribution of low-level features such as orientation, color, or relational distributions. As configuration among parts changes with articulated motion, the distribution changes, tracing a trajectory in the latent space of distributions, which we call the configuration space. These trajectories can then be used for recognition using standard techniques such as dynamic time warping. The core theory in this paper concerns embedding the frame-wise distributions, which can be looked upon as probability functions, into a low-dimensional space so that we can estimate various meaningful probabilistic distances such as the Chernoff, Bhattacharya, Matusita, Kullback-Leibler (KL) or symmetric-KL distances based on dot products between points in this space. Apart from computational advantages, this representation also affords speed-normalized matching of motion signatures. Speed normalized representations can be formed by interpolating the configuration trajectories along their arc lengths, without using any knowledge of the temporal scale variations between the sequences. We experiment with five different probabilistic distance measures and show the usefulness of the representation in three different contexts—sign recognition (with large number of possible classes), gesture recognition (with person variations), and classification of human-human interaction sequences (with segmentation problems). We find the importance of using the right distance measure for each situation. The low-dimensional embedding makes matching two to three times faster, while achieving recognition accuracies that are close to those obtained without using a low-dimensional embedding. We also empirically establish the robustness of the representation with respect to low-level parameters, embedding parameters, and temporal-scale parameters.  相似文献   

5.
We propose a new viewpoint-based simplification method for polygonal meshes, driven by several f-divergences such as Kullback-Leibler, Hellinger and Chi-Square. These distances are a measure of discrimination between probability distributions. The Kullback-Leibler distance between the projected and the actual area distributions of the polygons in the scene already has been used as a measure of viewpoint quality. In this paper, we use the variation in those viewpoint distances to determine the error introduced by an edge collapse. We apply the best half-edge collapse as a decimation criterion. The approximations produced by our method are close to the original model in terms of both visual and geometric criteria. Unlike many pure visibility-driven methods, our new approach does not completely remove hidden interiors in order to increase the visual quality of the simplified models. This makes our approach more suitable for applications which require exact geometry tolerance but also require high visual quality.  相似文献   

6.
Thermal mixing in rivers is a common geophysical phenomenon that controls myriad processes, from aquatic ecological functions to stream and groundwater biogeochemistry. We present high-resolution remotely-sensed temperature distributions of thermal plumes discharging into rivers collected from Yellowstone National Park. Airborne (4 m pixel size) and ground-based (centimetre or better spatial resolution) images corroborate the presence of these mixing zones. They illustrate that thermal discharges in rivers may not be well-mixed with the bulk flow even after traversing distances corresponding to several stream widths. This allows for large thermal gradients (>30°C) to persist between the thermal discharge and the bulk flow. The plumes may have pronounced internal temperature gradients that vary in space and time. The images illustrate the potential of portable high-resolution sensors not only for acquiring observations needed for fundamental understanding of non-isothermal mixing processes but also for providing temperature distributions necessary for understanding many thermally-mediated processes.  相似文献   

7.
International Journal of Information Security - In the history of cryptography, many cryptographic protocols have relied on random coin tosses to prove their security. Although flipping coins is...  相似文献   

8.
This paper proposes a new methodology for computing Hausdorff distances between sets of points in a robust way. In a first step, robust nearest neighbor distance distributions between the two sets of points are obtained by considering reliability measures in the computations through a Monte Carlo scheme. In a second step, the computed distributions are operated using random variables algebra in order to obtain probability distributions of the average, minimum or maximum distances. In the last step, different statistics are computed from these distributions. A statistical test of significance, the nearest neighbor index, in addition to the newly proposed divergence and clustering indices are used to compare the computed measurements with respect to values obtained by chance. Results on synthetic and real data show that the proposed method is more robust than the standard Hausdorff distance. In addition, unlike previously proposed methods based on thresholding, it is appropriate for problems that can be modeled through point processes.  相似文献   

9.
These six programs evaluate relationships among a number (> 2) of protein (or DNA) sequences. Program 1 automatically computes optimum alignments and total distances for all pairwise sequence combinations over any user desired range of the two gap penalties. Programs 2, 3 and 4 generate a square, symmetrical distance matrix, which can be exported for cluster analysis, or further analysed with programs 5 and 6 to give specific distances between sequences and extract sequence relationships. Data and results are exchanged among these programs, which are written in BASIC and compiled to run on Macintosh (68020/030/040) type machines with coprocessor and at least one MB of RAM. BASIC graphics commands or those for the Macintosh interface are avoided to facilitate use on other machines. Two groups of sequences are used to demonstrate (a) alignment, inter sequence distance calculation and dendrogram generation and (b) specific distance calculation and its usage in detecting sub groups of related sequences in dendrograms.  相似文献   

10.
Bate  R.R. 《Software, IEEE》1998,15(4):65-66
Software engineers are accustomed to having systems engineers furnish them with allocated requirements and changes to those requirements. All too frequently, however, the systems engineering department tosses these items over the wall-along with a tight deadline-without engaging the software engineer's timely participation  相似文献   

11.
A BASIC program, KOLCLUS, is given for a nonparametric significance test for the distinctness of a pair of clusters in Euclidean space, using the Kolmogorov-Smirnov statistic. The input data may be given in one of two forms: (1) the coordinates of the points in the full space; or (2) the distances of the points to the cluster centroids together with the intercentroid distance. The test consists of comparing the cumulative distribution of the projections of the points onto the intercentroid axis with hypothesized distributions that include a number of cluster distributions of interest. The test has less power than the parametric equivalent, but is highly conservative.  相似文献   

12.
《Pattern recognition》2006,39(5):812-826
A feature selection methodology based on a novel Bhattacharyya space is presented and illustrated with a texture segmentation problem. The Bhattacharyya space is constructed from the Bhattacharyya distances of different measurements extracted with sub-band filters from training samples. The marginal distributions of the Bhattacharyya space present a sequence of the most discriminant sub-bands that can be used as a path for a wrapper algorithm. When this feature selection is used with a multiresolution classification algorithm on a standard set of texture mosaics, it produces the lowest misclassification errors reported.  相似文献   

13.
In the current paper we present a method for assessing cluster stability. This method, combined with a clustering algorithm, yields an estimate of the data partition, namely, the number of clusters. We adopt the cluster stability standpoint where clusters are imagined as islands of “high” density in a sea of “low” density. Explicitly, a cluster is associated with its high density core. Our approach offers to evaluate the goodness of a cluster by the similarity amongst the entire cluster and its core. We propose to measure this resemblance by two-sample tests or by probability distances between appropriate probability distributions. The distances are calculated on clustered samples drawn from the source population according to two different distributions. The first law is the underlying set distribution. The second law is constructed so that it represents the clusters’ cores. Here, a variant of the k-nearest neighbor density estimation is applied, so that items belonging to cores have a much higher chance to be selected. As the sample distribution is unknown a distribution-free two-sample test is required to examine the mentioned correspondence. For constructing such a test, we use distance functions built on negative definite kernels. In practice, outliers in the samples and limitations of the clustering algorithm heavily contribute to the noise level. As a result of this shortcoming the distance values have to be determined for many pairs of samples and therefore an empirical distance's distribution is obtained. The distribution is dependent on the examined number of clusters. To prevent this property for biasing the results we normalize the distances. It is conjectured that the true number of clusters yields the most concentrated normalized distribution. To measure the concentration we use the sample mean and the sample 25th percentile. The paper exhibits the good performance of the proposed method on synthetic and real-world data.  相似文献   

14.
Identifier attributes—very high-dimensional categorical attributes such as particular product ids or people's names—rarely are incorporated in statistical modeling. However, they can play an important role in relational modeling: it may be informative to have communicated with a particular set of people or to have purchased a particular set of products. A key limitation of existing relational modeling techniques is how they aggregate bags (multisets) of values from related entities. The aggregations used by existing methods are simple summaries of the distributions of features of related entities: e.g., MEAN, MODE, SUM, or COUNT. This paper's main contribution is the introduction of aggregation operators that capture more information about the value distributions, by storing meta-data about value distributions and referencing this meta-data when aggregating—for example by computing class-conditional distributional distances. Such aggregations are particularly important for aggregating values from high-dimensional categorical attributes, for which the simple aggregates provide little information. In the first half of the paper we provide general guidelines for designing aggregation operators, introduce the new aggregators in the context of the relational learning system ACORA (Automated Construction of Relational Attributes), and provide theoretical justification. We also conjecture special properties of identifier attributes, e.g., they proxy for unobserved attributes and for information deeper in the relationship network. In the second half of the paper we provide extensive empirical evidence that the distribution-based aggregators indeed do facilitate modeling with high-dimensional categorical attributes, and in support of the aforementioned conjectures. Editors: Hendrik Blockeel, David Jensen and Stefan Kramer An erratum to this article is available at .  相似文献   

15.
Abstract

A Bayesian approach has been applied to estimate the distribution of magnitudes, interevent distances and times of earthquakes occurred in 2017 in central Italy by using a small amount of random samples drawn from the distribution of the same seismic parameters for the earthquakes occurred in 2014-2016. We applied the method to the whole and aftershock-depleted seismicity by using the exponential and the normal model to fit the distributions of the seismic parameters. Our findings indicate that the exponential model fits the distributions of the seismic parameters much better than the normal model. Furthermore, in the whole seismicity case, the method requires at least 2100 to 2300 random samples to estimate the distributions of the seismic parameters of earthquakes occurred in 2017 with an estimation error less than 0.01; while in the aftershock-depleted case, a minimum number of random samples varying between 360 and 1470 occurred in 2014-2017 is required to estimate the distributions of the seismic parameters of earthquakes occurred in 2017 with an estimation error less than 0.01.  相似文献   

16.
The Earth Mover's Distance as a Metric for Image Retrieval   总被引:33,自引:1,他引:32  
We investigate the properties of a metric between two distributions, the Earth Mover's Distance (EMD), for content-based image retrieval. The EMD is based on the minimal cost that must be paid to transform one distribution into the other, in a precise sense, and was first proposed for certain vision problems by Peleg, Werman, and Rom. For image retrieval, we combine this idea with a representation scheme for distributions that is based on vector quantization. This combination leads to an image comparison framework that often accounts for perceptual similarity better than other previously proposed methods. The EMD is based on a solution to the transportation problem from linear optimization, for which efficient algorithms are available, and also allows naturally for partial matching. It is more robust than histogram matching techniques, in that it can operate on variable-length representations of the distributions that avoid quantization and other binning problems typical of histograms. When used to compare distributions with the same overall mass, the EMD is a true metric. In this paper we focus on applications to color and texture, and we compare the retrieval performance of the EMD with that of other distances.  相似文献   

17.
A notion of relative strength and weakness is defined for the comparison of measures of distance between probability distributions and is shown to be appropriate for use in the selection of distances as quasioptimal design criteria in problems such as signal selection and detector design when the preferred criteria, probability of error and asymptotic relative efficiency, are intractable. Csiszar's work on the topological properties of ?-divergences, a class of distances also investigated by Ali and Silvey, enables one to identify strong distances and provides a justification for the experimental findings of previous investigators.  相似文献   

18.
Existing clustering-based methods for segmentation and fiber tracking of diffusion tensor magnetic resonance images (DT-MRI) are based on a formulation of a similarity measure between diffusion tensors, or measures that combine translational and diffusion tensor distances in some ad hoc way. In this paper we propose to use the Fisher information-based geodesic distance on the space of multivariate normal distributions as an intrinsic distance metric. An efficient and numerically robust shooting method is developed for computing the minimum geodesic distance between two normal distributions, together with an efficient graph-clustering algorithm for segmentation. Extensive experimental results involving both synthetic data and real DT-MRI images demonstrate that in many cases our method leads to more accurate and intuitively plausible segmentation results vis-à-vis existing methods.  相似文献   

19.
Systematic analysis of identities and inequalities satisfied by possibilistic information distances is conducted. The analysis is based on their representations as discrete sums and on certain inequalities for rearrangements of sequences. These identities and inequalities express several properties that are usually deemed characteristic of information distances and measures. In the companion paper those properties are used to obtain several axiomatic characterizations of possibility distances. The basic distance is g(p,q) defined for the distributions p = (Pi,[tdot],pn) and q = (qi,[tdot],qn) such that Pi≤qi. =i=1,[tdot],n They serve to define a metric G(p,q)=g<p,p ∨ q)+g(q,p [tdot] q) and a distanceH(p,q) = g(p∧q,p)+g(p ∧ q,q). All these distances are, in turn based on the U-uncertainty information function.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号