共查询到19条相似文献,搜索用时 176 毫秒
1.
为了解决连续不确定XML高效的top-k查询,提出CProTJFast算法.该算法基于P-文档模型,扩展PEDewey(probabilistic extended Dewey)编码支持连续分布类型节点的编码,采用路径概率下限值进行节点过滤,并针对连续概率密度函数制定过滤策略,从而在计算连续节点概率之前过滤掉不参与结果的节点.实验结果表明,采用连续节点过滤策略的CProTJFast算法有效地提高了连续不确定XML的top-k查询效率. 相似文献
2.
针对连续不确定XML数据的概率阈值查询,提出CPTI(Continuous Probabilistic Threshold Index)索引技术,包括CPTI结构索引和CPTI值索引。CPTI结构索引扩展了结构索引F-index支持连续不确定XML数据,通过CPTI结构索引查询twig小枝,并确定小枝的路径概率;CPTI值索引是一个二维表,记录cont类节点的概率信息,通过CPTI值索引过滤与查询无关的元素以减少查询中需要处理的元素数目。实验表明,此索引技术可极大地提高查询处理的性能。 相似文献
3.
针对连续不确定XML数据概率阈值范围查询,提出一种新的CUXI索引树。该索引树的构建方法是借鉴U树对空间数据自顶向下递归构建索引树的思想,将连续不确定XML文档中具有相同父亲的叶子节点构建二维数据矩形,在聚类的基础上来构建相应的CUXI索引树,其中叶子节点存储连续不确定数据辅助信息。为了提高查询效率,对连续不确定数据制定了过滤策略,通过遍历索引树过滤掉不满足查询范围的子树。理论和实验结果表明,此索引技术可提高查询处理的性能。 相似文献
4.
提出了一种新的基于贝叶斯网络对XML文档信息进行查询的模型方法.该模型支持针对XML文档信息的结构化查询.基于XML信息查询的特点,利用XML数据集中语词、元素和结构化单元的统计信息对模型的拓扑结构和条件概率进行了学习;结合概率函数的方法,利用模型的概率推理进程对XML文档和结构化查询条件的相关度进行了估算.最后在基于INEX测试集的实验中证明了该方法的有效性和可靠性. 相似文献
5.
针对在XML文档树模型中进行后兄弟节点查询时内存消耗大、匹配效率低等缺陷,提出一种基于XML数据流与栈的后兄弟查询算法。采用SAX解析器与结构连接方法,对XML文档中所有已知节点与后兄弟节点进行精确匹配并输出。结果表明,该算法具有适用范围广、占用系统资源少、匹配效率高等优势。 相似文献
6.
目前,不确定XML数据的top-k查询算法中都没有处理连续不确定数据,本文提出SPCProTJFast算法,该算法改进了传统的归并算法,并结合连续不确定数据的过滤方法,实现了连续不确定XML的Top-k查询。为了避免概率下限值过小对过滤效果的影响,又提出HPCProTJFast算法,该算法推迟了对连续节点的处理,只有在获得满足概率条件的整枝路径时才对连续节点进行访问。实验表明,在执行时间以及过滤效率上,同直接处理连续不确定数据的ProTJFast算法相比,这两种算法都要更高效,并且HPCProTJFast算法的效率更高。 相似文献
7.
8.
XML已经成为Internet上数据交换和数据集成的事实标准.随着XML的广泛应用,XML文档数量不断增多.如何高效地查询XML数据变得越来越重要.针对目前分支查询中普遍采用的基于堆栈的查询处理算法所存在的问题,提出了一种基于XML结构索引的模式匹配改进算法,通过选择合适的标签编码方式,利用XML结构索引,快速判断出元素之间的相互关系,防止大量不必要节点放入堆栈,从而提高查询处理效率.实验结果证明,文中改进的模式匹配算法Twig-Modify相比TwigStack以及TwigINLAB在查询处理的性能上有所提高 相似文献
9.
一种非归并不确定XML小枝模式查询算法 总被引:1,自引:1,他引:0
针对目前不确定XML小枝模式查询需要存储大量中间结果和归并中间结果的情况,提出一种非归并不确定XML小枝模式查询算法ProTwigList。该算法查询之前通过Tag+Level流进行剪枝,以减少待处理节点的数目;并扩展了区间编码来对剪枝后剩余的普通节点进行编码,用一定规则对分布节点进行标识;查询时采用公共分布节点路径的方法处理分布结点,最后结合最低公共祖先节点的概率计算查询结果的概率值。理论分析和实验结果证明了ProTwigList算法的查询效率。 相似文献
10.
11.
概率XML文件是概率数据的网络数据交换和表示标准,元素取值及其概率的查询与计算是概率XML文件的重要研究内容.概率XML文件树是一种有效的概率XML文件的数据模型,定义了概率XML文件树的基本路径和扩展路径,提出了根据可能世界原理将概率XML文件树分解为普通子XML树的集合的算法,根据路径分析原理将概率XML文件树分解为子概率XML树的集合的算法和相应的查询与计算结点及结点集合概率的算法,并通过实验进行了比较分析.实验结果表明:这两种方法是有效的;与前一种方法比较,后一种方法适合较大的概率XML文件树、结点及结点集合的概率的查询,计算过程较简单. 相似文献
12.
不确定海量数据存储与记录的广泛应用及其在XML上的扩展,使XML的关联事件概率的数据模型研究成为研究热点,以描述复杂事件的概率数据模型为目标,在当前已有概率模型的基础上,提出了多维不确定概率模型空间的概念,基于多个概率模型进行统一建模,并把单维XML概率节点引申到多维空间,进而定义了统一的空间查询方式,为复杂概率数据建模和查询优化提供了一种新颖的理论方法。 相似文献
13.
On the expressiveness of probabilistic XML models 总被引:1,自引:0,他引:1
Serge Abiteboul Benny Kimelfeld Yehoshua Sagiv Pierre Senellart 《The VLDB Journal The International Journal on Very Large Data Bases》2009,18(5):1041-1064
Various known models of probabilistic XML can be represented as instantiations of the abstract notion of p-documents. In addition to ordinary nodes, p-documents have distributional nodes that specify the possible worlds and their probabilistic distribution. Particular families of p-documents are determined
by the types of distributional nodes that can be used as well as by the structural constraints on the placement of those nodes
in a p-document. Some of the resulting families provide natural extensions and combinations of previously studied probabilistic
XML models. The focus of the paper is on the expressive power of families of p-documents. In particular, two main issues are
studied. The first is the ability to (efficiently) translate a given p-document of one family into another family. The second is closure under updates, namely, the ability to (efficiently) represent the result of updating the instances of a p-document of a given family as
another p-document of that family. For both issues, we distinguish two variants corresponding to value-based and object-based semantics of p-documents. 相似文献
14.
As probabilistic data management is becoming one of the main research focuses and keyword search is turning into a more popular query means, it is natural to think how to support keyword queries on probabilistic XML data. With regards to keyword query on deterministic XML documents, ELCA (Exclusive Lowest Common Ancestor) semantics allows more relevant fragments rooted at the ELCAs to appear as results and is more popular compared with other keyword query result semantics (such as SLCAs). In this paper, we investigate how to evaluate ELCA results for keyword queries on probabilistic XML documents. After defining probabilistic ELCA semantics in terms of possible world semantics, we propose an approach to compute ELCA probabilities without generating possible worlds. Then we develop an efficient stack-based algorithm that can find all probabilistic ELCA results and their ELCA probabilities for a given keyword query on a probabilistic XML document. Finally, we experimentally evaluate the proposed ELCA algorithm and compare it with its SLCA counterpart in aspects of result probability, time and space efficiency, and scalability. 相似文献
15.
The compact representation of incomplete probabilistic knowledge which can be encountered in risk evaluation problems, for instance in environmental studies is considered. Various kinds of knowledge are considered such as expert opinions about characteristics of distributions or poor statistical information. The approach is based on probability families encoded by possibility distributions and belief functions. In each case, a technique for representing the available imprecise probabilistic information faithfully is proposed, using different uncertainty frameworks, such as possibility theory, probability theory, and belief functions, etc. Moreover the use of probability-possibility transformations enables confidence intervals to be encompassed by cuts of possibility distributions, thus making the representation stronger. The respective appropriateness of pairs of cumulative distributions, continuous possibility distributions or discrete random sets for representing information about the mean value, the mode, the median and other fractiles of ill-known probability distributions is discussed in detail. 相似文献
16.
William L. McGill 《Computers & Structures》2008,86(10):1052-1060
This paper applies the Transferable Belief Model (TBM) interpretation of the Dempster-Shafer theory of evidence to estimate parameter distributions for probabilistic structural reliability assessment based on information from previous analyses, expert opinion, or qualitative assessments (i.e., evidence). Treating model parameters as credal variables, the suggested approach constructs a set of least-committed belief functions for each parameter defined on a continuous frame of real numbers that represent beliefs induced by the evidence in the credal state, discounts them based on the relevance and reliability of the supporting evidence, and combines them to obtain belief functions that represent the aggregate state of belief in the true value of each parameter. Within the TBM framework, beliefs held in the credal state can then be transformed to a pignistic state where they are represented by pignistic probability distributions. The value of this approach lies in its ability to leverage results from previous analyses to estimate distributions for use within a probabilistic reliability and risk assessment framework. The proposed methodology is demonstrated in an example problem that estimates the physical vulnerability of a notional office building to blast loading. 相似文献
17.
18.
针对XML数据半结构化的特点及概率查询理论,对已构建的PEPX概率数据模型进行研究,特别对高效独特的查询路径建立进行了分析,设计了运用概率论理论,在计算节点概率基础上动态选择数据查询路径的算法,并通过数据模拟,验证了该算法在减少查询操作、提高算法执行效率的有效性和可行性。 相似文献
19.
通过比较基于可能世界模型的概率数据在关系数据模型和XML数据模型中的表示方法,根据概率属性与普通属性的关系把概率关系模式分为1NF和3NF,根据分布节点与普通节点的关系把概率XML模式也分为1NF和3NF,以扩展的概率DTD文件为例设计了概率关系模式和概率XML模式之间的转换算法。实例分析结果表明该算法是有效的,也为现存的概率关系数据与概率XML数据之间提供了一种有效的模式转换方法。 相似文献