首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 878 毫秒
1.
一种基于XML文档聚类的XML近似查询算法   总被引:1,自引:0,他引:1       下载免费PDF全文
提出了一种基于XML文档聚类的XML近似查询算法。给出了基于语义的XML文档间距离的计算方法,结合该语义距离,提出了基于网格的八邻域聚类算法对XML数据库进行聚类划分,进而利用在聚类过程中得到的聚类中心对静态有序选择算法的近似查询评估阶段进行优化,使得不用对XML数据库进行完全遍历就能及时返回满足用户需要的查询结果。最后,在汽车外形智能化设计的实验中表明该算法有效地提高了静态有序选择算法的查询效率。  相似文献   

2.
XML数据的查询技术   总被引:31,自引:1,他引:31  
XML规范已成为当前网络应用(包括数字图书馆、Web服务以及电子商务)中事实上的数据表达、交换的标准.针对XML数据的查询在当前XML数据管理研究中占有重要的地位,也是当前XML数据处理研究领域的热点方向,相关的研究文献有很多.根据查询模式描述的不同,将当前XML查询技术归入两大类:XML Query方式和XML IR方式.后者又进而可分以为3个子类:XML IR/keyword方式、XML IR/fragment和XML IR/query方式,并从中挑选出3个研究者关注的问题进行了简述,它们是:Twig查询模式的处理、SLCA(smallest lowest common ancestor)节点的获取以及对所获取的XML片段相似性的度量.以方便普通用户使用为准则探讨了相关XML查询技术的优、缺点,将如下4个问题作为需要进一步关注的研究内容:结构化关键字查询及相应的结构相似性度量方法,如何消除XML Query查询处理模式(包含XML IR/query)和XML IR/keyword查询处理模式间数据冗余的问题,XML Query查询方式的理论探讨及其实现以及针对特定应用的XML数据的有效管理.  相似文献   

3.
杨长辉  岳友友 《计算机应用》2006,26(12):2991-2993
提出了一种新的XML查询方案,将内容查询置于结构查询之前,并在结构查询中提出M集合的概念,通过计算查询树的M集合和简化后的DTD的M集合之间的编辑距离,对XML文档进行排序,不仅减少了查询时间,而且在保证查全率的同时提高了查询的查准率。  相似文献   

4.
一种逐层提升缓冲的XML流查询自动机   总被引:3,自引:1,他引:3  
如何在XML流上高效地执行大量XPath查询是当今研究的热点.特别在管道处理等应用中还希望在解析流的同时尽早地输出查询结果.定义了基本XSIEQ(XML Stream Query with Immediate Evaluation)机.它是一个XML流查询框架,是被索引化的、基于栈的自动机;在其上可以扩展应用多种XPath查询算法.在基本XSIEQ机上,提出一种逐层提升缓冲(promoting buffer,简称PBuf)的查询算法,形式地定义了基于PBuf的XSIEQ机并进行了实现和测试.实验结果表明,提出的方法能够支持复杂的XPath查询,在执行效率方面优于传统算法.  相似文献   

5.
树形数据排序是XML数据处理中一个基本问题.提出了一种XML文档高能效排序算法--EEXPSort.该算法扫描XML文档产生相互独立的排序任务,利用多核CPU对排序任务进行并行处理;同时采用数据压缩、单临时文件存储以及避免子树匹配等策略,有效地减少磁盘IO和CPU操作时间.对不同特性的XML文档开展了大量比较实验,结果表明所提算法能效优于现有性能最好的树形数据排序算法HERMES.  相似文献   

6.
根据专业搜索引擎的特点,提出了一种新颖的基于词语共现与HITS算法的查询推荐算法QR-CH(Query Recommendation algorithm based on word Co-occurrence and HITS algorithm)。该算法一方面利用HITS算法对基于词语共现筛选出的关联词按语义关联性进行排序,选取排序靠前的关联词作为推荐词,提高了推荐词与原查询词的相关性;另一方面使用HITS算法排序关联文档,从查询结果文档集的角度来判断推荐是否冗余,降低了推荐词的冗余性。该算法将推荐相关的信息存储到知识树中,利用知识树实现查询推荐。实验结果表明QR-CH算法在推荐词的相关性和冗余词的判断方面均优于文献中已有的类似算法。  相似文献   

7.
XML查询的结构连接算法   总被引:1,自引:0,他引:1  
针对目前多数XML结构连接方法在输入元素集合不存在索引或者无序的情况下,对输入数据临时排序或建立索引代价过高的问题,分析经典的Stack-Tree-Desc算法以及B 树索引的优化算法,提出不局限于外部索引结构的XML查询优化策略并给出算法实现.实验结果表明该算法较Stack-Tree-Desc算法查询效率更高.  相似文献   

8.
基于扩展路径表达式的XML查询   总被引:4,自引:0,他引:4  
XML查询问题是当前计算机界研究的热点问题之一,国内外学者提出了众多的模型与算法.其中,日本学者Makoto Murata等提出采用扩展路径表达式来表达查询,并利用hedge自动机和字符串自动机进行XML的查询计算.这种方法与采用路径表达式控制的XML查询相比,克服了后者不能充分利用XML文档有序性的缺点.另外,扩展路径表达式具有较强的表达能力,可以表达任何MSO(一元二阶逻辑)查询.因此,扩展路径表达式已作为XML查询问题研究的主要理论框架之一,但是扩展路径表达式的编写比较困难,表达式也比较复杂,导致算法时间复杂度的提高.在扩展路径表达式中引入通配符,使得扩展路径表达式更加简单灵活;同时在查询的计算过程中提出并应用带截止集的自动机提高计算的时间效率.  相似文献   

9.
XML文档存储是NXD(Native XML Database)系统必须解决的问题.Internet中XML主要应用于信息交换过程的数据结构及语义描述,NXD系统也需要支持XQuery标准,提供高效率的XML文档访问接口.本文较完整地设计了NXD存储系统的体系结构,针对XML的路径查询特点,设计XML结点存储的数据结构及存储系统的索引.包括结构及其建立、维护的算法,索引采用一种HASH算BH(平衡HASH)算法实现.一通过试验系统测试,这些存储结构和算法可以保证NXD系统的访问效率及路径查询效率.  相似文献   

10.
一种新的基于划分的结构连接算法   总被引:2,自引:0,他引:2       下载免费PDF全文
有效的结构连接是XML查询处理的关键。目前,大部分结构连接算法由于需要临时排序、建立索引或存在数据复制及I/O问题,大大降低了执行效率。该文在分析比较现有结构连接算法的基础上,提出了一种新的基于划分的结构连接算法。该算法不需要排序或建立索引,通过栈的机制解决了数据复制问题,并充分考虑内存缓冲提高了I/O性能。实验分析表明该算法具有良好的查询性能。  相似文献   

11.
Searching XML data with a structured XML query can improve the precision of results compared with a keyword search. However, the structural heterogeneity of the large number of XML data sources makes it difficult to answer the structured query exactly. As such, query relaxation is necessary. Previous work on XML query relaxation poses the problem of unnecessary computation of a big number of unqualified relaxed queries. To address this issue, we propose an adaptive relaxation approach which relaxes a query against different data sources differently based on their conformed schemas. In this paper, we present a set of techniques that supports this approach, which includes schema-aware relaxation rules for relaxing a query adaptively, a weighted model for ranking relaxed queries, and algorithms for adaptive relaxation of a query and top-k query processing. We discuss results from a comprehensive set of experiments that show the effectiveness and the efficiency of our approach.  相似文献   

12.
摘要为了解决XML查询的信息过载问题,提出了基于条件偏好的XML多查询结果排序方法。该方法把用户指定的内容查询谓词作为上下文条件,然后在原始XML数据和查询历史上利用概率信息检索模型推测当前用户偏好,评估结果元素中被查询指定的属性单元值与未指定的属性单元值之间的关联关系以及未指定的属性单元值与用户偏好之间的相关程度,进而构建查询结果元素打分函数;在此基础上,利用打分函数计算结果元素的排序分值,并以此对查询结果进行排序。实验结果表明,提出的排序方法具有较高的排序准确性,能够较好地满足用户需求和偏好。  相似文献   

13.
Jian Liu  Z. M. Ma  Li Yan 《World Wide Web》2013,16(3):325-353
As the next generation language of the Internet, XML has been the de-facto standard of information exchange over the web. A core operation for XML query processing is to find all the occurrences of a twig pattern in an XML database. In addition, the study of probabilistic data has become an emerging topic for various applications on the Web. Therefore, researching the combination of XML twig pattern and probabilistic data is quite significant. In prior work of probabilistic XML, the answers of a given twig query are always complete. However, complete answers with low probabilities may be deemed irrelevant while incomplete answers with high probabilities are of great significance because incomplete answers may be the potential answers that interest the users. Different from complete evaluation, evaluating incomplete twigs in probabilistic XML introduces some new challenges. On one hand, incomplete queries do not only obtain complete matches, but also return answers that contain considerable incomplete matches. On the other hand, the processing of incomplete evaluation is more complicated. It is obvious that a ranking approach should be adopted along with evaluating incomplete answers. In this paper, we propose an efficient algorithm to handle the problem of querying incomplete twigs over the probabilistic XML database. We also present a novel algorithm for ranking the incomplete answers. The experimental results show that our proposed algorithms can improve the performance of querying and ranking incomplete twigs significantly.  相似文献   

14.
政务信息资源检索是政务信息资源共享系统的重要功能。以《政务信息资源目录体系》国家标准中的XML元数据规范为依据,提出了一种支持关键词搜索的政务信息资源检索算法。该算法使用政务信息资源XML元数据的TF*IDF和关键词依赖度对检索结果集进行语义相关度排序,通过改进关键词倒排索引来提高检索效率。实验表明该算法在检索结果排序精确度和时间效率上均有较大的改善,可有效提高政务信息资源利用的数据共享服务能力。  相似文献   

15.
Existing work of XML keyword search focus on how to find relevant and meaningful data fragments for a query, assuming each keyword is intended as part of it. However, in XML keyword search, user queries usually contain irrelevant or mismatched terms, typos etc, which may easily lead to empty or meaningless results. In this paper, we introduce the problem of content-aware XML keyword query refinement, where the search engine should judiciously decide whether a user query Q needs to be refined during the processing of Q, and find a list of promising refined query candidates which guarantee to have meaningful matching results over the XML data, without any user interaction or a second try. To achieve this goal, we build a novel content-aware XML keyword query refinement framework consisting of two core parts: (1) we build a query ranking model to evaluate the quality of a refined query RQ, which captures the morphological/semantical similarity between Q and RQ and the dependency of keywords of RQ over the XML data; (2) we integrate the exploration of RQ candidates and the generation of their matching results as a single problem, which is fulfilled within a one-time scan of the related keyword inverted lists optimally. Finally, an extensive empirical study verifies the efficiency and effectiveness of our framework.  相似文献   

16.
17.
In this paper, we address the problem of cardinality estimation of XPath queries over XML data stored in a distributed, Internet-scale environment such as a large-scale, data sharing system designed to foster innovations in biomedical and health informatics. The cardinality estimate of XPath expressions is useful in XQuery optimization, designing IR-style relevance ranking schemes, and statistical hypothesis testing. We present a novel gossip algorithm called XGossip, which given an XPath query estimates the number of XML documents in the network that contain a match for the query. XGossip is designed to be scalable, decentralized, and robust to failures—properties that are desirable in a large-scale distributed system. XGossip employs a novel divide-and-conquer strategy for load balancing and reducing the bandwidth consumption. We conduct theoretical analysis of XGossip in terms of accuracy of cardinality estimation, message complexity, and bandwidth consumption. We present a comprehensive performance evaluation of XGossip on Amazon EC2 using a heterogeneous collection of XML documents.  相似文献   

18.
This article reports on the XML retrieval system x2 that has been developed at the University of Munich over the last 5 years. In a typical session with x2, the user first browses a structural summary of the XML database in order to select interesting elements and keywords occurring in documents. Using this intermediate result, queries combining structure and textual references are composed semiautomatically. After query evaluation, the full set of answers is presented in a visual and structured way. x2 largely exploits the structure found in documents, queries and answers to enable new interactive visualization and exploration techniques that support mixed IR and database-oriented querying, thus bridging the gap between these three views on the data to be retrieved. Another salient characteristic of x2 that distinguishes it from other visual query systems for XML is that it supports various degrees of detailedness in the presentation of answers, as well as techniques for dynamically reordering, grouping and ranking retrieved elements once the complete answer set has been computed.  相似文献   

19.
基于XML数据的通用路径表达式的查询   总被引:2,自引:0,他引:2  
郑刚 《微机发展》2004,14(11):94-97
查询重写是数据库研究的一个基本问题,它和查询优化、数据仓库、数据集成、语义缓存等数据库问题密切相关。同时,查询重写也是在关系数据库中存储和查询XML数据的关键问题。由于XML数据是元素嵌套和元素引用的,嵌套层数可以任意,并且模式和数据混合,因此XML的查询会涉及到广义路径表达式(GPE)。文中着重研究了如何解决XML数据的查询重写的方案,把含有广义路径表达式的XML查询语句重写为含有简单路径表达式(SPE)的XML查询语句,再转化为基于关系数据库的SQL语句。  相似文献   

20.
Searching XML data using keyword queries has attracted much attention because it enables Web users to easily access XML data without having to learn a structured query language or study possibly complex data schemas. Most of the current approaches identify the meaningful results of a given keyword query based on the semantics of lowest common ancestor (LCA) and its variants. However, given the fact that LCA candidates are usually numerous and of low relevance to the users?? information need, how to effectively and efficiently identify the most relevant results from a large number of LCA candidates is still a challenging and unresolved issue. In this article, we introduce a novel semantics of relevant results based on mutual information between the query keywords. Then, we introduce a novel approach for identifying the relevant answers of a given query by adopting skyline semantics. We also recommend three different ranking criteria for selecting the top-k relevant results of the query. Efficient algorithms are proposed which rely on some provable properties of the dominance relationship between result candidates to rapidly identify the top-k dominant results. Extensive experiments were conducted to evaluate our approach and the results show that the proposed approach has a good performance compared with other existing approaches in different data sets and evaluation metrics  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号