首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 797 毫秒
1.
Metasearch engines offer better coverage and are more fault-tolerant and expandable than single search engines. A metasearch engine is required to post queries with and obtain retrieval results from several other Internet search engines. In this paper, we describe the use of the extensible style language (XSL) to support metasearches. We show how XSL can transform a query, expressed in XML, into different forms for different search engines. We show how the retrieval results could be transformed into a standard format so that the metasearch engine can interpret the retrieved data, filtering the irrelevant information (e.g. advertisement). The proposed structure treats the metasearch engine and the individual search engines as separate modules with a clearly defined communication structure through XSL. Thus, the system is more extensible than coding the structure and syntactic transformation processes. It allows other new search engines to be included just through plug-and-play, requiring only that the new transformation of XML for this search engine be included in the XSL.  相似文献   

2.
Semantic Web search is a new application of recent advances in information retrieval (IR), natural language processing, artificial intelligence, and other fields. The Powerset group in Microsoft develops a semantic search engine that aims to answer queries not only by matching keywords, but by actually matching meaning in queries to meaning in Web documents. Compared to typical keyword search, semantic search can pose additional engineering challenges for the back-end and infrastructure designs. Of these, the main challenge addressed in this paper is how to lower query latencies to acceptable, interactive levels. Index-based semantic search requires more data processing, such as numerous synonyms, hypernyms, multiple linguistic readings, and other semantic information, both on queries and in the index. In addition, some of the algorithms can be super-linear, such as matching co-references across a document. Consequently, many semantic queries can run significantly slower than the same keyword query. Users, however, have grown to expect Web search engines to provide near-instantaneous results, and a slow search engine could be deemed unusable even if it provides highly relevant results. It is therefore imperative for any search engine to meet its users’ interactivity expectations, or risk losing them. Our approach to tackle this challenge is to exploit data parallelism in slow search queries to reduce their latency in multi-core systems. Although all search engines are designed to exploit parallelism, at the single-node level this usually translates to throughput-oriented task parallelism. This paper focuses on the engineering of two latency-oriented approaches (coarse- and fine-grained) and compares them to the task-parallel approach. We use Powerset’s deployed search engine to evaluate the various factors that affect parallel performance: workload, overhead, load balancing, and resource contention. We also discuss heuristics to selectively control the degree of parallelism and consequent overhead on a query-by-query level. Our experimental results show that using fine-grained parallelism with these dynamic heuristics can significantly reduce query latencies compared to fixed, coarse-granularity parallelization schemes. Although these results were obtained on, and optimized for, Powerset’s semantic search, they can be readily generalized to a wide class of inverted-index search engines.  相似文献   

3.
We have to deal with different data formats whenever data formats evolve or data must be integrated from heterogeneous systems. These data when implemented in XML for data exchange cannot be shared freely among applications without data transformation. A common approach to solve this problem is to convert the entire XML data from their source format to the applications’ target formats using the transformations rules specified in XSLT stylesheets. However, in many cases, not all XML data are required to be transformed except for a smaller part described by a user’s query (application). In this paper, we present an approach that optimizes the execution time of an XSLT stylesheet for answering a given XPath query by modifying the XSLT stylesheet in such a way that it would (a) capture only the parts in the XML data that are relevant to the query and (b) process only those XSLT instructions that are relevant to the query. We prove the correctness of our optimization approach, analyze its complexity and present experimental results. The experimental results show that our approach performs the best in terms of execution time, especially when many cost-intensive XSLT instructions can be excluded in the XSLT stylesheet.  相似文献   

4.
集成搜索引擎的文本数据库选择   总被引:8,自引:0,他引:8  
用户需要检索的信息往往分散存储在多个搜索多个搜索引擎各自的数据库里,对普通用户而言,访问多个搜索引擎并从返回的结果中分辨出确实有网页是一件费时费力的工作,集成搜索引擎则可以提供给用户一个同时记问多个搜索引擎人集成环境,集成搜索引擎能将其接收到的用户查询提交给底层的多个搜索引擎进行搜索,作为一种搜索工具,集成搜索引擎具有如WEB查询覆盖面比传统引擎更大,引警有更好的可扩展性等优点,讨论了解决集成搜索引擎的数据库选择问题的多种技术,针对用户提交的查询要求,通过数据库选择可以选定最有可能返回有用信息的底层搜索引擎。  相似文献   

5.
Searching desired data on the Internet is one of the most common ways the Internet is used. No single search engine is capable of searching all data on the Internet. The approach that provides an interface for invoking multiple search engines for each user query has the potential to satisfy more users. When the number of search engines under the interface is large, invoking all search engines for each query is often not cost effective because it creates unnecessary network traffic by sending the query to a large number of useless search engines and searching these useless search engines wastes local resources. The problem can be overcome if the usefulness of every search engine with respect to each query can be predicted. We present a statistical method to estimate the usefulness of a search engine for any given query. For a given query, the usefulness of a search engine in this paper is defined to be a combination of the number of documents in the search engine that are sufficiently similar to the query and the average similarity of these documents. Experimental results indicate that our estimation method is much more accurate than existing methods.  相似文献   

6.
定向查询引擎在Web化学数据库集成检索中的应用   总被引:7,自引:7,他引:0  
Internet上的化学数据库是重要的专业资源,基于超链接分析的搜索引擎还不能索引这类资源。本论文以充分利用Internet上的化学数据库数据为目标,将“一个查询发动多个同级检索引擎,并以结构化的方式组织信息”的方案应用于以化合物标识信息为检索入口的Web化学数据库,建立了一个基于多站点集成检索的Web数据库定向查询引擎。该引擎是一个包括用户交互层、中间检索层、数据提供层的三层Web模型。各层在系统内部分别对应于响应用户检索请求的客户端代理模块、集成远程Web信息的服务器端代理模块,以及提供缓存和检索的关系数据库模块。模型采用JSP+Java组件的开发方式,在HTTP协议标准发送方法的基础上,采用XML技术对检索返回文档进行结构化数据的提取和表示,利用XML—DBMS实现XML数据的存储和检索,建立了一套针对深层Web数据交换的解决方案。依此方案所建立的ChemDB Portal Search实现了四个分布式Web化学数据库的有效加入、同时检索和统一显示。该系统是针对深层Web信息的挖掘和集成检索的一次尝试,它可为其它领域建立类似的系统提供借鉴。  相似文献   

7.
A universal search engine is unable to provide a personal touch to a user query. To overcome the deficiency of a universal search engine, vertical search engines are used, which return search results from a specific domain. An alternate option is to use a personalized search system. In our endeavor to provide personalized search results, the proposed system, Exclusively Your’s, observes a user browsing behavior and his actions. Based on the observed user behavior, it dynamically constructs user profile which consists of some terms that are related to user's interest. The constructed profile is later used for query expansion. The goal of research work in this paper is not to provide all the relevant results, but a few high quality personalized search results at the top of ranked list, which in other words means high precision. We performed experiments by personalizing Google, Yahoo, and Naver (widely used search engine in Korea). The results show that using Exclusively Your’s, a search engine yields significant improvement. We also compared the user profile constructed by the proposed approach with other similar personalization approaches; the results show a marginal increase in precision.  相似文献   

8.
Today the current state of the art in querying XML data is represented by XPath and XQuery, both of which rely on Boolean conditions for node selection. Boolean selection is too restrictive when users do not use or even know the data structure precisely, e.g. when queries are written based on a summary rather than on a schema. In this paper we describe a XML querying framework, called FuzzyXPath, based on Fuzzy Set Theory, which relies on fuzzy conditions for the definition of flexible constraints on stored data. A function called “deep-similar” is introduced to replace XPath’s typical “deep-equal” function. The main goal is to provide a degree of similarity between two XML trees, assessing whether they are similar both structure-wise and content-wise. Several query examples are discussed in the field of XML based metadata for e-learning.  相似文献   

9.
The parametric data model captures an object in terms of a single tuple. This feature eliminates unnecessary self-join operations to combine tuples scattered in a temporal relation. Despite this advantage, this model is relatively difficult to implement on top of relational databases because the sizes of attributes are unfixed. Since data boundaries are not problematic in XML, XML can be an elegant solution to implement parametric databases for temporal data. There are two approaches to implementing parametric databases using XML: (1) a native XML database with XQuery engine, and (2) an XML storage with a temporal query language. To determine which approach is appropriate in parametric databases, we consider four questions: the effectiveness of XML in modeling temporal data, the applicability of XML query languages, the user-friendliness of the query languages, and system performances of two approaches. By evaluating the four questions, we show that the latter approach is more appropriate to utilizing XML in parametric databases.  相似文献   

10.
A decentralized search engine for dynamic Web communities   总被引:1,自引:1,他引:0  
Currently, most Web search engines perform search on corpus comprising nearly entire content of the Web. The same centralized search service can be performed on a single site as well. Nonetheless, there is little research on community-wide search. This paper presents a peer-to-peer search engine ComSearch. ComSearch is designed to provide small- and middle-scale online communities—the ability to perform text search within the community. Communities are formed in a self-organizing style. P2P IR system may suffer unnecessary internal traffic in answering a multi-term query. In this paper, we propose several techniques to optimize the multi-term query process. The simulation results show that our proposed algorithms have good scalability. Compared with baseline approach, our improved algorithm can reduce the communication cost by about two orders of magnitude in the best case. We also deploy the system in a small-scale network and conduct a series of experiments to estimate the actual query response time as well as to investigate the data movement effect caused by node joining. Experimental results show that multiple data movements are quite common during network expansion. However, the percentage of multiple data movements decreases when a network is getting stable after the initial frequent joining activities. This provides possibilities for improvement on P2P data movement management.  相似文献   

11.
As a large number of corpuses are represented, stored and published in XML format, how to find useful information from XML databases has become an increasingly important issue. Keyword search enables web users to easily access XML data without the need to learn a structured query language or to study complex data schemas. Most existing indexing strategies for XML keyword search are based upon Dewey encoding. In this paper, we proposed a new encoding method called Level Order and Father (LAF) for XML documents. With LAF encoding, we devised a new index structure, called two‐layer LAF inverted index, which can greatly decrease the space complexity compared with Dewey encoding‐based inverted index. Furthermore, with two‐layer LAF inverted index, we proposed a new keyword query algorithm called Algorithm based on Binary Search (ABS) that can quickly find all Smallest Lowest Common Ancestor. We experimentally evaluate two‐layer LAF inverted index and ABS algorithm on four real XML data sets selected from Wikipedia. The experimental results prove the advantages of our index method and querying algorithm. The space consumed by two‐layer LAF index is less than half of that consumed by Dewey inverted index. Moreover, ABS is about one to two orders of magnitude faster than the classic Stack algorithm. Concurrency and Computation: Practice and Experience, 2012.© 2012 Wiley Periodicals, Inc.  相似文献   

12.
XQuery简介及在.Net中的应用   总被引:3,自引:0,他引:3  
XML是一种通用标记语言,它能对各种数据源的信息内容进行标记,包括结构化和半结构化文档、关系数据库和对象库,不管它们是物理存储在XML中还是通过中间产生展现为XML,都能够智能地使用XML结构化查询语言表达或查询这些类型数据。XQuery是定位于XML的查询语言,它能广泛应用于很多类型的XML数据源。  相似文献   

13.
People who classify and identify things based on their observable or deducible properties (called “characters” by biologists) can benefit from databases and keys that assist them in naming a specimen. This paper discusses our approach to generating an identification tool based on the field guide concept. Our software accepts character lists either expressed as XML (which biologists rarely provide knowingly—although most databases can now export in XML) or via ODBC connections to the data author’s relational database. The software then produces an Electronic Field Guide (EFG) implemented as a collection of Java servlets. The resulting guide answers queries made locally to a backend, or to Internet data sources via http, and returns XML. If, however, the query client requires HTML (e.g., if the EFG is responding to a human-centric browser interface that we or the remote application provides), or if some specialized XML is required, then the EFG forwards the XML to a servlet that applies an XSLT transformation to provide the look and feel that the client application requires. We compare our approach to the architecture of other taxon identification tools. Finally, we discuss how we combine this service with other biodiversity data services on the web to make integrated applications.  相似文献   

14.
用户使用关键字查询时可能不能准确地表达他们的意图,即使用户正确地表达了查询意图,查询引擎也可能不能准确地返回查询结果.针对这一问题,重点研究了在XML关键字查询中如何进行有效的查询改写并生成有意义的结果.提出4种查询改写操作和查询改写代价的概念,给出了动态规划的方法计算查询改写代价.为了找出最优的查询改写,给出了基于栈的查询改写和结果生成算法,并提出了基于划分的优化算法.最后通过丰富的实验对提出的方法进行了验证.  相似文献   

15.
基于相关术语集的搜索引擎选择   总被引:1,自引:0,他引:1  
欧洁 《计算机科学》2003,30(7):56-59
1 引言 Web从1991年出现以来,已经发展成为一个巨大的全球化信息空间,而且其信息容量仍在以指数形式飞速增长。面对海量Web信息资源,如何有效地检索Web信息,以帮助用户从大量文档信息集合中找到对给定查询请求有用的文档子集,也就成为一项重要而迫切的研究课题。  相似文献   

16.
Domain-specific Web search with keyword spices   总被引:4,自引:0,他引:4  
Domain-specific Web search engines are effective tools for reducing the difficulty experienced when acquiring information from the Web. Existing methods for building domain-specific Web search engines require human expertise or specific facilities. However, we can build a domain-specific search engine simply by adding domain-specific keywords, called "keyword spices," to the user's input query and forwarding it to a general-purpose Web search engine. Keyword spices can be effectively discovered from Web documents using machine learning technologies. The paper describes domain-specific Web search engines that use keyword spices for locating recipes, restaurants, and used cars.  相似文献   

17.
Databases deepen the Web   总被引:2,自引:0,他引:2  
Ghanem  T.M. Aref  W.G. 《Computer》2004,37(1):116-117
The Web has become the preferred medium for many database applications, such as e-commerce and digital libraries. These applications store information in huge databases that users access, query, and update through the Web. Database-driven Web sites have their own interfaces and access forms for creating HTML pages on the fly. Web database technologies define the way that these forms can connect to and retrieve data from database servers. The number of database-driven Web sites is increasing exponentially, and each site is creating pages dynamically-pages that are hard for traditional search engines to reach. Such search engines crawl and index static HTML pages; they do not send queries to Web databases. The information hidden inside Web databases is called the "deep Web" in contrast to the "surface Web" that traditional search engines access easily. We expect deep Web search engines and technologies to improve rapidly and to dramatically affect how the Web is used by providing easy access to many more information resources.  相似文献   

18.
Secure XML query answering to protect data privacy and semantic cache to speed up XML query answering are two hot spots in current research areas of XML database systems. While both issues are explored respectively in depth,they have not been studied together,that is,the problem of semantic cache for secure XML query answering has not been addressed yet. In this paper,we present an interesting joint of these two aspects and propose an efficient framework of semantic cache for secure XML query answering,which can improve the performance of XML database systems under secure circumstances. Our framework combines access control,user privilege management over XML data and the state-of-the-art semantic XML query cache techniques,to ensure that data are presented only to authorized users in an efficient way. To the best of our knowledge,the approach we propose here is among the first beneficial efforts in a novel perspective of combining caching and security for XML database to improve system performance. The efficiency of our framework is verified by comprehensive experiments.  相似文献   

19.
Keyword query processing over graph structured data is beneficial across various real world applications. The basic unit, of search and retrieval, in keyword search over graph, is a structure (interconnection of nodes) that connects all the query keywords. This new answering paradigm, in contrast to single web page results given by search engines, brings forth new challenges for ranking. In this paper, we propose a simple but effective Fuzzy set theory based Ranking measure, called FRank. Fuzzy sets acknowledge the contribution of each individual query keyword, discretely, to enumerate node relevance. A novel aggregation operator is defined, to combine the content relevance based fuzzy sets and, compute query dependent edge weights. The final rank, of an answer, is computed by non-monotonic addition of edge weights, as per their relevance to keyword query. FRank evaluates each answer based on the distribution of query keywords and structural connectivity between those keywords. An extensive empirical analysis shows superior performance by our proposed ranking measure as compared to the ranking measures adopted by current approaches in the literature.  相似文献   

20.
互联网应用广泛,Web上数据库信息发布和检索量迅速增加,而且数据都是基于XML技术的,这就使的传统的搜索引擎不能满足Web的需求。为此,针对XML的优点,结合各种成熟的理论,将XML技术应用到搜索引擎的研究中去;实践结果表明,搜索引擎中采用软件构件查询算法和基于XML的数据处理技术,能实现高效,快速,准确的检索,能较好地解决了目前web检索结果准确性和相关性不高的问题。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号