首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Hundreds of millions of users each day submit queries to the Web search engine. The user queries are typically very short which makes query understanding a challenging problem. In this paper, we propose a novel approach for query representation and classification. By submitting the query to a web search engine, the query can be represented as a set of terms found on the web pages returned by search engine. In this way, each query can be considered as a point in high-dimensional space and standard classification algorithms such as regression can be applied. However, traditional regression is too flexible in situations with large numbers of highly correlated predictor variables. It may suffer from the overfitting problem. By using search click information, the semantic relationship between queries can be incorporated into the learning system as a regularizer. Specifically, from all the functions which minimize the empirical loss on the labeled queries, we select the one which best preserves the semantic relationship between queries. We present experimental evidence suggesting that the regularized regression algorithm is able to use search click information effectively for query classification.  相似文献   

2.
3.
Tremendous increase in user-generated content (UGC) published over the web in the form of natural language has posed a formidable challenge to automated information extraction (IE) and content analysis (CA). Techniques based on tree kernels (TK) have been successfully used for modelling semantic compositionality in many natural language processing (NLP) applications. Essentially, these techniques obtain the similarity of two production rules based on exact string comparison between the peer nodes. However, semantically identical tree fragments are forbidden even though they can contribute to the similarity of two trees. A mechanism needs to be addressed that accounts for the similarity of rules with varied syntax and vocabulary holding knowledge that are relatively analogous. In this paper, a hierarchical framework based on document object model (DOM) tree and linguistic kernels that jointly address subjectivity detection, opinion extraction and polarity classification is addressed. The model proceeds in three stages: during first stage, the contents of each DOM tree node is analysed to estimate the complexity of vocabulary and syntax using readability test. In second stage, the semantic tree kernels extended with word embeddings are used to classify nodes containing subjective and objective content. Finally, the content returned to be subjective is further examined for opinion polarity classification using fine-grained linguistic kernels. The efficiency of the proposed model is demonstrated through a series of experiments being conducted. The results reveal that the proposed polarity-enriched tree kernel (PETK) results in better prediction performance compared to the conventional tree kernels.  相似文献   

4.
现有的空间关键字查询处理模式大都仅支持位置相近和文本相似匹配,但不能将语义相近但形式上不匹配的对象提供给用户;并且,当前的空间-文本索引结构也不能对空间对象中的数值属性进行处理。针对上述问题,本文提出了一种支持语义近似查询的空间关键字查询方法。首先,利用词嵌入技术对用户原始查询进行扩展,生成一系列与原始查询关键字语义相关的查询关键字;然后,提出了一种能够同时支持文本和语义匹配,并利用Skyline方法对数值属性进行处理的混合索引结构AIR-Tree;最后,利用AIR-Tree进行查询匹配,返回top-k个与查询条件最为相关的有序空间对象。实验分析和结果表明,与现有同类方法相比,本文方法具有较高的执行效率和较好的用户满意度;基于AIR-Tree索引的查询效率较IRS-Tree索引提高了3.6%,在查询结果准确率上较IR-Tree和IRS-Tree索引分别提高了10.14%和16.15%。  相似文献   

5.
Keyword search is an effective paradigm for information discovery and has been introduced recently to query XML documents. Scoring of XML search results is an important issue in XML keyword search. Traditional “bag-of-words” model cannot differentiate the roles of keywords as well as the relationship between keywords, thus is not proper for XML keyword queries. In this paper, we present a new scoring method based on a novel query model, called keyword query with structure (QWS), which is specially designed for XML keyword query. The method is based on a totally new view taken by the QWS model on a keyword query that, a keyword query is a composition of several query units, each representing a query condition. We believe that this method captures the semantic relevance of the search results. The paper first introduces an algorithm reformulating a keyword query to a QWS. Then, a scoring method is presented which measures the relevance of search results according to how many and how well the query conditions are matched. The scoring method is also extended to clusters of search results. Experimental results verify the effectiveness of our methods.  相似文献   

6.
The expansion of the Internet has made the task of searching a crucial one. Internet users, however, have to make a great effort in order to formulate a search query that returns the required results. Many methods have been devised to assist in this task by helping the users modify their query to give better results. In this paper we propose an interactive method for query expansion. It is based on the observation that documents are often found to contain terms with high information content, which can summarise their subject matter. We present experimental results, which demonstrate that our approach significantly shortens the time required in order to accomplish a certain task by performing web searches.  相似文献   

7.
The content-based cross-media retrieval is a new type of multimedia retrieval in which the media types of query examples and the returned results can be different. In order to learn the semantic correlations among multimedia objects of different modalities, the heterogeneous multimedia objects are analyzed in the form of multimedia document (MMD), which is a set of multimedia objects that are of different media types but carry the same semantics. We first construct an MMD semi-semantic graph (MMDSSG) by jointly analyzing the heterogeneous multimedia data. After that, cross-media indexing space (CMIS) is constructed. For each query, the optimal dimension of CMIS is automatically determined and the cross-media retrieval is performed on a per-query basis. By doing this, the most appropriate retrieval approach for each query is selected, i.e. different search methods are used for different queries. The query dependent search methods make cross-media retrieval performance not only accurate but also stable. We also propose different learning methods of relevance feedback (RF) to improve the performance. Experiment is encouraging and validates the proposed methods.  相似文献   

8.
An approach to learning query-transformation rules based on analyzing the existing data in the database is proposed. A framework and a closure algorithm for learning rules from a given data distribution are described. The correctness, completeness, and complexity of the proposed algorithm are characterized and a detailed example is provided to illustrate the framework  相似文献   

9.
Classification is a procedure to separate data or alternatives into two or more classes. In practice, the need to classify alternatives involving multiple criteria into distinct classes is considerable. Therefore, determining how to assist decision makers in classifying alternatives into multiple classes is an important issue in the field of multiple-criteria decision aids. This study proposes a two-phase case-based distance approach used to assist decision makers to classify alternatives into multiple groups. By incorporating the advantages of the case-based distance method, the proposed two-phase approach can classify alternatives by evaluating a set of cases selected by decision makers, reduce the number of misclassifications, improve multiple solution problems, and lessen the impact of outliers. An interactive classification procedure is also proposed to provide flexibility in such a way that decision makers can check and adjust classification results iteratively.  相似文献   

10.
This paper presents a novel, knowledge-based method for measuring semantic similarity in support of applications aimed at organizing and retrieving relevant textual information. We show how a quantitative context may be established for what is essentially qualitative in nature by effecting a topological transformation of the lexicon into a metric space where distance is well-defined. We illustrate the technique with a simple example and report on promising experimental results with a significant word similarity problem.  相似文献   

11.
Keyword query is an important means to find object information in XML document. Most of the existing keyword query approaches adopt the subtrees rooted at the smallest lowest common ancestors of the keyword matching nodes as the basic result units. The structural relationships among XML nodes are excessively emphasized but the semantic relevance is not fully exploited.To change this situation, we propose the concept of entity subtree and emphasis the semantic relevance among different nodes as querying information from XML. In our approach, keyword query cases are improved to a new keyword-based query language, Grouping and Categorization Keyword Expression (GCKE) and the core query algorithm, finding entity subtrees (FEST) is proposed to return high quality results by fully using the keyword semantic meanings exposed by GCKE. We demonstrate the effectiveness and the efficiency of our approach through extensive experiments.  相似文献   

12.
为了实现分布式空间数据库之间的互操作,需要对分布式查询进行优化处理,这种查询处理指的是在任何一个数据处理语句中它访问的是各个节点的数据而不是仅仅对发起查询的节点。提出了一种查询优化器的体系结构,针对上述查询最优化做了详细的讨论,着重讨论包含空间选择和连接的复杂空间查询。建立了典型的空间数据库的案例程序,通过分析表明,带有过滤和修正的查询优化器在时间与空间上的效率优势比较明显,获得了具有参考价值的结果。  相似文献   

13.
In this paper, we identify the problems of current semantic and hybrid search systems, which seek to bridge structure and unstructured data, and propose solutions. We introduce a novel input mechanism for hybrid semantic search that combines the clean and concise input mechanisms of keyword-based search engines with the expressiveness of the input mechanisms provided by semantic search engines. This interactive input mechanism can be used to formulate ontology-aware search queries without prior knowledge of the ontology. Furthermore, we propose a system architecture for automatically fetching relevant unstructured data, complementing structured data stored in a Knowledge Base, to create a combined index. This combined index can be used to conduct hybrid semantic searches which leverage information from structured and unstructured sources. We present the reference implementation Hybrid Semantic Search System ( \(HS^3\) ), which uses the combined index to put hybrid semantic search into practice and implements the interactive ontology-enhanced keyword-based input mechanism. For demonstration purpose, we apply \(HS^3\) to the tourism domain. We present performance test results and the results of a user evaluation. Finally, we provide instructions on how to apply \(HS^3\) to arbitrary domains.  相似文献   

14.
15.
16.
This research investigates and approach to query processing in a multidatabase system that uses an objectoriented model to capture the semantics of other data models. The object-oriented model is used to construct a global schema, defining an integrated view of the different schemas in the environment. The model is also used as a self-describing model to build a meta-database for storing information about the global schema. A unique aspect of this work is that the object-oriented model is used to describe the different data models of the multidatabase environment, thereby extending the meta database with semantic information about the local schemas. With the global and local schemas all represented in an object-oriented form, structural mappings between the global schema and each local schema are then easily supported. An object algebra then provides a query language for expressing global queries, using the structural mappings to translate object algebra queries into SQL queries over local relational schema. The advantage of using an object algebra is that the object-oriented database can be viewed as a blackboard for temporary storage of local data and for establishing relationships between different databases. The object algebra can be used to directly retrieve temporarily-stored data from the object-oriented database or to transparently retrieve data from local sources using the translation process described in this paper.  相似文献   

17.
One of the useful tools offered by existing web search engines is query suggestion (QS), which assists users in formulating keyword queries by suggesting keywords that are unfamiliar to users, offering alternative queries that deviate from the original ones, and even correcting spelling errors. The design goal of QS is to enrich the web search experience of users and avoid the frustrating process of choosing controlled keywords to specify their special information needs, which releases their burden on creating web queries. Unfortunately, the algorithms or design methodologies of the QS module developed by Google, the most popular web search engine these days, is not made publicly available, which means that they cannot be duplicated by software developers to build the tool for specifically-design software systems for enterprise search, desktop search, or vertical search, to name a few. Keyword suggested by Yahoo! and Bing, another two well-known web search engines, however, are mostly popular currently-searched words, which might not meet the specific information needs of the users. These problems can be solved by WebQS, our proposed web QS approach, which provides the same mechanism offered by Google, Yahoo!, and Bing to support users in formulating keyword queries that improve the precision and recall of search results. WebQS relies on frequency of occurrence, keyword similarity measures, and modification patterns of queries in user query logs, which capture information on millions of searches conducted by millions of users, to suggest useful queries/query keywords during the user query construction process and achieve the design goal of QS. Experimental results show that WebQS performs as well as Yahoo! and Bing in terms of effectiveness and efficiency and is comparable to Google in terms of query suggestion time.  相似文献   

18.
World Wide Web is a continuously growing giant, and within the next few years, Web contents will surely increase tremendously. Hence, there is a great requirement to have algorithms that could accurately classify Web pages. Automatic Web page classification is significantly different from traditional text classification because of the presence of additional information, provided by the HTML structure. Recently, several techniques have been arisen from combinations of artificial intelligence and statistical approaches. However, it is not a simple matter to find an optimal classification technique for Web pages. This paper introduces a novel strategy for vertical Web page classification, which is called Classification using Multi-layered Domain Ontology (CMDO). It employs several Web mining techniques, and depends mainly on proposed multi-layered domain ontology. In order to promote the classification accuracy, CMDO implies a distiller to reject pages related to other domains. CMDO also employs a novel classification technique, which is called Graph Based Classification (GBC). The proposed GBC has pioneering features that other techniques do not have, such as outlier rejection and pruning. Experimental results have shown that CMDO outperforms recent techniques as it introduces better precision, recall, and classification accuracy.  相似文献   

19.
Existing work of XML keyword search focus on how to find relevant and meaningful data fragments for a query, assuming each keyword is intended as part of it. However, in XML keyword search, user queries usually contain irrelevant or mismatched terms, typos etc, which may easily lead to empty or meaningless results. In this paper, we introduce the problem of content-aware XML keyword query refinement, where the search engine should judiciously decide whether a user query Q needs to be refined during the processing of Q, and find a list of promising refined query candidates which guarantee to have meaningful matching results over the XML data, without any user interaction or a second try. To achieve this goal, we build a novel content-aware XML keyword query refinement framework consisting of two core parts: (1) we build a query ranking model to evaluate the quality of a refined query RQ, which captures the morphological/semantical similarity between Q and RQ and the dependency of keywords of RQ over the XML data; (2) we integrate the exploration of RQ candidates and the generation of their matching results as a single problem, which is fulfilled within a one-time scan of the related keyword inverted lists optimally. Finally, an extensive empirical study verifies the efficiency and effectiveness of our framework.  相似文献   

20.
Among the developments of information technology, the most popular tools nowadays for seeking the knowledge are the Google or Yahoo keywords-based search engines on the Internet. Users can easily obtain the information they need, but they still have to read and organize those documents by themselves. Due to that reason, users have to spend most of time in browsing and skipping the documents they have searched. In order to facilitate this process, this paper proposes a query-based ontology knowledge acquisition system which dynamically constructs query-based partial ontology to provide proficient answers for users’ queries. To construct the relationships and hierarchy of concepts in such an ontology, the formal concept analysis approach is adopted. After the ontology is built, the system can deduct the specific answer according to the relationships and hierarchy of ontology without asking users to read the whole document sets. We collected three kinds of sports news pages as source documents including those regarding NBA, CPBL and MLB to evaluate the precision of the system function in the experiment, which, as a result, reveals that the proposed approach indeed can work effectively.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号