首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Identifying and interpreting user intent are fundamental to semantic search. In this paper, we investigate the association of intent with individual words of a search query. We propose that words in queries can be classified as either content or intent, where content words represent the central topic of the query, while users add intent words to make their requirements more explicit. We argue that intelligent processing of intent words can be vital to improving the result quality, and in this work we focus on intent word discovery and understanding. Our approach towards intent word detection is motivated by the hypotheses that query intent words satisfy certain distributional properties in large query logs similar to function words in natural language corpora. Following this idea, we first prove the effectiveness of our corpus distributional features, namely, word co-occurrence counts and entropies, towards function word detection for five natural languages. Next, we show that reliable detection of intent words in queries is possible using these same features computed from query logs. To make the distinction between content and intent words more tangible, we additionally provide operational definitions of content and intent words as those words that should match, and those that need not match, respectively, in the text of relevant documents. In addition to a standard evaluation against human annotations, we also provide an alternative validation of our ideas using clickthrough data. Concordance of the two orthogonal evaluation approaches provide further support to our original hypothesis of the existence of two distinct word classes in search queries. Finally, we provide a taxonomy of intent words derived through rigorous manual analysis of large query logs.  相似文献   

2.
搜索引擎用户查询中的复杂专有名词识别   总被引:1,自引:0,他引:1  
专有名词识别(Named-Entity Recognition,NER)是自然语言处理和信息检索的基础。现有的很多文献集中于人名、地名、机构名等的识别,很少涉及到书名和电影名等较为复杂的专有名词。专注于某搜索引擎的用户查询日志中出现的这类复杂专有名词的识别。根据用户的查询在网络中的上下文数据,将查询进行粗切分,并利用该网络数据作为训练语料训练复杂专名分类器。使用三种不同的分类器,证实该方法能取得相当好的效果。  相似文献   

3.
 The process of ranking (scoring) has been used to make billions of financing decisions each year serving an industry worth hundreds of billion of dollars. To a lesser extent, ranking has also been used to process hundreds of millions of applications by U.S. Universities resulting in over 15 million college admissions in the year 2000 for a total revenue of over $250 billion. College admissions are expected to reach over 17 million by the year 2010 for total revenue of over $280 billion. In this paper, we will introduce fuzzy query and fuzzy aggregation as an alternative for ranking and predicting the risk for credit scoring and university admissions, which currently utilize an imprecise and subjective process. In addition we will introduce the BISC Decision Support System. The main key features of the BISC Decision Support System for the internet applications are (1) to use intelligently the vast amounts of important data in organizations in an optimum way as a decision support system and (2) to share intelligently and securely company's data internally and with business partners and customers that can be process quickly by end users.  相似文献   

4.
5.
顾逸圣  曾国荪 《计算机应用》2017,37(10):2958-2963
针对在编写软件、复用源代码的过程中仅依靠关键词无法精准搜索到适用源代码的问题,提出一种将语法和语义结合的源代码精准搜索方法。首先依据源代码语法语义的客观和唯一性,增加语法结构和"输入/输出"语义作为用户录入请求的一部分,并规范了具体的请求格式;然后在此基础上分别设计源代码语法匹配算法、"输入/输出"语义匹配算法、关键词兼容匹配,以及源代码搜索结果可信度计算算法;最后综合上述算法实现对源代码的精准搜索。测试结果表明:与单纯的关键词搜索相比,提出的方法对搜索的平均排序倒数(MRR)有超过62%的提升,有助于实现源代码的精准搜索。  相似文献   

6.
The similarity search problem has received considerable attention in database research community. In sensor network applications, this problem is even more important due to the imprecision of the sensor hardware, and variation of environmental parameters. Traditional similarity search mechanisms are both improper and inefficient for these highly energy-constrained sensors. A difficulty is that it is hard to predict which sensor has the most similar (or closest) data item such that many or even all sensors need to send their data to the query node for further comparison. In this paper, we propose a similarity search algorithm (SSA), which is a novel framework based on the concept of Hilbert curve over a data-centric storage structure, for efficiently processing similarity search queries in sensor networks. SSA successfully avoids the need of collecting data from all sensors in the network in searching for the most similar data item. The performance study reveals that this mechanism is highly efficient and significantly outperforms previous approaches in processing similarity search queries.  相似文献   

7.
Accompanying the growth of the Internet, computers throughout the world can connect to each other and exchange information, increasing the convenience and efficiency of information-based work. The advent of data-sharing applications, such as Napster and Gnutella, has made peer-to-peer (P2P) systems popular for widespread exchange of resources and voluminous information between millions of users. In recent years, research issues associated with P2P systems have been discussed widely. To resolve the file-availability problem and improve the workload, a method called the Distributed Hash Table (DHT) has been proposed. However, DHT-based systems in structured architectures cannot support efficient queries, such as a similarity query, range query, and partial-match query, due to the characteristics of the hash function. This study presents a novel scheme that supports filename partial-matches in structured P2P systems. The proposed approach supports complex queries and guarantees result quality. Experimental results demonstrate the effectiveness of the proposed approach.  相似文献   

8.
Medium and large clusters incorporating hybrid CPU/graphics processing unit (GPU) nodes are present in many datacenters today. They can accelerate many different kinds of applications and appropriately manage applications dealing with a high volume of data. This is the case of the similarity problem because large databases are managed and very quick responses are required to hundreds or thousands of queries per second. However, the design and usage of heterogeneous computing platforms poses big challenges as system size, energy saving, task mapping, scheduling, among others, must be efficiently handled. In this paper we focus on the scheduling issue for distributing the incoming queries to all the processing components in the cluster nodes. Our algorithms exploit the computational resources, simultaneously processing queries on CPU cores and on the GPUs. Thus, we address the problem of how to distribute the queries over the whole system in order to obtain the best performance, under the assumption of defining a heuristic that automatically provides the best distribution. Experimental results show the benefits in terms of execution time and energy saving of using an appropriate scheduling scheme.  相似文献   

9.
Source code examples are used by developers to implement unfamiliar tasks by learning from existing solutions. To better support developers in finding existing solutions, code search engines are designed to locate and rank code examples relevant to user’s queries. Essentially, a code search engine provides a ranking schema, which combines a set of ranking features to calculate the relevance between a query and candidate code examples. Consequently, the ranking schema places relevant code examples at the top of the result list. However, it is difficult to determine the configurations of the ranking schemas subjectively. In this paper, we propose a code example search approach that applies a machine learning technique to automatically train a ranking schema. We use the trained ranking schema to rank candidate code examples for new queries at run-time. We evaluate the ranking performance of our approach using a corpus of over 360,000 code snippets crawled from 586 open-source Android projects. The performance evaluation study shows that the learning-to-rank approach can effectively rank code examples, and outperform the existing ranking schemas by about 35.65 % and 48.42 % in terms of normalized discounted cumulative gain (NDCG) and expected reciprocal rank (ERR) measures respectively.  相似文献   

10.
11.
12.
We study the problem of generating efficient, equivalent rewritings using views to compute the answer to a query. We take the closed-world assumption, in which views are materialized from base relations, rather than views describing sources in terms of abstract predicates, as is common when the open-world assumption is used. In the closed-world model, there can be an infinite number of different rewritings that compute the same answer, yet have quite different performance. Query optimizers take a logical plan (a rewriting of the query) as an input, and generate efficient physical plans to compute the answer. Thus our goal is to generate a small subset of the possible logical plans without missing an optimal physical plan.We first consider a cost model that counts the number of subgoals in a physical plan, and show a search space that is guaranteed to include an optimal rewriting, if the query has a rewriting in terms of the views. We also develop an efficient algorithm for finding rewritings with the minimum number of subgoals. We then consider a cost model that counts the sizes of intermediate relations of a physical plan, without dropping any attributes, and give a search space for finding optimal rewritings. Our final cost model allows attributes to be dropped in intermediate relations. We show that, by careful variable renaming, it is possible to do better than the standard “supplementary relation” approach, by dropping attributes that the latter approach would retain. Experiments show that our algorithm of generating optimal rewritings has good efficiency and scalability.  相似文献   

13.
Performing complex analysis on top of massive data stores is essential to most modern enterprises and organizations and requires simple, flexible and powerful syntactic constructs to express naturally and succinctly complex decision support queries. In addition, these linguistic features have to be coupled by appropriate evaluation and optimization techniques in order to efficiently compute these queries. In this article we review the concept of grouping variable and describe a simple SQL extension to match it. We show that this extension enables the facile expression of a large class of practical data analysis queries. Besides syntactic simplicity, grouping variables can be neatly modeled in relational algebra via a relational operator, called MD-join. MD-join combines joins and group-bys (a frequent case in decision support queries) into one operator, allowing novel evaluation and optimization techniques. By making explicit how joins interact with group bys, we provide the optimizer with enough information to use specific algorithms and employ appropriate optimization plans, not easily detectable previously. Several experiments demonstrate substantial performance improvements, in some cases of one or two orders of magnitude. The work on grouping variables have influenced at least one commercial system and the standardization of ANSI SQL and implementations of it have been studied in the context of telecom applications, medical and bio-informatics, finance and others. Finally, current work studies the potential of grouping variables in formulating decision support queries over streams of data, one of the latest research trends in database community.  相似文献   

14.
We have already proposed a new concept of “universal multimedia access” intended to narrow the digital divide by providing appropriate multimedia expressions according to users’ (mental and physical) abilities, computer facilities, and network environments. In this article, we redefine the switching functions for our new concept of universal design-based multimedia access, and discuss its user interface for supporting users in accordance with their abilities, computer facilities, and network environments.  相似文献   

15.
16.
17.
Designers require knowledge and data about users to effectively evaluate product accessibility during the early stages of design. This paper addresses this problem by setting out the sensory, cognitive and motor dimensions of user capability that are important for product interaction. The relationship between user capability and product demand is used as the underlying conceptual model for product design evaluations and for estimating the number of people potentially excluded from using a given product.  相似文献   

18.

Historically, the Multimedia community research has focused on output modalities, through studies on timing and multimedia processing. The Multimodal Interaction community, on the other hand, has focused on user-generated modalities, through studies on Multimodal User Interfaces (MUI). In this paper, aiming to assist the development of multimedia applications with MUIs, we propose the integration of concepts from those two communities in a unique high-level programming framework. The framework integrates user modalities —both user-generated (e.g., speech, gestures) and user-consumed (e.g., audiovisual, haptic)— in declarative programming languages for the specification of interactive multimedia applications. To illustrate our approach, we instantiate the framework in the NCL (Nested Context Language) multimedia language. NCL is the declarative language for developing interactive applications for Brazilian Digital TV and an ITU-T Recommendation for IPTV services. To help evaluate our approach, we discuss a usage scenario and implement it as an NCL application extended with the proposed multimodal features. Also, we compare the expressiveness of the multimodal NCL against existing multimedia and multimodal languages, for both input and output modalities.

  相似文献   

19.
《Advanced Robotics》2013,27(5):499-517
We are developing a helper robot that carries out tasks ordered by users through speech. The robot needs a vision system to recognize the objects appearing in the orders. However, conventional vision systems cannot recognize objects in complex scenes. They may find many objects and cannot determine which is the target. This paper proposes a method of using a conversation with the user to solve this problem. The robot asks a question to which the user can easily answer and whose answer can efficiently reduce the number of candidate objects. It considers the characteristics of features used for object identification such as the ease for humans to specify them by word, generating a user-friendly and efficient sequence of questions. Experimental results show that the robot can detect target objects by asking the questions generated by the method.  相似文献   

20.
The following search game is considered: there are two players, say Paul (or questioner) and Carole (or responder). Carole chooses a number x*Sn={1,2,…,n}, Paul has to find the number x* by asking q-ary bi-interval queries and Carole is allowed to lie at most once throughout the game. The minimum worst-case number LB(n,q,1) of q-ary bi-interval queries necessary to guess the number x* is determined exactly for all integers n?1 and q?2. It turns out that LB(n,q,1) coincides with the minimum worst-case number L(n,q,1) of arbitrary q-ary queries.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号