共查询到20条相似文献,搜索用时 15 毫秒
1.
The process of ranking (scoring) has been used to make billions of financing decisions each year serving an industry worth
hundreds of billion of dollars. To a lesser extent, ranking has also been used to process hundreds of millions of applications
by U.S. Universities resulting in over 15 million college admissions in the year 2000 for a total revenue of over $250 billion.
College admissions are expected to reach over 17 million by the year 2010 for total revenue of over $280 billion. In this
paper, we will introduce fuzzy query and fuzzy aggregation as an alternative for ranking and predicting the risk for credit
scoring and university admissions, which currently utilize an imprecise and subjective process. In addition we will introduce
the BISC Decision Support System. The main key features of the BISC Decision Support System for the internet applications
are (1) to use intelligently the vast amounts of important data in organizations in an optimum way as a decision support system
and (2) to share intelligently and securely company's data internally and with business partners and customers that can be
process quickly by end users. 相似文献
3.
The similarity search problem has received considerable attention in database research community. In sensor network applications, this problem is even more important due to the imprecision of the sensor hardware, and variation of environmental parameters. Traditional similarity search mechanisms are both improper and inefficient for these highly energy-constrained sensors. A difficulty is that it is hard to predict which sensor has the most similar (or closest) data item such that many or even all sensors need to send their data to the query node for further comparison. In this paper, we propose a similarity search algorithm (SSA), which is a novel framework based on the concept of Hilbert curve over a data-centric storage structure, for efficiently processing similarity search queries in sensor networks. SSA successfully avoids the need of collecting data from all sensors in the network in searching for the most similar data item. The performance study reveals that this mechanism is highly efficient and significantly outperforms previous approaches in processing similarity search queries. 相似文献
4.
Accompanying the growth of the Internet, computers throughout the world can connect to each other and exchange information, increasing the convenience and efficiency of information-based work. The advent of data-sharing applications, such as Napster and Gnutella, has made peer-to-peer (P2P) systems popular for widespread exchange of resources and voluminous information between millions of users. In recent years, research issues associated with P2P systems have been discussed widely. To resolve the file-availability problem and improve the workload, a method called the Distributed Hash Table (DHT) has been proposed. However, DHT-based systems in structured architectures cannot support efficient queries, such as a similarity query, range query, and partial-match query, due to the characteristics of the hash function. This study presents a novel scheme that supports filename partial-matches in structured P2P systems. The proposed approach supports complex queries and guarantees result quality. Experimental results demonstrate the effectiveness of the proposed approach. 相似文献
5.
Medium and large clusters incorporating hybrid CPU/graphics processing unit (GPU) nodes are present in many datacenters today. They can accelerate many different kinds of applications and appropriately manage applications dealing with a high volume of data. This is the case of the similarity problem because large databases are managed and very quick responses are required to hundreds or thousands of queries per second. However, the design and usage of heterogeneous computing platforms poses big challenges as system size, energy saving, task mapping, scheduling, among others, must be efficiently handled. In this paper we focus on the scheduling issue for distributing the incoming queries to all the processing components in the cluster nodes. Our algorithms exploit the computational resources, simultaneously processing queries on CPU cores and on the GPUs. Thus, we address the problem of how to distribute the queries over the whole system in order to obtain the best performance, under the assumption of defining a heuristic that automatically provides the best distribution. Experimental results show the benefits in terms of execution time and energy saving of using an appropriate scheduling scheme. 相似文献
6.
Source code examples are used by developers to implement unfamiliar tasks by learning from existing solutions. To better support developers in finding existing solutions, code search engines are designed to locate and rank code examples relevant to user’s queries. Essentially, a code search engine provides a ranking schema, which combines a set of ranking features to calculate the relevance between a query and candidate code examples. Consequently, the ranking schema places relevant code examples at the top of the result list. However, it is difficult to determine the configurations of the ranking schemas subjectively. In this paper, we propose a code example search approach that applies a machine learning technique to automatically train a ranking schema. We use the trained ranking schema to rank candidate code examples for new queries at run-time. We evaluate the ranking performance of our approach using a corpus of over 360,000 code snippets crawled from 586 open-source Android projects. The performance evaluation study shows that the learning-to-rank approach can effectively rank code examples, and outperform the existing ranking schemas by about 35.65 % and 48.42 % in terms of normalized discounted cumulative gain (NDCG) and expected reciprocal rank (ERR) measures respectively. 相似文献
8.
We study the problem of generating efficient, equivalent rewritings using views to compute the answer to a query. We take the closed-world assumption, in which views are materialized from base relations, rather than views describing sources in terms of abstract predicates, as is common when the open-world assumption is used. In the closed-world model, there can be an infinite number of different rewritings that compute the same answer, yet have quite different performance. Query optimizers take a logical plan (a rewriting of the query) as an input, and generate efficient physical plans to compute the answer. Thus our goal is to generate a small subset of the possible logical plans without missing an optimal physical plan.We first consider a cost model that counts the number of subgoals in a physical plan, and show a search space that is guaranteed to include an optimal rewriting, if the query has a rewriting in terms of the views. We also develop an efficient algorithm for finding rewritings with the minimum number of subgoals. We then consider a cost model that counts the sizes of intermediate relations of a physical plan, without dropping any attributes, and give a search space for finding optimal rewritings. Our final cost model allows attributes to be dropped in intermediate relations. We show that, by careful variable renaming, it is possible to do better than the standard “supplementary relation” approach, by dropping attributes that the latter approach would retain. Experiments show that our algorithm of generating optimal rewritings has good efficiency and scalability. 相似文献
9.
We have already proposed a new concept of “universal multimedia access” intended to narrow the digital divide by providing
appropriate multimedia expressions according to users’ (mental and physical) abilities, computer facilities, and network environments.
In this article, we redefine the switching functions for our new concept of universal design-based multimedia access, and
discuss its user interface for supporting users in accordance with their abilities, computer facilities, and network environments. 相似文献
11.
Historically, the Multimedia community research has focused on output modalities, through studies on timing and multimedia processing. The Multimodal Interaction community, on the other hand, has focused on user-generated modalities, through studies on Multimodal User Interfaces (MUI). In this paper, aiming to assist the development of multimedia applications with MUIs, we propose the integration of concepts from those two communities in a unique high-level programming framework. The framework integrates user modalities —both user-generated (e.g., speech, gestures) and user-consumed (e.g., audiovisual, haptic)— in declarative programming languages for the specification of interactive multimedia applications. To illustrate our approach, we instantiate the framework in the NCL (Nested Context Language) multimedia language. NCL is the declarative language for developing interactive applications for Brazilian Digital TV and an ITU-T Recommendation for IPTV services. To help evaluate our approach, we discuss a usage scenario and implement it as an NCL application extended with the proposed multimodal features. Also, we compare the expressiveness of the multimodal NCL against existing multimedia and multimodal languages, for both input and output modalities. 相似文献
12.
Designers require knowledge and data about users to effectively evaluate product accessibility during the early stages of
design. This paper addresses this problem by setting out the sensory, cognitive and motor dimensions of user capability that
are important for product interaction. The relationship between user capability and product demand is used as the underlying
conceptual model for product design evaluations and for estimating the number of people potentially excluded from using a
given product. 相似文献
13.
We are developing a helper robot that carries out tasks ordered by users through speech. The robot needs a vision system to recognize the objects appearing in the orders. However, conventional vision systems cannot recognize objects in complex scenes. They may find many objects and cannot determine which is the target. This paper proposes a method of using a conversation with the user to solve this problem. The robot asks a question to which the user can easily answer and whose answer can efficiently reduce the number of candidate objects. It considers the characteristics of features used for object identification such as the ease for humans to specify them by word, generating a user-friendly and efficient sequence of questions. Experimental results show that the robot can detect target objects by asking the questions generated by the method. 相似文献
14.
The Gnutella file sharing system allows a large number of peers to share their local files. However, it does not coordinate the way by which these shared objects are named or how they are searched by other users; such decisions are made independently by each peer. In this work, we investigate the practical performance implications of this design. We collected the shared filenames and user generated queries over a three-year period. We show the mismatch between these naming mechanisms. We show the fundamental limitations of Gnutella performance that cannot be addressed by improvements in overlays or by varying the search mechanisms alone. Based on our observations, we describe two practical approaches to improve Gnutella performance. We describe a mechanism to build the file term synopsis using the observed popularity of queries routed through the ultrapeer. We also describe a query transformation mechanism that improves the success rates for failed queries. 相似文献
15.
The following search game is considered: there are two players, say Paul (or questioner) and Carole (or responder). Carole chooses a number x*∈ Sn={1,2,…, n}, Paul has to find the number x* by asking q- ary bi-interval queries and Carole is allowed to lie at most once throughout the game. The minimum worst-case number LB( n, q,1) of q- ary bi-interval queries necessary to guess the number x* is determined exactly for all integers n?1 and q?2. It turns out that LB( n, q,1) coincides with the minimum worst-case number L( n, q,1) of arbitrary q-ary queries. 相似文献
16.
Experienced users who query search engines have a complex behavior. They explore many topics in parallel, experiment with query variations, consult multiple search engines, and gather information over many sessions. In the process they need to keep track of search context — namely useful queries and promising result links, which can be hard. We present an extension to search engines called SearchPad that makes it possible to keep track of ‘search context' explicitly. We describe an efficient implementation of this idea deployed on four search engines: AltaVista, Excite, Google and Hotbot. Our design of SearchPad has several desirable properties: (i) portability across all major platforms and browsers; (ii) instant start requiring no code download or special actions on the part of the user; (iii) no server side storage; and (iv) no added client–server communication overhead. An added benefit is that it allows search services to collect valuable relevance information about the results shown to the user. In the context of each query SearchPad can log the actions taken by the user, and in particular record the links that were considered relevant by the user in the context of the query. The service was tested in a multi-platform environment with over 150 users for 4 months and found to be usable and helpful. We discovered that the ability to maintain search context explicitly seems to affect the way people search. Repeat SearchPad users looked at more search results than is typical on the Web, suggesting that availability of search context may partially compensate for non-relevant pages in the ranking. 相似文献
17.
Methods and tools for binary code analysis developed in the Institute of System Programming, Russian Academy of Sciences, and their applications in algorithm and data format recovery are considered. The executable code of various general-purpose CPU architectures is analyzed. The analysis is performed given no source codes, debugging information, and specific OS version requirements. The approach implies collecting a detailed machine instruction level execution trace; a method for successively increasing presentation level; extraction of algorithm’s code followed by structuring of both code and data formats it processes. Important results are obtained, viz. an intermediate representation is developed that allows carrying out most preliminary processing tasks and algorithm code extraction without having to focus on specifics of a given machine; and a method and software tool are developed for automated recovery of network message and file formats. The tools are integrated into the unified analysis platform that supports their combined use. The architecture behind the platform is also described. Examples of its application to real programs are given. 相似文献
18.
Keyword-based Web search is a widely used approach for locating information on the Web. However, Web users usually suffer from the difficulties of organizing and formulating appropriate input queries due to the lack of sufficient domain knowledge, which greatly affects the search performance. An effective tool to meet the information needs of a search engine user is to suggest Web queries that are topically related to their initial inquiry. Accurately computing query-to-query similarity scores is a key to improve the quality of these suggestions. Because of the short lengths of queries, traditional pseudo-relevance or implicit-relevance based approaches expand the expression of the queries for the similarity computation. They explicitly use a search engine as a complementary source and directly extract additional features (such as terms or URLs) from the top-listed or clicked search results. In this paper, we propose a novel approach by utilizing the hidden topic as an expandable feature. This has two steps. In the offline model-learning step, a hidden topic model is trained, and for each candidate query, its posterior distribution over the hidden topic space is determined to re-express the query instead of the lexical expression. In the online query suggestion step, after inferring the topic distribution for an input query in a similar way, we then calculate the similarity between candidate queries and the input query in terms of their corresponding topic distributions; and produce a suggestion list of candidate queries based on the similarity scores. Our experimental results on two real data sets show that the hidden topic based suggestion is much more efficient than the traditional term or URL based approach, and is effective in finding topically related queries for suggestion. 相似文献
20.
Range and k-nearest neighbor searching are core problems in pattern recognition. Given a database S of objects in a metric space M and a query object q in M, in a range searching problem the goal is to find the objects of S within some threshold distance to g, whereas in a k-nearest neighbor searching problem, the k elements of S closest to q must be produced. These problems can obviously be solved with a linear number of distance calculations, by comparing the query object against every object in the database. However, the goal is to solve such problems much faster. We combine and extend ideas from the M-tree, the multivantage point structure, and the FQ-tree to create a new structure in the "bisector tree" class, called the Antipole tree. Bisection is based on the proximity to an "Antipole" pair of elements generated by a suitable linear randomized tournament. The final winners a, b of such a tournament is far enough apart to approximate the diameter of the splitting set. If dist(a, b) is larger than the chosen cluster diameter threshold, then the cluster is split. The proposed data structure is an indexing scheme suitable for (exact and approximate) best match searching on generic metric spaces. The Antipole tree outperforms by a factor of approximately two existing structures such as list of clusters, M-trees, and others and, in many cases, it achieves better clustering properties. 相似文献
|