首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Improved relevance ranking in WebGather   总被引:7,自引:0,他引:7       下载免费PDF全文
The amount of information on the web is growing rapidly,and search engines that rely on keyword matching usually return too many low quality matches.To improve search results,a challenging task for search engines is how to effecively calculate a relevance ranking for each web page,This paper discusses in what order a search engine should return the uRLs it has produced in response to a user‘s query,so at to show ore relevant pages first.Emphasis is given on the ranking functions adopted by WebGather that take link structure and user popularity factors into account.Experimental results are also presented to evaluate the proposed strategy.  相似文献   

2.
The Web comprises of voluminous rich learning content.The volume of ever growing learning resources however leads to the problem of information overload.A large number of irrelevant search results generated from search engines based on keyword matching techniques further augment the problem.A learner in such a scenario needs semantically matched learning resources as the search results.Keeping in view the volume of content and significance of semantic knowledge,our paper proposes a multi-threaded semantic focused crawler(SFC) specially designed and implemented to crawl on the WWW for educational learning content.The proposed SFC utilizes domain ontology to expand a topic term and a set of seed URLs to initiate the crawl.The results obtained by multiple iterations of the crawl on various topics are shown and compared with the results obtained by executing an open source crawler on the similar dataset.The results are evaluated using Semantic Similarity,a vector space model based metric,and the harvest ratio.  相似文献   

3.
This paper is concerned with the matchmaker for ranking web services by using semantics. So far several methods of semantic matchmaker have been proposed. Most of them, however, focus on classifying the services into predefined categories rather than providing a ranking result. In this paper, a new method of semantic matchmaker is proposed for ranking web services. It is proposed to use the semantic distance for estimating the matching degree between a service and a user request. Four types of semantic distances are defined and four algorithms are implemented respectively to calculate them. Experimental results show that the proposed semantic matchmaker significantly outperforms the keyword based baseline method.  相似文献   

4.
Keyword query has attracted much research attention due to its simplicity and wide applications. The inherent ambiguity of keyword query is prone to unsatisfied query results. Moreover some existing techniques on Web query, keyword query in relational databases and XML databases cannot be completely applied to keyword query in dataspaces. So we propose KeymanticES, a novel keyword-based semantic entity search mechanism in dataspaces which combines both keyword query and semantic query features. And we focus on query intent disambiguation problem and propose a novel three-step approach to resolve it. Extensive experimental results show the effectiveness and correctness of our proposed approach.  相似文献   

5.
Personalized semantic retrieval extends the query process and optimizes query results by mapping user preference of information to ontology. It can fetch different results according to the same queries from different users. This paper proposes a personalized semantic retrieval model based on social network. It implements the organization, presentation, acquisition and maintenance of user preference data. Finally, it uses these personalization data in the process of information retrieval.  相似文献   

6.
Searching Databases with Keywords   总被引:5,自引:1,他引:4       下载免费PDF全文
Traditionally, SQL query language is used to search the data in databases. However, it is inappropriate for end-users, since it is complex and hard to learn. It is the need of end-user, searching in databases with keywords, like in web search engines. This paper presents a survey of work on keyword search in databases. It also includes a brief introduction to the SEEKER system which has been developed.  相似文献   

7.
The existing solutions to keyword search in the cloud can be divided into two categories: searching on exact keywords and searching on error-tolerant keywords. An error-tolerant keyword search scheme permits to make searches on encrypted data with only an approximation of some keyword. The scheme is suitable to the case where users' searching input might not exactly match those pre-set keywords. In this paper, we first present a general framework for searching on error-tolerant keywords. Then we propose a concrete scheme, based on a fuzzy extractor, which is proved secure against an adaptive adversary under well-defined security definition. The scheme is suitable for all similarity metrics including Hamming distance, edit distance, and set difference. It does not require the user to construct or store anything in advance, other than the key used to calculate the trapdoor of keywords and the key to encrypt data documents. Thus, our scheme tremendously eases the users' burden. What is more, our scheme is able to transform the servers' searching for error-tolerant keywords on ciphertexts to the searching for exact keywords on plaintexts. The server can use any existing approaches of exact keywords search to search plaintexts on an index table.  相似文献   

8.
In recent years,there is a fast proliferation of collaborative tagging(a.k.a.folksonomy) systems in Web 2.0 communities.With the increasingly large amount of data,how to assist users in searching their interested resources by utilizing these semantic tags becomes a crucial problem.Collaborative tagging systems provide an environment for users to annotate resources,and most users give annotations according to their perspectives or feelings.However,users may have different perspectives or feelings on resources,e.g.,some of them may share similar perspectives yet have a conflict with others.Thus,modeling the profile of a resource based on tags given by all users who have annotated the resource is neither suitable nor reasonable.We propose,to tackle this problem in this paper,a community-aware approach to constructing resource profiles via social filtering.In order to discover user communities,three different strategies are devised and discussed.Moreover,we present a personalized search approach by combining a switching fusion method and a revised needs-relevance function,to optimize personalized resources ranking based on user preferences and user issued query.We conduct experiments on a collected real life dataset by comparing the performance of our proposed approach and baseline methods.The experimental results verify our observations and effectiveness of proposed method.  相似文献   

9.
Query suggestions help users refine their queries after they input an initial query.Previous work on query suggestion has mainly concentrated on approaches that are similarity-based or context-based,developing models that either focus on adapting to a specific user(personalization)or on diversifying query aspects in order to maximize the probability of the user being satisfied(diversification).We consider the task of generating query suggestions that are both personalized and diversified.We propose a personalized query suggestion diversification(PQSD)model,where a user's long-term search behavior is injected into a basic greedy query suggestion diversification model that considers a user's search context in their current session.Query aspects are identified through clicked documents based on the open directory project(ODP)with a latent dirichlet allocation(LDA)topic model.We quantify the improvement of our proposed PQSD model against a state-of-the-art baseline using the public america online(AOL)query log and show that it beats the baseline in terms of metrics used in query suggestion ranking and diversification.The experimental results show that PQSD achieves its best performance when only queries with clicked documents are taken as search context rather than all queries,especially when more query suggestions are returned in the list.  相似文献   

10.
11.
We investigate the limitations of existing XML search methods and propose a new semantics, related relationship, to effectively capture meaningful relationships of data elements from XML data in the absence of structural constraints. Then we make an extension to XPath by introducing a new axis, related axis, to specify the related relationship between query nodes so as to enhance the flexibility of XPath. We propose to reduce the cost of computing the related relationship by a new schema summary that summarizes the related relationship from the original schema without any loss. Based on this schema summary, we introduce two indices to improve the performance of query processing. Our algorithm shows that the evaluation of most queries can be equivalently transformed into just a few selection and value join operations, thus avoids the costly structural join operations. The experimental results show that our method is effective and efficient in terms of comparing the effectiveness of the related relationship with existing keyword search semantics and comparing the efficiency of our evaluation methods with existing query engines.  相似文献   

12.
Document similarity search is to find documents similar to a given query document and return a ranked list of similar documents to users, which is widely used in many text and web systems, such as digital library, search engine, etc. Traditional retrieval models, including the Okapi's BM25 model and the Smart's vector space model with length normalization, could handle this problem to some extent by taking the query document as a long query. In practice, the Cosine measure is considered as the best model for document similarity search because of its good ability to measure similarity between two documents. In this paper, the quantitative performances of the above models are compared using experiments. Because the Cosine measure is not able to reflect the structural similarity between documents, a new retrieval model based on TextTiling is proposed in the paper. The proposed model takes into account the subtopic structures of documents. It first splits the documents into text segments with TextTiling and calculates the similarities for different pairs of text segments in the documents. Lastly the overall similarity between the documents is returned by combining the similarities of different pairs of text segments with optimal matching method. Experiments are performed and results show: 1) the popular retrieval models (the Okapi's BM25 model and the Smart's vector space model with length normalization) do not perform well for document similarity search; 2) the proposed model based on TextTiling is effective and outperforms other models, including the Cosine measure; 3) the methods for the three components in the proposed model are validated to be appropriately employed.  相似文献   

13.
Secure XML query answering to protect data privacy and semantic cache to speed up XML query answering are two hot spots in current research areas of XML database systems. While both issues are explored respectively in depth,they have not been studied together,that is,the problem of semantic cache for secure XML query answering has not been addressed yet. In this paper,we present an interesting joint of these two aspects and propose an efficient framework of semantic cache for secure XML query answering,which can improve the performance of XML database systems under secure circumstances. Our framework combines access control,user privilege management over XML data and the state-of-the-art semantic XML query cache techniques,to ensure that data are presented only to authorized users in an efficient way. To the best of our knowledge,the approach we propose here is among the first beneficial efforts in a novel perspective of combining caching and security for XML database to improve system performance. The efficiency of our framework is verified by comprehensive experiments.  相似文献   

14.
In this paper,a noverl technique adopted in HarkMan is introduced.HarkMan is a keywore-spotter designed to automatically spot the given words of a vocabulary-independent task in unconstrained Chinese telephone speech.The speaking manner and the number of keywords are not limited.This paper focuses on the novel technique which addresses acoustic modeling,keyword spotting network,search strategies,robustness,and rejection.The underlying technologies used in HarkMan given in this paper are useful not only for keyword spotting but also for continuous speech recognition.The system has achieved a figure-of-merit value over 90%.  相似文献   

15.
Due to the famous dimensionality curse problem, search in a high-dimensional space is considered as a "hard" problem. In this paper, a novel composite distance transformation method, which is called CDT, is proposed to support a fast κ-nearest-neighbor (κ-NN) search in high-dimensional spaces. In CDT, all (n) data points are first grouped into some clusters by a κ-Means clustering algorithm. Then a composite distance key of each data point is computed. Finally, these index keys of such n data points are inserted by a partition-based B^+-tree. Thus, given a query point, its κ-NN search in high-dimensional spaces is transformed into the search in the single dimensional space with the aid of CDT index. Extensive performance studies are conducted to evaluate the effectiveness and efficiency of the proposed scheme. Our results show that this method outperforms the state-of-the-art high-dimensional search techniques, such as the X-Tree, VA-file, iDistance and NB-Tree.  相似文献   

16.
Due to the famous dimensionality curse problem, search in a high-dimensional space is considered as a "hard" problem. In this paper, a novel composite distance transformation method, which is called CDT, is proposed to support a fast k-nearest-neighbor (k-NN) search in high-dimensional spaces. In CDT, all (n) data points are first grouped into some clusters by a k-Means clustering algorithm. Then a composite distance key of each data point is computed. Finally, these index keys of such n data points are inserted by a partition-based B -tree. Thus, given a query point, its k-NN search in high-dimensional spaces is transformed into the search in the single dimensional space with the aid of CDT index. Extensive performance studies are conducted to evaluate the effectiveness and efficiency of the proposed scheme. Our results show-that this method outperforms the state-of-the-art high-dimensional search techniques, such as the X-Tree, VA-file, iDistance and NB-Tree.  相似文献   

17.
The query space of a similarity query is usually narrowed down by pruning inactive query subspaces which contain no query results and keeping active query subspaces which may contain objects corre-sponding to the request. However,some active query subspaces may contain no query results at all,those are called false active query subspaces. It is obvious that the performance of query processing degrades in the presence of false active query subspaces. Our experiments show that this problem becomes seriously when the data are high dimensional and the number of accesses to false active sub-spaces increases as the dimensionality increases. In order to solve this problem,this paper proposes a space mapping approach to reducing such unnecessary accesses. A given query space can be re-fined by filtering within its mapped space. To do so,a mapping strategy called maxgap is proposed to improve the efficiency of the refinement processing. Based on the mapping strategy,an index structure called MS-tree and algorithms of query processing are presented in this paper. Finally,the performance of MS-tree is compared with that of other competitors in terms of range queries on a real data set.  相似文献   

18.
In real applications of inductive learning for classifi cation, labeled instances are often defi cient, and labeling them by an oracle is often expensive and time-consuming. Active learning on a single task aims to select only informative unlabeled instances for querying to improve the classifi cation accuracy while decreasing the querying cost. However, an inevitable problem in active learning is that the informative measures for selecting queries are commonly based on the initial hypotheses sampled from only a few labeled instances. In such a circumstance, the initial hypotheses are not reliable and may deviate from the true distribution underlying the target task. Consequently, the informative measures will possibly select irrelevant instances. A promising way to compensate this problem is to borrow useful knowledge from other sources with abundant labeled information, which is called transfer learning. However, a signifi cant challenge in transfer learning is how to measure the similarity between the source and the target tasks. One needs to be aware of different distributions or label assignments from unrelated source tasks;otherwise, they will lead to degenerated performance while transferring. Also, how to design an effective strategy to avoid selecting irrelevant samples to query is still an open question. To tackle these issues, we propose a hybrid algorithm for active learning with the help of transfer learning by adopting a divergence measure to alleviate the negative transfer caused by distribution differences. To avoid querying irrelevant instances, we also present an adaptive strategy which could eliminate unnecessary instances in the input space and models in the model space. Extensive experiments on both the synthetic and the real data sets show that the proposed algorithm is able to query fewer instances with a higher accuracy and that it converges faster than the state-of-the-art methods.  相似文献   

19.
The paper proposes a novel symmetrical encoding-based index structure, which is called EDD-tree (for encoding-based dual distance tree), to support fast k-nearest neighbor (k-NN) search in high-dimensional spaces. In the EDD-tree, all data points are first grouped into clusters by a k-means clustering algorithm. Then the uniform ID number of each data point is obtained by a dual-distance-driven encoding scheme, in which each cluster sphere is partitioned twice according to the dual distances of start- and centroid-distance. Finally, the uniform ID number and the centroid-distance of each data point are combined to get a uniform index key, the latter is then indexed through a partition-based B^+-tree. Thus, given a query point, its k-NN search in high-dimensional spaces can be transformed into search in a single dimensional space with the aid of the EDD-tree index. Extensive performance studies are conducted to evaluate the effectiveness and efficiency of our proposed scheme, and the results demonstrate that this method outperforms the state-of-the-art high-dimensional search techniques such as the X-tree, VA-file, iDistance and NB-tree, especially when the query radius is not very large.  相似文献   

20.
A Novel Approach Towards Large Scale Cross-Media Retrieval   总被引:1,自引:1,他引:0       下载免费PDF全文
With the rapid development of Internet and multimedia technology,cross-media retrieval is concerned to retrieve all the related media objects with multi-modality by submitting a query media object.Unfortunately,the complexity and the heterogeneity of multi-modality have posed the following two major challenges for cross-media retrieval:1) how to construct a unified and compact model for media objects with multi-modality,2) how to improve the performance of retrieval for large scale cross-media database.In this paper,we propose a novel method which is dedicate to solving these issues to achieve effective and accurate cross-media retrieval.Firstly,a multi-modality semantic relationship graph(MSRG) is constructed using the semantic correlation amongst the media objects with multi-modality.Secondly,all the media objects in MSRG are mapped onto an isomorphic semantic space.Further,an efficient indexing MK-tree based on heterogeneous data distribution is proposed to manage the media objects within the semantic space and improve the performance of cross-media retrieval.Extensive experiments on real large scale cross-media datasets indicate that our proposal dramatically improves the accuracy and efficiency of cross-media retrieval,outperforming the existing methods significantly.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号