共查询到20条相似文献,搜索用时 31 毫秒
1.
Steffen Lange 《Information Processing Letters》2004,91(6):285-292
A natural approach towards powerful machine learning systems is to enable options for additional machine/user interactions, for instance by allowing the system to ask queries about the concept to be learned. This motivates the development and analysis of adequate formal learning models.In the present paper, we investigate two different types of query learning models in the context of learning indexable classes of recursive languages: Angluin's original model and a relaxation thereof, called learning with extra queries. In the original model the learner is restricted to query languages belonging to the target class, while in the new model it is allowed to query other languages, too. As usual, the following standard types of queries are considered: superset, subset, equivalence, and membership queries.The learning capabilities of the resulting query learning models are compared to one another and to different versions of Gold-style language learning from only positive data and from positive and negative data (including finite learning, conservative inference, and learning in the limit). A complete picture of the relation of all these models has been elaborated. A couple of interesting differences and similarities between query learning and Gold-style learning have been observed. In particular, query learning with extra superset queries coincides with conservative inference from only positive data. This result documents the naturalness of the new query model. 相似文献
2.
3.
《Information Systems》2001,26(6):445-475
The rapid increase in end-user computing calls into question the suitability of existing database query languages (DBQLs). Because the typical DB end-user is not a DB specialist, it is essential that DBQLs use concepts that are as close as possible to those in the end-users’ cognitive mental model and adopt interface techniques that are suited to end-users’ abilities. Concept-based query languages are well suited for this. This realization has motivated further research in conceptual, or semantic, query approaches. However, the primary focus in this field has been on semantic query optimization, not on query formulation. In this study, we address ourselves to the problem of formulation of queries using concepts. We propose a concept-based query language, called the conceptual query language (CQL), which allows for the conceptual abstraction of database queries and exploits the rich semantics of data models to ease and facilitate query formulation.The CQL approach uses the relationship semantics of semantic data models to render transparent the technical complexities of existing DB query languages. Association semantics are also used to automatically construct query graphs and pseudo-natural language explanations of queries, and to generate SQL codes. A set theoretic formalism for conceptual queries is developed and used. This paper discusses the design of CQL, its expressive power, its implementation, and the strategies for CQL query processing. The implementation of a CQL prototype is briefly discussed in this paper. User experiments were carried out extensively and showed the advantage of CQL over alternative languages such as SQL. 相似文献
4.
5.
This paper discusses the issues involved in designing a query language for the Semantic Web and presents the OWL query language (OWL-QL) as a candidate standard language and protocol for query–answering dialogues among Semantic Web computational agents using knowledge represented in the W3Cs ontology web language (OWL). OWL-QL is a formal language and precisely specifies the semantic relationships among a query, a query answer, and the knowledge base(s) used to produce the answer. Unlike standard database and Web query languages, OWL-QL supports query–answering dialogues in which the answering agent may use automated reasoning methods to derive answers to queries, as well as dialogues in which the knowledge to be used in answering a query may be in multiple knowledge bases on the Semantic Web, and/or where those knowledge bases are not specified by the querying agent. In this setting, the set of answers to a query may be of unpredictable size and may require an unpredictable amount of time to compute. 相似文献
6.
Boolean query mapping across heterogeneous information sources 总被引:5,自引:0,他引:5
Chen-Chuan Chang K. Garcia-Molina H. Paepcke A. 《Knowledge and Data Engineering, IEEE Transactions on》1996,8(4):515-521
Searching over heterogeneous information sources is difficult because of the nonuniform query languages. Our approach is to allow a user to compose Boolean queries in one rich front end language. For each user query and target source, we transform the user query into a subsuming query that can be supported by the source but that may return extra documents. The results are then processed by a filter query to yield the correct final result. We introduce the architecture and associated algorithms for generating the supported subsuming queries and filters. We show that generated subsuming queries return a minimal number of documents; we also discuss how minimal cost filters can be obtained. We have implemented prototype versions of these algorithms and demonstrated them on heterogeneous Boolean systems 相似文献
7.
Similarity query processing is becoming increasingly important in many applications such as data cleaning, record linkage, Web search, and document analytics. In this paper we study how to provide end-to-end similarity query support natively in a parallel database system. We discuss how to express a similarity predicate in its query language, how to build indexes, how to answer similarity queries (selections and joins) efficiently in the runtime engine, possibly using indexes, and how to optimize similarity queries. One particular challenge is how to incorporate existing similarity join algorithms, which often require a series of steps to achieve a high efficiency, including collecting token frequencies, finding matching record id pairs, and reassembling result records based on id pairs. We present a novel approach that uses existing runtime operators to implement such complex join algorithms without reinventing the wheel; doing so positions the system to automatically benefit from future improvements to those operators. The approach includes a technique to transform a similarity join plan into an efficient operator-based physical plan during query optimization by using a template expressed largely in the system’s user-level query language; this technique greatly simplifies the specification of such a transformation rule. We use Apache AsterixDB, a parallel Big Data management system, to illustrate and validate our techniques. We conduct an experimental study using several large, real datasets on a parallel computing cluster to assess the similarity query support. We also include experiments involving three other parallel systems and report the efficacy and performance results. 相似文献
8.
Web users often post queries through form-based interfaces on the Web to retrieve data from the Web; however, answers to these
queries are mostly computed according to keywords entered into different fields specified in a query interface, and their
precision and recall could be low. The precision and recall ratios in answering this type of query can be improved by considering
closely related previous queries submitted through the same interface, along with their answers. In this paper, we present
an approach for enhancing the retrieval of relevant answers to a form-based Web query by adopting the data-mining approach
using previous, relevant queries and their answers. Experimental results on a randomly selected set of 3,800 documents retrieved
from various Web sites show that our data-mining, query-rewriting approach achieves average precision and true positive ratios
on rewritten queries in the upper 80% range, whereas the average false positive ratio is less than 2.0%.
Work partially done during a visit to BYU and partially supported by National Natural Science Foundation of China No. 60503036
and Fok YingTong Education Foundation No. 104027. 相似文献
9.
《Information and Software Technology》2006,48(9):901-914
Database applications tend toward getting more versatile and broader to comply with the expansion of various organizations. However, naïve users usually suffer from accessing data arbitrarily by using formal query languages. Therefore, we believe that accessing databases using natural language constructs will become a popular interface in the future. The concept of object-oriented modeling makes the real world to be well represented or expressed in some kinds of logical form. Since the class diagram in UML is used to model the static relationships of databases, in this paper, we intend to study how to extend the UML class diagram representations to capture natural language queries with fuzzy semantics. By referring to the conceptual schema throughout the class diagram representation, we propose a methodology to map natural language constructs into the corresponding class diagram and employ Structured Object Model (SOM) methodology to transform the natural language queries into SQL statements for query executions. Moreover, our approach can handle queries containing vague terms specified in fuzzy modifiers, like ‘good’ or ‘bad’. By our approach, users obtain not only the query answers but also the corresponding degree of vagueness, which can be regarded as the same way we are thinking. 相似文献
10.
《Information Fusion》2007,8(1):70-83
Previous approaches in query processing do not consider queries to automatically combine results obtained from different information sources, i.e. they do not support information fusion. In this work, an approach for information fusion using a progressive query language and an interactive reasoner is for this reason introduced. The system basically consists of a query processor with fusion capability and a reasoner with learning capability. This query processor first executes a query to produce some initial results. If the initial results are uninformative, then the reasoner guided by the user creates a more elaborate query by means of some rule and returns the query to the query processor to produce a more informative answer. What is novel in our approach is that application-dependent information fusion rules can be initially specified by the user and subsequently learned by the reasoner. Examples of progressive queries are drawn from multi-sensor information fusion applications. 相似文献
11.
We present a new formal model of query and computation on the Web. We focus on two important aspects that distinguish the access to Web data from the access to a standard database system: the navigational nature of the access and the lack of concurrency control. We show that these two issues have significant effects on the computability of queries. To illustrate the ideas and how they can be used in practice for designing appropriate Web query languages, we consider a particular query language, the Web calculus, an abstraction and extension of the practical Web query language WebSQL. 相似文献
12.
Plagiarism source retrieval is the core task of plagiarism detection. It has become the standard for plagiarism detection to use the queries extracted from suspicious documents to retrieve the plagiarism sources. Generating queries from a suspicious document is one of the most important steps in plagiarism source retrieval. Heuristic-based query generation methods are widely used in the current research. Each heuristic-based method has its own advantages, and no one statistically outperforms the others on all suspicious document segments when generating queries for source retrieval. Further improvements on heuristic methods for source retrieval rely mainly on the experience of experts. This leads to difficulties in putting forward new heuristic methods that can overcome the shortcomings of the existing ones. This paper paves the way for a new statistical machine learning approach to select the best queries from the candidates. The statistical machine learning approach to query generation for source retrieval is formulated as a ranking framework. Specifically, it aims to achieve the optimal source retrieval performance for each suspicious document segment. The proposed method exploits learning to rank to generate queries from the candidates. To our knowledge, our work is the first research to apply machine learning methods to resolve the problem of query generation for source retrieval. To solve the essential problem of an absence of training data for learning to rank, the building of training samples for source retrieval is also conducted. We rigorously evaluate various aspects of the proposed method on the publicly available PAN source retrieval corpus. With respect to the established baselines, the experimental results show that applying our proposed query generation method based on machine learning yields statistically significant improvements over baselines in source retrieval effectiveness. 相似文献
13.
Information retrieval (IR) is the science of identifying documents or sub-documents from a collection of information or database. The collection of information does not necessarily be available in only one language as information does not depend on languages. Monolingual IR is the process of retrieving information in query language whereas cross-lingual information retrieval (CLIR) is the process of retrieving information in a language that differs from query language. In current scenario, there is a strong demand of CLIR system because it allows the user to expand the international scope of searching a relevant document. As compared to monolingual IR, one of the biggest problems of CLIR is poor retrieval performance that occurs due to query mismatching, multiple representations of query terms and untranslated query terms. Query expansion (QE) is the process or technique of adding related terms to the original query for query reformulation. Purpose of QE is to improve the performance and quality of retrieved information in CLIR system. In this paper, QE has been explored for a Hindi–English CLIR in which Hindi queries are used to search English documents. We used Okapi BM25 for documents ranking, and then by using term selection value, translated queries have been expanded. All experiments have been performed using FIRE 2012 dataset. Our result shows that the relevancy of Hindi–English CLIR can be improved by adding the lowest frequency term. 相似文献
14.
dentifying ambiguous queries is crucial to research on personalized Web search and search result diversity. Intuitively, query
logs contain valuable information on how many intentions users have when issuing a query. However, previous work showed user
clicks alone are misleading in judging a query as being ambiguous or not. In this paper, we address the problem of learning
a query ambiguity model by using search logs. First, we propose enriching a query by mining the documents clicked by users
and the relevant follow up queries in a session. Second, we use a text classifier to map the documents and the queries into
predefined categories. Third, we propose extracting features from the processed data. Finally, we apply a state-of-the-art
algorithm, Support Vector Machine (SVM), to learn a query ambiguity classifier. Experimental results verify that the sole
use of click based features or session based features perform worse than the previous work based on top retrieved documents.
When we combine the two sets of features, our proposed approach achieves the best effectiveness, specifically 86% in terms
of accuracy. It significantly improves the click based method by 5.6% and the session based method by 4.6%. 相似文献
15.
Yakov Kogan David Michaeli Yehoshua Sagiv Oded Shmueli 《Data & Knowledge Engineering》1998,28(3):655-275
Current query languages for the Web (e.g., W3QL, WebLog and WebSQL) explore the structure of the Web. However, usually, the structure of the Web has little to do with the semantics of the data. Therefore, it is practically difficult to pose database queries over the Web. We introduce a new type of tags for denoting the semantics of data stored in HTML pages. These semantic tags (implemented as HTML comments) superimpose on HTML pages semistructured objects in the style of the OEM model. The paper discusses two implemented tools for fully utilizing the semantics. The first is a visualization tool for displaying both the HTML reading of Web pages and the OEM reading of Web pages. The second tool is a query language, similar to LOREL, that can query the HTML structure and/or the OEM reading. The above formalism and tools provide data-modeling capabilities for the Web that fit its heterogeneous nature. Real database queries, taking the OEM point of view, can be formulated, including queries about the schema as well as queries about the HTML structure of Web pages. Therefore, the query language is not restricted to portions of the Web in which semantic tags are used. 相似文献
16.
Edgar Meij Marc Bron Laura Hollink Bouke Huurnink Maarten de Rijke 《Journal of Web Semantics》2011,9(4):418-433
We introduce the task of mapping search engine queries to DBpedia, a major linking hub in the Linking Open Data cloud. We propose and compare various methods for addressing this task, using a mixture of information retrieval and machine learning techniques. Specifically, we present a supervised machine learning-based method to determine which concepts are intended by a user issuing a query. The concepts are obtained from an ontology and may be used to provide contextual information, related concepts, or navigational suggestions to the user submitting the query. Our approach first ranks candidate concepts using a language modeling for information retrieval framework. We then extract query, concept, and search-history feature vectors for these concepts. Using manual annotations we inform a machine learning algorithm that learns how to select concepts from the candidates given an input query. Simply performing a lexical match between the queries and concepts is found to perform poorly and so does using retrieval alone, i.e., omitting the concept selection stage. Our proposed method significantly improves upon these baselines and we find that support vector machines are able to achieve the best performance out of the machine learning algorithms evaluated. 相似文献
17.
Web systems, Web services, and Web-based publish/subscribe systems communicate events as XML messages and in many cases, require
composite event detection: it is not sufficient to react to single event messages, but events have to be considered in relation
to other events that are received over time. This entails a need for expressive, high-level languages for querying composite
events. Emphasizing language design and formal semantics, we describe the rule-based composite event query language XChangeEQ. XChangeEQ is designed to completely cover and integrate the four complementary querying dimensions: event data, event composition,
temporal relationships, and event accumulation. Semantics are provided as a model theory with accompanying fixpoint theory,
an approach that is established for rule languages but has not been applied to event queries so far. Because they are highly
declarative, thus easy to understand and well suited for query optimization, such semantics are desirable for event queries. 相似文献
18.
Tak-Lam Wong 《Applied Intelligence》2012,36(4):918-931
We propose a framework for adapting a previously learned wrapper from a source Web site to unseen sites in different languages.
To achieve this, we exploit the previously learned information extraction knowledge and the previously extracted or collected
items in the source Web site. These knowledge and data are automatically translated to the same language as the unseen sites
via online Web resources such as online Web dictionaries or maps. Site independent features which capture the characteristics
of the content of the data are then derived from the translated information. Several text mining methods are employed to automatically
discover a set of machine labeled training examples in the unseen site. Both content oriented features and site dependent
features of the machine labeled training examples are used for learning the new wrapper for the new unseen site using our
language independent wrapper induction component. We conducted experiments on some real-world Web sites in different languages
to demonstrate the effectiveness of our framework. 相似文献
19.
用XML对数据库查询的方法 总被引:14,自引:0,他引:14
本文讨论了用XML查询数据库的具体实现方法。首先,提出了用DTD描述关系数据模式和利用ASP技术转化数据库的数据成XML文档方法;然后,用XML的查询语言XML-QL完成Web数据库上查询和数据集成等操作。 相似文献
20.
Polynomial Time Learnability of Simple Deterministic Languages 总被引:1,自引:0,他引:1
This paper is concerned with the problem of learning simple deterministic languages. The algorithm described in this paper is based on the theory of model inference given by Shapiro. In our setting, however, nonterminal membership queries, except for the start symbol, are not permitted. Extended equivalence queries are used instead. Nonterminals that are necessary for a correct grammar and their intended models are introduced automatically. We give an algorithm that, for any simple deterministic language L, outputs a grammar G in 2-standard form, such that L = L(G), using membership queries and extended equivalence queries. We also show that the algorithm runs in time polynomial in the length of the longest counterexample and the number of nonterminals in a minimal grammar for L. 相似文献