首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 140 毫秒
1.
We present an algorithm for converting a semantically meaningful SQL query into an equivalent algebraic expression. The relational algebra we employ consists of the following operators: union, intersection, difference, Cartesian product, selection, and projection. The SQL queries we consider can have an arbitrary level of nesting but are restricted in three ways. We assume that they do not contain an ORDER BY or a GROUP BY clause, all SELECTs are in fact SELECT DISTINCTs, and that no aggregate functions are used. The first two assumptions are made in order to remain faithful to the definition of a relation as a set of rows. The last assumption is needed since there is no standard for incorporating aggregate functions into a relational algebra. The research results in this paper will be useful in implementing an SQL user interface for database management systems that internally employ relational algebra. They can also be used in optimizing the evaluation of SQL queries.  相似文献   

2.
Identifier attributes—very high-dimensional categorical attributes such as particular product ids or people's names—rarely are incorporated in statistical modeling. However, they can play an important role in relational modeling: it may be informative to have communicated with a particular set of people or to have purchased a particular set of products. A key limitation of existing relational modeling techniques is how they aggregate bags (multisets) of values from related entities. The aggregations used by existing methods are simple summaries of the distributions of features of related entities: e.g., MEAN, MODE, SUM, or COUNT. This paper's main contribution is the introduction of aggregation operators that capture more information about the value distributions, by storing meta-data about value distributions and referencing this meta-data when aggregating—for example by computing class-conditional distributional distances. Such aggregations are particularly important for aggregating values from high-dimensional categorical attributes, for which the simple aggregates provide little information. In the first half of the paper we provide general guidelines for designing aggregation operators, introduce the new aggregators in the context of the relational learning system ACORA (Automated Construction of Relational Attributes), and provide theoretical justification. We also conjecture special properties of identifier attributes, e.g., they proxy for unobserved attributes and for information deeper in the relationship network. In the second half of the paper we provide extensive empirical evidence that the distribution-based aggregators indeed do facilitate modeling with high-dimensional categorical attributes, and in support of the aforementioned conjectures. Editors: Hendrik Blockeel, David Jensen and Stefan Kramer An erratum to this article is available at .  相似文献   

3.
Cost-sensitive learning with conditional Markov networks   总被引:1,自引:0,他引:1  
There has been a recent, growing interest in classification and link prediction in structured domains. Methods such as conditional random fields and relational Markov networks support flexible mechanisms for modeling correlations due to the link structure. In addition, in many structured domains, there is an interesting structure in the risk or cost function associated with different misclassifications. There is a rich tradition of cost-sensitive learning applied to unstructured (IID) data. Here we propose a general framework which can capture correlations in the link structure and handle structured cost functions. We present two new cost-sensitive structured classifiers based on maximum entropy principles. The first determines the cost-sensitive classification by minimizing the expected cost of misclassification. The second directly determines the cost-sensitive classification without going through a probability estimation step. We contrast these approaches with an approach which employs a standard 0/1-loss structured classifier to estimate class conditional probabilities followed by minimization of the expected cost of misclassification and with a cost-sensitive IID classifier that does not utilize the correlations present in the link structure. We demonstrate the utility of our cost-sensitive structured classifiers with experiments on both synthetic and real-world data.  相似文献   

4.
We study the power of two models of faulty teachers in Valiant’s PAC learning model and Angluin’s exact learning model. The first model we consider is learning from an incomplete membership oracle introduced by Angluin and Slonim [D. Angluin, D.K. Slonim, Randomly fallible teachers: Learning monotone DNF with an incomplete membership oracle, Machine Learning 14 (1) (1994) 7–26]. In this model, the answers to a random subset of the learner’s membership queries may be missing. The second model we consider is random persistent classification noise in membership queries introduced by Goldman, Kearns and Schapire [S. Goldman, M. Kearns, R. Schapire, Exact identification of read-once formulas using fixed points of amplification functions, SIAM Journal on Computing 22 (4) (1993) 705–726]. In this model, the answers to a random subset of the learner’s membership queries are flipped.  相似文献   

5.
Bottom-Up Induction of Feature Terms   总被引:2,自引:0,他引:2  
The aim of relational learning is to develop methods for the induction of hypotheses in representation formalisms that are more expressive than attribute-value representation. Most work on relational learning has been focused on induction in subsets of first order logic like Horn clauses. In this paper we introduce the representation formalism based on feature terms and we introduce the corresponding notions of subsumption and anti-unification. Then we explain INDIE, a heuristic bottom-up learning method that induces class hypotheses, in the form of feature terms, from positive and negative examples. The biases used in INDIE while searching the hypothesis space are explained while describing INDIE's algorithms. The representational bias of INDIE can be summarised in that it makes an intensive use of sorts and sort hierarchy, and in that it does not use negation but focuses on detecting path equalities. We show the results of INDIE in some classical relational datasets showing that it's able to find hypotheses at a level comparable to the original ones. The differences between INDIE's hypotheses and those of the other systems are explained by the bias in searching the hypothesis space and on the representational bias of the hypothesis language of each system.  相似文献   

6.
Data dependencies are useful to design relational databases. There is a strong connection between dependencies and some fragments of the propositional logic. In particular, functional dependencies are closely related to Horn formulas. Also, multivalued dependencies are characterized in terms of multivalued formulas. It is known that both Horn formulas and sets of functional dependencies are learnable in the exact model of learning with queries. Here we proof that neither multivalued formulas nor multivalued dependencies can be learned using only membership queries or only equivalence queries.  相似文献   

7.
We present a general rank-aware model of data which supports handling of similarity in relational databases. The model is based on the assumption that in many cases it is desirable to replace equalities on values in data tables by similarity relations expressing degrees to which the values are similar. In this context, we study various phenomena which emerge in the model, including similarity-based queries and similarity-based data dependencies. Central notion in our model is that of a ranked data table over domains with similarities which is our counterpart to the notion of relation on relation scheme from the classical relational model. Compared to other approaches which cover related problems, we do not propose a similarity-based or ranking module on top of the classical relational model. Instead, we generalize the very core of the model by replacing the classical, two-valued logic upon which the classical model is built by a more general logic involving a scale of truth degrees that, in addition to the classical truth degrees 0 and 1, contains intermediate truth degrees. While the classical truth degrees 0 and 1 represent nonequality and equality of values, and subsequently mismatch and match of queries, the intermediate truth degrees in the new model represent similarity of values and partial match of queries. Moreover, the truth functions of many-valued logical connectives in the new model serve to aggregate degrees of similarity. The presented approach is conceptually clean, logically sound, and retains most properties of the classical model while enabling us to employ new types of queries and data dependencies. Most importantly, similarity is not handled in an ad hoc way or by putting a “similarity module” atop the classical model in our approach. Rather, it is consistently viewed as a notion that generalizes and replaces equality in the very core of the relational model. We present fundamentals of the formal model and two equivalent query systems which are analogues of the classical relational algebra and domain relational calculus with range declarations. In the sequel to this paper, we deal with similarity-based dependencies.  相似文献   

8.
A random set based knowledge representation framework for learning linguistic models is presented. Within this framework a number of algorithms for learning prototypes are proposed, based on grouping certain sets of attributes and evaluating joint mass assignments on labels. These mass assignments can then be combined with a Semi-Naïve Bayes classifier in order to determine classification probabilities. The potential of such linguistic classifiers is then illustrated by their application to a number of toy and benchmark problems. This framework also allows for the evaluation of linguistic queries as will be demonstrated on several well known data sets.  相似文献   

9.
马尔可夫逻辑网络是将马尔可夫网络与一阶逻辑相结合的一种统计关系学习模型,在自然语言处理、复杂网络、信息抽取等领域都有重要的应用前景.较为全面、深入地总结了马尔可夫逻辑网络的理论模型、推理、权重和结构学习,最后指出了马尔可夫逻辑网络未来的主要研究方向.  相似文献   

10.
少样本学习是目前机器学习研究领域的一个热点,它能在少量的标记样本中学习到较好的分类模型.但是,在噪声的不确定环境中,传统的少样本学习模型泛化能力弱.针对这一问题,提出一种鲁棒性的少样本学习方法RFSL(Robust Few-Shot Learning).首先,使用核密度估计(Kernel Density Estimation,KDE)和图像滤波(Image Filtering)方法在训练集中加入不同的随机噪声,形成多个不同噪声下的训练集,并分别生成支持集和查询集.其次,利用关系网络的关系模块通过训练集端到端地学习多个基分类器.最后,采用投票的方式对各基分类器的最末Sigmoid层非线性分类结果进行融合.实验结果表明,RFSL模型可促进小样本学习快速收敛,同时,与R-Net以及其他主流少样本学习方法相比,RFSL具有更高的分类准确率,更强的鲁棒性.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号