首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
In data mining, the usefulness of a data pattern depends on the user of the database and does not solely depend on the statistical strength of the pattern. Based on the premise that heuristic search in combinatorial spaces built on computer and human cognitive theories is useful for effective knowledge discovery, this study investigates how the use of self-organizing maps as a tool of data visualization in data mining plays a significant role in human–computer interactive knowledge discovery. This article presents the conceptual foundations of the integration of data visualization and query processing for knowledge discovery, and proposes a set of query functions for the validation of self-organizing maps in data mining. Received 1 November 1999 / Revised 2 March 2000 / Accepted in revised form 20 October 2000  相似文献   

2.
The skyline operator has been extensively explored in the literature, and most of the existing approaches assume that all dimensions are available for all data items. However, many practical applications such as sensor networks, decision making, and location-based services, may involve incomplete data items, i.e., some dimensional values are missing, due to the device failure or the privacy preservation. This paper is the first, to our knowledge, study of k-skyband (kSB) query processing on incomplete data, where multi-dimensional data items are missing some values of their dimensions. We formalize the problem, and then present two efficient algorithms for processing it. Our methods introduce some novel concepts including expired skyline, shadow skyline, and thickness warehouse, in order to boost the search performance. As a second step, we extend our techniques to tackle constrained skyline (CS) and group-by skyline (GBS) queries over incomplete data. Extensive experiments with both real and synthetic data sets demonstrate the effectiveness and efficiency of our proposed algorithms under various experimental settings.  相似文献   

3.
Self-organising maps (SOM) have become a commonly-used cluster analysis technique in data mining. However, SOM are not able to process incomplete data. To build more capability of data mining for SOM, this study proposes an SOM-based fuzzy map model for data mining with incomplete data sets. Using this model, incomplete data are translated into fuzzy data, and are used to generate fuzzy observations. These fuzzy observations, along with observations without missing values, are then used to train the SOM to generate fuzzy maps. Compared with the standard SOM approach, fuzzy maps generated by the proposed method can provide more information for knowledge discovery.  相似文献   

4.
面向属性的归纳与概念聚类   总被引:3,自引:1,他引:3  
伍小荣  谢立宏 《计算机工程》2003,29(5):92-93,123
面向属性的归纳是新近提出的一种广泛用于数据库中知识发现的方法,文章指出这种方法与一种机器学习方法-概念聚类之间的紧密联系,并描述如何使用一个概念聚类算法进行面向属性的归纳。  相似文献   

5.
A text mining approach for automatic construction of hypertexts   总被引:1,自引:0,他引:1  
The research on automatic hypertext construction emerges rapidly in the last decade because there exists a urgent need to translate the gigantic amount of legacy documents into web pages. Unlike traditional ‘flat’ texts, a hypertext contains a number of navigational hyperlinks that point to some related hypertexts or locations of the same hypertext. Traditionally, these hyperlinks were constructed by the creators of the web pages with or without the help of some authoring tools. However, the gigantic amount of documents produced each day prevent from such manual construction. Thus an automatic hypertext construction method is necessary for content providers to efficiently produce adequate information that can be used by web surfers. Although most of the web pages contain a number of non-textual data such as images, sounds, and video clips, text data still contribute the major part of information about the pages. Therefore, it is not surprising that most of automatic hypertext construction methods inherit from traditional information retrieval research. In this work, we will propose a new automatic hypertext construction method based on a text mining approach. Our method applies the self-organizing map algorithm to cluster some at text documents in a training corpus and generate two maps. We then use these maps to identify the sources and destinations of some important hyperlinks within these training documents. The constructed hyperlinks are then inserted into the training documents to translate them into hypertext form. Such translated documents will form the new corpus. Incoming documents can also be translated into hypertext form and added to the corpus through the same approach. Our method had been tested on a set of at text documents collected from a newswire site. Although we only use Chinese text documents, our approach can be applied to any documents that can be transformed to a set of index terms.  相似文献   

6.
基于数据挖掘的人口数据预测模型综述   总被引:3,自引:1,他引:2       下载免费PDF全文
论文调查了国内外基于数据挖掘技术的人口数据预测模型。根据预测目的不同对这些模型进行了分类比较,在此基础上综合各模型的优缺点,对今后的研究工作做了进一步展望。  相似文献   

7.
将不完全数据分为了两类:属性值残缺和属性值隐含.对基于这两类不完全数据的数据挖掘方法分别进行了探讨,给出了相应的处理方法,并对这些方法及其应用进行了讨论.属性值残缺的处理主要采用一系列"补漏"的方法,使数据成为完全数据集;属性值隐含的处理则通过EM算法来优化模型的参数,弥补数据的不完全性.  相似文献   

8.
Handling of incomplete data sets using ICA and SOM in data mining   总被引:1,自引:0,他引:1  
Based on independent component analysis (ICA) and self-organizing maps (SOM), this paper proposes an ISOM-DH model for the incomplete data’s handling in data mining. Under these circumstances the data remain dependent and non-Gaussian, this model can make full use of the information of the given data to estimate the missing data and can visualize the handled high-dimensional data. Compared with mixture of principal component analyzers (MPCA), mean method and standard SOM-based fuzzy map model, ISOM-DH model can be applied to more cases, thus performing its superiority. Meanwhile, the correctness and reasonableness of ISOM-DH model is also validated by the experiment carried out in this paper.  相似文献   

9.
Learning often occurs through comparing. In classification learning, in order to compare data groups, most existing methods compare either raw instances or learned classification rules against each other. This paper takes a different approach, namely conceptual equivalence, that is, groups are equivalent if their underlying concepts are equivalent while their instance spaces do not necessarily overlap and their rule sets do not necessarily present the same appearance. A new methodology of comparing is proposed that learns a representation of each group’s underlying concept and respectively cross-exams one group’s instances by the other group’s concept representation. The innovation is fivefold. First, it is able to quantify the degree of conceptual equivalence between two groups. Second, it is able to retrace the source of discrepancy at two levels: an abstract level of underlying concepts and a specific level of instances. Third, it applies to numeric data as well as categorical data. Fourth, it circumvents direct comparisons between (possibly a large number of) rules that demand substantial effort. Fifth, it reduces dependency on the accuracy of employed classification algorithms. Empirical evidence suggests that this new methodology is effective and yet simple to use in scenarios such as noise cleansing and concept-change learning.  相似文献   

10.
关于数据挖掘中聚类分析的研究进展   总被引:1,自引:0,他引:1  
聚类分析是数据挖掘的一种重要技术,在本文中,回顾了几种现有的聚类分析的方法,指出了这些方法的优劣并且总结了聚类分析的主要研究方向并对聚类分析进行了前景展望。  相似文献   

11.
Data for classification are often incomplete. The multiple-values construction method (MVCM) can be used to include data with missing values for classification. In this study, the MVCM is implemented by using fuzzy sets theory in the context of classification with discrete data. By using the fuzzy sets based MVCM, data with missing values can add values to classification, but can also introduce excessive uncertainty. Furthermore, the computational cost for the use of incomplete data could be prohibitive if the scale of missing values is large. This paper discusses the association between classification performance and the use of incomplete data. It proposes an algorithm of near-optimal use of incomplete classification data. An experiment with real-world data demonstrates the usefulness of the algorithm.  相似文献   

12.
Biclusters are subsets of genes that exhibit similar behavior over a set of conditions. A biclustering algorithm is a useful tool for uncovering groups of genes involved in the same cellular processes and groups of conditions under which these processes take place. In this paper, we propose a polynomial time algorithm to identify functionally highly correlated biclusters. Our algorithm identifies (1) gene sets that simultaneously exhibit additive, multiplicative, and combined patterns and allow high levels of noise, (2) multiple, possibly overlapped, and diverse gene sets, (3) biclusters that simultaneously exhibit negatively and positively correlated gene sets, and (4) gene sets for which the functional association is very high. We validate the level of functional association in our method by using the GO database, protein-protein interactions and KEGG pathways.  相似文献   

13.
This paper proposes to utilize information within incomplete instances (instances with missing values) when estimating missing values. Accordingly, a simple and efficient nonparametric iterative imputation algorithm, called the NIIA method, is designed for iteratively imputing missing target values. The NIIA method imputes each missing value several times until the algorithm converges. In the first iteration, all the complete instances are used to estimate missing values. The information within incomplete instances is utilized since the second imputation iteration. We conduct some experiments for evaluating the efficiency, and demonstrate: (1) the utilization of information within incomplete instances is of benefit to easily capture the distribution of a dataset; and (2) the NIIA method outperforms the existing methods in accuracy, and this advantage is clearly highlighted when datasets have a high missing ratio.  相似文献   

14.
针对传统关联规则表示方式无法展现领域知识、数据项间的关系及规则中所隐含的信息等问题,提出了一种基于概念图的关联规则知识表示方法,该方法包括模式定义和模式解析,其结合概念图理论可将关联规则转换成概念图的知识表示形式。给出了关联规则的概念图知识表示算法,并以某省全员人口数据为数据源对算法进行了具体实现和分析。实验结果表明,该方法在人口信息表现方面具有良好的效果。  相似文献   

15.
Many recent papers have dealt with the application of feedforward neural networks in financial data processing. This powerful neural model can implement very complex nonlinear mappings, but when outputs are not available or clustering of patterns is required, the use of unsupervised models such as self-organizing maps is more suitable. The present work shows the capabilities of self-organizing feature maps for the analysis and representation of financial data and for aid in financial decision-making. For this purpose, we analyse the Spanish banking crisis of 1977–1985 and the Spanish economic situation in 1990 and 1991, making use of this unsupervised model. Emphasis is placed on the analysis of the synaptic weights, fundamental for delimiting regions on the map, such as bankrupt or solvent regions, where similar companies are clustered. The time evolution of the companies and other important conclusions can be drawn from the resulting maps.Characters and symbols used and their meaning nx x dimension of the neuron grid, in number of neurons - ny y dimension of the neuron grid, in number of neurons - n dimension of the input vector, number of input variables - (i, j) indices of a neuron on the map - k index of the input variables - w ijk synaptic weight that connects thek input with the (i, j) neuron on the map - W ij weight vector of the (i, j) neuron - x k input vector - X input vector - (t) learning rate - o starting learning rate - f final learning rate - R(t) neighbourhood radius - R0 starting neighbourhood radius - R f final neighbourhood radius - t iteration counter - t rf number of iterations until reachingR f - t f number of iterations until reaching f - h(·) lateral interaction function - standard deviation - for every - d (x, y) distance between the vectors x and y  相似文献   

16.
Persistent homology is a computationally intensive and yet extremely powerful tool for Topological Data Analysis. Applying the tool on potentially infinite sequence of data objects is a challenging task. For this reason, persistent homology and data stream mining have long been two important but disjoint areas of data science. The first computational model, that was recently introduced to bridge the gap between the two areas, is useful for detecting steady or gradual changes in data streams, such as certain genomic modifications during the evolution of species. However, that model is not suitable for applications that encounter abrupt changes of extremely short duration. This paper presents another model for computing persistent homology on streaming data that addresses the shortcoming of the previous work. The model is validated on the important real-world application of network anomaly detection. It is shown that in addition to detecting the occurrence of anomalies or attacks in computer networks, the proposed model is able to visually identify several types of traffic. Moreover, the model can accurately detect abrupt changes of extremely short as well as longer duration in the network traffic. These capabilities are not achievable by the previous model or by traditional data mining techniques.  相似文献   

17.
Clustering is an important data mining problem. However, most earlier work on clustering focused on numeric attributes which have a natural ordering to their attribute values. Recently, clustering data with categorical attributes, whose attribute values do not have a natural ordering, has received more attention. A common issue in cluster analysis is that there is no single correct answer to the number of clusters, since cluster analysis involves human subjective judgement. Interactive visualization is one of the methods where users can decide a proper clustering parameters. In this paper, a new clustering approach called CDCS (Categorical Data Clustering with Subjective factors) is introduced, where a visualization tool for clustered categorical data is developed such that the result of adjusting parameters is instantly reflected. The experiment shows that CDCS generates high quality clusters compared to other typical algorithms.  相似文献   

18.
模糊数据挖掘   总被引:5,自引:0,他引:5  
本文在数据库中知识发现(KDD)和数据挖掘(DM)技术的基础上,提出了模糊数据库中知识发现(KDFD)和模糊数据挖掘(FDM)的概念与技术,并给出FDM的算法,它能有效地挖掘出模糊数据库中潜在的有价值的知识。本文具体讨论了模糊关联规则及模糊数据依赖的挖掘。  相似文献   

19.
We present a knowledge discovery method for graded attributes that is based on an interactive determination of implications (if-then-rules) holding between the attributes of a given data-set. The corresponding algorithm queries the user in an efficient way about implications between the attributes. The result of the process is a representative set of examples for the entire theory and a set of implications from which all implications that hold between the attributes can be deduced. In many instances, the exploration process may be shortened by the usage of the user’s background knowledge. That is, a set of of implications the user knows beforehand. The method was successfully applied in different real-life applications for discrete data. In this paper, we show that attribute exploration with background information can be generalized for graded attributes.  相似文献   

20.
环境问题愈来愈受到人们的重视,现在积累了大量空气污染数据为空气质量日报预报提供了坚实的基础。大气环境预测系统Atosphere Environment Forecast System(AEFS)是采用数据挖掘技术开发的一个环境质量预报系统。该系统主要运用了粗集理论和在线分析处理技术,取得了较好的效果。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号