首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 765 毫秒
1.
Fisher Score (FS)是一种快速高效的评价特征分类能力的指标,但传统的FS指标既无法直接应用于多标记学习,也不能有效处理样本极值导致的类中心与实际类中心的误差。提出一种结合中心偏移和多标记集合关联性的FS多标记特征选择算法,找出不同标记下每类样本的极值点,以极值点到该类样本的中心距离乘以半径系数筛选新的样本,从而获得分布更为密集的样本集合,以此计算特征的FS得分,通过整体遍历全体样本的标记集合中的每个标记,并在遍历过程中针对具有更多标记数量的样本自适应地赋以标记权值,得到整体特征的平均FS得分,以特征的FS得分进行排序过滤出目标子集实现特征选择目标。在8个公开的多标记文本数据集上进行参数分析及5种指标性能比较,结果表明,该算法具有一定的有效性和鲁棒性,在多数指标上优于MLNB、MLRF、PMU、MLACO等多标记特征选择算法。  相似文献   

2.
3.
Protein function prediction is an important problem in functional genomics. Typically, protein sequences are represented by feature vectors. A major problem of protein datasets that increase the complexity of classification models is their large number of features. Feature selection (FS) techniques are used to deal with this high dimensional space of features. In this paper, we propose a novel feature selection algorithm that combines genetic algorithms (GA) and ant colony optimization (ACO) for faster and better search capability. The hybrid algorithm makes use of advantages of both ACO and GA methods. Proposed algorithm is easily implemented and because of use of a simple classifier in that, its computational complexity is very low. The performance of proposed algorithm is compared to the performance of two prominent population-based algorithms, ACO and genetic algorithms. Experimentation is carried out using two challenging biological datasets, involving the hierarchical functional classification of GPCRs and enzymes. The criteria used for comparison are maximizing predictive accuracy, and finding the smallest subset of features. The results of experiments indicate the superiority of proposed algorithm.  相似文献   

4.
基于近邻传播算法的半监督聚类   总被引:31,自引:2,他引:29  
肖宇  于剑 《软件学报》2008,19(11):2803-2813
提出了一种基于近邻传播(affinity propagation,简称AP)算法的半监督聚类方法.AP是在数据点的相似度矩阵的基础上进行聚类.对于规模很大的数据集,AP算法是一种快速、有效的聚类方法,这是其他传统的聚类算法所不能及的,比如:K中心聚类算法.但是,对于一些聚类结构比较复杂的数据集,AP算法往往不能得到很好的聚类结果.使用已知的标签数据或者成对点约束对数据形成的相似度矩阵进行调整,进而达到提高AP算法的聚类性能.实验结果表明,该方法不仅提高了AP对复杂数据的聚类结果,而且在约束对数量较多时,该方法要优于相关比对算法.  相似文献   

5.
Imbalance classification techniques have been frequently applied in many machine learning application domains where the number of the majority (or positive) class of a dataset is much larger than that of the minority (or negative) class. Meanwhile, feature selection (FS) is one of the key techniques for the high-dimensional classification task in a manner which greatly improves the classification performance and the computational efficiency. However, most studies of feature selection and imbalance classification are restricted to off-line batch learning, which is not well adapted to some practical scenarios. In this paper, we aim to solve high-dimensional imbalanced classification problem accurately and efficiently with only a small number of active features in an online fashion, and we propose two novel online learning algorithms for this purpose. In our approach, a classifier which involves only a small and fixed number of features is constructed to classify a sequence of imbalanced data received in an online manner. We formulate the construction of such online learner into an optimization problem and use an iterative approach to solve the problem based on the passive-aggressive (PA) algorithm as well as a truncated gradient (TG) method. We evaluate the performance of the proposed algorithms based on several real-world datasets, and our experimental results have demonstrated the effectiveness of the proposed algorithms in comparison with the baselines.  相似文献   

6.

Feature selection (FS) is a critical step in data mining, and machine learning algorithms play a crucial role in algorithms performance. It reduces the processing time and accuracy of the categories. In this paper, three different solutions are proposed to FS. In the first solution, the Harris Hawks Optimization (HHO) algorithm has been multiplied, and in the second solution, the Fruitfly Optimization Algorithm (FOA) has been multiplied, and in the third solution, these two solutions are hydride and are named MOHHOFOA. The results were tested with MOPSO, NSGA-II, BGWOPSOFS and B-MOABC algorithms for FS on 15 standard data sets with mean, best, worst, standard deviation (STD) criteria. The Wilcoxon statistical test was also used with a significance level of 5% and the Bonferroni–Holm method to control the family-wise error rate. The results are shown in the Pareto front charts, indicating that the proposed solutions' performance on the data set is promising.

  相似文献   

7.
Abstract

Composite plates play a very important role in engineering applications, especially in aerospace industry. Thermal buckling of such components is of great importance and must be known to achieve an appropriate design. This paper deals with stacking sequence optimisation of laminated composite plates for maximising the critical buckling temperature using a powerful meta-heuristic algorithm called firefly algorithm (FA) which is based on the flashing behaviour of fireflies. The main objective of present work was to show the ability of FA in optimisation of composite structures. The performance of FA is compared with the results reported in the previous published works using other algorithms which shows the efficiency of FA in stacking sequence optimisation of laminated composite structures.  相似文献   

8.

Community detection (or clustering) in large-scale graphs is an important problem in graph mining. Communities reveal interesting organizational and functional characteristics of a network. Louvain algorithm is an efficient sequential algorithm for community detection. However, such sequential algorithms fail to scale for emerging large-scale data. Scalable parallel algorithms are necessary to process large graph datasets. In this work, we show a comparative analysis of our different parallel implementations of Louvain algorithm. We design parallel algorithms for Louvain method in shared memory and distributed memory settings. Developing distributed memory parallel algorithms is challenging because of inter-process communication and load balancing issues. We incorporate dynamic load balancing in our final algorithm DPLAL (Distributed Parallel Louvain Algorithm with Load-balancing). DPLAL overcomes the performance bottleneck of the previous algorithms and shows around 12-fold speedup scaling to a larger number of processors. We also compare the performance of our algorithm with some other prominent algorithms in the literature and get better or comparable performance . We identify the challenges in developing distributed memory algorithm and provide an optimized solution DPLAL showing performance analysis of the algorithm on large-scale real-world networks from different domains.

  相似文献   

9.
The minimum volume ellipsoid (MVE) is a useful tool in multivariate statistics and data mining. It is used for computing robust multivariate outlier diagnostics and for calculating robust covariance matrix estimates. Various search algorithms for finding or approximating the MVE have been developed, but due to the combinatorial nature of the problem, exact computation of the MVE is impractical for all but the smallest datasets. Since large datasets are increasingly common, alternative algorithms are desired. Even among small datasets, performance of the existing algorithms varies considerably—no single algorithm dominates in performance. This paper presents a unique matrix-structured genetic algorithm (GA) that directly searches the ellipsoid space for the MVE. By directly searching the space of ellipsoids, the impact of the combinatorial nature of the problem is minimized. The matrix-structured GA is described in detail, and evidence is provided to illustrate the performance of the new algorithm in detecting multivariate outliers.   相似文献   

10.
Xue  Yanbing  Geng  Huiqiang  Zhang  Hua  Xue  Zhenshan  Xu  Guangping 《Multimedia Tools and Applications》2018,77(17):22199-22211

This paper proposes a feed forward architecture algorithm using fusion of features and classifiers for semantic segmentation. The algorithm consists of three phases: Firstly, the features from hierarchical convolutional neural network (CNN) and the features based on region are extracted and fused on super pixel level; secondly, multiple classifiers of Softmax, XGBoost and Random Forest are ensemble to compute the per-pixel class probabilities; at last, a fully connected conditional random field is employed to enhance the final performance. The hierarchical features contain more global evidence and the region features contain more local evidence. So the fusion of these two features is expected to enhance the feature representation ability. In classification phase, integrating multiple classifiers aims to improve the generalization ability of classification algorithms. Experiments are conducted on Sift-Flow datasets by our proposed methods with competitive labeling accuracy.

  相似文献   

11.
The success rates of the expert or intelligent systems depend on the selection of the correct data clusters. The k-means algorithm is a well-known method in solving data clustering problems. It suffers not only from a high dependency on the algorithm's initial solution but also from the used distance function. A number of algorithms have been proposed to address the centroid initialization problem, but the produced solution does not produce optimum clusters. This paper proposes three algorithms (i) the search algorithm C-LCA that is an improved League Championship Algorithm (LCA), (ii) a search clustering using C-LCA (SC-LCA), and (iii) a hybrid-clustering algorithm called the hybrid of k-means and Chaotic League Championship Algorithm (KSC-LCA) and this algorithm has of two computation stages. The C-LCA employs chaotic adaptation for the retreat and approach parameters, rather than constants, which can enhance the search capability. Furthermore, to overcome the limitation of the original k-means algorithm using the Euclidean distance that cannot handle the categorical attribute type properly, we adopt the Gower distance and the mechanism for handling a discrete value requirement of the categorical value attribute. The proposed algorithms can handle not only the pure numeric data but also the mixed-type data and can find the best centroids containing categorical values. Experiments were conducted on 14 datasets from the UCI repository. The SC-LCA and KSC-LCA competed with 16 established algorithms including the k-means, k-means++, global k-means algorithms, four search clustering algorithms and nine hybrids of k-means algorithm with several state-of-the-art evolutionary algorithms. The experimental results show that the SC-LCA produces the cluster with the highest F-Measure on the pure categorical dataset and the KSC-LCA produces the cluster with the highest F-Measure for the pure numeric and mixed-type tested datasets. Out of 14 datasets, there were 13 centroids produced by the SC-LCA that had better F-Measures than that of the k-means algorithm. On the Tic-Tac-Toe dataset containing only categorical attributes, the SC-LCA can achieve an F-Measure of 66.61 that is 21.74 points over that of the k-means algorithm (44.87). The KSC-LCA produced better centroids than k-means algorithm in all 14 datasets; the maximum F-Measure improvement was 11.59 points. However, in terms of the computational time, the SC-LCA and KSC-LCA took more NFEs than the k-means and its variants but the KSC-LCA ranks first and SC-LCA ranks fourth among the hybrid clustering and the search clustering algorithms that we tested. Therefore, the SC-LCA and KSC-LCA are general and effective clustering algorithms that could be used when an expert or intelligent system requires an accurate high-speed cluster selection.  相似文献   

12.
付忠旺  肖蓉  余啸  谷懿 《计算机应用》2018,38(3):824-828
针对已有研究在评价软件缺陷个数预测模型性能时没有考虑到软件缺陷数据集存在数据不平衡的问题而采用了评估回归模型的不合适的评价指标的问题,提出以平均缺陷百分比作为评价指标,讨论不同回归算法对软件缺陷个数预测模型性能的影响程度。利用PROMISE提供的6个开源数据集,分析了10个回归算法对软件缺陷个数预测模型预测结果的影响以及各种回归算法之间的差异。研究结果表明:使用不同的回归算法建立的软件缺陷个数预测模型具有不同的预测效果,其中梯度Boosting回归算法和贝叶斯岭回归算法预测效果更好。  相似文献   

13.
Classical clustering algorithms like K-means often converge to local optima and have slow convergence rates for larger datasets. To overcome such situations in clustering, swarm based algorithms have been proposed. Swarm based approaches attempt to achieve the optimal solution for such problems in reasonable time. Many swarm based algorithms such as Flower Pollination Algorithm (FPA), Cuckoo Search Algorithm (CSA), Black Hole Algorithm (BHA), Bat Algorithm (BA) Particle Swarm Optimization (PSO), Firefly Algorithm (FFA), Artificial Bee Colony (ABC) etc have been successfully applied to many non-linear optimization problems. In this paper, an algorithm is proposed which hybridizes Chaos Optimization and Flower Pollination over K-means to improve the efficiency of minimizing the cluster integrity. The proposed algorithm referred as Chaotic FPA (CFPA) is compared with FPA, CSA, BHA, BA, FFA, and PSO over K-Means for data clustering problem. Experiments are conducted on sixteen benchmark datasets. Algorithms are compared on four different performance parameters — cluster integrity, execution time, number of iterations to converge (NIC) and stability. Results obtained are analyzed statistically using Non-parametric Friedman test. If Friedman test rejects the Null hypothesis then pair wise comparison is done using Nemenyi test. Experimental Result demonstrates the following: (a) CFPA and BHA have better performance on the basis of cluster integrity as compared to other algorithms; (b) Prove the superiority of CFPA and CSA over others on the basis of execution time; (c) CFPA and FPA converges earlier than other algorithms to evaluate optimal cluster integrity; (d) CFPA and BHA produce more stable results than other algorithms.  相似文献   

14.

In the present study, a new algorithm is developed for neural network training by combining a gradient-based and a meta-heuristic algorithm. The new algorithm benefits from simultaneous local and global search, eliminating the problem of getting stuck in local optimum. For this purpose, first the global search ability of the grey wolf optimizer (GWO) is improved with the Levy flight, a random walk in which the jump size follows the Levy distribution, which results in a more efficient global search in the search space thanks to the long jumps. Then, this improved algorithm is combined with back propagation (BP) to use the advantages of enhanced global search ability of GWO and local search ability of BP algorithm in training neural network. The performance of the proposed algorithm has been evaluated by comparing it against a number of well-known meta-heuristic algorithms using twelve classification and function-approximation datasets.

  相似文献   

15.
为提高无线传感器网络(WSN)的节点定位的估计精度,提出基于自由搜索优化的智能估计定位算法。自由搜索是一种新的群集智能算法,应用于函数优化。该算法计算量少、收敛速度高、程序实现简洁、需要调整的参数少。利用智能优化算法将参数估计问题转化为非线性函数的优化问题。仿真实验结果显示,与最小二乘估计定位算法相比,新算法的定位精度有所提高。  相似文献   

16.

针对多视角聚类任务如何更好地实现视角间的合作之挑战, 提出一种新的视角融合策略. 该策略首先为每个视角设置一个划分, 然后通过自适应学习获取一个融合权重矩阵对每个视角的划分进行自适应融合, 最终利用视角集成方法得到全局划分结果. 将上述策略应用到经典的FCM(Fuzzy ??-means) 模糊聚类框架, 提出相应的多视角模糊聚类算法. 在模拟数据集和UCI 数据集上的实验结果均显示, 所提出的算法较几种相关聚类算法在应对多视角聚类任务时具有更好的适应性和更好的聚类性能.

  相似文献   

17.

The algorithm selection problem is defined as identifying the best-performing machine learning (ML) algorithm for a given combination of dataset, task, and evaluation measure. The human expertise required to evaluate the increasing number of ML algorithms available has resulted in the need to automate the algorithm selection task. Various approaches have emerged to handle the automatic algorithm selection challenge, including meta-learning. Meta-learning is a popular approach that leverages accumulated experience for future learning and typically involves dataset characterization. Existing meta-learning methods often represent a dataset using predefined features and thus cannot be generalized across different ML tasks, or alternatively, learn a dataset’s representation in a supervised manner and therefore are unable to deal with unsupervised tasks. In this study, we propose a novel learning-based task-agnostic method for producing dataset representations. Then, we introduce TRIO, a meta-learning approach, that utilizes the proposed dataset representations to accurately recommend top-performing algorithms for previously unseen datasets. TRIO first learns graphical representations for the datasets, using four tools to learn the latent interactions among dataset instances and then utilizes a graph convolutional neural network technique to extract embedding representations from the graphs obtained. We extensively evaluate the effectiveness of our approach on 337 datasets and 195 ML algorithms, demonstrating that TRIO significantly outperforms state-of-the-art methods for algorithm selection for both supervised (classification and regression) and unsupervised (clustering) tasks.

  相似文献   

18.

对于包含大量特征的数据集, 特征选择已成为一个研究热点, 能剔除无关和冗余特征, 将会有效改善分类准确性. 对此, 在分析已有文献的基础上, 提出一种基于属性关系的特征选择算法(NCMIPV), 获取优化特征子集, 并在UCI 数据集上对NCMIPV 算法进行性能评估. 实验结果表明, 与原始特征子集相比, 该算法能有效降低特征空间维数, 运行时间也相对较短, 分类差错率可与其他算法相比, 在某些场合下性能明显优于其他算法.

  相似文献   

19.
基序查找是生物信息学中的一个重要问题,由于生物序列中大多数信号的复杂性,一直没有很好的模型或可靠的算法来求解这一问题.本文提出了一种基于统一投影和邻居桶聚集提炼策略的基序查找算法UPNT(Uniform Projection with Neighbourhood Thresholding).在UPNT算法中,利用统一投影策略有效减少了投影数目,并使用邻居桶聚集提炼的策略大大减少了提炼桶的数目.本文进一步使用背景分布均衡与非均衡的合成(l,d)序列两套数据集对算法性能进行测试和分析,实验结果表明:UPNT在成功率和运行时间上的综合性能优于Random Projection、Aggregation和Uniform Projection等投影算法,具有更强的适用性.  相似文献   

20.

In machine learning, searching for the optimal feature subset from the original datasets is a very challenging and prominent task. The metaheuristic algorithms are used in finding out the relevant, important features, that enhance the classification accuracy and save the resource time. Most of the algorithms have shown excellent performance in solving feature selection problems. A recently developed metaheuristic algorithm, gaining-sharing knowledge-based optimization algorithm (GSK), is considered for finding out the optimal feature subset. GSK algorithm was proposed over continuous search space; therefore, a total of eight S-shaped and V-shaped transfer functions are employed to solve the problems into binary search space. Additionally, a population reduction scheme is also employed with the transfer functions to enhance the performance of proposed approaches. It explores the search space efficiently and deletes the worst solutions from the search space, due to the updation of population size in every iteration. The proposed approaches are tested over twenty-one benchmark datasets from UCI repository. The obtained results are compared with state-of-the-art metaheuristic algorithms including binary differential evolution algorithm, binary particle swarm optimization, binary bat algorithm, binary grey wolf optimizer, binary ant lion optimizer, binary dragonfly algorithm, binary salp swarm algorithm. Among eight transfer functions, V4 transfer function with population reduction on binary GSK algorithm outperforms other optimizers in terms of accuracy, fitness values and the minimal number of features. To investigate the results statistically, two non-parametric statistical tests are conducted that concludes the superiority of the proposed approach.

  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号