首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 234 毫秒
1.
Along with the increase of data and information, incremental learning ability turns out to be more and more important for machine learning approaches. The online algorithms try not to remember irrelevant information instead of synthesizing all available information (as opposed to classic batch learning algorithms). In this study, we attempted to increase the prediction accuracy of an incremental version of Naive Bayes model by integrating instance based learning. We performed a large-scale comparison of the proposed method with other state-of-the-art algorithms on several datasets and the proposed method produce better accuracy in most cases.  相似文献   

2.
研究表明,端学习机和判别性字典学习算法在图像分类领域极具有高效和准确的优势。然而,这两种方法也具有各自的缺点,极端学习机对噪声的鲁棒性较差,判别性字典学习算法在分类过程中耗时较长。为统一这种互补性以提高分类性能,文中提出了一种融合极端学习机的判别性分析字典学习模型。该模型利用迭代优化算法学习最优的判别性分析字典和极端学习机分类器。为验证所提算法的有效性,利用人脸数据集进行分类。实验结果表明,与目前较为流行的字典学习算法和极端学习机相比,所提算法在分类过程中具有更好的效果。  相似文献   

3.
4.
针对小数据集情况下贝叶斯网络(BN)参数学习结果精度较低的问题,分析了小数据集情况下BN参数变权重设计的必要性,提出一种基于变权重融合的BN参数学习算法VWPL。首先根据专家经验确定不等式约束条件,计算参数学习最小样本数据集阈值,设计了随样本量变化的变权重因子函数;然后根据样本计算出初始参数集,通过Bootstrap方法进行参数扩展得到满足约束条件的候选参数集,将其代入BN变权重参数计算模型即可获取最终的BN参数。实验结果表明,当学习数据量较小时,VWPL算法的学习精度高于MLE算法和QMAP算法的,也优于定权重学习算法的。另外,将VWPL算法成功应用到了轴承故障诊断实验中,为在小数据集上进行BN参数估计提供了一种方法。  相似文献   

5.
Single-machine and flowshop scheduling with a general learning effect model   总被引:3,自引:0,他引:3  
Learning effects in scheduling problems have received growing attention recently. Biskup [Biskup, D. (2008). A state-of-the-art review on scheduling with learning effect. European Journal of Operational Research, 188, 315–329] classified the learning effect scheduling models into two diverse approaches. The position-based learning model seems to be a realistic assumption for the case that the actual processing of the job is mainly machine driven, while the sum-of-processing-time-based learning model takes into account the experience the workers gain from producing the jobs. In this paper, we propose a learning model which considers both the machine and human learning effects simultaneously. We first show that the position-based learning and the sum-of-processing-time-based learning models in the literature are special cases of the proposed model. Moreover, we present the solution procedures for some single-machine and some flowshop problems.  相似文献   

6.
Learning to rank, a task to learn ranking functions to sort a set of entities using machine learning techniques, has recently attracted much interest in information retrieval and machine learning research. However, most of the existing work conducts a supervised learning fashion. In this paper, we propose a transductive method which extracts paired preference information from the unlabeled test data. Then we design a loss function to incorporate this preference data with the labeled training data, and learn ranking functions by optimizing the loss function via a derived Ranking SVM framework. The experimental results on the LETOR 2.0 benchmark data collections show that our transductive method can significantly outperform the state-of-the-art supervised baseline.  相似文献   

7.
邓波  陆颖隽  王如志 《计算机科学》2017,44(3):264-267, 287
在多示例学习(MIL)中,包是含有多个示例的集合,训练样本只给出包的标记,而没有给出单个示例的标记。提出一种基于示例标记强度的MIL方法(ILI-MIL),其允许示例标记强度为任何实数。考虑到基于梯度训练神经网络方法的计算复杂性和ILI-MIL目标函数的复杂性,利用基于化学反应优化的高阶神经网络来实现ILI-MIL,学习方法具有较强的非线性表达能力和较高的计算效率。实验结果表明,该算法比已有算法具有更加有效的分类能力,且适应范围更广。  相似文献   

8.
Generating a low-rank matrix approximation is very important in large-scale machine learning applications. The standard Nyström method is one of the state-of-the-art techniques to generate such an approximation. It has got rapid developments since being applied to Gaussian process regression. Several enhanced Nyström methods such as ensemble Nyström, modified Nyström and SS-Nyström have been proposed. In addition, many sampling methods have been developed. In this paper, we review the Nyström methods for large-scale machine learning. First, we introduce various Nyström methods. Second, we review different sampling methods for the Nyström methods and summarize them from the perspectives of both theoretical analysis and practical performance. Then, we list several typical machine learning applications that utilize the Nyström methods. Finally, we make our conclusions after discussing some open machine learning problems related to Nyström methods.  相似文献   

9.
The problem of video classification can be viewed as discovering the signature patterns in the elemental features of a video class. In order to solve this problem, a large and diverse set of video features is proposed in this paper. The contributions of the paper further lie in dealing with high-dimensionality induced by the feature space and in presenting an algorithm based on two-phase grid searching for automatic parameter selection for support vector machine (SVM). The framework thus is directed to bridge the gap between low-level features and semantic video classes. The experimental results and comparison with state-of-the-art learning tools on more than 5000 video segments show the effectiveness of our approach.  相似文献   

10.
Random forests is currently one of the most used machine learning algorithms in the non-streaming (batch) setting. This preference is attributable to its high learning performance and low demands with respect to input preparation and hyper-parameter tuning. However, in the challenging context of evolving data streams, there is no random forests algorithm that can be considered state-of-the-art in comparison to bagging and boosting based algorithms. In this work, we present the adaptive random forest (ARF) algorithm for classification of evolving data streams. In contrast to previous attempts of replicating random forests for data stream learning, ARF includes an effective resampling method and adaptive operators that can cope with different types of concept drifts without complex optimizations for different data sets. We present experiments with a parallel implementation of ARF which has no degradation in terms of classification performance in comparison to a serial implementation, since trees and adaptive operators are independent from one another. Finally, we compare ARF with state-of-the-art algorithms in a traditional test-then-train evaluation and a novel delayed labelling evaluation, and show that ARF is accurate and uses a feasible amount of resources.  相似文献   

11.
We examine the performance of a fuzzy genetics-based machine learning method for multidimensional pattern classification problems with continuous attributes. In our method, each fuzzy if-then rule is handled as an individual, and a fitness value is assigned to each rule. Thus, our method can be viewed as a classifier system. In this paper, we first describe fuzzy if-then rules and fuzzy reasoning for pattern classification problems. Then we explain a genetics-based machine learning method that automatically generates fuzzy if-then rules for pattern classification problems from numerical data. Because our method uses linguistic values with fixed membership functions as antecedent fuzzy sets, a linguistic interpretation of each fuzzy if-then rule is easily obtained. The fixed membership functions also lead to a simple implementation of our method as a computer program. The simplicity of implementation and the linguistic interpretation of the generated fuzzy if-then rules are the main characteristic features of our method. The performance of our method is evaluated by computer simulations on some well-known test problems. While our method involves no tuning mechanism of membership functions, it works very well in comparison with other classification methods such as nonfuzzy machine learning techniques and neural networks.  相似文献   

12.
Recently, several works have approached the HIV-1 protease specificity problem by applying a number of methods from the field of machine learning. However, it is still difficult for researchers to choose the best method due to the lack of an effective comparison. For the first time we have made an extensive study on methods for feature extraction for the problem of HIV-1 protease. We show that a fusion of classifiers trained in different feature spaces permits to obtain a drastically error reduction with respect to the performance of the state-of-the-art.  相似文献   

13.
Error correcting output codes (ECOCs) is a powerful framework to solve the multi-class problems. Finding the optimal partitions with maximum class discrimination efficiently is a key point to improve its performance. In this paper, we propose an alternative and efficient approach to obtain the partitions which are discriminative in the class space. The main idea of the proposed method is to transform the partition in the class space into the cut for an undirected graph using spectral clustering. In addition to measuring the class similarity, the confusion matrix with a pre-classifier is used. Our method is compared with the classical ECOC and DECOC over a synthetic dataset, a set of UCI machine learning repository datasets and one face recognition application. The results show that our proposal is able to obtain comparable or even better classification accuracy while reducing the computational complexity in comparison with the state-of-the-art coding methods.  相似文献   

14.
Sequential learning is the discipline of machine learning that deals with dependent data such that neighboring labels exhibit some kind of relationship. The paper main contribution is two-fold: first, we generalize the stacked sequential learning, highlighting the key role of neighboring interactions modeling. Second, we propose an effective and efficient way of capturing and exploiting sequential correlations that takes into account long-range interactions. We tested the method on two tasks: text lines classification and image pixel classification. Results on these tasks clearly show that our approach outperforms the standard stacked sequential learning as well as state-of-the-art conditional random fields.  相似文献   

15.
频繁闭项集的挖掘是发现数据项之间关联规则的一种有效方式。当前以MapReduce模式为基础的云计算平台为解决海量数据中的关联规则挖掘问题提供新的解决思路。文中提出并实现一种基于Hadoop云计算平台的频繁闭项集的并行挖掘算法。该算法主要包括并行计数、构造全局频繁项表、并行挖掘局部频繁闭项集和并行筛选全局频繁闭项集四个步骤。在多个数据集上的实验表明,该方法能较大提高数据挖掘的效率,具有较好的加速比。  相似文献   

16.
Pedestrian counting plays an important role in public safety and intelligent transportation. Most pedestrian counting algorithms based on supervised learning require much labeling work and rarely exploit the topological information of unlabelled data in a video. In this paper, we propose a Semi-Supervised Elastic Net (SSEN) regression method by utilizing sequential information between unlabelled samples and their temporally neighboring samples as a regularization term. Compared with a state-of-the-art algorithm, extensive experiments indicate that our algorithm can not only select sparse representative features from the original feature space without losing their interpretability, but also attain superior prediction performance with only very few labelled frames.  相似文献   

17.
Fully automatic annotation of tennis game using broadcast video is a task with a great potential but with enormous challenges. In this paper we describe our approach to this task, which integrates computer vision, machine listening, and machine learning. At the low level processing, we improve upon our previously proposed state-of-the-art tennis ball tracking algorithm and employ audio signal processing techniques to detect key events and construct features for classifying the events. At high level analysis, we model event classification as a sequence labelling problem, and investigate four machine learning techniques using simulated event sequences. Finally, we evaluate our proposed approach on three real world tennis games, and discuss the interplay between audio, vision and learning. To the best of our knowledge, our system is the only one that can annotate tennis game at such a detailed level.  相似文献   

18.
The energy-efficient building design requires building performance simulation (BPS) to compare multiple design options for their energy performance. However, at the early stage, BPS is often ignored, due to uncertainty, lack of details, and computational time. This article studies probabilistic and deterministic approaches to treat uncertainty; detailed and simplified zoning for creating zones; and dynamic simulation and machine learning for making energy predictions. A state-of-the-art approach, such as dynamic simulation, provide a reliable estimate of energy demand, but computationally expensive. Reducing computational time requires the use of an alternative approach, such as a machine learning (ML) model. However, an alternative approach will cause a prediction gap, and its effect on comparing options needs to be investigated. A plugin for Building information modelling (BIM) modelling tool has been developed to perform BPS using various approaches. These approaches have been tested for an office building with five design options. A method using the probabilistic approach to treat uncertainty, detailed zoning to create zones, and EnergyPlus to predict energy is treated as the reference method. The deterministic or ML approach has a small prediction gap, and the comparison results are similar to the reference method. The simplified model approach has a large prediction gap and only makes only 40% comparison results are similar to the reference method. These findings are useful to develop a BIM integrated tool to compare options at the early design stage and ascertain which approach should be adopted in a time-constraint situation.  相似文献   

19.
Representing causality in machine learning to predict control parameters is state-of-the-art research in intelligent control. This study presents a physics-based machine learning method providing a prediction model that guarantees enhanced interpretability conforming to physical laws. The proposed approach encodes physical knowledge as mapping relationships between variables in engineering dataset into the learning procedure through dimensional analysis. This derives causal relationships between the control parameter and its influencing factors. The proposed machine learning method's objective function is further improved by the penalty term in the regularization strategy. Verifications on the energy consumption prediction of tunnel boring machine prove that, the established model accords with basic principles in this field. Moreover, the proposed approach traces the impact of three major factors (structure, operation, and geology) along the construction section, offering each component's contribution rates to energy consumption. Compared with several commonly used machine learning algorithms, the proposed method reduces the need for large amounts of training data and demonstrates higher accuracy. The results indicate that the revealed causality and enhanced prediction performance of the proposed method advance the applicability of machine learning methods to intelligent control during construction.  相似文献   

20.
Nearest neighbors by neighborhood counting   总被引:2,自引:0,他引:2  
Finding nearest neighbors is a general idea that underlies many artificial intelligence tasks, including machine learning, data mining, natural language understanding, and information retrieval. This idea is explicitly used in the k-nearest neighbors algorithm (kNN), a popular classification method. In this paper, this idea is adopted in the development of a general methodology, neighborhood counting, for devising similarity functions. We turn our focus from neighbors to neighborhoods, a region in the data space covering the data point in question. To measure the similarity between two data points, we consider all neighborhoods that cover both data points. We propose to use the number of such neighborhoods as a measure of similarity. Neighborhood can be defined for different types of data in different ways. Here, we consider one definition of neighborhood for multivariate data and derive a formula for such similarity, called neighborhood counting measure or NCM. NCM was tested experimentally in the framework of kNN. Experiments show that NCM is generally comparable to VDM and its variants, the state-of-the-art distance functions for multivariate data, and, at the same time, is consistently better for relatively large k values. Additionally, NCM consistently outperforms HEOM (a mixture of Euclidean and Hamming distances), the "standard" and most widely used distance function for multivariate data. NCM has a computational complexity in the same order as the standard Euclidean distance function and NCM is task independent and works for numerical and categorical data in a conceptually uniform way. The neighborhood counting methodology is proven sound for multivariate data experimentally. We hope it works for other types of data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号