首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 187 毫秒
1.
本文实现了基于马尔可夫模型的启动子预测算法,结合隐马尔可夫模型中的前向算法,改进了基于马尔可夫理论的启动子预测方法,具体改进了碱基转移概率的算法和序列所在模型的概率。改进的马尔可夫模型的预测结果显示,以此模型建立的系统能更有效地识别数据集中的三种序列。  相似文献   

2.
启动子识别是生物信息学的一个重要研究方向,根据启动子本身的特点已经有基于信号、内容和CpG岛等多种识别算法。针对基因序列数据数据量大、维数高、非线性的特点,提出了基于流形结构重建的启动子识别算法,先利用非线性降维方法压缩数据,然后再进行启动子识别。实验结果表明,该方法能够取得较好的结果。  相似文献   

3.
以说话人识别中的背景模型为基础,根据模型中的各个高斯分量,构造出说话人特征空间,将长度不一样的语句映射成为空间中大小相同的向量,且经过相关矩阵进行规整后,采用线性支持向量机进行说话人识别。借鉴几种常见的特征规整方式,结合语句映射后的向量,提出四种不同的规整方法:均值/方差规整、权重规整、WLOG规整和球形规整,并与概率序列核进行比较研究。根据语音特征向量序列中相邻的特征向量的前后转移关系,结合提出的概率序列核,构造出转移概率序列核。实验在NIST2001库上进行,结果表明概率序列核模型识别性能接近经典的UBM-MAP模型,将这两类模型得分进行融合,可非常明显地提高识别性能,进一步融合转移概率序列核后,性能还可提高19.1%。  相似文献   

4.
提出了一种新的基于高斯概率模型的字符识别算法,该算法根据模式识别的样本分布特征与高斯分布的一致性,构建了一个高斯概率模型.在模型中存储概率为P的训练样本,分类识别时,将测试样本与模型进行相关计算得出概率值,进行判断.结果表明,该算法识别速度快,准确率高,与其他字符识别算法(KNN)相比有更好的实用性.  相似文献   

5.
针对真核生物启动子识别的高假阳性现状,提出了一种基于特征综合的真核启动子识别方法。通过提取人类启动子核苷酸联体统计信息作为特征,并使用主成分分析法进行主元提取。将10维主成分特征与2维CpG岛特征进行特征综合,共同作为BP神经网络的输入来识别启动子。对人类基因序列启动子的预测结果表明,不但有效地减小了假阳性,而且具有较好的敏感性和特异性。  相似文献   

6.
较高精度的煤与瓦斯突出预测是煤矿安全生产的必要前提和保证.为了提高煤与瓦斯突出预测模型的预测精度,提出了一种改进的极限学习机煤与瓦斯突出预测模型.首先利用核主成分分析法对煤与瓦斯突出的影响指标进行降维简化处理,提取指标数据的主成分序列;把主成分序列分为训练样本和验证样本,然后在训练阶段,使用训练样本通过结合了全局搜索和局部搜索的文化基因算法对极限学习机的输入权值和隐含层偏差进行优化,得到最佳预测模型;最后,在最佳预测模型中,用验证样本对煤与瓦斯突出强度进行预测.通过实例验证,该模型能够有效预测煤与瓦斯突出强度.与BP、SVM、ELM、KPCA-ELM等预测模型相比,该模型具有更高的预测精度.  相似文献   

7.
基于统计特征的基因识别算法对较长的序列预测精度较高,但对于较短的基因序列识别精度仍然不理想。在分别研究基因序列的碱基组成成分、周期3性质、密码子使用频率和碱基位置相关性的基础上,提出了一种基于多种特征的基因识别算法。实验结果表明对于长度小于90 bp(base pair)的基因序列,提出算法的平均预测精度比现有算法提高2.2%。  相似文献   

8.
提出了一种基于混合高斯隐马尔可夫模型的带式输送机堆煤时刻预测方法。该方法根据传感器采集的带式输送机功率时序数据建立带式输送机运行状态的混合高斯隐马尔可夫模型,基于该模型采用基于图的状态序列遍历算法和基于切普曼-柯尔莫哥罗夫方程的概率转移算法对带式输送机堆煤时刻进行预测:基于图的状态序列遍历算法通过寻找当前状态到堆煤状态的通路确定剩余时间;基于切普曼-柯尔莫哥罗夫方程的概率转移算法通过粒子群优化算法及切普曼-柯尔莫哥罗夫方程交叉验证来获取训练样本上失败状态的概率阈值,并计算当前的状态迁移到超过失败状态概率阈值的转移次数来确定剩余时间。基于煤矿生产实际数据集的实验验证了该方法可有效预测带式输送机的堆煤发生时刻。  相似文献   

9.
介绍了构造性机器学习方法——覆盖算法在蛋白质二级结构预测中的应用。相比普通的神经网络,这种方法直观且运算简单,对训练样本可100%识别。同时,考虑到同源家族的结构应该比单条序列结构预测更准确,采用了基于概率的Profile编码方式,相比以往的预测方法,具有更好的稳定性和精确性。  相似文献   

10.
介绍了构造性机器学习方法——覆盖算法在蛋白质二级结构预测中的应用。相比普通的神经网络,这种方法直观且运算简单,对训练样本可100%识别。同时,考虑到同源家族的结构应该比单条序列结构预测更准确,采用了基于概率的Profile编码方式,相比以往的预测方法,具有更好的稳定性和精确性。  相似文献   

11.
社交网络平台产生海量的短文本数据流,具有快速、海量、概念漂移、文本长度短小、类标签大量缺失等特点.为此,文中提出基于向量表示和标签传播的半监督短文本数据流分类算法,可对仅含少量有标记数据的数据集进行有效分类.同时,为了适应概念漂移,提出基于聚类簇的概念漂移检测算法.在实际短文本数据流上的实验表明,相比半监督分类算法和半监督数据流分类算法,文中算法不仅提高分类精度和宏平均,还能快速适应数据流中的概念漂移.  相似文献   

12.
For improving the classification performance on the cheap, it is necessary to exploit both labeled and unlabeled samples by applying semi-supervised learning methods, most of which are built upon the pair-wise similarities between the samples. While the similarities have so far been formulated in a heuristic manner such as by k-NN, we propose methods to construct similarities from the probabilistic viewpoint. The kernel-based formulation of a transition probability is first proposed via comparing kernel least squares to variational least squares in the probabilistic framework. The formulation results in a simple quadratic programming which flexibly introduces the constraint to improve practical robustness and is efficiently computed by SMO. The kernel-based transition probability is by nature favorably sparse even without applying k-NN and induces the similarity measure of the same characteristics. Besides, to cope with multiple types of kernel functions, the multiple transition probabilities obtained correspondingly from the kernels can be probabilistically integrated with prior probabilities represented by linear weights. We propose a computationally efficient method to optimize the weights in a discriminative manner. The optimized weights contribute to a composite similarity measure straightforwardly as well as to integrate the multiple kernels themselves as multiple kernel learning does, which consequently derives various types of multiple kernel based semi-supervised classification methods. In the experiments on semi-supervised classification tasks, the proposed methods demonstrate favorable performances, compared to the other methods, in terms of classification performances and computation time.  相似文献   

13.
提出了一种基于两阶段学习的半监督支持向量机(semi-supervised SVM)分类算法.首先使用基于图的标签传递算法给未标识样本赋予初始伪标识,并利用k近邻图将可能的噪声样本点识别出来并剔除;然后将去噪处理后的样本集视为已标识样本集输入到支持向量机(SVM)中,使得SVM在训练时能兼顾整个样本集的信息,从而提高SVM的分类准确率.实验结果证明,同其它半监督学习算法相比较,本文算法在标识的训练样本较少的情况下,分类性能有所提高且具有较高的可靠性.  相似文献   

14.
基于集成学习的半监督情感分类方法研究   总被引:1,自引:0,他引:1  
情感分类旨在对文本所表达的情感色彩类别进行分类的任务。该文研究基于半监督学习的情感分类方法,即在很少规模的标注样本的基础上,借助非标注样本提高情感分类性能。为了提高半监督学习能力,该文提出了一种基于一致性标签的集成方法,用于融合两种主流的半监督情感分类方法:基于随机特征子空间的协同训练方法和标签传播方法。首先,使用这两种半监督学习方法训练出的分类器对未标注样本进行标注;其次,选取出标注一致的未标注样本;最后,使用这些挑选出的样本更新训练模型。实验结果表明,该方法能够有效降低对未标注样本的误标注率,从而获得比任一种半监督学习方法更好的分类效果。  相似文献   

15.
针对现有文本分类方法在即时性文本信息上面临的挑战,考虑到即时性文本信息具有已标注数据规模小的特点,为了提高半监督学习的分类性能,该文提出一种基于优化样本分布抽样集成学习的半监督文本分类方法。首先,通过运用一种新的样本抽样的优化策略,获取多个新的子分类器训练集,以增加训练集之间的多样性和减少噪声的扩散范围,从而提高分类器的总体泛化能力;然后,采用基于置信度相乘的投票机制对预测结果进行集成,对未标注数据进行标注;最后,选取适量的数据来更新训练模型。实验结果表明,该方法在长文本和短文本上都取得了优于研究进展方法的分类性能。  相似文献   

16.
在开放环境下,数据流具有数据高速生成、数据量无限和概念漂移等特性.在数据流分类任务中,利用人工标注产生大量训练数据的方式昂贵且不切实际.包含少量有标记样本和大量无标记样本且还带概念漂移的数据流给机器学习带来了极大挑战.然而,现有研究主要关注有监督的数据流分类,针对带概念漂移的数据流的半监督分类的研究尚未引起足够的重视....  相似文献   

17.
Supervised text classification methods are efficient when they can learn with reasonably sized labeled sets. On the other hand, when only a small set of labeled documents is available, semi-supervised methods become more appropriate. These methods are based on comparing distributions between labeled and unlabeled instances, therefore it is important to focus on the representation and its discrimination abilities. In this paper we present the ST LDA method for text classification in a semi-supervised manner with representations based on topic models. The proposed method comprises a semi-supervised text classification algorithm based on self-training and a model, which determines parameter settings for any new document collection. Self-training is used to enlarge the small initial labeled set with the help of information from unlabeled data. We investigate how topic-based representation affects prediction accuracy by performing NBMN and SVM classification algorithms on an enlarged labeled set and then compare the results with the same method on a typical TF-IDF representation. We also compare ST LDA with supervised classification methods and other well-known semi-supervised methods. Experiments were conducted on 11 very small initial labeled sets sampled from six publicly available document collections. The results show that our ST LDA method, when used in combination with NBMN, performed significantly better in terms of classification accuracy than other comparable methods and variations. In this manner, the ST LDA method proved to be a competitive classification method for different text collections when only a small set of labeled instances is available. As such, the proposed ST LDA method may well help to improve text classification tasks, which are essential in many advanced expert and intelligent systems, especially in the case of a scarcity of labeled texts.  相似文献   

18.
Minyoung Kim 《Pattern recognition》2011,44(10-11):2325-2333
We introduce novel discriminative semi-supervised learning algorithms for dynamical systems, and apply them to the problem of 3D human motion estimation. Our recent work on discriminative learning of dynamical systems has been proven to achieve superior performance than traditional generative learning approaches. However, one of the main issues of learning the dynamical systems is to gather labeled output sequences which are typically obtained from precise motion capture tools, hence expensive. In this paper we utilize a large amount of unlabeled (input) video data to improve the prediction performance of the dynamical systems significantly. We suggest two discriminative semi-supervised learning approaches that extend the well-known algorithms in static domains to the sequential, real-valued multivariate output domains: (i) self-training which we derive as coordinate ascent optimization of a proper discriminative objective over both model parameters and the unlabeled state sequences, (ii) minimum entropy approach which maximally reduces the model's uncertainty in state prediction for unlabeled data points. These approaches are shown to achieve significant improvement against the traditional generative semi-supervised learning methods. We demonstrate the benefits of our approaches on the 3D human motion estimation problems.  相似文献   

19.
In real-world data mining applications, it is often the case that unlabeled instances are abundant, while available labeled instances are very limited. Thus, semi-supervised learning, which attempts to benefit from large amount of unlabeled data together with labeled data, has attracted much attention from researchers. In this paper, we propose a very fast and yet highly effective semi-supervised learning algorithm. We call our proposed algorithm Instance Weighted Naive Bayes (simply IWNB). IWNB firstly trains a naive Bayes using the labeled instances only. And the trained naive Bayes is used to estimate the class membership probabilities of the unlabeled instances. Then, the estimated class membership probabilities are used to label and weight unlabeled instances. At last, a naive Bayes is trained again using both the originally labeled data and the (newly labeled and weighted) unlabeled data. Our experimental results based on a large number of UCI data sets show that IWNB often improves the classification accuracy of original naive Bayes when available labeled data are very limited.  相似文献   

20.
特征选择旨在降低高维度特征空间,进而简化问题和优化学习方法。已有的研究显示特征提取方法能够有效降低监督学习的情感分类中的特征维度空间。同以往研究不一样的是,该文首次探讨半监督情感分类中的特征提取方法,提出一种基于二部图的特征选择方法。该方法首先借助二部图模型来表述文档与单词间的关系;然后,结合小规模标注样本的标签信息和二部图模型,利用标签传播(LP)算法计算每个特征的情感概率;最后,按照特征的情感概率进行排序进而实现特征选择。多个领域的实验结果表明,在半监督情感分类任务中,基于二部图的特征选择方法明显优于随机特征选择,在保证分类效果不下降(甚至提高)的前提下有效降低了特征空间维度。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号