首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 640 毫秒
1.
不同训练样本对识别系统的影响   总被引:10,自引:0,他引:10  
刘刚  张洪刚  郭军 《计算机学报》2005,28(11):1923-1928
分析了训练样本对于识别系统性能的影响,将训练样本分为三种: 好样本、差样本和边界样本,分析了它们在训练中所起的作用,并结合基于HMM的手写数字识别系统,给出了一种简单的边界样本定义和选择的方法;通过实验证明了采用边界样本训练可使系统误识率降低17.51%,并证明了边界样本的重要性,且指出非边界样本的存在会影响训练的效果.  相似文献   

2.
HCL2000手写汉字数据库的更新及相关研究   总被引:2,自引:0,他引:2  
HCL2000 是目前最具影响力的手写汉字数据库之一,基于研究手写汉字规律的设计初衷,该数据库采用了以书写者为单位按文件形式组织和存放的方式。本文则从研究样本选择的应用角度出发,对HCL2000中的样本进行了重新组织,同时对该数据库中的错误进行了纠正,生成了一个新的手写汉字数据库HCL2004。文章最后基于HCL2004 数据库和方向线素特征进行了有关训练样本数对识别性能影响的研究,给出了3755类大字符集情况下的最佳训练样本数为300 的结论,同时还对识别过程中的样本选择问题进行了探讨。  相似文献   

3.
本文通过分析传统汉字的结构模型所具有的优缺点,提出了建立脱机手写汉字统计模型的理论框架;并利用PCA技术发现大量数据规律性的能力,提出了一种基于PCA技术的脱机手写汉字的统计模型.与传统的结构模型相比,该模型避免了目前还无法解决的准确抽取结构基元的困难,通过以容易抽取的可重构的统计特征作为统计基元,并通过对统计基元变化的整体描述或者说对统计基元相互之间关系的描述,较好地建立了脱机手写汉字的统计模型.根据该模型得到的一些实验结果充分说明了其描述脱机手写汉字的有效性.  相似文献   

4.
手写汉字识别是手写汉字输入的基础。目前智能设备中的手写汉字输入法无法根据用户的汉字书写习惯,动态调整识别模型以提升手写汉字的正确识别率。通过对最新深度学习算法及训练模型的研究,提出了一种基于用户手写汉字样本实时采集的个性化手写汉字输入系统的设计方法。该方法将采集用户的手写汉字作为增量样本,通过对服务器端训练生成的手写汉字识别模型的再次训练,使识别模型能够更好地适应该用户的书写习惯,提升手写汉字输入系统的识别率。最后,在该理论方法的基础上,结合新设计的深度残差网络,进行了手写汉字识别的对比实验。实验结果显示,通过引入实时采集样本的再次训练,手写汉字识别模型的识别率有较大幅度的提升,能够更有效的满足用户在智能设备端对手写汉字输入系统的使用需求。  相似文献   

5.
建立一个实用的脱机手写汉字笔迹库是研究笔迹鉴别技术的基础,论文结合笔迹图像与书写者信息设计了一个脱机手写汉字笔迹库系统,详细介绍了笔迹样本采集方案及系统的主要功能,阐述了部分关键问题的解决方案。  相似文献   

6.
基于集成学习的自训练算法是一种半监督算法,不少学者通过集成分类器类别投票或平均置信度的方法选择可靠样本。基于置信度的投票策略倾向选择置信度高的样本或置信度低但投票却一致的样本进行标记,后者这种情形可能会误标记靠近决策边界的样本,而采用异构集成分类器也可能会导致各基分类器对高置信度样本的类别标记不同,从而无法将其有效加入到有标记样本集。提出了结合主动学习与置信度投票策略的集成自训练算法用来解决上述问题。该算法合理调整了投票策略,选择置信度高且投票一致的无标记样本加以标注,同时利用主动学习对投票不一致而置信度较低的样本进行人工标注,以弥补集成自训练学习只关注置信度高的样本,而忽略了置信度低的样本的有用信息的缺陷。在UCI数据集上的对比实验验证了该算法的有效性。  相似文献   

7.
针对汉字识别的超多类问题,将贝叶斯网络分类器引入小样本字符集脱机手写体汉字识别中.对手写大写数字汉字的小样本字符集构造识别系统,同时与传统的欧氏距离方法进行比较,实验表明该算法将识别率提高到92.4%,在小样本字符集脱机手写体识别中具有较强的实用性和良好的扩展性.  相似文献   

8.
随着银行业提出手填票据自动化处理需求后,对手写汉字的识别技术研究推向新的高潮。由于手写汉字形体复杂多样、训练样本不多,从而导致识别率难以提高。设计一种多模型的超图学习算法来识别手写汉字块,根据训练样本间距离关系构建样本关系阵;以样本的稀疏表示参数为样本间的关系紧密性权重构建另一个样本关系阵;以样本约束法则为基础,以标记样本间的关系权重构建标记样本间的关系阵,融合这几个关系矩阵成为多模型的超图学习框架。通过迭代学习,找出最优的手写汉字块类别归属,在手写汉字块的实验中表现出一定的优势。  相似文献   

9.
脱机手写汉字中叉点精细化的改进算法   总被引:1,自引:0,他引:1       下载免费PDF全文
康辉  李思莉 《计算机工程》2006,32(20):193-194
提出了一种新的脱机手写汉字中叉点精细化的改进算法。它主要是在提取了手写汉字骨架之后,用Rutoviz相交数得到特征点。在此基础上进行叉点精细化的算法。该改进算法以最大圆规则为基础,结合实验中发现的问题,增加了新的规则,使叉点结合的错误率几乎为零。从而进一步提高了脱机手写汉字识别的正确率。  相似文献   

10.
对于模式识别系统而言,不同的训练样本在建立分类模型时所起的作用不同,以往的蛋白质关联结构预测方法都是从样本集中随机选取一部分样本作为分类器的训练样本,这将降低蛋白质关联结构分类器的预测精度,为改善训练样本对预测精度的影响,本文提出一种基于样本选择及BP神经网络的蛋白质关联结构预测方法.该方法选取与蛋白质关联结构相关的属性进行编码,并采用样本选择技术从编码后的样本集中选取一定的高质量样本构建预测模型,从而有效地对蛋白质关联结构进行预测.本文根据提出的编码方式对从蛋白质数据库PDB中获取的200个蛋白质进行编码,然后用最近邻算法选择训练样本,并使用BP神经网络建立相应的预测模型.实验结果表明,进行训练样本选择能够有效提高蛋白质关联结构的预测精度.  相似文献   

11.
As a core area in data mining, frequent pattern (or itemset) mining has been studied for a long time. Weighted frequent pattern mining prunes unimportant patterns and maximal frequent pattern mining discovers compact frequent patterns. These approaches contribute to improving mining performance by reducing the search space. However, we need to consider both the downward closure property and patterns' subset checking process when integrating these different methods in order to prevent unintended pattern losses. Moreover, it is also essential to extract valid patterns with faster runtime and less memory consumption. For this reason, in this paper, we propose more efficient maximal weighted frequent pattern (MWFP) mining approaches based on tree and array structures. We describe how to handle these problems more efficiently, maintaining the correctness of our method. We develop two types of maximal weighted frequent mining algorithms based on weight ascending order and support descending order and compare these two algorithms to conclude which is more suitable for MWFP mining. In addition, comprehensive tests in this paper show that our algorithms are more efficient and scalable than state‐of‐the‐art algorithms, and they also have the correctness of the MWFP mining in terms of their pattern generation results.  相似文献   

12.
Existing classification algorithms use a set of training examples to select classification features, which are then used for all future applications of the classifier. A major problem with this approach is the selection of a training set: a small set will result in reduced performance, and a large set will require extensive training. In addition, class appearance may change over time requiring an adaptive classification system. In this paper, we propose a solution to these basic problems by developing an on-line feature selection method, which continuously modifies and improves the features used for classification based on the examples provided so far. The method is used for learning a new class, and to continuously improve classification performance as new data becomes available. In ongoing learning, examples are continuously presented to the system, and new features arise from these examples. The method continuously measures the value of the selected features using mutual information, and uses these values to efficiently update the set of selected features when new training information becomes available. The problem is challenging because at each stage the training process uses a small subset of the training data. Surprisingly, with sufficient training data the on-line process reaches the same performance as a scheme that has a complete access to the entire training data.  相似文献   

13.
Feature selection for multi-label naive Bayes classification   总被引:4,自引:0,他引:4  
In multi-label learning, the training set is made up of instances each associated with a set of labels, and the task is to predict the label sets of unseen instances. In this paper, this learning problem is addressed by using a method called Mlnb which adapts the traditional naive Bayes classifiers to deal with multi-label instances. Feature selection mechanisms are incorporated into Mlnb to improve its performance. Firstly, feature extraction techniques based on principal component analysis are applied to remove irrelevant and redundant features. After that, feature subset selection techniques based on genetic algorithms are used to choose the most appropriate subset of features for prediction. Experiments on synthetic and real-world data show that Mlnb achieves comparable performance to other well-established multi-label learning algorithms.  相似文献   

14.
Mining patterns from graph traversals   总被引:12,自引:0,他引:12  
In data models that have graph representations, users navigate following the links of the graph structure. Conducting data mining on collected information about user accesses in such models, involves the determination of frequently occurring access sequences. In this paper, the problem of finding traversal patterns from such collections is examined. The determination of patterns is based on the graph structure of the model. For this purpose, three algorithms, one which is level-wise with respect to the lengths of the patterns and two which are not are presented. Additionally, we consider the fact that accesses within patterns may be interleaved with random accesses due to navigational purposes. The definition of the pattern type generalizes existing ones in order to take into account this fact. The performance of all algorithms and their sensitivity to several parameters is examined experimentally.  相似文献   

15.
A novel successive learning algorithm based on a Test Feature Classifier is proposed for efficient handling of sequentially provided training data. The fundamental characteristics of the successive learning are considered. In the learning, after recognition of a set of unknown data by a classifier, they are fed into the classifier in order to obtain a modified performance. An efficient algorithm is proposed for the incremental definition of prime tests which are irreducible combinations of features and capable of classifying training patterns into correct classes. Four strategies for addition of training patterns are investigated with respect to their precision and performance using real pattern data. A real-world problem of classification of defects on wafer images has been dealt with by the proposed classifier, obtaining excellent performance even through efficient addition strategies.  相似文献   

16.
Deterministic Learning and Rapid Dynamical Pattern Recognition   总被引:3,自引:0,他引:3  
Recognition of temporal/dynamical patterns is among the most difficult pattern recognition tasks. In this paper, based on a recent result on deterministic learning theory, a deterministic framework is proposed for rapid recognition of dynamical patterns. First, it is shown that a time-varying dynamical pattern can be effectively represented in a time-invariant and spatially distributed manner through deterministic learning. Second, a definition for characterizing similarity of dynamical patterns is given based on system dynamics inherently within dynamical patterns. Third, a mechanism for rapid recognition of dynamical patterns is presented, by which a test dynamical pattern is recognized as similar to a training dynamical pattern if state synchronization is achieved according to a kind of internal and dynamical matching on system dynamics. The synchronization errors can be taken as the measure of similarity between the test and training patterns. The significance of the paper is that a completely dynamical approach is proposed, in which the problem of dynamical pattern recognition is turned into the stability and convergence of a recognition error system. Simulation studies are included to demonstrate the effectiveness of the proposed approach  相似文献   

17.
Volatility is a key variable in option pricing, trading, and hedging strategies. The purpose of this article is to improve the accuracy of forecasting implied volatility using an extension of genetic programming (GP) by means of dynamic training‐subset selection methods. These methods manipulate the training data in order to improve the out‐of‐sample patterns fitting. When applied with the static subset selection method using a single training data sample, GP could generate forecasting models, which are not adapted to some out‐of‐sample fitness cases. In order to improve the predictive accuracy of generated GP patterns, dynamic subset selection methods are introduced to the GP algorithm allowing a regular change of the training sample during evolution. Four dynamic training‐subset selection methods are proposed based on random, sequential, or adaptive subset selection. The latest approach uses an adaptive subset weight measuring the sample difficulty according to the fitness cases' errors. Using real data from S&P500 index options, these techniques are compared with the static subset selection method. Based on mean squared error total and percentage of non‐fitted observations, results show that the dynamic approach improves the forecasting performance of the generated GP models, especially those obtained from the adaptive‐random training‐subset selection method applied to the whole set of training samples.  相似文献   

18.
基于互联网和self-training的中文问答模式学习   总被引:1,自引:0,他引:1  
在已有的问答模式学习中,模式定义和候选答案评分偏于简单,而且学习过程依赖于人工标定语料。通过挖掘Web文本中动、名词序列的骨架模式,用以扩充模式定义;将self-training学习机制引入问答模式学习:用一对训练语料进行初始学习,通过互联网搜索,自动选择可靠程度较高的问答对,重新训练;扩充了启发规则,改进候选答案的评分方法。实验结果表明:所提出的问答模式学习方法能有效地提高中文问答系统的性能。  相似文献   

19.
20.
Most of the widely used pattern classification algorithms, such as Support Vector Machines (SVM), are sensitive to the presence of irrelevant or redundant features in the training data. Automatic feature selection algorithms aim at selecting a subset of features present in a given dataset so that the achieved accuracy of the following classifier can be maximized. Feature selection algorithms are generally categorized into two broad categories: algorithms that do not take the following classifier into account (the filter approaches), and algorithms that evaluate the following classifier for each considered feature subset (the wrapper approaches). Filter approaches are typically faster, but wrapper approaches deliver a higher performance. In this paper, we present the algorithm – Predictive Forward Selection – based on the widely used wrapper approach forward selection. Using ideas from meta-learning, the number of required evaluations of the target classifier is reduced by using experience knowledge gained during past feature selection runs on other datasets. We have evaluated our approach on 59 real-world datasets with a focus on SVM as the target classifier. We present comparisons with state-of-the-art wrapper and filter approaches as well as one embedded method for SVM according to accuracy and run-time. The results show that the presented method reaches the accuracy of traditional wrapper approaches requiring significantly less evaluations of the target algorithm. Moreover, our method achieves statistically significant better results than the filter approaches as well as the embedded method.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号