首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 156 毫秒
1.
随着互联网的不断发展,网络上的文本数据日益增多,如果能对这些数据进行有效分类,那么更有利于从中挖掘出有价值的信息,因此文本数据的管理和整合显得十分重要。文本分类是自然语言处理任务中的一项基础性工作,主要应用于舆情检测及新闻文本分类等领域,目的是对文本资源进行整理和归类。基于深度学习的文本分类,在对文本数据处理中,表现出较好的分类效果。本文对用于文本分类的深度学习算法进行详细阐述,按照深度学习的不同算法进行分类,并分析各种算法的特点,最后对深度学习算法在文本分类领域的未来研究方向进行总结。  相似文献   

2.
为了解决藻类分类识别中人工选取特征困难的问题,提出了一种基于深度学习的藻类分类识别方法。首先,对训练和测试样本集数据进行处理,得到所需数据的格式;其次,研究各种深度学习模型,理解卷积层、全连接层等的作用,基于Caffe设计深度学习网络模型;最后,根据设计的深度学习网络模型,比较各个模型的性能,得到最好的模型。实验结果表明,使用该方法做藻类分类,优于张松等基于视觉词包模型训练SVM分类器的方法,得到比较理想的效果。  相似文献   

3.
传统的图像分类算法在数据集过小的情况下分类准确率不高,且传统的图像变形方法容易破坏数据主体语义信息。基于图像变形网络的小样本图像分类算法研究中,采用端对端的方式结合图像变形网络和小样本图像分类网络,通过加权融合训练图像和相似图像的方式实现了对原有数据集的有效扩充,利用数据增强提高了小样本图像分类的准确率。实验数据表明,提出的方法在mini-ImageNet数据集上对小样本图像分类网络的性能有较好的提升效果。  相似文献   

4.
大数据时代,随着社交媒体的不断普及,在网络以及生活中,各类文本数据日益增长,采用文本分类技术对文本数据进行分析和管理具有重要的意义。文本分类是自然语言处理领域中的一个基础研究内容,在给定标准下,根据内容对文本进行分类,文本分类的场景应用十分广泛,如情感分析、话题分类和关系分类等。深度学习是机器学习中一种基于对数据进行表征学习的方法,在文本数据处理中表现出了较好的分类效果。中文文本与英文文本在形、音、象上都有着区别,着眼于中文文本分类的特别之处,对用于中文文本分类的深度学习方法进行分析与阐述,最终梳理出常用于中文文本分类的数据集。  相似文献   

5.
张珂  高策  郭丽茹  苑津莎  赵振兵 《计算机应用》2017,37(11):3244-3248
针对非受限条件下人脸图像年龄分类准确度较低的问题,提出了一种基于深度残差网络(ResNets)和大数据集微调的非受限条件下人脸年龄分类方法。首先,选用深度残差网络作为基础卷积神经网络模型处理人脸年龄分类问题;其次,在ImageNet数据集上对深度残差网络预训练,学习基本图像特征的表达;然后,对大规模人脸年龄图像数据集IMDB-WIKI清洗,并建立了IMDB-WIKI-8数据集用于微调深度残差网络,实现一般物体图像到人脸年龄图像的迁移学习,使模型适应于年龄段的分布并提高网络学习能力;最后,在非受限人脸数据集Adience上对微调后的网络模型进行训练和测试,并采用交叉验证方法获取年龄分类准确度。通过34/50/101/152层残差网络对比可知,随着网络层数越深年龄分类准确度越高,并利用152层残差网络获得了Adience数据集上人脸图像年龄分类的最高准确度65.01%。实验结果表明,结合更深层残差网络和大数据集微调,能有效提高人脸图像年龄分类准确度。  相似文献   

6.
戴芹  陈雪  马建文  李启青  冯春 《计算机工程》2005,31(15):35-36,86
选择了北京奥运主场馆及其周围的地区作为实验区,购置陆地卫星ETM+6个波段数据,从学习机制和技术流程上对贝叶斯网络分类和最大似然分类进行了对比,实验结果表明:贝叶斯网络分类方法在提高遥感数据的分类精度方面具有较大的研发潜力,贝叶斯网络为遥感数据分类处理提供了一种可选择途径。  相似文献   

7.
郑燕  王杨  郝青峰  甘振韬 《计算机应用》2014,34(5):1336-1340
传统的超网络模型在处理不平衡数据分类问题时,具有很大的偏向性,正类的识别率远远高于负类。为此,提出了一种代价敏感超网络Boosting集成算法。首先,将代价敏感学习引入超网络模型,提出了代价敏感的超网络模型;同时,为了使算法能够自适应正类的错分代价,采用Boosting算法对代价敏感超网络进行集成。代价敏感超网络能很好地修正传统的超网络在处理不平衡数据分类问题时过分偏向正类的缺陷,提高对负类的分类准确性。实验结果表明,代价敏感超网络Boosting集成算法具有处理不平衡数据分类问题的优势。  相似文献   

8.
提出了一种基于卷积神经网络和随机森林相结合的方法,用于对海洋可食用鱼类进行识别分类。通过使用YOLOv3目标检测网络对原始鱼类图片进行目标定位并使用数据增强方法对数据集进行扩充,模型在自建数据集上进行训练和微调,达到了较高的分类准确率和稳定性。实验结果表明该模型在鱼类分类任务上的有效性,并为解决传统方法对鱼类分类的困难提供了一种新思路。  相似文献   

9.
剩余油的形态分布情况对油田的深度开发有着重大意义. 针对剩余油数据量较少和传统的形态参数分类能力有限等问题, 提取一种基于深度学习的剩余油形态分类方法. 该方法在数据预处理部分, 利用生成对抗网络ACGAN的多类别数据生成特性对剩余油图像进行数据增强; 采用VGG19模型作为主干网络提取传统形态参数无法描述的深层特征, 同时引入SENet注意力机制, 改善模型特征表达能力, 使得最终的分类结果更加精确. 为验证本研究方法的有效性, 将本文方法与传统形态参数和其他深度学习模型的分类方法进行对比, 并通过主观视觉和客观指标进行评估, 结果表明本文方法分类更为精确.  相似文献   

10.
衣治安  吕曼 《计算机工程》2007,33(15):167-169
网络入侵检测所处理的数据由多类攻击数据和正常数据构成,基于此对多分类支持向量机在网络入侵检测中的应用进行了研究,采用一对一方法构造了多分类支持向量机分类器,用KDD99入侵检测数据对所提出的多分类支持向量机分类器进行了测试评估,将实验结果和BP神经网络方法进行了比较。实验表明提出的方法是可行的、高效的。  相似文献   

11.
基于单类分类器的半监督学习   总被引:1,自引:0,他引:1  
提出一种结合单类学习器和集成学习优点的Ensemble one-class半监督学习算法.该算法首先为少量有标识数据中的两类数据分别建立两个单类分类器.然后用建立好的两个单类分类器共同对无标识样本进行识别,利用已识别的无标识样本对已建立的两个分类面进行调整、优化.最终被识别出来的无标识数据和有标识数据集合在一起训练一个基分类器,多个基分类器集成在一起对测试样本的测试结果进行投票.在5个UCI数据集上进行实验表明,该算法与tri-training算法相比平均识别精度提高4.5%,与仅采用纯有标识数据的单类分类器相比,平均识别精度提高8.9%.从实验结果可以看出,该算法在解决半监督问题上是有效的.  相似文献   

12.
The use of toolkits and reference frameworks for the design and evaluation of learning activities enables the systematic application of pedagogical criteria in the elaboration of learning resources and learning designs. Pedagogical classification as described in such frameworks is a major criterion for the retrieval of learning objects, since it serves to partition the space of available learning resources depending either on the pedagogical standpoint that was used to create them, or on the interpreted pedagogical orientation of their constituent learning contents and activities. However, pedagogical classification systems need to be evaluated to assess their quality with regards to providing a degree of inter-subjective agreement on the meaning of the classification dimensions they provide. Without such evaluation, classification metadata, which is typically provided by a variety of contributors, is at risk of being fuzzy in reflecting the actual pedagogical orientations, thus hampering the effective retrieval of resources. This paper describes a case study that evaluates the general pedagogical dimensions proposed by Conole et al. to classify learning resources. Rater agreement techniques are used for the assessment, which is proposed as a general technique for the evaluation of such kind of classification schemas. The case study evaluates the degree of coherence of the pedagogical dimensions proposed by Conole et al. as an objective instrument to classify pedagogical resources. In addition, the technical details on how to integrate such classifications in learning object metadata are provided.  相似文献   

13.
为了能有效应对数据流中的概念漂移现象,提出结合无监督学习的数据流分类算法.该算法以集成式分类技术为基础,在分类过程中引入属性约简,利用聚类算法对数据进行聚类,通过对比分类和聚类结果的准确率,判断是否发生概念漂移.实验表明,文中算法在综合时间花销和准确率上取得较好效果.  相似文献   

14.
Incremental learning techniques have been used extensively to address the data stream classification problem. The most important issue is to maintain a balance between accuracy and efficiency, i.e., the algorithm should provide good classification performance with a reasonable time response. This work introduces a new technique, named Similarity-based Data Stream Classifier (SimC), which achieves good performance by introducing a novel insertion/removal policy that adapts quickly to the data tendency and maintains a representative, small set of examples and estimators that guarantees good classification rates. The methodology is also able to detect novel classes/labels, during the running phase, and to remove useless ones that do not add any value to the classification process. Statistical tests were used to evaluate the model performance, from two points of view: efficacy (classification rate) and efficiency (online response time). Five well-known techniques and sixteen data streams were compared, using the Friedman’s test. Also, to find out which schemes were significantly different, the Nemenyi’s, Holm’s and Shaffer’s tests were considered. The results show that SimC is very competitive in terms of (absolute and streaming) accuracy, and classification/updating time, in comparison to several of the most popular methods in the literature.  相似文献   

15.
Aggregating outputs of multiple classifiers into a committee decision is one of the most important techniques for improving classification accuracy. The issue of selecting an optimal subset of relevant features plays also an important role in successful design of a pattern recognition system. In this paper, we present a neural network based approach for identifying salient features for classification in neural network committees. Feature selection is based on two criteria, namely the reaction of the cross-validation data set classification error due to the removal of the individual features and the diversity of neural networks comprising the committee. The algorithm developed removed a large number of features from the original data sets without reducing the classification accuracy of the committees. The accuracy of the committees utilizing the reduced feature sets was higher than those exploiting all the original features.  相似文献   

16.
层次分类方法利用类别层次结构来分解问题和组织分类器,可有效解决多类分类问题。依据是否要求类别之间存在显式层次关系,层次分类方法可分为两大类。文中对不要求类别之间存在显式层次关系的层次分类方法进行综述。首先归纳和阐述此类方法所采用的基本框架,然后介绍和分析其中若干关键技术的研究进展,最后从算法和应用两个角度对国内外相关研究进行详细叙述,进而对现有方法进行总结,并给出进一步研究的方向。  相似文献   

17.
Data mining techniques such as classification algorithms are applied to data which are usually high dimensional and very large. In order to assist the user to perform a classification task, visual techniques can be employed to represent high dimensional data in a more comprehensible 2D or 3D space. However, such representation of high dimensional data in the 2D or 3D space may unavoidably cause overlapping data and information loss. This issue can be addressed by interactive visualization. With expert domain knowledge, the user can build classifiers that are as competitive as automated ones using a 2D or 3D visual interface interactively. Several visual techniques have been proposed for classifying high dimensional data. However, the user׳s interaction with those techniques is highly dependent on the experience of the user in the visual identification of classifying data, and as a result, the classification results of those techniques may vary and may not be repeatable. To address this deficiency, this article presents an interactive visual approach to the classification of high dimensional data. Our approach employs the enhanced separation feature of a visual technique called HOV3 by which the user plots the training dataset by applying statistical measurements on a 2D space in order to separate data points into groups with the same class labels. A data group with its corresponding statistical measurement which separated it from the others is taken as a visual classifier. Then the user mixes the data points in a classifier with the unlabeled dataset and plots them in HOV3 by the measurement of the classifier. The data points which overlap the labeled ones in the 2D space are assigned the corresponding label. Our approach avoids the randomness in the existing interactive visual classification techniques, as the visual classifier in this approach only depends on the training dataset and its statistical measurement. As a result, this work provides an intuitive and effective approach to classify high dimensional data by interactive visualization.  相似文献   

18.
In this paper, we investigate the practical implementation issues of the real-time constrained linear discriminant analysis (CLDA) approach for remotely sensed image classification. Specifically, two issues are to be resolved: (1) what is the best implementation scheme that yields lowest chip design complexity with comparable classification performance, and (2) how to extend CLDA algorithm for multispectral image classification. Two limitations about data dimensionality have to be relaxed. One is in real-time hyperspectral image classification, where the number of linearly independent pixels received for classification must be larger than the data dimensionality (i.e., the number of spectral bands) in order to generate a non-singular sample correlation matrix R for the classifier, and relaxing this limitation can help to resolve the aforementioned first issue. The other is in multispectral image classification, where the number of classes to be classified cannot be greater than the data dimensionality, and relaxing this limitation can help to resolve the aforementioned second issue. The former can be solved by introducing a pseudo inverse initiate of sample correlation matrix for R-1 adaptation, and the latter is taken care of by expanding the data dimensionality via the operation of band multiplication. Experiments on classification performance using these modifications are conducted to demonstrate their feasibility. All these investigations lead to a detailed ASIC chip design scheme for the real-time CLDA algorithm suitable to both hyperspectral and multispectral images. The proposed techniques to resolving these two dimensionality limitations are instructive to the real-time implementation of several popular detection and classification approaches in remote sensing image exploitation.  相似文献   

19.
为有效使用大量未标注的图像进行分类,提出一种基于半监督学习的图像分类方法。通过共同的隐含话题桥接少量已标注的图像和大量未标注的图像,利用已标注图像的Must-link约束和Cannot-link约束提高未标注图像分类的精度。实验结果表明,该方法有效提高Caltech-101数据集和7类图像集约10%的分类精度。此外,针对目前绝大部分半监督图像分类方法不具备增量学习能力这一缺点,提出该方法的增量学习模型。实验结果表明,增量学习模型相比无增量学习模型提高近90%的计算效率。关键词半监督学习,图像分类,增量学习中图法分类号TP391。41IncrementalImageClassificationMethodBasedonSemi-SupervisedLearningLIANGPeng1,2,LIShao-Fa2,QINJiang-Wei2,LUOJian-Gao31(SchoolofComputerScienceandEngineering,GuangdongPolytechnicNormalUniversity,Guangzhou510665)2(SchoolofComputerScienceandEngineering,SouthChinaUniversityofTechnology,Guangzhou510006)3(DepartmentofComputer,GuangdongAIBPolytechnicCollege,Guangzhou510507)ABSTRACTInordertouselargenumbersofunlabeledimageseffectively,animageclassificationmethodisproposedbasedonsemi-supervisedlearning。Theproposedmethodbridgesalargeamountofunlabeledimagesandlimitednumbersoflabeledimagesbyexploitingthecommontopics。Theclassificationaccuracyisimprovedbyusingthemust-linkconstraintandcannot-linkconstraintoflabeledimages。TheexperimentalresultsonCaltech-101and7-classesimagedatasetdemonstratethattheclassificationaccuracyimprovesabout10%bytheproposedmethod。Furthermore,duetothepresentsemi-supervisedimageclassificationmethodslackingofincrementallearningability,anincrementalimplementationofourmethodisproposed。Comparingwithnon-incrementallearningmodelinliterature,theincrementallearningmethodimprovesthecomputationefficiencyofnearly90%。  相似文献   

20.
如何有效评价训练数据集的可用性,一直是困扰智能分类系统应用的难点问题。针对机器学习领域的数据分类问题,提出了一种基于区间分析和信息粒化的数据集分类可用性的评估方法,用于评价数据集的可分程度。该方法将待评估的数据集定义为分类信息系统,提出了分类置信区间的概念,通过区间分析进行信息粒化。在此信息粒化策略下,定义分类可用性的数学模型,并进一步给出单个属性以及整体数据集的分类可用性的计算方法。选择18个UCI标准数据集作为评估对象,给出了部分数据集分类可用性的评估结果,并且选取3种分类器对所选数据集进行分类实验,最终通过对上述实验结果的分析证明了该评估方法的有效性和可行性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号