首页 | 官方网站   微博 | 高级检索  
     

基于分布和逆文本类别指数的特征迁移加权算法
引用本文:邱云飞,刘世兴,林明明,邵良杉.基于分布和逆文本类别指数的特征迁移加权算法[J].计算机应用,2015,35(6):1643-1648.
作者姓名:邱云飞  刘世兴  林明明  邵良杉
作者单位:1. 辽宁工程技术大学 软件学院, 辽宁 葫芦岛 125105; 2. 辽宁工程技术大学 系统工程研究所, 辽宁 葫芦岛 125105
基金项目:国家自然科学基金资助项目,辽宁省创新团队项目,辽宁省高等学校杰出青年学者成长计划项目
摘    要:传统机器学习面临一个难题,即当训练数据与测试数据不再服从相同分布时,由训练集得到的分类器无法对测试集文本准确分类。针对该问题,根据迁移学习原理,在源领域和目标领域的交集特征中,依据改进的特征分布相似度进行特征加权;在非交集特征中,引入语义近似度和新提出的逆文本类别指数(TF-ICF),对特征在源领域内进行加权计算,充分利用大量已标记的源领域数据和少量已标记的目标领域数据获得所需特征,以便快速构建分类器。在文本数据集20Newsgroups和非文本数据集UCI中的实验结果表明,基于分布和逆文本类别指数的特征迁移加权算法能够在保证精度的前提下对特征快速迁移并加权。

关 键 词:迁移学习    特征分布    逆文本类别指数    语义近似度    特征加权
收稿时间:2014-12-22
修稿时间:2015-03-17

Feature transfer weighting algorithm based on distribution and term frequency-inverse class frequency
QIU Yunfei,LIU Shixing,LIN Mingming,SHAO Liangshan.Feature transfer weighting algorithm based on distribution and term frequency-inverse class frequency[J].journal of Computer Applications,2015,35(6):1643-1648.
Authors:QIU Yunfei  LIU Shixing  LIN Mingming  SHAO Liangshan
Affiliation:1. School of Software, Liaoning Technical University, Huludao Liaoning 125105, China;
2. System Engineering Institute, Liaoning Technical University, Huludao Liaoning 125105, China
Abstract:Traditional machine learning faces a problem: when the training data and test data no longer obey the same distribution, the classifier trained by training data can't classify test data accurately. To solve this problem, according to the transfer learning principle, the features were weighted according to the improved distribution similarity of source domain and target domain's intersection features. The semantic similarity and Term Frequency-Inverse Class Frequency (TF-ICF) were used to weight non-intersection features in source domain. Lots of labeled source domain data and a little labeled target domain were used to obtain the required features for building text classifier quickly. The experimental results on test dataset 20Newsgroups and non-text dataset UCI show that feature transfer weighting algorithm based on distribution and TF-ICF can transfer and weight features rapidly while guaranteeing precision.
Keywords:transfer learning  feature distribution  Term Frequency-Inverse Class Frequency (TF-ICF)  semantic similarity  feature weighting
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号