首页 | 官方网站   微博 | 高级检索  
     

基于深度学习的中文专利自动分类方法研究
引用本文:吕璐成,韩涛,周健,赵亚娟.基于深度学习的中文专利自动分类方法研究[J].图书情报工作,2020,64(10):75-85.
作者姓名:吕璐成  韩涛  周健  赵亚娟
作者单位:1.中国科学院文献情报中心, 北京, 100190;2.中国科学院大学经济与管理学院图书情报与档案管理系, 北京, 100190;3.中国科学院计算技术研究所, 北京, 100190
基金项目:本文系中国科学院青年人才项目"基于深度学习的专利所属产业分类"(项目编号:G180161001)研究成果之一。
摘    要:目的/意义]面向当前国内专利审查和专利情报分析工作中对于海量专利分类的客观需求,设计了7种基于深度学习的专利自动分类方法,对比各种方法的分类效果,从而助力专利分类效率和效果的提升。方法/过程]针对传统机器学习方法存在的缺陷,基于Word2Vec、CNN、RNN、Attention机制等深度学习技术,考虑专利文本语序特征、上下文特征以及分类关键特征,设计Word2Vec+TextCNN、Word2Vec+GRU、Word2Vec+BiGRU、Word2Vec+BiGRU+TextCNN等7种深度学习模型,以中国专利为例,选取IPC主分类号的"部"作为分类依据,对比这7种模型与3种传统分类模型在中文专利分类任务中的效果。结果/结论]实证研究效果显示,采用考虑语序特征、上下文特征及强化关键特征的深度学习方法进行中文专利分类具有更优的分类效果。

关 键 词:专利自动分类  深度学习  词嵌入  专利文本挖掘
收稿时间:2019-11-11
修稿时间:2019-12-27

Research on the Method of Chinese Patent Automatic Classification Based on Deep Learning
Lyu Lucheng,Han Tao,Zhou Jian,Zhao Yajuan.Research on the Method of Chinese Patent Automatic Classification Based on Deep Learning[J].Library and Information Service,2020,64(10):75-85.
Authors:Lyu Lucheng  Han Tao  Zhou Jian  Zhao Yajuan
Affiliation:1.National Science Library, Chinese Academy of Sciences, Beijing 100190;2.Department of Library, Information and Archives Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190;3.Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190
Abstract:Purpose/significance] In order to meet the needs of classifying massive patent automatically in current patent examination and patent information analysis work, this paper studies a series of patent automatic classification methods based on deep learning and compares the classification effects. This will promote the efficiency and effectiveness of patent classification. Method/process] Aiming at the shortcoming of traditional machine learning methods, 7 deep learning models was designed, including Word2Vec+TextCNN, Word2Vec+GRU, Word2Vec+BiGRU, Word2Vec+ BiGRU+TextCNN and so on. These models based on the deep learning technology, such as Word2Vec, CNN, RNN, Attention mechanism and so on and considered the characteristics of patent text word order, context features and other key features in classification. Selecting the ‘Section’ of main International Patent Classification (IPC) was as the class labels, the study classified the Chinese patents by above 7 deep learning models and 3 traditional machine learning methods. And there was a comparison about the effect of classification in different models. Result/conclusion] The empirical research indicated that it reached the better effect of Chinese patent classification by using deep learning methods which considered the characteristics of patent text word order, context features and other key features in classification.
Keywords:patent automatic classification  deep learning  word embedding  patent text mining  
本文献已被 维普 等数据库收录!
点击此处可从《图书情报工作》浏览原始摘要信息
点击此处可从《图书情报工作》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号