首页 | 官方网站   微博 | 高级检索  
     

面向专利的主题短语提取
引用本文:马建红,姬帅,刘硕.面向专利的主题短语提取[J].计算机工程与设计,2019,40(5):1365-1369,1382.
作者姓名:马建红  姬帅  刘硕
作者单位:河北工业大学计算机科学与软件学院,天津,300401;河北工业大学计算机科学与软件学院,天津,300401;河北工业大学计算机科学与软件学院,天津,300401
摘    要:在中文专利主题挖掘研究中,针对基于单词的传统主题模型结果可解释性较差问题,提出一种融合词向量和Generalized Pólya urn (GPU)的改进模型GW_PhraseLDA。根据专利文本特点,使用BLSTM-CRF模型进行专利短语抽取,利用训练好的词向量生成先验知识。在Gibbs采样的迭代过程中,利用GPU策略提升语义相关短语在同一主题下的概率。在中文专利文本上的实验结果表明,所提模型能够有效提高专利主题生成质量,相比传统的主题模型更具可解释性和判别性。

关 键 词:专利挖掘  短语抽取  双向长短时记忆网络  条件随机场  主题模型

Topical phrase mining for patent
MA Jian-hong,JI Shuai,LIU Shuo.Topical phrase mining for patent[J].Computer Engineering and Design,2019,40(5):1365-1369,1382.
Authors:MA Jian-hong  JI Shuai  LIU Shuo
Affiliation:(School of Computer Science and Engineering,Hebei University of Technology,Tianjin 300401,China)
Abstract:In the study of Chinese patent topic mining,an improved model GW _ PhraseLDA,which combined word vector and Generalized Pólya urn (GPU),was proposed to solve the problem of poor interpretability of the result of the traditional topic model based on the word.According to the characteristics of the patent text,the BLSTM-CRF model was used to extract the patent phrases.The trained word vectors were used to generate prior knowledge.In the iterative process of Gibbs sampling,the GPU strategy was used to improve the probability of semantic related phrases under the same topic.Results of experiments on Chinese patent texts show that the model proposed can effectively improve the quality of patent topic,which is much more interpretable and discriminant than traditional topic models.
Keywords:patent mining  term extraction  bidirectional long short-term memory  conditional random fields  topic model
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号