利用word2vec对中文词进行聚类的研究 Research on Chinese word Clustering with Word2vec期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

利用word2vec对中文词进行聚类的研究

引用本文：	郑文超,徐鹏.利用word2vec对中文词进行聚类的研究[J].软件,2013(12):160-162.

作者姓名：	郑文超徐鹏

作者单位：	北京邮电大学网络技术研究院,北京100876

摘要：	文本聚类在数据挖掘和机器学习中发挥着重要的作用，该技术经过多年的发展，已产生了一系列的理论成果。本文在前人研究成果的基础上，探索了一种新的中文聚类方法。本文先提出了一种中文分词算法，用来将中文文本分割成独立的词语。再对处理后的语料使用Word2Vec工具集，应用深度神经网络算法，转化为对应的词向量。最后，将词向量之间的余弦距离定义为词之间的相似度，通过使用K-means聚类算法将获取的词向量进行聚类，最终可以返回语料库中同输入词语语意最接近的词。本文从网络上抓取了2012年的网络新闻数据，应用上述方法进行了实验，取得了不错的实验效果。
关键词：	数据挖掘聚类分词词向量神经网络
Research on Chinese word Clustering with Word2vec

ZHENG Wen-chao,XU Peng.Research on Chinese word Clustering with Word2vec[J].Software,2013(12):160-162.

Authors:	ZHENG Wen-chao XU Peng

Affiliation:	(Beijing University of Posts ＆ Telecommunications Institute of Network Technology, Beijing 100876, China)

Abstract:	Text clustering plays an important role in data mining and machine learning. After years of development, clustering technology has produced a series of theorey. This paper explored a new method of Chinese clustering. By putting forword a new method to Chinese word segments, this paper can split Chinese text into word segments. With Word2Vec toolset, we can transfrom word segments into vectors. To deifne the cosine distance between two vectors, we can apply K-means algorithm on the vectors to cluster words. In this paper, we downloaded network news text on the Internet, and applied the methods above, which shows good result.

Keywords:	data mining clustering word segment word vector neural networks
本文献已被维普等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏