首页 | 官方网站   微博 | 高级检索  
     

基于超链接信息的Web文本聚类方法研究
引用本文:孙莉娜.基于超链接信息的Web文本聚类方法研究[J].数字社区&智能家居,2006(26).
作者姓名:孙莉娜
作者单位:天津大学电信学院 天津300072
摘    要:面对当前大量的文本数据信息,如何帮助人们准确定位所需信息,成为文本挖掘领域的一个研究趋势。通过将文本分类和聚类方法应用于信息检索-—对网页文本进行聚类,提出了基于超链接信息的Web文本自动聚类模型。利用结构挖掘技术获得主题领域的多个权威网页作为初始聚类中心,通过去除超链接信息中的噪声和多余链接得到网站的简明拓扑结构,并结合内容挖掘,动态调整聚类中心,最终将网页聚成各主题下的不同子类别。

关 键 词:文本挖掘  HITS算法  拓扑结构

Research on the Method of Clustering Web Documents Based on Hyperlink Information
SUN Li-na.Research on the Method of Clustering Web Documents Based on Hyperlink Information[J].Digital Community & Smart Home,2006(26).
Authors:SUN Li-na
Abstract:Facing the massive volume text data information, how to locate the required information is one of the important research directions of text mining. The algorithms of text classification and clustering are applied to information retrieval, so the method of clustering Web documents based on hyperlink is presented according to the especial feature. And then the topological structure of website are found through hyperlink information, those noise and surplus hyperlink are cut down, the clusters are carried out based on the similarity between characteristic vectors which get from the content excavate of hyperlink anchor texts and web page texts. At the same time, the cluster centurions are adjusted dynamically, so as to realize the Web documents clustering based on hyperlink.
Keywords:text mining  Hyperlink-Induced Topic Search algorithm  topological structure
本文献已被 CNKI 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号