首页 | 官方网站   微博 | 高级检索  
     

Web挖掘系统的设计与实现
引用本文:陈建华,包煊.Web挖掘系统的设计与实现[J].计算机工程,2002,28(8):141-142,151.
作者姓名:陈建华  包煊
作者单位:兰州大学计算机系,兰州730000
摘    要:介绍了Web挖掘理论,包括Web挖掘定义、Web挖掘任务、Web挖掘分类3个方面,并简单介绍了实现Web文本挖掘系统WTMiner(Web Text Miner)的几个关键技术:分词,特征提取,分类器的设计。在分词中采用了支持首字Hash和二分查找了 从而提高了分词速度,分类器的设计中考虑到SVM的训练算法速度慢的缺点,用近邻法以减少训练样本集中样本的数量,从而大大提高了算法速度。

关 键 词:Web  设计  文本分类  支持向量机  数据挖掘系统  数据库  计算机网络  信息检索
文章编号:1000-3428(2002)08-0141-02

Design and Implementation of a Web Mining Tool
CHEN JianhuaBAO Xuan.Design and Implementation of a Web Mining Tool[J].Computer Engineering,2002,28(8):141-142,151.
Authors:CHEN JianhuaBAO Xuan
Abstract:Firstly, the paper introduces the theory of Web mining, including the definition, the task and the categorization of Web mining. Secondly, it also introduces several pivotal technologies in WTMiner (Web Text Miner), including word segmentation, term extraction and categorization method. In word segmentation, it uses two-way searching and hashing operation by means of the first Chinese character in a string to accelerate its speed.Considering the slow training speed to SVM (support vector machine), it uses K-nearest neighbor SVM to reduce the number of training set, so increase the algorithms speed greatly. ;;;
Keywords:Web miningText categorizationSupport vector machine (SVM)Word segmentation  
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号