WWW中文信息自动分类方法研究 Study on Automatic Categorizing Method of Chinese Information for World Wide Web期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

WWW中文信息自动分类方法研究

引用本文：	郑家恒,宋文中.WWW中文信息自动分类方法研究[J].情报学报,2002,21(5):532-536.

作者姓名：	郑家恒宋文中

作者单位：	山西大学计算机科学系,太原,030006

摘要：	本文采用一种基于词的归类技术。在类别词专指度的计算中 ,考虑了类别词在语料中的频度、集中度和分布性等因素。根据HTML语言的标记特性 ,应用三维加权分类算法计算类别权值。采用Bayes公式变型 ,计算WWW中文信息文件归类可信度 ,并按可信度最大归类。对 10 8篇试语料进行测试 ,封闭测试的归类正确率为98 1% ,开放测试的正确率为 83 3%。
关键词：	WWW中文信息自动分类文本自动分类类别词
修稿时间：	2001年9月3日
Study on Automatic Categorizing Method of Chinese Information for World Wide Web

Zheng Jiaheng and Song Wenzhong.Study on Automatic Categorizing Method of Chinese Information for World Wide Web[J].Journal of the China Society for Scientific andTechnical Information,2002,21(5):532-536.

Authors:	Zheng Jiaheng and Song Wenzhong

Abstract:	The word-based categorization is adopted in the paper.It not only uses the frequency,concentrated degree and distribution,but also uses amount of the every corpus to determine the specialty of the category-word.This paper analyses the tag of HTML,discusses the research on the three-dimensional weighted algorithm to calculate the classification weight.The algorithm uses the frequency,location and specialty.The reliability is calculated by Bayes algorithm and the document is categorized to the kind which reliability is maximum.Close testing and open testing are done in the experiment system.The recall ratio of close testing is 98.1%,the accuracy of open testing is 83.3%.

Keywords:	WWW Chinese information automatic categorization text automatic categorization category-word
本文献已被 CNKI 万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏