首页 | 官方网站   微博 | 高级检索  
     

Web信息抽取及知识表示系统的研究与实现
引用本文:谭守标,徐超,江元,宁仁霞.Web信息抽取及知识表示系统的研究与实现[J].计算机系统应用,2010,19(9):1-4.
作者姓名:谭守标  徐超  江元  宁仁霞
作者单位:1. 安徽大学,电子科学与技术学院,安徽,合肥,230039
2. 黄山学院,电子信息工程系,安徽,黄山,245021
基金项目:安徽省教育厅自然科学基金(2005KJ004ZD)
摘    要:研究了从数据密集型Web页面中自动提取结构化数据并形成知识表示系统的问题。基于知识数据库实现动态页面获取,进行预处理后转换为XML文档,采用基于PAT-array的模式发现算法自动发现重复模式,结合基于本体的关键词库自动识别页面数据显示结构模型,利用XML的对象-关系映射技术将数据存入知识数据库,由此实现Web数据自动抽取。同时,利用知识数据库已有知识从互联网抽取新知识,达到知识数据库的自扩展。以交通信息自动抽取及混合交通出行方案生成与表示系统进行的实验表明该系统具有高抽取准确率和良好的适应性。

关 键 词:Web信息提取  知识表示  数据密集型Web页面  基于本体的关键词库
收稿时间:1/6/2010 12:00:00 AM
修稿时间:2010/2/26 0:00:00

Research and Realization of a Web Information Extraction and Knowledge Presentation System
TAN Shou-Biao,XU Chao,JIANG Yuan and NING Ren-Xia.Research and Realization of a Web Information Extraction and Knowledge Presentation System[J].Computer Systems& Applications,2010,19(9):1-4.
Authors:TAN Shou-Biao  XU Chao  JIANG Yuan and NING Ren-Xia
Affiliation:TAN Shou-Biao, XU Chao, JIANG Yuan (School of Electronic Science and Technology, Anhui University, Hefei 230039, China)NING Ren-Xia(Electronic Information Engineering, Huangshan University, Huangshan 245021, China)
Abstract:The Web Information Extraction and Knowledge Presentation System is proposed to extract information from data intensive web pages. It downloads dynamic web pages, based on a knowledge database, changes them to XML documents after preprocessing, finds repeated patterns from them, by using a PAT-array based Pattern Discovery Algorithm, recognizes their data display structure models, automatically based on the repeated patterns and an ontology-based keyword library, and then extracts the data and stores them in the knowledge database with the object-relational mapping technology of XML. Through these steps, web data is extracted automatically, and the knowledge database is also expanded automatically. Experiments on the traffic information auto-extraction and mixed traffic travel schemes auto-creation system showed that the system has high precision and is adaptive to web pages in different domains with different structures.
Keywords:web information extraction  knowledge presentation  data intensive web pages  ontology-based keyword library
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《计算机系统应用》浏览原始摘要信息
点击此处可从《计算机系统应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号