基于视觉特征和领域本体的Web信息抽取 Visual Features and Domain Ontology-Based Web Information Extraction期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于视觉特征和领域本体的Web信息抽取

引用本文：	张鑫,陈梅,王翰虎,王嫣然.基于视觉特征和领域本体的Web信息抽取[J].微机发展,2011(2):58-61,65.

作者姓名：	张鑫陈梅王翰虎王嫣然

作者单位：	贵州大学计算机科学与信息学院;

基金项目：	贵州省2008年省级信息化专项基金项目(0830); 贵州省科技计划工业攻关基金项目(黔科合GY字[2008]3035)

摘要：	为了解决网页信息的自动抽取,该文提出了一种基于视觉特征和领域本体的Web信息抽取算法。该算法以基于领域本体的信息抽取为基础,根据网页的视觉特征来准确划定信息抽取区域,然后结合DOM树技术和抽取路径的启发式学习,获得Web页面中信息项的抽取路径。通过信息项的抽取路径自动生成信息项的领域本体,通过信息项的领域本体解析出信息项的抽取规则。使用本算法来进行Web信息的抽取,具有查全率与查准率高、时间复杂度低、用户负担较轻和自动化程度高的特点。
关键词：	视觉特征领域本体 Web信息抽取路径学习启发式学习
Visual Features and Domain Ontology-Based Web Information Extraction

ZHANG Xin,CHEN Mei,WANG Han-hu,WANG Yan-ran.Visual Features and Domain Ontology-Based Web Information Extraction[J].Microcomputer Development,2011(2):58-61,65.

Authors:	ZHANG Xin CHEN Mei WANG Han-hu WANG Yan-ran

Affiliation:	ZHANG Xin,CHEN Mei,WANG Han-hu,WANG Yan-ran(College of Computer Science and Information,Guizhou University,Guiyang 550025,China)

Abstract:	Put forward a Web information extraction algorithm based on visual features and domain ontology in order to solve the problem of Web information automatic extraction.This algorithm is on base of domain ontology-based Web page information extraction,according to the visual characteristics of the sample Web page to accurately delineated the area of information extraction,and get the Web page information item extraction path by combining DOM tree technology and extraction path heuristic learning.Through the do...

Keywords:	visual features domain ontology Web information extraction path learning discovery learning
本文献已被 CNKI 维普等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏