首页 | 官方网站   微博 | 高级检索  
     

基于树结构的Web表格信息抽取方法
引用本文:孙全红,张贞贞.基于树结构的Web表格信息抽取方法[J].华北水利水电学院学报,2011,32(3):108-110.
作者姓名:孙全红  张贞贞
作者单位:华北水利水电学院,河南郑州,450011
基金项目:河南省教育厅科技攻关项目
摘    要:针对目前国内外多种信息抽取方法中存在不同程度的局限性,提出一种基于DOM树和二叉树结构的Web表格信息抽取方法.该方法提供了以Web表格为信息抽取对象的、支持抽取方式选择的Web表格信息抽取工具.该工具将Html文档解析成DOM树,再将DOM树构建成一棵含有文本信息的二叉树,最后通过遍历二叉树实现对Web表格信息的抽取...

关 键 词:表格信息  Html文档  DOM树  二叉树

Information Extraction Method over Web Tables Based on Tree
SUN Quan-hong,ZHANG Zhen-zhen.Information Extraction Method over Web Tables Based on Tree[J].Journal of North China Institute of Water Conservancy and Hydroelectric Power,2011,32(3):108-110.
Authors:SUN Quan-hong  ZHANG Zhen-zhen
Affiliation:(North China Institute of Water Conservancy and Hydroelectric Power,Zhengzhou 4 50011,China)
Abstract:Aiming at the limitations in different degrees in various information extraction methods at home and abroad at present,an information extraction method over we b-tables based on DOM tree and binary tree was put forward.The method provided a web-table information extraction tool which the web-table was used as inform ation extraction objects and the choice of extraction modes was supported.The t ool parsed Html documents into DOM tree,then constructed a DOM tree into a bina ry tree containing texts,finally the information extraction of web-table was a chieved by traversing a binary tree.
Keywords:table information  Html document  DOM tree  binary tree
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号