数字图书馆中信息采集子系统的设计与实现 The Implementation of Information Extraction Subsystem in the DL期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

数字图书馆中信息采集子系统的设计与实现

引用本文：	李贵林,李建中,杨艳.数字图书馆中信息采集子系统的设计与实现[J].计算机工程与应用,2004,40(2):229-232.

作者姓名：	李贵林李建中杨艳

作者单位：	1. 哈尔滨工业大学计算机科学与技术学院,哈尔滨,150001 2. 哈尔滨工业大学计算机科学与技术学院,哈尔滨,150001;黑龙江大学计算机科学技术学院,哈尔滨,150080

摘要：	电子文档的信息提取是建立数字图书馆的基础。论文主要介绍一个数字图书馆的信息采集子系统。它以PDF文件为提取对象,由全自动化录入和半自动录入两部分构成。全自动录入融合了基于规则和自动机两种提取方法的优点,具有速度快、准确率高等特点;半自动录入采用在AdobeAcrobat内部加入plug-in(插件)的方法为用户提供友好的使用界面,使用户可以方便地进行手工录入。
关键词：	PDF 基于规则信息提取插件
文章编号：	1002-8331-(2004)02-0229-04
The Implementation of Information Extraction Subsystem in the DL

Li Guilin Li Jianzhong , Yang Yan.The Implementation of Information Extraction Subsystem in the DL[J].Computer Engineering and Applications,2004,40(2):229-232.

Authors:	Li Guilin Li Jianzhong Yang Yan

Affiliation:	Li Guilin 1 Li Jianzhong 1,2 Yang Yan 1,21

Abstract:	Information extraction from electronic documents is the foundation to build a digital library.In this article,the authors want to introduce such a system using PDF as object and it is composed of two parts:the automatic part and the semi-automatic part.The former is realized by rule-based and automata method which is fast and accurate.The lat-ter inserts plug-ins into Adobe Acrobat to make the user input information as easy as possible.

Keywords:	PDF rule-based information extraction plug-in
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏