PDF文档中JPEG图像的自动提取技术研究 Automatically Extracting Images of JPEG Format from PDF Documents期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

PDF文档中JPEG图像的自动提取技术研究

引用本文：	陈云榕,刘立柱,叶晗. PDF文档中JPEG图像的自动提取技术研究[J]. 信息工程大学学报, 2007, 8(2): 213-216

作者姓名：	陈云榕刘立柱叶晗

作者单位：	信息工程大学,信息工程学院,河南,郑州,450002

摘要：	对PDF文档的处理常常要涉及到文本和图像的提取。文章在深入分析PDF格式和其中采用的各种压缩算法的基础上,实现了PDF文档中JPEG图像的自动提取。算法结合PDF文档结构和页面树,按照图像在页面中出现的顺序,对各页面中含有的图像对象进行准确定位,依次提取其中的图像压缩数据,并根据采用压缩算法的不同,对压缩数据进行筛选,最后得到保存为JPEG格式的图像。算法可以很好地应用于由任何途径生成的各种类型的PDF文档。
关键词：	PDF文档 JPEG图像文档结构页面树压缩算法
文章编号：	1671-0673（2007）02-0213-04
修稿时间：	2006-11-142007-03-29
Automatically Extracting Images of JPEG Format from PDF Documents

CHEN Yun-rong,LIU Li-zhu,YE Han. Automatically Extracting Images of JPEG Format from PDF Documents[J]. , 2007, 8(2): 213-216

Authors:	CHEN Yun-rong LIU Li-zhu YE Han

Affiliation:	Institute of Information Engineering, Information Engineering University, Zhengzhou 450002, China

Abstract:	Dealing with PDF Documents sometimes involves text and image extraction.This article presents a method of extracting JPEG images from PDF documents automatically after analyzing the PDF format and the compression algorithm used in PDF files.On the basis of document structure and page tree,the compressed image data can be exactly positioned and then be extracted in the order of pages.The compressed data are selected according to the DCT compression algorithm,and saved in the JPEG format.The method can be applied in all sorts of PDF files.

Keywords:	PDF document JPEG images document structure page tree compression algorithm
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《信息工程大学学报》浏览原始摘要信息
	点击此处可从《信息工程大学学报》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏