首页 | 官方网站   微博 | 高级检索  
     

一种基于算术编码的文本数据压缩算法
引用本文:李英,崔艳鹏,高新波.一种基于算术编码的文本数据压缩算法[J].电子科技大学学报(自然科学版),2016,45(6):929-933.
作者姓名:李英  崔艳鹏  高新波
作者单位:1.西安电子科技大学电子工程学院 西安 710071
基金项目:国家自然科学基金61571354
摘    要:提出了一种基于算术编码的文本数据压缩算法,将扫描产生的偏移量、匹配数据长度等全局优化问题转化为局部优化问题,并从Glomb编码思路出发,推导出一种参数选择算法;对LZ77算法进行修正,提出一种预测编码方法,获得预测参数。对预测参数、偏移量、数据匹配长度、保留文本数据使用MQ算术编码器进行编码,针对不同类型数据,设计出不同的编码算法和相应的上下文算法。对算法进行仿真,并与Winzip、WinRar压缩效率进行比较,结果表明对纯文本数据、Word文档数据、C语言程序代码,图像数据等,该压缩算法优于Winzip;在纯文本数据、Word文档数据、C语言程序代码压缩方面与WinRar相当或者略好,但在图像压缩方面的性能与WinRar相比略有不足。

关 键 词:算术编码    参数优化    预测编码    文本数据压缩
收稿时间:2015-08-24

A Novel Algorithm for Text Data Compression Based on Arithmetic Codec
Affiliation:1.School of Electronic Engineering, Xidian University Xi'an 7100712.Institute for Internet Behavior, Xidian University Xi'an 710071
Abstract:A novel algorithm for text data compression is proposed based on arithmetic codec. The global parameters optimization is converted into the local parameter optimization, then Glomb code principle is used to solve the local optimization, and a parameter choice method is derived. The LZ77 scanning algorithm is improved in which a prediction code is proposed, and the prediction data is preserved. The parameters such as prediction data, offset, match data length and preserved text data are loaded into MQ codec in which the data can be compressed. To improve the compression efficiency, the corresponding compression algorithms and the context design algorithm are proposed. The proposed algorithm for text data compression is simulated and compared with Winzip and WinRAR. The results show that our compression algorithm has an advantage in compression effect over the Winzip for the data such as texts, word documents, C language program codes and images. Compared with WinRar, our algorithm achieved almost the same compression results for texts, word documents, C language program codes except images.
Keywords:
点击此处可从《电子科技大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《电子科技大学学报(自然科学版)》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号