首页 | 官方网站   微博 | 高级检索  
     


New adaptive compressors for natural language text
Authors:N R Brisaboa  A Fariña  G Navarro  J R Parama
Affiliation:1. Database Laboratory, Department of Computer Science, University of A Coru?a, Campus de Elvi?a s/n, 15071, A Coru?a, Spain;2. Center for Web Research, Department of Computer Science, University of Chile, Blanco Encalada 2120, Santiago, Chile
Abstract:Semistatic byte‐oriented word‐based compression codes have been shown to be an attractive alternative to compress natural language text databases, because of the combination of speed, effectiveness, and direct searchability they offer. In particular, our recently proposed family of dense compression codes has been shown to be superior to the more traditional byte‐oriented word‐based Huffman codes in most aspects. In this paper, we focus on the problem of transmitting texts among peers that do not share the vocabulary. This is the typical scenario for adaptive compression methods. We design adaptive variants of our semistatic dense codes, showing that they are much simpler and faster than dynamic Huffman codes and reach almost the same compression effectiveness. We show that our variants have a very compelling trade‐off between compression/decompression speed, compression ratio, and search speed compared with most of the state‐of‐the‐art general compressors. Copyright © 2008 John Wiley & Sons, Ltd.
Keywords:text databases  natural language text compression  dynamic compression  searching compressed text
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号