首页 | 官方网站   微博 | 高级检索  
     


CVA file: an index structure for high-dimensional datasets
Authors:Jiyuan An  Hanxiong Chen  Kazutaka Furuse  Nobuo Ohbo
Affiliation:(1) Doctoral Program in Engineering, University of Tsukuba, Ibaraki, Japan;(2) Institute of Information Sciences and Electronics, University of Tsukuba, Ibaraki, Japan;(3) Centre for Information Technology Innovation, Queensland University of Technology, 126 Margaret Street GPO Box 2434, Brisbane, Australia
Abstract:Similarity search is important in information-retrieval applications where objects are usually represented as vectors of high dimensionality. This paper proposes a new dimensionality-reduction technique and an indexing mechanism for high-dimensional datasets. The proposed technique reduces the dimensions for which coordinates are less than a critical value with respect to each data vector. This flexible datawise dimensionality reduction contributes to improving indexing mechanisms for high-dimensional datasets that are in skewed distributions in all coordinates. To apply the proposed technique to information retrieval, a CVA file (compact VA file), which is a revised version of the VA file is developed. By using a CVA file, the size of index files is reduced further, while the tightness of the index bounds is held maximally. The effectiveness is confirmed by synthetic and real data.
Keywords:Information retrieval  High-dimensional data  Spatial index  Local dimensionality reduction  Zipfrsquos law" target="_blank">gif" alt="rsquo" align="BASELINE" BORDER="0">s law  CVA file
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号