This paper concerns the following problem: given a set of multi-attribute records, a fixed number of buckets and a two-disk system, arrange the records into the buckets and then store the buckets between the disks in such a way that, over all possible orthogonal range queries (ORQs), the disk access concurrency is maximized. We shall adopt the multiple key hashing (MKH) method for arranging records into buckets and use the disk modulo (DM) allocation method for storing buckets onto disks. Since the DM allocation method has been shown to be superior to any other allocation methods for allocating an MKH file onto a two-disk system for answering ORQs, the real issue is knowing how to determine an optimal way for organizing the records into buckets based upon the MKH concept.
A performance formula that can be used to evaluate the average response time, over all possible ORQs, of an MKH file in a two-disk system using the DM allocation method is first presented. Based upon this formula, it is shown that our design problem is related to a notoriously difficult problem, namely the Prime Number Problem. Then a performance lower bound and an efficient algorithm for designing optimal MKH files in certain cases are presented. It is pointed out that in some cases the optimal MKH file for ORQs in a two-disk system using the DM allocation method is identical to the optimal MKH file for ORQs in a single-disk system and the optimal average response time in a two-disk system is slightly greater than one half of that in a single-disk system. 相似文献
Consider the problem of efficiently simulating the shared-memory parallel random access machine (PRAM) model on massively parallel architectures with physically distributed memory. To prevent network congestion and memory bank contention, it may be advantageous to hash the shared memory address space. The decision on whether or not to use hashing depends on (1) the communication latency in the network and (2) the locality of memory accesses in the algorithm.We relate this decision directly to algorithmic issues by studying the complexity of hashing in the Block PRAM model of Aggarwal, Chandra, and Snir, a shared-memory model of parallel computation which accounts for communication locality. For this model, we exhibit a universal family of hash functions having optimal locality. The complexity of applying these hash functions to the shared address space of the Block PRAM (i.e., by permuting data elements) is asymptotically equivalent to the complexity of performing a square matrix transpose, and this result is best possible for all pairwise independent universal hash families. These complexity bounds provide theoretical evidence that hashing and randomized routing need not destroy communication locality, addressing an open question of Valiant.This work was started when the author was a student at Oxford University, supported by a National Science Foundation Graduate Fellowship and a Rhodes Scholarship. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the author and do not necessarily reflect the views of the National Science Foundation or the Rhodes Trust. 相似文献
Multimedia-based hashing is considered an important technique for achieving authentication and copy detection in digital contents. However, 3D model hashing has not been as widely used as image or video hashing. In this study, we develop a robust 3D mesh-model hashing scheme based on a heat kernel signature (HKS) that can describe a multi-scale shape curve and is robust against isometric modifications. We further discuss the robustness, uniqueness, security, and spaciousness of the method for 3D model hashing. In the proposed hashing scheme, we calculate the local and global HKS coefficients of vertices through time scales and 2D cell coefficients by clustering HKS coefficients with variable bin sizes based on an estimated L2 risk function, and generate the binary hash through binarization of the intermediate hash values by combining the cell values and the random values. In addition, we use two parameters, bin center points and cell amplitudes, which are obtained through an iterative refinement process, to improve the robustness, uniqueness, security, and spaciousness further, and combine them in a hash with a key. By evaluating the robustness, uniqueness, and spaciousness experimentally, and through a security analysis based on the differential entropy, we verify that our hashing scheme outperforms conventional hashing schemes. 相似文献
The problem of efficiently finding similar items in a large corpus of high-dimensional data points arises in many real-world
tasks, such as music, image, and video retrieval. Beyond the scaling difficulties that arise with lookups in large data sets,
the complexity in these domains is exacerbated by an imprecise definition of similarity. In this paper, we describe a method
to learn a similarity function from only weakly labeled positive examples. Once learned, this similarity function is used
as the basis of a hash function to severely constrain the number of points considered for each lookup. Tested on a large real-world
audio dataset, only a tiny fraction of the points (~0.27%) are ever considered for each lookup. To increase efficiency, no
comparisons in the original high-dimensional space of points are required. The performance far surpasses, in terms of both
efficiency and accuracy, a state-of-the-art Locality-Sensitive-Hashing-based (LSH) technique for the same problem and data
set. 相似文献
The sign languages used by deaf communities around the world represent a linguistic challenge that natural-language researchers in AI have only recently begun to take up. This challenge is particularly relevant to research in Machine Translation (MT), as natural sign languages have evolved in deaf communities into efficient modes of gestural communication, which differ from English not only in modality but in grammatical structure, exploiting a higher dimensionality of spatial expression. In this paper we describe Zardoz, an on-going AI research system that tackles the cross-modal MT problem, translating English text into fluid sign language. The paper presents an architectural overview of Zardoz, describing its central blackboard organization, the nature of its interlingual representation, and the major components which interact through this blackboard both to analyze the verbal input and generate the corresponding gestural output in one of a number of sign variants. 相似文献