期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

RE-tree: an efficient index structure for regular expressions 总被引：4，自引：0，他引：4

Chan Chee-Yong Garofalakis Minos Rastogi Rajeev 《The VLDB Journal The International Journal on Very Large Data Bases》2003,12(2):102-119

Due to their expressive power, regular expressions (REs) are quickly becoming an integral part of language specifications for several important application scenarios. Many of these applications have to manage huge databases of RE specifications and need to provide an effective matching mechanism that, given an input string, quickly identifies the REs in the database that match it. In this paper, we propose the RE-tree, a novel index structure for large databases of RE specifications. Given an input query string, the RE-tree speeds up the retrieval of matching REs by focusing the search and comparing the input string with only a small fraction of REs in the database. Even though the RE-tree is similar in spirit to other tree-based structures that have been proposed for indexing multidimensional data, RE indexing is significantly more challenging since REs typically represent infinite sets of strings with no well-defined notion of spatial locality. To address these new challenges, our RE-tree index structure relies on novel measures for comparing the relative sizes of infinite regular languages. We also propose innovative solutions for the various RE-tree operations including the effective splitting of RE-tree nodes and computing a "tight" bounding RE for a collection of REs. Finally, we demonstrate how sampling-based approximation algorithms can be used to significantly speed up the performance of RE-tree operations. Preliminary experimental results with moderately large synthetic data sets indicate that the RE-tree is effective in pruning the search space and easily outperforms naive sequential search approaches.Received: 16 September 2002, Published online: 8 July 2003Edited by R. Ramakrishnan 相似文献

2.

Query reverse engineering

Quoc Trung Tran Chee-Yong Chan Srinivasan Parthasarathy 《The VLDB Journal The International Journal on Very Large Data Bases》2014,23(5):721-746

In this paper, we introduce a new problem termed query reverse engineering (QRE). Given a database \(D\) and a result table \(T\) —the output of some known or unknown query \(Q\) on \(D\) —the goal of QRE is to reverse-engineer a query \(Q'\) such that the output of query \(Q'\) on database \(D\) (denoted by \(Q'(D)\) ) is equal to \(T\) (i.e., \(Q(D)\) ). The QRE problem has useful applications in database usability, data analysis, and data security. In this work, we propose a data-driven approach, TALOS for Tree-based classifier with At Least One Semantics, that is based on a novel dynamic data classification formulation and extend the approach to efficiently support the three key dimensions of the QRE problem: whether the input query is known/unknown, supporting different query fragments, and supporting multiple database versions. 相似文献

3.

Scalable filtering of XML data for Web services

Felber P. Chee-Yong Chan Garofalakis M. Rastogi R. 《Internet Computing, IEEE》2003,7(1):49-57

As the Web gains prevalence as an application-to-application communication medium, organizations are deploying more Web service applications to provide standardized, programmatic application functionality over the Internet. The paper considers how scalable content-based routing architectures for Web applications can handle the growing number of XML messages associated with Web services. 相似文献

4.

Systematic Approach for Quick Quality Assessment of Ink Effluents for Treatment and Discharge

Chee-Yong Chua Kai-Chee Loh 《Canadian Metallurgical Quarterly》2004,130(4):417-424

A characterization study on synthetic wastewaters containing various commercially available ink-jet inks was conducted. Analysis of this resulted in the identification of seven high-risk noncompliance parameters, namely, chemical oxygen demand (COD), biochemical oxygen demand at 5 days (BOD5), total dissolved solids (TDS), phenols, copper, iron, and sulphate concentrations. Of these, COD reduction was found to be the most stringent treatment criterion based on the industry-accepted standard Fenton’s oxidation reaction for treatment. TDS and COD were also proposed as critical parameters for the initial assessment of the quality of untreated ink effluents. To provide for rapid and robust indications of the TDS and COD of the untreated ink effluents, a correlation for TDS as a function of conductivity and turbidity was obtained. Furthermore, a deterministic approach based on Beer’s law of absorbance additivity was developed for determining COD of mixtures of ink effluents using absorbance measurements at 210, 436, 525, and 620 nm. These were validated successfully against experimental data. Based on an in-depth analysis of the compositions of ink effluents, a list of simple and rapid on-site water quality parameters was proposed for monitoring the quality of the treated ink wastewater. This consisted of measurements of UV-absorbance at 210 nm, conductivity, pH, turbidity, and color. Based on the discharge limits imposed by a particular country, one can then develop a range of values for these quality parameters in order to meet the discharge regulations of that country. The Singapore context was used as a case study to illustrate this approach. In addition, it was found that the copper content in the cyan ink effluents was substantially reduced by more than 96% along with the standard Fenton’s reaction. Through the course of investigating the Fenton’s oxidation reaction, it was found that measurements of oxidation-reduction potential and/or temperature are suitable indicators of the progress of the reaction. 相似文献

5.

Sort-sharing-aware query processing

Yu Cao Ramadhana Bramandia Chee-Yong Chan Kian-Lee Tan 《The VLDB Journal The International Journal on Very Large Data Bases》2012,21(3):411-436

Many database applications require sorting a table (or relation) over multiple sort orders. Some examples include creation of multiple indices on a relation, generation of multiple reports from a table, evaluation of a complex query that involves multiple instances of a relation, and batch processing of a set of queries. In this paper, we study how to optimize multiple sortings of a table. We investigate the correlation between sort orders and exploit sort-sharing techniques of reusing the (partial) work done to sort a table on a particular order for another order. Specifically, we introduce a novel and powerful evaluation technique, called cooperative sorting, that enables sort sharing between seemingly non-related sort orders. Subsequently, given a specific set of sort orders, we determine the best combination of various sort-sharing techniques so as to minimize the total processing cost. We also develop techniques to make a traditional query optimizer extensible so that it will not miss the truly cheapest execution plan with the sort-sharing (post-) optimization turned on. We demonstrate the efficiency of our ideas with a prototype implementation in PostgreSQL and evaluate the performance using both TPC-DS benchmark and synthetic data. Our experimental results show significant performance improvement over the traditional evaluation scheme. 相似文献