期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

An annotation management system for relational databases

Deepavali Bhagwat Laura Chiticariu Wang-Chiew Tan Gaurav Vijayvargiya 《The VLDB Journal The International Journal on Very Large Data Bases》2005,14(4):373-396

We present an annotation management system for relational databases. In this system, every piece of data in a relation is assumed to have zero or more annotations associated with it and annotations are propagated along, from the source to the output, as data is being transformed through a query. Such an annotation management system could be used for understanding the provenance (aka lineage) of data, who has seen or edited a piece of data or the quality of data, which are useful functionalities for applications that deal with integration of scientific and biological data. We present an extension, pSQL, of a fragment of SQL that has three different types of annotation propagation schemes, each useful for different purposes. The default scheme propagates annotations according to where data is copied from. The default-all scheme propagates annotations according to where data is copied from among all equivalent formulations of a given query. The custom scheme allows a user to specify how annotations should propagate. We present a storage scheme for the annotations and describe algorithms for translating a pSQL query under each propagation scheme into one or more SQL queries that would correctly retrieve the relevant annotations according to the specified propagation scheme. For the default-all scheme, we also show how we generate finitely many queries that can simulate the annotation propagation behavior of the set of all equivalent queries, which is possibly infinite. The algorithms are implemented and the feasibility of the system is demonstrated by a set of experiments that we have conducted. 相似文献

2.

Integrated Video and Text for Content-based Access to Video Databases

Jiang Haitao Montesi Danilo Elmagarmid Ahmed K. 《Multimedia Tools and Applications》1999,9(3):227-249

This paper introduces a new approach to realize video databases. The approach consists of a VideoText data model based on free text annotations associated with logical video segments and a corresponding query language. Traditional database techniques are inadequate for exploiting queries on unstructured data such as video, supporting temporal queries, and ranking query results according to their relevance to the query. In this paper, we propose to use information retrieval techniques to provide such features and to extend the query language to accommodate interval queries that are particularly suited to video data. Algorithms are provided to show how user queries are evaluated. Finally, a generic and modular video database architecture which is based on VideoText data model is described. 相似文献

3.

Using a distributed quadtree index in peer-to-peer networks 总被引：6，自引：0，他引：6

Egemen Tanin Aaron Harwood Hanan Samet 《The VLDB Journal The International Journal on Very Large Data Bases》2007,16(2):165-178

Peer-to-peer (P2P) networks have become a powerful means for online data exchange. Currently, users are primarily utilizing these networks to perform exact-match queries and retrieve complete files. However, future more data intensive applications, such as P2P auction networks, P2P job-search networks, P2P multiplayer games, will require the capability to respond to more complex queries such as range queries involving numerous data types including those that have a spatial component. In this paper, a distributed quadtree index that adapts the MX-CIF quadtree is described that enables more powerful accesses to data in P2P networks. This index has been implemented for various prototype P2P applications and results of experiments are presented. Our index is easy to use, scalable, and exhibits good load-balancing properties. Similar indices can be constructed for various multidimensional data types with both spatial and non-spatial components. 相似文献

4.

DAS下一种基于生成检测查询的数据有效性验证方法

闫巧芝王洁萍杜小勇《软件学报》2009,20(Z1):154-164

在外包数据库服务(database-as-a-service,简称DAS)中,数据拥有者将数据外包给第三方:服务提供商(database service provider,简称DSP).与传统的DBMS相比,DAS通过提供基于Web的数据访问来节省数据库管理开销.为了保证DSP的服务质量,之前大部分工作关注于对数据隐秘性和数据有效性的研究.目前验证数据有效性的方法均要求DSP提供额外信息或储存额外数据,且每次更新都需要验证数据做相应调整,这在实际部署中是很低效的.为此提出了一种基于生成检测查询的数据有效性验证方法:通过用户发出过的多个查询生成检测查询,客户端根据检测查询的执行结果,并利用多个查询与检测查询的关系高效、有效地完成基于概率的有效性验证.通过实验验证了该方法的可行性. 相似文献

5.

Optimizing ranking method using social annotations based on language model

Kunmei Wen Ruixuan Li Jing Xia Xiwu Gu 《Artificial Intelligence Review》2014,41(1):81-96

Recent research has shown that more and more web users utilize social annotations to manage and organize their interested resources. Therefore, with the growing popularity of social annotations, it is becoming more and more important to utilize such social annotations to achieve effective web search. However, using a statistical model, there are no previous studies that examine the relationships between queries and social annotations. Motivated by this observation, we use social annotations to re-rank search results. We intend to optimize retrieval ranking method by using the ranking strategy of integrating the query-annotation similarity into query-document similarity. Specifically, we calculate the query-annotation similarity by using a statistical language model, which in a shorter form we call simply a language model. Then the initial search results are re-ranked according to the computational weighted score of the query-document similarity score and the query-annotation similarity score. Experimental results show that the proposed method can improve the NDCG score by 8.13%. We further conduct an empirical evaluation of the method by using a query set including about 300 popular social annotations and constructed phrases. More generally, the optimized results with social annotations based on a language model can be of significant benefit to web search. 相似文献

6.

Aggregated 2D range queries on clustered points

《Information Systems》2016

Efficient processing of aggregated range queries on two-dimensional grids is a common requirement in information retrieval and data mining systems, for example in Geographic Information Systems and OLAP cubes. We introduce a technique to represent grids supporting aggregated range queries that requires little space when the data points in the grid are clustered, which is common in practice. We show how this general technique can be used to support two important types of aggregated queries, which are ranked range queries and counting range queries. Our experimental evaluation shows that this technique can speed up aggregated queries up to more than an order of magnitude, with a small space overhead. 相似文献

7.

Containment of Conjunctive Queries on Annotated Relations

Todd J. Green 《Theory of Computing Systems》2011,49(2):429-459

We study containment and equivalence of (unions of) conjunctive queries on relations annotated with elements of a commutative semiring. Such relations and the semantics of positive relational queries on them were introduced in a recent paper as a generalization of set semantics, bag semantics, incomplete databases, and databases annotated with various kinds of provenance information. We obtain positive decidability results and complexity characterizations for databases with lineage, why-provenance, and provenance polynomial annotations, for both conjunctive queries and unions of conjunctive queries. At least one of these results is surprising given that provenance polynomial annotations seem “more expressive” than bag semantics and under the latter, containment of unions of conjunctive queries is known to be undecidable. The decision procedures rely on interesting variations on the notion of containment mappings. We also show that for any positive semiring (a very large class) and conjunctive queries without self-joins, equivalence is the same as isomorphism. 相似文献

8.

应用分布式索引提高海量数据查询性能 总被引：1，自引：0，他引：1

窦晓峰陈胜王熠航麦联叨由建宏《计算机系统应用》2014,23(6):259-261

在电信领域的精准化营销、即席查询业务中,存在着大量针对一张宽表或几张宽表（超过50字段）的随机查询场景. 传统处理模式（直接查询数据库）在数据量不大（〈;1000万）时,查询响应时间可优化到几秒至数十秒级,而当数据量到达几千万、上亿甚至十亿记录以上时,此处理模式无论如何优化或更改索引机制,都无法满足秒级并发查询要求.新的处理模式通过引入分布式Solr索引层解决上述问题.索引层预先对数据库记录建立索引,查询不再作用于数据库而直接查询索引层,如此,可大幅提高查询性能.经过对两种处理模式的对比验证,在相同环境下,数据量到达5000万,每秒20并发访问的宽表查询场景,传统处理模式的查询全部超时失败,而使用分布式索引层的查询可以在2秒以内返回,查询全部成功. 相似文献

9.

CMS实验元数据发现的数据聚集系统

梁栋臧冬松霍菁孙功星 Valentin Kuznetsov 《计算机工程》2014,(4):57-63,70

在大型强子对撞机上的紧凑缪子螺线管探测器实验,具有数据量大（PB级规模）、数据类型复杂与数据地理上全球分布的特点。记录上述数据的元数据达到TB级的规模,并且以不同的格式保存在不同的关系和非关系数据源中,通过在这些异构数据源上添加一个缓存层的方法,实现一个提供精确的关键词查询的数据聚集系统。根据多重映射和聚集的方式支持用户的查询,并利用有效的缓存管理策略来提升查询的命中率。实验结果表明,该系统能够通过缓存的方式响应超过70％的用户查询,具有良好的查询性能。相似文献

10.

一种基于P2P的移动对象查询框架

李盛白张岩高宏《计算机研究与发展》2009,46(Z2)

近年来,时空数据查询方法的研究成为人们普遍关注的研究热点.但大部分研究主要集中在集中式环境,在分布式环境下对海量时空数据进行高效的轨迹查询和窗口查询是一件十分有意义且具有挑战性的工作.设计了一种基于P2P的解决方案,提出了对移动对象运动空间进行双层划分的方法来同时支持两种查询.应用网格过滤技术有效地解决了数据频繁更新的问题.对运动空间进行高效的划分,具有比空间填充曲线方法更好的负载平衡性,同时设计了高效的Overlay--SmartChord来支持窗口查询.实验结果表明,和现有方案相比所提方案可以有效减少更新通信量,负载平衡性和路由效率有显著提高. 相似文献

11.

基于聚类的非清洁数据库的聚集查询处理算法 总被引：1，自引：0，他引：1

姜国华王宏志李建中高宏《计算机研究与发展》2009,46(Z2)

现实数据库中的不完整数据、不一致数据、重复数据等非清洁数据为数据库的有效使用带来了影响,从包含非清洁数据的数据库中得到满足清洁度要求的统计分析结果,为数据库研究带来了新的挑战,聚集查询是统计分析的基础.面向非清洁数据,提出了有清洁度保证的聚集查询处理算法,用于处理包含group by子句的聚集查询.考虑到在非清洁数据中,同一个元组可能属于不同的分组,提出的方法是利用可重叠聚类的方法将数据库中的元组加以分组,从而得到考虑数据非清洁性的分组,以及基于这些分组计算得到的聚集结果及其以概率表达的清洁度.提出的方法适用于多种聚集函数以及包含选择条件的聚集查询.通过实验验证了方法的效率. 相似文献

12.

MRST——-An Efficient Monitoring Technology of Summarization on Stream Data

下载免费PDF全文

Xiao-Bo Fan Ting-Ting Xie Cui-Ping Li and Hong Chen 《计算机科学技术学报》2007,22(2):190-196

Monitoring on data streams is an efficient method of acquiring the characters of data stream. However the available resources for each data stream are limited, so the problem of how to use the limited resources to process infinite data stream is an open challenging problem. In this paper, we adopt the wavelet and sliding window methods to design a multi-resolution summarization data structure, the Multi-Resolution Summarization Tree （MRST） which can be updated incrementally with the incoming data and can support point queries, range queries, multi-point queries and keep the precision of queries. We use both synthetic data and real-world data to evaluate our algorithm. The results of experiment indicate that the efficiency of query and the adaptability of MRST have exceeded the current algorithm, at the same time the realization of it is simpler than others. 相似文献

13.

n-of-N 数据流模型上高效概率Skyline 计算

杨永滔王意洁《软件学报》2012,23(3):550-564

研究概率数据流上的q-skyline计算问题.与只支持滑动窗口数据流模型的已有方法相比,所提出的方法能够支持更为通用的n-of-N数据流模型.采用将q-skyline查询转换为区间树上刺入查询的方法支持n-of-N数据流模型.提出PnNM算法维护支持n-of-N数据流模型所需的相关数据结构,高效处理了不确定对象候选集合更新和区间更新等维护工作;提出PnNCont算法实现连续查询处理.理论分析和实验结果表明,算法能够有效地支持概率数据流n-of-N模型上的q-skyline查询处理. 相似文献

14.

Improving large-scale search engines with semantic annotations

Damaris Fuentes-Lorenzo Norberto Fernández Jesús A. Fisteus Luis Sánchez 《Expert systems with applications》2013,40(6):2287-2296

Traditional search engines have become the most useful tools to search the World Wide Web. Even though they are good for certain search tasks, they may be less effective for others, such as satisfying ambiguous or synonym queries. In this paper, we propose an algorithm that, with the help of Wikipedia and collaborative semantic annotations, improves the quality of web search engines in the ranking of returned results. Our work is supported by (1) the logs generated after query searching, (2) semantic annotations of queries and (3) semantic annotations of web pages. The algorithm makes use of this information to elaborate an appropriate ranking. To validate our approach we have implemented a system that can apply the algorithm to a particular search engine. Evaluation results show that the number of relevant web resources obtained after executing a query with the algorithm is higher than the one obtained without it. 相似文献

15.

Labeling sensing data for mobility modeling

《Information Systems》2016

In urban environments, sensory data can be used to create personalized models for predicting efficient routes and schedules on a daily basis; and also at the city level to manage and plan more efficient transport, and schedule maintenance and events. Raw sensory data is typically collected as time-stamped sequences of records, with additional activity annotations by a human, but in machine learning, predictive models view data as labeled instances, and depend upon reliable labels for learning. In real-world sensor applications, human annotations are inherently sparse and noisy. This paper presents a methodology for preprocessing sensory data for predictive modeling in particular with respect to creating reliable labeled instances. We analyze real-world scenarios and the specific problems they entail, and experiment with different approaches, showing that a relatively simple framework can ensure quality labeled data for supervised learning. We conclude the study with recommendations to practitioners and a discussion of future challenges. 相似文献

16.

Relational completeness of query languages for annotated databases

Floris Geerts Jan Van den Bussche 《Journal of Computer and System Sciences》2011,77(3):491-504

Annotated relational databases can be queried either by simply making the annotations explicitly available along the ordinary data, or by adapting the standard query operators so that they have an implicit effect also on the annotations. We compare the expressive power of these two approaches. As a formal model for the implicit approach we propose the color algebra, an adaptation of the relational algebra to deal with the annotations. We show that the color algebra is relationally complete: it is equivalent to the relational algebra on the explicit annotations. Our result extends a similar completeness result established for the query algebra of the MONDRIAN annotation system, from unions of conjunctive queries to the full relational algebra. We also show that the color algebra is nonredundant: no operator can be expressed in terms of the other operators. We also present a generalization of the color algebra that is relationally complete in the presence of built-in predicates on the annotations. 相似文献

17.

Relevance measures for the creation of groups in an annotation system

《Journal of Visual Languages and Computing》2014,25(6):695-702

The MADCOW annotation system supports a notion of group, facilitating focused annotations with respect to a domain. In previous work, we adopted ontologies to represent knowledge about domains, thus allowing more refined annotations to a group, and discussed how the use of ontologies facilitates the formulation of semantically significant queries for retrieving annotations on specific topics. We now expand on previous results and study two new types of measures to identify matches between users׳ interests and groups: Degree Centrality, developed for social networks to assess the quality of concepts in an ontology, and URL concordance, indicating the similarity of interests among users who annotate the same pages. 相似文献

18.

基于逻辑事务处理的普适计算数据同步方案

下载免费PDF全文

谯石《计算机工程》2009,35(15):55-57

提出逻辑事务处理的概念,使用参数化的数据库查询语句解决以服务器为中心的系统的子集化问题,设计一个可以应用于数据子集的稳定可升级的方案。测试结果显示,与基于传统事务处理的数据同步方案相比,该方案的异步属性使之更适合于服务器数据库和客户端数据库网络连接较弱的情况。相似文献

19.

Approximate range–sum query answering on data cubes with probabilistic guarantees

Alfredo Cuzzocrea Wei Wang 《Journal of Intelligent Information Systems》2007,28(2):161-197

Approximate range aggregate queries are one of the most frequent and useful kinds of queries for Decision Support Systems (DSS), as they are widely used in many data analysis tasks. Traditionally, sampling-based techniques have been proposed to tackle this problem. However, their effectiveness degrade when the underlying data distribution is skewed. Another approach based on the outlier management can limit the effect of data skews but fails to address other requirements of approximate range aggregate queries, such as error guarantees and query processing efficiency. In this paper, we present a technique that provides approximate answers to range aggregate queries on OLAP data cubes efficiently, with theoretical guarantees on the errors. Our basic idea is to build different data structures to manage outliers and the rest of the data. Carefully chosen outliers are organized in a quad-tree based indexing data structure to provide efficient access for query processing. A query-workload adaptive, tree-like synopsis data structure, called T unable P artition-Tree (TP-Tree), is proposed to organize samples extracted from non-outlier data. Our experiments clearly demonstrate the merits of our technique, by comparing with previous well-known techniques. 相似文献

20.

数据挖掘系统中语义缓存机制的设计与实现

葛鹏程李建中张兆功《计算机工程与应用》2005,41(31):160-163,218

越来越多的人开始研究如何能够快速高效地采用数据挖掘的方法获取有用的知识,然而面对大量数据特别是海量数据时,采用现有的挖掘算法有时会需要相当长的执行时间,而且如果挖掘请求相似,将会出现重复计算问题。论文在挖掘过程中引入一种称为语义cache的机制,在对挖掘请求进行分析并和已有挖掘结果进行比较的基础上,除去某些可以避免的计算,提取出用户所需的所有知识,以此来达到减少计算量,提高系统响应速度的目的。理论分析和实验结果证明了该机制的有效性。相似文献