期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Designing Templates for Mining Association Rules 总被引：3，自引：0，他引：3

Elena Baralis Giuseppe Psaila 《Journal of Intelligent Information Systems》1997,9(1):7-32

Current approaches to data mining usually address specific userrequests, while no general design criteria for the extraction of associationrules are available for the end-user. In this paper, we propose aclassification of association rule types, which provides a general frameworkfor the design of association rule mining applications. Based on theidentified association rule types, we introduce predefined templates as ameans to capture the user specification of mining applications. Furthermore,we propose a general language to design templates for the extraction ofarbitrary association rule types. 相似文献

2.

NetCluster: A clustering-based framework to analyze internet passive measurements data

Elena Baralis Andrea Bianco Tania Cerquitelli Luca Chiaraviglio Marco Mellia 《Computer Networks》2013,57(17):3300-3315

Internet measured data collected via passive measurement are analyzed to obtain localization information on nodes by clustering (i.e., grouping together) nodes that exhibit similar network path properties. Since traditional clustering algorithms fail to correctly identify clusters of homogeneous nodes, we propose the NetCluster novel framework, suited to analyze Internet measurement datasets. We show that the proposed framework correctly analyzes synthetically generated traces. Finally, we apply it to real traces collected at the access link of Politecnico di Torino campus LAN and discuss the network characteristics as seen at the vantage point. 相似文献

3.

Support environment for active rule design

Elena Baralis Stefano Ceri Piero Fraternali Stefano Paraboschi 《Journal of Intelligent Information Systems》1996,7(2):129-149

The lack of tools for rule generation, analysis, and run-time monitoring appears one of the main obstacles to the widespreading of active database applications. This paper describes a complete tool environment for assisting the design of active rules applications; the tools were developed at Politecnico di Milano in the context of the IDEA Project, a 4-years Esprit project sponsored by the European Commission which was launched in June 1992. We describe tools for active rule generation, analysis, debugging, and browsing; rules are defined in Chimera, a conceptual design model and language for the specification of active rules applications. We also introduce a tool for mapping from Chimera into Oracle, a relational product supporting triggers.Most of the tools described in this paper are fully implemented and currently in operation (beta-testing) within the companies participating to the IDEA Project, with the exception of two of them (called Argonaut-V and Pandora), which will be completed by the end of 1996.Research presented in this paper is supported by Esprit project P6333 IDEA, and by ENEL contract VDS 1/94: Integrity Constraint Management 相似文献

4.

I‐prune: Item selection for associative classification

Elena Baralis Paolo Garza 《国际智能系统杂志》2012,27(3):279-299

Associative classification is characterized by accurate models and high model generation time. Most time is spent in extracting and postprocessing a large set of irrelevant rules, which are eventually pruned. We propose I‐prune, an item‐pruning approach that selects uninteresting items by means of an interestingness measure and prunes them as soon as they are detected. Thus, the number of extracted rules is reduced and model generation time decreases correspondingly. A wide set of experiments on real and synthetic data sets has been performed to evaluate I‐prune and select the appropriate interestingness measure. The experimental results show that I‐prune allows a significant reduction in model generation time, while increasing (or at worst preserving) model accuracy. Experimental evaluation also points to the chi‐square measure as the most effective interestingness measure for item pruning. © 2012 Wiley Periodicals, Inc. 相似文献

5.

Discovering cross-topic collaborations among researchers by exploiting weighted association rules

Cagliero Luca Garza Paolo Kavoosifar Mohammad Reza Baralis Elena 《Scientometrics》2018,116(2):1273-1301

Identifying the most relevant scientific publications on a given topic is a well-known research problem. The Author-Topic Model (ATM) is a generative model that represents the relationships between research topics and publication authors. It allows us to identify the most influential authors on a particular topic. However, since most research works are co-authored by many researchers the information provided by ATM can be complemented by the study of the most fruitful collaborations among multiple authors. This paper addresses the discovery of research collaborations among multiple authors on single or multiple topics. Specifically, it exploits an exploratory data mining technique, i.e., weighted association rule mining, to analyze publication data and to discover correlations between ATM topics and combinations of authors. The mined rules characterize groups of researchers with fairly high scientific productivity by indicating (1) the research topics covered by their most cited publications and the relevance of their scientific production separately for each topic, (2) the nature of the collaboration (topic-specific or cross-topic), (3) the name of the external authors who have (occasionally) collaborated with the group either on a specific topic or on multiple topics, and (4) the underlying correlations between the addressed topics. The applicability of the proposed approach was validated on real data acquired from the Online Mendelian Inheritance in Man catalog of genetic disorders and from the PubMed digital library. The results confirm the effectiveness of the proposed strategy.

相似文献

6.

Energy-saving models for wireless sensor networks

Daniele Apiletti Elena Baralis Tania Cerquitelli 《Knowledge and Information Systems》2011,28(3):615-644

Nowadays, wireless sensor networks are being used for a fast-growing number of different application fields (e.g., habitat monitoring, highway traffic monitoring, remote surveillance). Monitoring (i.e., querying) the sensor network entails the frequent acquisition of measurements from all sensors. Since sensor data acquisition and communication are the main sources of power consumption and sensors are battery-powered, an important issue in this context is energy saving during data collection. Hence, the challenge is to extend sensor lifetime by reducing communication cost and computation energy. This paper thoroughly describes the complete design, implementation and validation of the SeReNe framework. Given historical sensor readings, SeReNe discovers energy-saving models to efficiently acquire sensor network data. SeReNe exploits different clustering algorithms to discover spatial and temporal correlations which allow the identification of sets of correlated sensors and sensor data streams. Given clusters of correlated sensors, a subset of representative sensors is selected. Rather than directly querying all network nodes, only the representative sensors are queried by reducing the communication, computation and power costs. Experiments performed on both a real sensor network deployed at the Politecnico di Torino labs and a publicly available dataset from Intel Berkeley Research lab demonstrate the adaptability and the effectiveness of the SeReNe framework in providing energy-saving sensor network models. 相似文献

7.

IMine: Index Support for Item Set Mining

Baralis Elena Cerquitelli Tania Chiusano Silvia 《Knowledge and Data Engineering, IEEE Transactions on》2009,21(4):493-506

This paper presents the IMine index, a general and compact structure which provides tight integration of itemset extraction in a relational DBMS. Since no constraint is enforced during the index creation phase, IMine provides a complete representation of the original database. To reduce the I/O cost, data accessed together during the same extraction phase are clustered on the same disk block. The IMine index structure can be efficiently exploited by different itemset extraction algorithms. In particular, IMine data access methods currently support the FP-growth and LCM v.2 algorithms, but they can straightforwardly support the enforcement of various constraint categories. The IMine index has been integrated into the PostgreSQL DBMS and exploits its physical level access methods. Experiments, run for both sparse and dense data distributions, show the efficiency of the proposed index and its linear scalability also for large datasets. Itemset mining supported by the IMine index shows performance always comparable with, and sometimes better than, state of the art algorithms accessing data on flat file. 相似文献

8.

Multi-document summarization based on the Yago ontology

Elena Baralis Luca Cagliero Saima Jabeen Alessandro Fiori Sajid Shah 《Expert systems with applications》2013,40(17):6976-6984

Sentence-based multi-document summarization is the task of generating a succinct summary of a document collection, which consists of the most salient document sentences. In recent years, the increasing availability of semantics-based models (e.g., ontologies and taxonomies) has prompted researchers to investigate their usefulness for improving summarizer performance. However, semantics-based document analysis is often applied as a preprocessing step, rather than integrating the discovered knowledge into the summarization process.This paper proposes a novel summarizer, namely Yago-based Summarizer, that relies on an ontology-based evaluation and selection of the document sentences. To capture the actual meaning and context of the document sentences and generate sound document summaries, an established entity recognition and disambiguation step based on the Yago ontology is integrated into the summarization process.The experimental results, which were achieved on the DUC’04 benchmark collections, demonstrate the effectiveness of the proposed approach compared to a large number of competitors as well as the qualitative soundness of the generated summaries. 相似文献

9.

Analysis of diabetic patients through their examination history

Dario Antonelli Elena Baralis Giulia Bruno Tania Cerquitelli Silvia Chiusano Naeem Mahoto 《Expert systems with applications》2013,40(11):4672-4678

The analysis of medical data is a challenging task for health care systems since a huge amount of interesting knowledge can be automatically mined to effectively support both physicians and health care organizations. This paper proposes a data analysis framework based on a multiple-level clustering technique to identify the examination pathways commonly followed by patients with a given disease. This knowledge can support health care organizations in evaluating the medical treatments usually adopted, and thus the incurred costs. The proposed multiple-level strategy allows clustering patient examination datasets with a variable distribution. To measure the relevance of specific examinations for a given disease complication, patient examination data has been represented in the Vector Space Model using the TF-IDF method. As a case study, the proposed approach has been applied to the diabetic care scenario. The experimental validation, performed on a real collection of diabetic patients, demonstrates the effectiveness of the approach in identifying groups of patients with a similar examination history and increasing severity in diabetes complications. 相似文献

10.

Constrained itemset mining on a sequence of incoming data blocks

Elena Baralis Tania Cerquitelli Silvia Chiusano 《国际智能系统杂志》2010,25(5):389-410

Many real‐life databases are updated by means of incoming business information. In these databases (e.g., transactional data from large retail chains, call‐detail records), the content evolves through periodical insertions (or deletions) of data blocks. Since data evolve over time, algorithms have to be devised to incrementally update data mining models. This paper presents a novel index, called I‐Forest, to support itemset mining on incoming data blocks, where new blocks are inserted periodically, or old blocks are discarded. The I‐Forest structure provides a complete data representation and allows different kind of analyses (e.g., investigate quarterly data), besides supporting user‐defined time and support constraints. The I‐Forest index has been implemented into the PostgreSQL open source DBMS and exploits its physical level access methods. Experiments, run for both sparse and dense data distributions, show the effectiveness of the I‐Forest‐based approach to perform itemset mining with both time and support constraints. The execution time of the I‐Forest‐based itemset mining technique is often faster than the Prefix‐Tree algorithm accessing static data on flat files. © 2010 Wiley Periodicals, Inc. 相似文献