首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 609 毫秒
1.
Translational research in Life-Science nowadays leverages e-Science platforms to analyze and produce huge amounts of data. With the unprecedented growth of Life-Science data repositories, identifying relevant data for analysis becomes increasingly difficult. The instrumentation of e-Science platforms with provenance tracking techniques provides useful information from a data analysis process design or debugging perspective. However raw provenance traces are too massive and too generic to facilitate the scientific interpretation of data. In this paper, we propose an integrated approach in which Life-Science knowledge is (i) captured through domain ontologies and linked to Life-Science data analysis tools, and (ii) propagated through rules to produced data, in order to constitute human-tractable experiment summaries. Our approach has been implemented in the Virtual Imaging Platform (VIP) and experimental results show the feasibility of producing few domain-specific statements which opens new data sharing and repurposing opportunities in line with Linked Data initiatives.  相似文献   

2.
Fast, massive, and viral data diffused on social media affects a large share of the online population, and thus, the (prospective) information diffusion mechanisms behind it are of great interest to researchers. The (retrospective) provenance of such data is equally important because it contributes to the understanding of the relevance and trustworthiness of the information. Furthermore, computing provenance in a timely way is crucial for particular use cases and practitioners, such as online journalists that promptly need to assess specific pieces of information. Social media currently provide insufficient mechanisms for provenance tracking, publication and generation, while state-of-the-art on social media research focuses mainly on explicit diffusion mechanisms (like retweets in Twitter or reshares in Facebook).The implicit diffusion mechanisms remain understudied due to the difficulties of being captured and properly understood. From a technical side, the state of the art for provenance reconstruction evaluates small datasets after the fact, sidestepping requirements for scale and speed of current social media data. In this paper, we investigate the mechanisms of implicit information diffusion by computing its fine-grained provenance. We prove that explicit mechanisms are insufficient to capture influence and our analysis unravels a significant part of implicit interactions and influence in social media. Our approach works incrementally and can be scaled up to cover a truly Web-scale scenario like major events. We can process datasets consisting of up to several millions of messages on a single machine at rates that cover bursty behaviour, without compromising result quality. By doing that, we provide to online journalists and social media users in general, fine grained provenance reconstruction which sheds lights on implicit interactions not captured by social media providers. These results are provided in an online fashion which also allows for fast relevance and trustworthiness assessment.  相似文献   

3.
4.
The ability to generate crew pairings quickly is essential to solving the airline crew scheduling problem. Although techniques for doing so are well-established, they are also highly customized and require significant implementation efforts. This greatly impedes researchers studying important problems such as robust planning, integrated planning, and automated recovery, all of which also require the generating of crew pairings. As an alternative, we present an integer programming (IP) approach to generating crew pairings, which can be solved via traditional methods such as branch-and-bound using off-the-shelf commercial solvers. This greatly facilitates the prototyping and testing of new research ideas. In addition, we suggest that our modeling approach, which uses both connection variables and marker variables to capture the non-linear cost function and constraints of the crew scheduling problem, can be applicable in other scheduling contexts as well. Computational results using data from a major US hub-and-spoke carrier demonstrate the performance of our approach.  相似文献   

5.
Scientific workflows are a popular mechanism for specifying and automating data-driven in silico experiments. A significant aspect of their value lies in their potential to be reused. Once shared, workflows become useful building blocks that can be combined or modified for developing new experiments. However, previous studies have shown that storing workflow specifications alone is not sufficient to ensure that they can be successfully reused, without being able to understand what the workflows aim to achieve or to re-enact them. To gain an understanding of the workflow, and how it may be used and repurposed for their needs, scientists require access to additional resources such as annotations describing the workflow, datasets used and produced by the workflow, and provenance traces recording workflow executions.In this article, we present a novel approach to the preservation of scientific workflows through the application of research objects—aggregations of data and metadata that enrich the workflow specifications. Our approach is realised as a suite of ontologies that support the creation of workflow-centric research objects. Their design was guided by requirements elicited from previous empirical analyses of workflow decay and repair. The ontologies developed make use of and extend existing well known ontologies, namely the Object Reuse and Exchange (ORE) vocabulary, the Annotation Ontology (AO) and the W3C PROV ontology (PROVO). We illustrate the application of the ontologies for building Workflow Research Objects with a case-study that investigates Huntington’s disease, performed in collaboration with a team from the Leiden University Medial Centre (HG-LUMC). Finally we present a number of tools developed for creating and managing workflow-centric research objects.  相似文献   

6.
7.
Our understanding of distributed decision making in professional teams and their performance comes in part from studies in which researchers gather and process information about the communications and actions of teams. In many cases, the data sets available for analysis are large, unwieldy and require methods for exploratory and dynamic management of data. In this paper, we report the results of interviewing eight researchers on their work process when conducting such analyses and their use of support tools in this process. Our aim with the study was to gain an understanding of their workflow when studying distributed decision making in teams, and specifically how automated pattern extraction tools could be of use in their work. Based on an analysis of the interviews, we elicited three issues of concern related to the use of support tools in analysis: focusing on a subset of data to study, drawing conclusions from data and understanding tool limitations. Together, these three issues point to two observations regarding tool use that are of specific relevance to the design of intelligent support tools based on pattern extraction: open-endedness and transparency.  相似文献   

8.
The research area of Multimedia Content Analysis (MMCA) considers all aspects of the automated extraction of knowledge from multimedia archives and data streams. To satisfy the increasing computational demands of emerging MMCA problems, there is an urgent need to apply High Performance Computing (HPC) techniques. As most MMCA researchers are not also HPC experts, however, there is a demand for programming models and tools that are both efficient and easy to use. Existing user transparent parallelization tools generally use a data parallel approach in which data structures (e.g. video frames) are scattered among the available nodes in a compute cluster. For certain MMCA applications a data parallel approach induces intensive communication, however, which significantly decreases performance. In these situations, we can benefit from applying alternative approaches.We present Pyxis-DT, a user transparent parallel programming model for MMCA applications that employs both data and task parallelism. Hybrid parallel execution is obtained by run-time construction and execution of a task graph consisting of strictly defined building block operations. Results show that for realistic MMCA applications the concurrent use of data and task parallelism can significantly improve performance compared to using either approach in isolation. Extensions for GPU clusters are also presented.  相似文献   

9.
10.
Data provenance refers to the knowledge about data sources and operations carried out to obtain some piece of data. A provenance-enabled system maintains record of the interoperation of processes across different modules, stages and authorities to capture the full lineage of the resulting data, and typically allows data-focused audits using semantic technologies, such as ontologies, that capture domain knowledge. However, regulating access to captured provenance data is a non-trivial problem, since execution records form complex, overlapping graphs with individual nodes possibly being subject to different access policies. Applying traditional access control to provenance queries can either hide from the user the entire graph with nodes that had access to them denied, reveal too much information, or return a semantically invalid graph. An alternative approach is to answer queries with a new graph that abstracts over the missing nodes and fragments. In this paper, we present TACLP, an access control language for provenance data that supports this approach, together with an algorithm that transforms graphs according to sets of access restrictions. The algorithm produces safe and valid provenance graphs that retain the maximum amount of information allowed by the security model. The approach is demonstrated on an example of restricting access to a clinical trial provenance trace.  相似文献   

11.
The traditional approach for estimating the performance of numerical methods is to combine an operation's count with an asymptotic error analysis. This analytic approach gives a general feel of the comparative efficiency of methods, but it rarely leads to very precise results. It is now recognized that accurate performance evaluation can be made only with actual measurements on working software. Given that such an approach requires an enormous amount of performance data related to actual measurements, the development of novel approaches and systems that intelligently and efficiently analyze these data is of great importance to scientists and engineers. The paper presents intelligent knowledge acquisition approaches and an integrated prototype system, which enables the automatic and systematic analysis of performance data. The system analyzes the performance data which is usually stored in a database with statistical, and inductive learning techniques and generates knowledge which can be incorporated in a knowledge base incrementally. We demonstrate the use of the system in the context of a case study, covering the analysis of numerical algorithms for the pricing of American vanilla options in a Black and Scholes modeling framework. We also present a qualitative and quantitative comparison of two techniques used for the automated knowledge acquisition phase  相似文献   

12.
Provenance refers to the entire amount of information, comprising all the elements and their relationships, that contribute to the existence of a piece of data. The knowledge of provenance data allows a great number of benefits such as verifying a product, result reproductivity, sharing and reuse of knowledge, or assessing data quality and validity. With such tangible benefits, it is no wonder that in recent years, research on provenance has grown exponentially, and has been applied to a wide range of different scientific disciplines. Some years ago, managing and recording provenance information were performed manually. Given the huge volume of information available nowadays, the manual performance of such tasks is no longer an option. The problem of systematically performing tasks such as the understanding, capture and management of provenance has gained significant attention by the research community and industry over the past decades. As a consequence, there has been a huge amount of contributions and proposed provenance systems as solutions for performing such kinds of tasks. The overall objective of this paper is to plot the landscape of published systems in the field of provenance, with two main purposes. First, we seek to evaluate the desired characteristics that provenance systems are expected to have. Second, we aim at identifying a set of representative systems (both early and recent use) to be exhaustively analyzed according to such characteristics. In particular, we have performed a systematic literature review of studies, identifying a comprehensive set of 105 relevant resources in all. The results show that there are common aspects or characteristics of provenance systems thoroughly renowned throughout the literature on the topic. Based on these results, we have defined a six-dimensional taxonomy of provenance characteristics attending to: general aspects, data capture, data access, subject, storage, and non-functional aspects. Additionally, the study has found that there are 25 most referenced provenance systems within the provenance context. This study exhaustively analyzes and compares such systems attending to our taxonomy and pinpoints future directions.  相似文献   

13.
The primary goal of visual data exploration tools is to enable the discovery of new insights. To justify and reproduce insights, the discovery process needs to be documented and communicated. A common approach to documenting and presenting findings is to capture visualizations as images or videos. Images, however, are insufficient for telling the story of a visual discovery, as they lack full provenance information and context. Videos are difficult to produce and edit, particularly due to the non‐linear nature of the exploratory process. Most importantly, however, neither approach provides the opportunity to return to any point in the exploration in order to review the state of the visualization in detail or to conduct additional analyses. In this paper we present CLUE (Capture, Label, Understand, Explain), a model that tightly integrates data exploration and presentation of discoveries. Based on provenance data captured during the exploration process, users can extract key steps, add annotations, and author “Vistories”, visual stories based on the history of the exploration. These Vistories can be shared for others to view, but also to retrace and extend the original analysis. We discuss how the CLUE approach can be integrated into visualization tools and provide a prototype implementation. Finally, we demonstrate the general applicability of the model in two usage scenarios: a Gapminder‐inspired visualization to explore public health data and an example from molecular biology that illustrates how Vistories could be used in scientific journals.  相似文献   

14.
Scientific communities are under increasing pressure from funding organizations to publish their raw data, in addition to their traditional publications, in open archives. Many scientists would be willing to do this if they had tools that streamlined the process and exposed simple provenance information, i.e., enough to explain the methodology and validate the results without compromising the author’s intellectual property or competitive advantage. This paper presents Provenance Explorer, a tool that enables the provenance trail associated with a scientific discovery process to be visualized and explored through a graphical user interface (GUI). Based on RDF graphs, it displays the sequence of data, states and events associated with a scientific workflow, illustrating the methodology that led to the published results. The GUI also allows permitted users to expand selected links between nodes to reveal more fine-grained information and sub-workflows. But more importantly, the system enables scientists to selectively construct “scientific publication packages” by choosing particular nodes from the visual provenance trail and dragging-and-dropping them into an RDF package which can be uploaded to an archive or repository for publication or e-learning. The provenance relationships between the individual components in the package are automatically inferred using a rules-based inferencing engine.  相似文献   

15.
Cryptographic algorithms play a key role in computer security and the formal analysis of their robustness is of utmost importance. Yet, logic and automated reasoning tools are seldom used in the analysis of a cipher, and thus one cannot often get the desired formal assurance that the cipher is free from unwanted properties that may weaken its strength.In this paper, we claim that one can feasibly encode the low-level properties of state-of-the-art cryptographic algorithms as SAT problems and then use efficient automated theorem-proving systems and SAT-solvers for reasoning about them. We call this approach logical cryptanalysis.In this framework, for instance, finding a model for a formula encoding an algorithm is equivalent to finding a key with a cryptanalytic attack. Other important properties, such as cipher integrity or algebraic closure, can also be captured as SAT problems or as quantified boolean formulae. SAT benchmarks based on the encoding of cryptographic algorithms can be used to effectively combine features of real-world problems and randomly generated problems.Here we present a case study on the U.S. Data Encryption Standard (DES) and show how to obtain a manageable encoding of its properties.We have also tested three SAT provers, TABLEAU by Crawford and Auton, SATO by Zhang, and rel-SAT by Bayardo and Schrag, on the encoding of DES, and we discuss the reasons behind their different performance.A discussion of open problems and future research concludes the paper.  相似文献   

16.
Since feature models for realistic product families may be quite complicated, the automated analysis of feature models is desirable. Although several approaches reported in the literature address this issue, complex cross-tree relationships involving attributes in extended feature models have not been handled. In this article, we introduce a mapping from extended feature models to constraint logic programming over finite domains. This mapping is used to translate into constraint logic programs; basic, cardinality-based and extended feature models, which can include complex cross-tree relationships involving attributes. This translation enables the use of off-the-shelf constraint solvers for the automated analysis of extended feature models involving such complex relationships. We also present the performance results of some well-known analysis operations on an example translated model.  相似文献   

17.
Historically, software development methodologies have focused more on improving tools for system development than on developing tools that assist with system composition and integration. Component-based middleware like Enterprise Java-Beans (EJB), Microsoft .NET, and the CORBA Component Model (CCM) have helped improve software reusability through component abstraction. However, as developers have adopted these commercial off-the-shelf technologies, a wide gap has emerged between the availability and sophistication of standard software development tools like compilers and debuggers, and the tools that developers use to compose, analyze, and test a complete system or system of systems. As a result, developers continue to accomplish system integration using ad hoc methods without the support of automated tools. Model-driven development is an emerging paradigm that solves numerous problems associated with the composition and integration of large-scale systems while leveraging advances in software development technologies such as component-based middleware. MDD elevates software development to a higher level of abstraction than is possible with third-generation programming languages.  相似文献   

18.
ContextContinuous Integration (CI) has become an established best practice of modern software development. Its philosophy of regularly integrating the changes of individual developers with the master code base saves the entire development team from descending into Integration Hell, a term coined in the field of extreme programming. In practice, CI is supported by automated tools to cope with this repeated integration of source code through automated builds and testing. One of the main problems, however, is that relevant information about the quality and health of a software system is both scattered across those tools and across multiple views.ObjectiveThis paper introduces a quality awareness framework for CI-data and its conceptional model used for the data integration and visualization. The framework called SQA-Mashup makes use of the service-based mashup paradigm and integrates information from the entire CI-toolchain into a single service.MethodThe research approach followed in our work consists out of (i) a conceptional model for data integration and visualization, (ii) a prototypical framework implementation based on tool requirements derived from literature, and (iii) a controlled user study to evaluate its usefulness.ResultsThe results of the controlled user study showed that SQA-Mashup’s single point of access allows users to answer questions regarding the state of a system more quickly (57%) and accurately (21.6%) than with standalone CI-tools.ConclusionsThe SQA-Mashup framework can serve as one-stop shop for software quality data monitoring in a software development project. It enables easy access to CI-data which otherwise is not integrated but scattered across multiple CI-tools. Our dynamic visualization approach allows for a tailoring of integrated CI-data according to information needs of different stakeholders such as developers or testers.  相似文献   

19.
Hospital information systems designed to support the needs of health care professionals include patient data entered using both freetext and precoded storage schemes. A major disadvantage of freetext storage schemes is that data captured in this format can only be presentedas isto the user for review tasks. In the view of many health care scientists, natural language understanding systems capable of identifying, extracting, and encoding information contained in freetext data may provide the necessary tools to overcome this weakness. This paper describes the development and evaluation of a such a system designed to encode freetext admission diagnoses. This system combines both semantic and syntactic linguistic analysis techniques. Evaluation results demonstrate the overall performance of this system to be reasonable, accurately encoding approximately 76% of admission diagnoses. Inefficiencies are primarily due to the inability of this system to generate encodings in roughly 15% of test cases. When encodings are produced, however, accuracy equals that of the current manual coding method. With further modification, this application can partially automate the coding process.  相似文献   

20.
The Search Space Toolkit (SST) is a suite of tools for investigating the properties of the continuous search spaces which arise in designing complex engineering artifacts whose evaluation requires significant computation by a numerical simulator. SST has been developed as part of NDA, a computational environment for (semi-)automated design of jet engine exhaust nozzles for supersonic aircraft which resulted from a collaboration between computer scientists at Rutgers University and design engineers at General Electric and Lockheed. Though the design spaces for this sort of engineering artifact are mainly continuous, they typically include features such as unevaluable points, multiple local optima, and large derivatives which cause difficulties for standard numerical optimization methods. The search spaces which SST explores also differ significantly from the discrete search spaces that typically arise in artificial intelligence research, and properly searching such spaces requires a synergistic combination of numerical methods and AI techniques and is a fundamental Al research area. By promoting the design space to be a first class entity, rather than a “black box” buried in the interface between an (unconstrained) optimizer and a simulator, SST allows a more principled approach to automated design.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号