Invited Talks
   
 
 

VisTrails: Using Provenance to Streamline Data Exploration

Juliana Freire

Department of Computer Science

University of Utah, EUA

 

Abstract

The volume of information is growing at an exponential rate. One of the greatest scientific and engineering challenges of the 21st century is to effectively understand and leverage this growing wealth of data. To analyze and understand large volumes of data, complex computational processes need to be assembled, be it to mine the data or to create insightful visualizations. Recently, workflows have emerged as a paradigm for representing and managing complex computations. Workflows can capture complex analyses processes at various levels of detail and provide the provenance information necessary for reproducibility, result publication and sharing among collaborators.

In this presentation, we will give an overview and a demo of VisTrails, a new provenance management system that provides infrastructure for data exploration and visualization through workflows. Whereas workflows have been traditionally used to automate repetitive tasks, for applications that are exploratory in nature, change is the norm. As a scientist generates and evaluates hypotheses about data under study, a series of different, albeit related, workflows are created while the computational task is adjusted in an interactive process. VisTrails was designed to manage rapidly-evolving, exploratory computational tasks. By automatically capturing detailed history information about the exploration process and explicitly maintaining the relationships among the workflows created, VisTrails not only allows results to be reproduced, but it also enables users to efficiently and effectively navigate through the space of workflows used in an exploration task (e.g., to follow chains of reasoning backward and forward). In addition, this provenance information is used to simplify the creation and maintenance of workflows; to optimize their execution; to provide scalable mechanisms for collaborative exploration of large parameter spaces in a distributed setting; and infrastructure for knowledge sharing and re-use. As an important goal of our project is to produce tools that domain scientists who are not expert programmers can use, VisTrails provides intuitive, point-and-click interfaces that allow users to interact with and query the provenance information, including the ability to visually compare different workflows and their results.

VisTrails has been released under an open-source license and can be downloaded from http://www.vistrails.org.