|
VisTrails: Using Provenance to Streamline Data Exploration
Juliana Freire
Department of Computer Science
University of Utah, EUA
Abstract
The
volume of information is growing at an exponential rate. One of
the greatest scientific and engineering challenges of the 21st
century is to effectively understand and leverage this growing
wealth of data. To analyze and understand large volumes of data,
complex computational processes need to be assembled, be it to
mine the data or to create insightful visualizations. Recently,
workflows have emerged as a paradigm for representing and managing
complex computations. Workflows can capture complex analyses processes
at various levels of detail and provide the provenance information
necessary for reproducibility, result publication and sharing among
collaborators.
In
this presentation, we will give an overview and a demo of VisTrails,
a new provenance management system that provides infrastructure
for data exploration and visualization through workflows. Whereas workflows have been traditionally
used to automate repetitive tasks, for applications that are exploratory
in nature, change is the norm. As a scientist generates and evaluates
hypotheses about data under study, a series of different, albeit
related, workflows are created while the computational task is
adjusted in an interactive process. VisTrails was designed to manage rapidly-evolving, exploratory
computational tasks. By
automatically capturing detailed history information about the
exploration process and explicitly maintaining the relationships
among the workflows created, VisTrails not
only allows results to be reproduced, but it also enables users
to efficiently and effectively navigate through the space of workflows
used in an exploration task (e.g., to follow chains of reasoning
backward and forward). In addition, this provenance information
is used to simplify the creation and maintenance of workflows;
to optimize their execution; to provide scalable mechanisms for
collaborative exploration of large parameter spaces in a distributed
setting; and infrastructure for knowledge sharing and re-use. As
an important goal of our project is to produce tools that domain
scientists who are not expert programmers can use, VisTrails provides
intuitive, point-and-click interfaces that allow users to interact
with and query the provenance information, including the ability
to visually compare different workflows and their results.
VisTrails has been released under an open-source license and can be downloaded
from http://www.vistrails.org.
|
|