Visualization Brings Data to Life
Driving a rover on Mars from a command room on Earth is not an easy or intuitive task. Scientists and engineers cannot observe the often-treacherous Martian environment in real time, so charting a course for a rover involves a lot of trial-and-error inputting of coordinates and a separate software program to simulate the route. This kind of cumbersome data-manipulation problem is what students participating in Caltech's Data Visualization program aim to tackle, with the goal of developing innovative software to streamline the ways in which scientists visually manipulate their data.
The 10-week summer program brings together design students and computer science students from across the country.
In 2015, students developed a visualization tool to map possible paths for the Mars Curiosity rover, using colors and graphs to represent the amount of pitching (forward and backward tilt), rolling (side-to-side tilt), and yaw (rotation around a vertical axis) that the rover would encounter along any given pathway. These data existed before, but had never been encoded in an interactive, visual way.
"These aren't static visualizations, or videos. They allow researchers to interact with their data in meaningful, intuitive ways," says Hillary Mushkin, research professor of art and design in the Division of Engineering and Applied Science and a cofounder of the program.
Mushkin and her colleagues—Santiago Lombeyda, a computational scientist at the Center for Data-Driven Discovery; Scott Davidoff, manager of JPL's Human Interfaces Group; and Maggie Hendrie, department chair of the Interaction Design program at Art Center College of Design—founded the Data Visualization program in 2011. Each year, the four receive proposals from researchers, describing the problems they have with interacting with their data. Three projects are chosen. Two design students and three computer science students are selected from a nationwide pool of applicants to work over the summer on the projects; each computer science student is responsible for a single project while the two design students are on all three projects.
"We use a human-centered design methodology," Mushkin says. "This is a very iterative process. Students observe the researchers interacting with their data and learn what their workflow is, and how it could be improved. Design students create sketches and ask the researchers to 'interact' with them by pointing, talking, shuffling, and annotating the paper, while computer science students create rough drafts of a variety of possible approaches to coding the visualization."
Assistant Professor of Biology Mitch Guttman was among the researchers who submitted problems for the Data Visualization students to grapple with. His challenge: how to visualize, in three dimensions, the complicated folding of the human genome—the complete sequence of nucleotides that makes up DNA.
"Prior to this project, the method we used to visualize the structure of the 3D genome provided no features that allowed for easy navigation and interpretation of various regions," says Noah Ollikainen, a postdoctoral scholar in the Guttman lab. As a result, he says, "it was very difficult to search for specific genes of interest and explore the structure of the genome surrounding that gene."
Over the 10-week program, students developed a software prototype—the 3D Genome Interface, or Gin for short—that enables biologists to explore genomic structures. They can zoom in and out on different genes of interest and discover relationships between these structures and features such as gene expression and protein localization. The software also allows the Guttman group to compare genome structures among different types of cells, and to color the entire structure according to gene-expression data.
"This visualization is important because it will allow biologists to understand the relationships between 3D genome structure and critical biological processes that occur in the cell nucleus, such as gene transcription and splicing," Ollikainen says. "Understanding these relationships will give us crucial insights into how cells decide which genes to turn on or off and how different cells in our bodies can exhibit a remarkable diversity of behaviors using the same genome sequence that can be folded into different 3D structures."
Such data visualization could allow biologists to, for example, compare the 3D genome structures of cancer cells and normal cells, which could lead to new insights into how cancer arises and inform the development of cancer-targeting drugs.
Most of the design and computer science students come from outside Caltech, but this past summer Caltech undergraduate Matthew Edwards was selected as the programmer for a project with Professor of Geophysics Mark Simons. His group's task was to develop interactive tools to visually represent the movement of ocean tides and inland Antarctic ice streams over varying timescales.
"I started the summer knowing very little about glaciers, data visualization, and human-centered design, and finished knowing a good deal about them all," Edwards says. "I learned how to work with the end-user to understand their visualization wants and needs, and we did this by making many prototypes and mockups, getting feedback from the researchers, and observing their workflow and typical tools used, and iterating on that process many times. I've learned that any project that involves creating something for someone else can benefit from the techniques of human-centered design."
The 10-week program often provides only enough time to generate a working prototype. Several projects, however, continue to be developed. For example, the prototype Mars rover driving software is being funded for further development as part of the Rover Planning Subsystem for the Ground Data System on Mars 2020, the next NASA rover mission to Mars.
"Sometimes, just seeing something in a new way can transform one's understanding," says Mushkin. "We hope that this program can help to catalyze that kind of transformation."