Loraine Lab Research

Genome Informatics Systems and Visualization

Data sharing, version-tracking, and integration continue pose problems in genomics research. Availability of new genomic data sets (such as tiling array and next-gen sequencing data) create the need for computer systems that can combine old and new data sets in ways that expose interesting biological features. Because this new information is genome-wide and genome-centric, it requires systems that can provide access and visualization capability on a genomic scale. New visualization approaches that move beyond standard Web-based, Web-browser-based models for data exploration and dissemination are becoming increasingly necessary [Loraine, 2002].

Our approach is to develop systems and design principles that meet the need for advanced visualization capability but which are also practical and easy to deploy. We believe that systems that work for beginning or occasional users could have tremendous positive impact on education and research.

We are developing a data repository and visualization software server for genomic data visualization using the highly-tractable and relatively small (around 10 times smaller than the human genome) Arabidopsis genome as a model. This project receives funding from the Arabidopsis 2010 program at NSF. In future we plan to support other plant species. However, our main goal is to make the software system portable so that data providers and research labs can use our software to deploy their own custom data sets.

For the data server's visualization front end, we are using the Integrated Genome Browser, an advanced desktop visualization tool from the open source Genoviz Project.

Our experience has been that being able to move easily and quickly through a genome using a zoomable graphical user interface implemented as desktop software provides a much more satisfying experience than traditional "point-click-wait" Web-based browsers. The difference is akin to watching a film versus paging through a photo album. Although it may take more time and effort to set up the video player, the end result is often more satisfying.

Screen capture of Integrated Genome Browser

IGB shows the unusual immunoglobulin-like organization of the mouse protocadherin-alpha locus, which is expressed in complex patterns in the mouse brain. White edges indicate shared boundaries among overlapping exons. We obtained the data from the UCSC table browser, removed RNAs that map to multiple genomic locations, reformatted the data and then published it in our experimental data repository.