Introduction

This project develops new software tools for pangenomic analysis, which is a relatively new area of genomic research that studies large numbers of genome sequences from multiple organisms to understand how organisms adapt their genomes to their environments.  As the cost of DNA sequencing continues to decrease, it is now routine for multiple genomes per species to be available for analysis, giving much more information about the species. The approach makes use of a graph-based representation of a pangenome and exploits this representation to efficiently find both shared and unique regions of interest across genomes. Each individual’s genomic sequence corresponds to path in a graph data structure called a De Bruijn graph; these graphs are large and can have millions of nodes and edges.  The tools being developed are based on finding frequented regions (FRs) in De Bruijn graphs; these regions are hotspots that often represent features of interest in one or more genomes.

Support provided by the National Science Foundation.

 

People

 

Resources

Publications