MSU-NCGR pangenomics project
Introduction
This project develops new software tools for pangenomic analysis, which is a relatively new area of genomic research that studies large numbers of genome sequences from multiple organisms to understand how organisms adapt their genomes to their environments. As the cost of DNA sequencing continues to decrease, it is now routine for multiple genomes per species to be available for analysis, giving much more information about the species. The approach makes use of a graph-based representation of a pangenome and exploits this representation to efficiently find both shared and unique regions of interest across genomes. Each individual’s genomic sequence corresponds to path in a graph data structure called a De Bruijn graph; these graphs are large and can have millions of nodes and edges. The tools being developed are based on finding frequented regions (FRs) in De Bruijn graphs; these regions are hotspots that often represent features of interest in one or more genomes.
Support provided by the National Science Foundation.
People
- Dr. Brendan Mumey (MSU, PI)
- Dr. Indika Kahanda (University of North Florida, Co-PI)
- Dr. Joann Mudge (NCGR, PI)
- Dr. Thiruvarangan Ramaraj (DePaul University, Co-PI)
- Dr. Alan Cleary (NCGR, Senior Personnel)
Resources
- FindFRs software available at https://github.com/msu-alglab/FindFRs3
- Other software available at github.com/abi-pangenomics and github.com/msu-alglab/haplotype-blocks
- Additional resources.
Publications
- Williams L., Mumey B., Extending Maximal Perfect Haplotype Blocks to the Realm of
Pangenomics. In: Algorithms for Computational Biology. AlCoB 2020. Lecture Notes in
Computer Science, vol 12099.
URL: https://link.springer.com/chapter/10.1007/978-3-030-42266-0_4 - B. Manuweera, I. Kahanda, B. Mumey, J. Mudge, T. Ramaraj, A. Cleary, Pangenome-Wide
Association Studies with Frequented Regions, MODI 2019 - Machine Learning Models for
Multi-omics Data Integration (in conjunction with ACM BCB 2019).
URL: https://dl.acm.org/doi/10.1145/3307339.3343478 - A. Cleary, T. Ramaraj, I. Kahanda, J. Mudge and B. Mumey, "Exploring Frequented Regions
in Pan-Genomic Graphs," in IEEE/ACM Transactions on Computational Biology and Bioinformatics.
doi: 10.1109/TCBB.2018.2864564
URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8430554&isnumber=4359833 - Alan Cleary, Indika Kahanda, Brendan Mumey, Joann Mudge, and Thiruvarangan Ramaraj. 2017. Exploring Frequented Regions in Pan-Genomic Graphs. In Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics (ACM-BCB '17). ACM, New York, NY, USA, 89-97. DOI: https://doi.org/10.1145/3107411.3107427 (Best Student Paper award)
- A. Cleary, T. Ramaraj, J. Mudge, B. Mumey, Approximate Frequent Subpath Mining Applied
to Pangenomics, BICoB 2017, International Conference on Bioinformatics and Computational
Biology.
URL: https://www.researchgate.net/publication/318541480_Approximate_Frequent_Subpath_Mining_Applied_to_Pangenomics