MSU-NCGR pangenomics project

Introduction

This project develops new software tools for pangenomic analysis, which is a relatively new area of genomic research that studies large numbers of genome sequences from multiple organisms to understand how organisms adapt their genomes to their environments. As the cost of DNA sequencing continues to decrease, it is now routine for multiple genomes per species to be available for analysis, giving much more information about the species. The approach makes use of a graph-based representation of a pangenome and exploits this representation to efficiently find both shared and unique regions of interest across genomes. Each individual’s genomic sequence corresponds to path in a graph data structure called a De Bruijn graph; these graphs are large and can have millions of nodes and edges. The tools being developed are based on finding frequented regions (FRs) in De Bruijn graphs; these regions are hotspots that often represent features of interest in one or more genomes.

Support provided by the National Science Foundation.

People

Dr. Brendan Mumey (MSU, PI)
Dr. Indika Kahanda (University of North Florida, Co-PI)
Dr. Joann Mudge (NCGR, PI)
Dr. Thiruvarangan Ramaraj (DePaul University, Co-PI)
Dr. Alan Cleary (NCGR, Senior Personnel)

Resources

FindFRs software available at https://github.com/msu-alglab/FindFRs3
Other software available at github.com/abi-pangenomics and github.com/msu-alglab/haplotype-blocks
Additional resources.

Publications

Williams L., Mumey B., Extending Maximal Perfect Haplotype Blocks to the Realm of Pangenomics. In: Algorithms for Computational Biology. AlCoB 2020. Lecture Notes in Computer Science, vol 12099.
URL: https://link.springer.com/chapter/10.1007/978-3-030-42266-0_4
B. Manuweera, I. Kahanda, B. Mumey, J. Mudge, T. Ramaraj, A. Cleary, Pangenome-Wide Association Studies with Frequented Regions, MODI 2019 - Machine Learning Models for Multi-omics Data Integration (in conjunction with ACM BCB 2019).
URL: https://dl.acm.org/doi/10.1145/3307339.3343478
A. Cleary, T. Ramaraj, I. Kahanda, J. Mudge and B. Mumey, "Exploring Frequented Regions in Pan-Genomic Graphs," in IEEE/ACM Transactions on Computational Biology and Bioinformatics.
doi: 10.1109/TCBB.2018.2864564
URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8430554&isnumber=4359833
Alan Cleary, Indika Kahanda, Brendan Mumey, Joann Mudge, and Thiruvarangan Ramaraj. 2017. Exploring Frequented Regions in Pan-Genomic Graphs. In Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics (ACM-BCB '17). ACM, New York, NY, USA, 89-97. DOI: https://doi.org/10.1145/3107411.3107427 (Best Student Paper award)
A. Cleary, T. Ramaraj, J. Mudge, B. Mumey, Approximate Frequent Subpath Mining Applied to Pangenomics, BICoB 2017, International Conference on Bioinformatics and Computational Biology.
URL: https://www.researchgate.net/publication/318541480_Approximate_Frequent_Subpath_Mining_Applied_to_Pangenomics

More Information

Resources

Follow Us

MSU-NCGR pangenomics project

Gianforte School of Computing

More Information

Resources

Follow Us