Application of computational approaches to analyze metagenomic data.


Gwak HJ(#)(1), Lee SJ(#)(1), Rho M(2)(3).
Author information:
(1)Department of Computer Science and Engineering, Hanyang University, Seoul, 04763, Republic of Korea.
(2)Department of Computer Science and Engineering, Hanyang University, Seoul, 04763, Republic of Korea. [Email]
(3)Department of Biomedical Informatics, Hanyang University, Seoul, 04763, Republic of Korea. [Email]
(#)Contributed equally


Microorganisms play a vital role in living systems in numerous ways. In the soil or ocean environment, microbes are involved in diverse processes, such as carbon and nitrogen cycle, nutrient recycling, and energy acquisition. The relation between microbial dysbiosis and disease developments has been extensively studied. In particular, microbial communities in the human gut are associated with the pathophysiology of several chronic diseases such as inflammatory bowel disease and diabetes. Therefore, analyzing the distribution of microorganisms and their associations with the environment is a key step in understanding nature. With the advent of next-generation sequencing technology, a vast amount of metagenomic data on unculturable microbes in addition to culturable microbes has been produced. To reconstruct microbial genomes, several assembly algorithms have been developed by incorporating metagenomic features, such as uneven depth. Since it is difficult to reconstruct complete microbial genomes from metagenomic reads, contig binning approaches were suggested to collect contigs that originate from the same genome. To estimate the microbial composition in the environment, various methods have been developed to classify individual reads or contigs and profile bacterial proportions. Since microbial communities affect their hosts and environments through metabolites, metabolic profiles from metagenomic or metatranscriptomic data have been estimated. Here, we provide a comprehensive review of computational methods that can be applied to investigate microbiomes using metagenomic and metatranscriptomic sequencing data. The limitations of metagenomic studies and the key approaches to overcome such problems are discussed.