GPA: A Microbial Genetic Polymorphisms Assignments Tool in Metagenomic Analysis by Bayesian Estimation.

Affiliation

Beijing Key Laboratory of Emerging Infectious Diseases, Institute of Infectious Diseases, Beijing Ditan Hospital, Capital Medical University, Beijing 100015, China. Electronic address: [Email]

Abstract

Identifying antimicrobial resistant (AMR) bacteria in metagenomics samples is essential for public health and food safety. Next-generation sequencing (NGS) technology has provided a powerful tool in identifying the genetic variation and constructing the correlations between genotype and phenotype in humans and other species. However, for complex bacterial samples, there lacks a powerful bioinformatic tool to identify genetic polymorphisms or copy number variations (CNVs) for given genes. Here we provide a Bayesian framework for genotype estimation for mixtures of multiple bacteria, named as Genetic Polymorphisms Assignments (GPA). Simulation results showed that GPA has reduced the false discovery rate (FDR) and mean absolute error (MAE) in CNV and single nucleotide variant (SNV) identification. This framework was validated by whole-genome sequencing and Pool-seq data from Klebsiella pneumoniae with multiple bacteria mixture models, and showed the high accuracy in the allele fraction detections of CNVs and SNVs in AMR genes between two populations. The quantitative study on the changes of AMR genes fraction between two samples showed a good consistency with the AMR pattern observed in the individual strains. Also, the framework together with the genome annotation and population comparison tools has been integrated into an application, which could provide a complete solution for AMR gene identification and quantification in unculturable clinical samples. The GPA package is available at https://github.com/IID-DTH/GPA-package.

Keywords

Bayesian model,Genetic polymorphisms,Metagenomics,Next-generation sequencing,Pool-seq,