Evaluation of a concatenated protein phylogeny for classification of tailed double-stranded DNA viruses belonging to the order Caudovirales.


Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Queensland, Australia. [Email]


Viruses of bacteria and archaea are important players in global carbon cycling as well as drivers of host evolution, yet the taxonomic classification of viruses remains a challenge due to their genetic diversity and absence of universally conserved genes. Traditional classification approaches employ a combination of phenotypic and genetic information which is no longer scalable in the era of bulk viral genome recovery through metagenomics. Here, we evaluate a phylogenetic approach for the classification of tailed double-stranded DNA viruses from the order Caudovirales by inferring a phylogeny from the concatenation of 77 single-copy protein markers using a maximum-likelihood method. Our approach is largely consistent with the International Committee on Taxonomy of Viruses, with 72 and 89% congruence at the subfamily and genus levels, respectively. Discrepancies could be attributed to misclassifications and a small number of highly mosaic genera confounding the phylogenetic signal. We also show that confidently resolved nodes in the concatenated protein tree are highly reproducible across different software and models, and conclude that the approach can serve as a framework for a rank-normalized taxonomy of most tailed double-stranded DNA viruses.