Virus-like insertions with sequence signatures similar to those of endogenous nonretroviral RNA viruses in the human genome.

Affiliation

Kojima S(1), Yoshikawa K(2), Ito J(3), Nakagawa S(4), Parrish NF(5), Horie M(1)(6), Kawano S(7), Tomonaga K(8)(9)(10).
Author information:
(1)Laboratory of RNA Viruses, Institute for Frontier Life and Medical Sciences, Kyoto University, Kyoto 606-8507, Japan.
(2)Department of Computer and Network Engineering, Graduate School of Informatics and Engineering, The University of Electro-Communications, Tokyo 182-8585, Japan.
(3)Division of Systems Virology, Department of Infectious Disease Control, International Research Center for Infectious Diseases, Institute of Medical Science, The University of Tokyo, Tokyo 108-8639, Japan.
(4)Department of Molecular Life Science, Tokai University School of Medicine, Isehara 259-1193, Japan.
(5)Genome Immunobiology RIKEN Hakubi Research Team, RIKEN Cluster for Pioneering Research, Yokohama 230-0045, Japan.
(6)Hakubi Center for Advanced Research, Kyoto University, Kyoto 606-8507, Japan.
(7)Department of Computer and Network Engineering, Graduate School of Informatics and Engineering, The University of Electro-Communications, Tokyo 182-8585, Japan; [Email] [Email]
(8)Laboratory of RNA Viruses, Institute for Frontier Life and Medical Sciences, Kyoto University, Kyoto 606-8507, Japan; [Email] [Email]
(9)Laboratory of RNA Viruses, Graduate School of Biostudies, Kyoto University, Kyoto 606-8507, Japan.
(10)Department of Molecular Virology, Graduate School of Medicine, Kyoto University, Kyoto 606-8507, Japan.

Abstract

Understanding the genetics and taxonomy of ancient viruses will give us great insights into not only the origin and evolution of viruses but also how viral infections played roles in our evolution. Endogenous viruses are remnants of ancient viral infections and are thought to retain the genetic characteristics of viruses from ancient times. In this study, we used machine learning of endogenous RNA virus sequence signatures to identify viruses in the human genome that have not been detected or are already extinct. Here, we show that the k-mer occurrence of ancient RNA viral sequences remains similar to that of extant RNA viral sequences and can be differentiated from that of other human genome sequences. Furthermore, using this characteristic, we screened RNA viral insertions in the human reference genome and found virus-like insertions with phylogenetic and evolutionary features indicative of an exogenous origin but lacking homology to previously identified sequences. Our analysis indicates that animal genomes still contain unknown virus-derived sequences and provides a glimpse into the diversity of the ancient virosphere.