Senckenberg Deutsches Entomologisches Institut, Eberswalder Straße 90, 15374 Müncheberg, Germany; Department of Zoology, Institute of Ecology and Earth Sciences, University of Tartu, Vanemuise 46, 51014 Tartu, Estonia. Electronic address: [Email]
In several sawfly taxa strong mitonuclear discordance has been observed, with nuclear genes supporting species assignments based on morphology, whereas the barcode region of the mitochondrial COI gene suggests different relationships. As previous studies were based on only a few nuclear genes, the causes and the degree of mitonuclear discordance remain ambiguous. Here, we obtained genomic-scale ddRAD data together with Sanger sequences of mitochondrial COI and two to three nuclear protein coding genes to investigate species limits and mitonuclear discordance in two closely related species groups of the sawfly genus Empria. As found previously based on nuclear ITS and mitochondrial COI sequences, species are in most cases supported as monophyletic based on new nuclear data reported here, but not based on mitochondrial COI. This mitonuclear discordance can be explained by occasional mitochondrial introgression with little or no nuclear gene flow, a pattern that might be common in haplodiploid taxa with slowly evolving mitochondrial genomes. Some species in the E. immersa group are not recovered as monophyletic according to either mitochondrial or nuclear data, but this could partly be because of unresolved taxonomy. Preliminary analyses of ddRAD data did not recover monophyly of E. japonica within the E. longicornis group (three Sanger sequenced nuclear genes strongly supported monophyly), but closer examination of the data and additional Sanger sequencing suggested that both specimens were substantially (possibly 10-20% of recovered loci) cross-contaminated. A reason could be specimen identification tag jumps during sequencing library preparation that in previous studies have been shown to affect up to 2.5% of the sequenced reads. We provide an R script to examine patterns of identical loci among the specimens and estimate that the cross-contamination rate is not unusually high for our ddRAD dataset as a whole (based on counting of identical sequences in the immersa and longicornis groups, which are well separated from each other and probably do not hybridise). The high rate of cross-contamination for both E. japonica specimens might be explained by the small number of recovered loci (~1000) compared to most other specimens (>10 000 in some cases) because of poor sequencing results. We caution against drawing unexpected biological conclusions when closely related specimens are pooled before sequencing and tagged only at one end of the molecule or at both ends using a unique combination of limited number of tags (less than the number of specimens).