Abstract
Tortricidae is one of the largest families in Lepidoptera, including subfamilies of Tortricinae, Olethreutinae, and Chlidanotinae. Here, we assembled the gap-free genome for the subfamily Chlidanotinae using Illumina, Nanopore, and Hi-C sequencing from Polylopha cassiicola, a pest of camphor trees in southern China. The nuclear genome is 302.03 Mb in size, with 36.82% of repeats and 98.4% of BUCSO completeness. The karyotype is 2n = 44 for males. We identified 15412 protein-coding genes, 1052 tRNAs, and 67 rRNAs. We also determined the mitochondrial genome of this species and annotated 13 protein-coding genes, 22 tRNAs, and one rRNA. These high-quality genomes provide valuable information for studying phylogeny, karyotypic evolution, and adaptive evolution of tortricid moths.
Similar content being viewed by others
Background & Summary
Tortricidae, the leafroller moths, is one of the largest families of Lepidoptera (butterflies and moths)1, including numerous notorious economic pests such as the spruce budworm, Choristoneura fumiferana2, oriental fruit moth Grapholita molesta3 and codling moth, Cydia pomonella4. The two main subfamilies are Tortricinae and Olethreutinae, which are relatively young5, comprising over 95% of tortricid species. Genomes of many species in these two subfamilies have been determined6, revealing an ancestral sex chromosome-autosome fusion and two subsequent autosome fusions relative to the ancestral karyotype of Lepidoptera7. Compared to the two successful subfamilies, the relict subfamily Chlidanotinae is much more limited in distribution range, host range, species richness, and population size. Species of this subfamily are mainly distributed in tropical regions, indicating varied climatic adaptability compared to species of the other subfamilies. Thus, this group can provide valuable insights into the phylogeny and pest adaptation and evolution of Tortricidae. However, no genome has been assembled for species of Chlidanotinae.
Here, we present the first chromosome-level genome assembly and annotation in the Olethreutinae using high-coverage long-read and Hi-C sequencing from Polylopha cassiicola8. This species is mainly distributed in the southern coastal regions of China and Southeast Asia. It is a pest of trees Cinnamomum cassia and C. camphora. We also assembled the mitochondrial genome of this species from the Illumina short sequencing reads. These genomes are expected to provide information for understanding the phylogeny, karyotypic, and adaptive evolution of Tortricidae.
Methods
Sample collection and sequencing
P. cassiicola larvae were collected from the tops of C. camphora in Guangxi, China. The larvae were reared in the laboratory to pupae and adults for genomic and transcriptome sequencing. Three individuals were used for three types of genome sequencing: one male pupae for Nanopore long-read sequencing, one male pupae for Illumina short-read sequencing, and one female adult for Hi-C sequencing. In addition, four larvae were used for RNA sequencing. Nucleic acid extraction and sequencing libraries was contracted by BerryGenomic (Beijing, China). Methods for nucleic acid extraction, platforms for sequencing, and sequencing outputs are provided in Table 1.
Genome assembly
The Nanopore long reads were assembled into 76 contigs using NextDenovo 2.5.2 (https://github.com/Nextomics/NextDenovo) with parameters: “read_cutoff = 4k, genome_ size = 400 m, nextgraph_options = -a 1”. Redundant sequences in contigs were removed using Purge_dups9. The cleaned contigs containing 65 sequences were then assembled to chromosome-level using Hi-C information. In this analysis, we mapped the Hi-C reads to cleaned contigs using BWA10 with options: “mem -SP5”, anchored contigs using YaHS 1.2a.111 with option: “-e GATC”, and manually adjusted using Juicerbox 1.22.0112. We removed the contigs that did not have any contact information with the chromosomes, which could be from potential contamination, such as symbiotic microbes. At last, the chromosomal-level genomic sequences were subjected to two rounds of long-read polishing and two rounds of short-read polishing using Nextpolish 1.4.113. The obtained P. cassiicola genome is 302.03 Mb in size and contains 21 autosomes and one Z sex-chromosome (Fig. 1a).
We also assembled mitochondrial genome using MitoZ 3.614 based on the short-reads. In the mitochondrial genome, we identified 13 protein-coding genes, 22 tRNAs, and 1 rRNA (Fig. 2).
Genome synteny
We analysed the chromosomal synteny between P. cassiicola and three other species from Tortricidae and one from Sesiidae: Choristoneura fumiferana (Tortricidae: Tortricinae)2, Grapholita molesta (Tortricidae: Olethreutinae)3, Tortricodes alternella (Tortricidae: Tortricinae; NCBI GenBank assembly: GCA_947859335.115), and Sesia bembeciformis (Sesiidae: Sesiinae)16. Synteny analysis was conducted using MSCANX pipeline in JCVI utility libraries17. We assigned names of the ancestral linkage group in Lepidoptera6 (Merian elements, M1-31 and MZ) based on chromosomal homology. The results show different patterns of chromosomal fusion in species T. alternella and P. cassiicola (Fig. 1b).
Repeat element and non-coding RNA annotation
Repeat elements were detected using RepeatMasker 4.1.518 with options “-no_is -norna -xsmall -q”. This analysis was conducted against three databases: Repbase (http://www.girinst.org), Dfam database1 specific to Arthropoda, and a species-specific repeat library constructed using RepeatModeler219. Transfer RNA (tRNA) was predicted by tRNAscanSE 2.0.1220 with default parameters, and ribosome RNA (rRNA) was predicted using Barrnap 0.9 (https://github.com/tseemann/barrnap). In the P. cassiicola genome, 36.82% of bases were annotated as repeat elements (Table 2). We identified 67 rRNAs, and 1052 tRNAs (Table 2).
Gene prediction and functional annotation
Gene structure was predicted using an ab initio method, Helixer21, with options: “–subsequence-length 320760–batch-size 6”, and with a pre-trained model for invertebrate “invertebrate_v0.3_m_0200”. Gene function, Gene Ontology (GO), and Kyoto Encyclopedia of Genes and Genomes (KEGG) items for predicted genes were annotated using eggNOG-Mapper22 web tools, against the eggNOG Database 5. A total of 15412 protein-coding genes were predicted, in which 12671 genes were functionally annotated.
Data Records
The Nanopore reads, Illumina reads, Hi-C reads, and RNA reads for P. cassiicola genome assembly were deposited at NCBI under Sequence Read Archive under accession number SRP47975923. The nuclear and mitochondrial genome assemblies were deposited in Genbank under accession number GCA_038024825.124. The genome annotation files are available in Figshare25 at https://doi.org/10.6084/m9.figshare.24902046.
Technical Validation
To validate the accuracy of the final genome assembly, we mapped the Illumina short reads and Nanopore long reads to the P. cassiicola genome using Minimap226 with option “-ax sr” for short reads and option “-ax map-ont” for long reads. The mapping rates for the short reads and long reads were calculated using Samtools27. Analysis revealed 96.38% and 98.73% mapping rates for the short and long reads, respectively. We examined the coverage of short reads along the mitochondrial genome and showed 100% coverage (Fig. 1b).
Completeness of the assembly and gene prediction were evaluated using BUSCO 5.4.728 with lepidoptera_odb10 database. In this analysis, BUSCO examined the states and proportions of 5,286 single-copy orthologous of Lepidoptera in our genome assembly: single-copy (S), duplication (D), fragment (F), and missing (M). The analyses showed completeness ranging 95.1%–98.4% for each assembly stage (Table 3), and 97.8% for predicted gene set: “C: 97.8% [S: 97.2%, D: 0.6%], F: 0.9%, M: 1.3%”. Quality of gene prediction was manually evaluated using RNA-seq data. Specifically, RNA-seq reads were mapped to the genome using Hisat29 and Samtools27. We imported the obtained BAM file and annotation file into the IGV browser30. Based on manual examination, we found that the machine learning-based annotation method has predicted a near-complete gene structure. These results indicate that we have obtained a high-quality assembly and annotation for P. cassiicola genome.
Code availability
No custom scripts or code were used in this study.
References
van der Geest, L. P. S. & Evenhuis, H. H. Tortricid Pests: Their Biology, Natural Enemies, and Control. vol. 5 (Elsevier, 1991).
Béliveau, C. et al. The spruce budworm genome: reconstructing the evolutionary history of antifreeze proteins. Genome Biol. Evol. 14, evac087 (2022).
Cao, L.-J. et al. Population genomic signatures of the oriental fruit moth related to the Pleistocene climates. Communciations Biol. 5, 142 (2022).
Wan, F. et al. A chromosome-level genome assembly of Cydia pomonella provides insights into chemical ecology and insecticide resistance. Nat. Commun. 10, 4237 (2019).
Fagua, G., Condamine, F. L., Horak, M., Zwick, A. & Sperling, F. A. H. Diversification shifts in leafroller moths linked to continental colonization and the rise of angiosperms. Cladistics 33, 449–466 (2017).
Wright, C. J., Stevens, L., Mackintosh, A., Lawniczak, M. & Blaxter, M. Comparative genomics reveals the dynamics of chromosome evolution in Lepidoptera. Nat. Ecol. Evol. 1–14 https://doi.org/10.1038/s41559-024-02329-4 (2024).
Šíchová, J., Nguyen, P., Dalíková, M. & Marec, F. Chromosomal evolution in tortricid moths: conserved karyotypes with diverged features. PLoS ONE 8, e64520 (2013).
Nasu, Y. Lopharcha moriutii, sp. nov. and Polylopha cassiicola Liu & Kawabe (Lepidoptera, Tortricidae, Chlidanotinae, Polyorthini) from Thailand and Hong Kong. Zootaxa 1369, 55–61 (2006).
Guan, D. et al. Identifying and removing haplotypic duplication in primary genome assemblies. Bioinformatics 36, 2896–2898 (2020).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Zhou, C., McCarthy, S. A. & Durbin, R. YaHS: yet another Hi-C scaffolding tool. Bioinformatics 39, btac808 (2023).
Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 3, 95–98 (2016).
Hu, J., Fan, J., Sun, Z. & Liu, S. NextPolish: a fast and efficient genome polishing tool for long-read assembly. Bioinformatics 36, 2253–2255 (2020).
Meng, G., Li, Y., Yang, C. & Liu, S. MitoZ: a toolkit for animal mitochondrial genome assembly, annotation and visualization. Nucleic Acids Res. 47, e63 (2019).
Wellcome Sanger Institute. Genbank https://identifiers.org/insdc.gca:GCA_947859335.1 (2023).
Boyes, D. & Langdon, W. B. V. The genome sequence of the Lunar Hornet, Sesia bembeciformis (Hübner 1806). Wellcome Open Res 8, (2023).
Tang, H. et al. Synteny and Collinearity in Plant Genomes. Science 320, 486–488 (2008).
Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinforma. 25, 4.10.1–4.10.14 (2009).
Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl. Acad. Sci. 117, 9451–9457 (2020).
Chan, P. P., Lin, B. Y., Mak, A. J. & Lowe, T. M. tRNAscan-SE 2.0: improved detection and functional classification of transfer RNA genes. Nucleic Acids Res. 49, 9077–9096 (2021).
Stiehler, F. et al. Helixer: cross-species gene annotation of large eukaryotic genomes using deep learning. Bioinformatics 36, 5291–5298 (2021).
Cantalapiedra, C. P., Hernández-Plaza, A., Letunic, I., Bork, P. & Huerta-Cepas, J. eggNOG-mapper v2: Functional annotation, orthology assignments, and domain prediction at the metagenomic scale. Mol. Biol. Evol. 38, 5825–5829 (2021).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRP479759 (2024).
Genbank https://identifiers.org/ncbi/insdc.gca:GCA_038024825.1 (2024).
Yang, F. & Wei, S.-J. Genome annotation of Polylopha cassiicola. figshare https://doi.org/10.6084/m9.figshare.24902046 (2023).
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Danecek, P. et al. Twelve years of SAMtools and BCFtools. GigaScience 10, giab008 (2021).
Manni, M., Berkeley, M. R., Seppey, M., Simão, F. A. & Zdobnov, E. M. BUSCO update: Novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Mol. Biol. Evol. 38, 4647–4654 (2021).
Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907–915 (2019).
Robinson, J. T. et al. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011).
Acknowledgements
We thank Ming-Liang Li for his help in sample collection. This work was supported by National Natural Science Foundation of China (32070464 and 32272543), Program of Beijing Academy of Agriculture and Forestry Sciences (JKZX202208), and Beijing Key Laboratory of Environmentally Friendly Management on Pests of North China Fruits (BZ0432).
Author information
Authors and Affiliations
Contributions
S.W. designed the study. L.C., Y.Y. and J.C. contribute to the materials of this study. F.Y. and W.S. analysed the data. F.Y. wrote the manuscript. S.W. revised the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Yang, F., Cao, LJ., Chen, JC. et al. Nuclear and mitochondrial genomes of Polylopha cassiicola: the first assembly in Chlidanotinae (Tortricidae). Sci Data 11, 419 (2024). https://doi.org/10.1038/s41597-024-03255-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-024-03255-7