Introduction

Horizontal gene transfer (HGT), the transfer of genetic materials between sexually well-separated species, is a significant force in the evolution of eukaryotic genomes (Keeling and Palmer, 2008). Generally, HGT is less common in multicellular than in unicellular eukaryotes (Keeling and Palmer, 2008). In vascular plants, HGT appears to be rare in plastid and nuclear genomes but is frequent in the mitochondrial (mt) genome (Richardson and Palmer, 2007; Bock, 2010), and is facilitated by intimate physical associations between donor and recipient species through parasitism or epiphytism (Davis et al., 2005; Mower et al., 2010; Rice et al., 2013; Xi et al., 2013). The most extensive mt HGT events have been detected in the basal angiosperm Amborella, which acquired more than one megabase of foreign mtDNA, including four entire mt genomes from its epiphytes (Rice et al., 2013). In cases where HGT has occurred between species without parasitic or epiphytic relationships, transfer of foreign DNA via vectors such as bacteria or fungi has been suggested as an alternative route (Bergthorsson et al., 2003; Won and Renner, 2003; Bock, 2010). After gene transfer via HGT, foreign genes often undergo modifications such as gene conversion, differential gene loss and a change in mutation rate (Won and Renner, 2003; Hao et al., 2010; Mower et al., 2010). These processes are creative forces in the generation of mt genetic diversity and have an important role in plant mt genome evolution (Archibald and Richards, 2010; Hao et al., 2010; Mower et al., 2010).

To date, our knowledge and data about HGT in plants mainly originate from studies of angiosperms. Only one case of HGT has been reported in gymnosperms, in which an intron of a mt gene was transferred from an angiosperm to the ancestor of an Asian clade of Gnetum (Gnetales, gymnosperms) (Won and Renner, 2003). After the ancestral acquisition of the angiosperm-derived sequence, eight out of 20 species in this clade of Gnetum gradually lost the foreign copy, whereas all retained the native copy (Won and Renner, 2003). The causes and consequences of mt HGT remain largely unclear in gymnosperms owing to the limited number of cases discovered.

Here we report a new case of mt HGT from an angiosperm to a gymnosperm, the Canary Island pine Pinus canariensis (Pinaceae). Pinus canariensis originated from an ancient Mediterranean evolutionary center (Klaus and Ehrendorfer, 1989), and is currently restricted to the Canary Islands. We detected two copies of mt nad5-1 (NADH dehydrogenase subunit 5 intron 1 plus its flanking exons 1 and 2) in P. canariensis. Phylogenetic analyses revealed that one of these was vertically inherited, whereas the other was horizontally transferred from an angiosperm species. To understand the specificity and history of the event, we surveyed sequence variation in the three genomes of the genus Pinus to establish the distribution of the angiosperm-derived nad5-1 copy in the genus and the genomic location of this sequence; we estimated the timing of this HGT in P. canariensis and discussed its putative routes. Our discovery is important for understanding the role of HGT in the evolution of the mt genome in plants.

Materials and methods

Taxon sampling

To examine the presence of the angiosperm-type copy of nad5-1 in the genus Pinus, we sampled 44 pine species to represent 9 of the 11 subsections of the genus Pinus recognized by Gernandt et al. (2005). Most species were represented by multiple accessions collected either from documented individuals grown by different institutions or from natural stands, and a total of 148 individuals were analyzed (Supplementary Table S1). For P. canariensis, 22 individuals were sampled, of which 16 were from four natural populations across the species’ range, and the other six individuals were from a botanic garden (Supplementary Table S1). We retrieved the nad5-1 sequences of 175 species from GenBank, spanning all major clades of land plants: angiosperms, gymnosperms, ferns and mosses (Supplementary Table S2).

DNA isolation, PCR amplification and sequencing

Genomic DNA was extracted from needles or seedlings using a Plant Genomic DNA Kit (TIANGEN, Beijing, China) according to the manufacturer’s instructions. The mtDNA nad5-1 region in Pinus species was amplified using a pair of previously published primers (Wang et al., 2000). This pair of primers generated two copies of nad5-1 in P. canariensis. A BLAST search in the NCBI nucleotide database revealed that one of these was identical to the previously reported nad5-1 sequence of P. canariensis (this copy is referred to hereafter as the gymnosperm-type), whereas the other was similar to those of angiosperm species (this copy is hereafter referred to as the angiosperm-type). To test the presence of angiosperm-type nad5-1 in the genus Pinus further, we designed a pair of primers specific to this sequence, and surveyed 44 pine species (Supplementary Table S1). To test whether the remaining genomic regions surrounding the angiosperm-type nad5-1 were also angiosperm-derived, we used primers specific to the neighboring nad5 exons 1 and 2 of angiosperm species, and universal primers for nad5 exon 2 of seed plants to amplify the regions connecting the nad5-1 segment. The primers specific to nad5 exons 1 and 2 of angiosperms were designed based on Genbank sequences, and the universal primers were from a previous study (Qiu et al., 2006). The primer sequences, annealing temperature and sizes of each product are listed in Supplementary Table S3. The PCR products were subjected to electrophoresis using a 1.0% agarose gel, after which the desired band was cut from the gel and purified using a GFX PCR DNA and Gel Band Purification Kit (Amersham Pharmacia Biotech, Buckinghamshire, UK). The purified PCR products were sequenced directly on an ABI 3730 automated sequencer (Applied Biosystems, Foster City, CA, USA). The sequences for each DNA region have been deposited in GenBank under accession numbers KP141747–KP141754 (Supplementary Table S2).

Quantitative PCR

To determine the genomic location of the angiosperm-type nad5-1 in P. canariensis, we applied a quantitative PCR (qPCR) strategy similar to that used by Mower et al. (2010). This approach is based on the fact that the copy numbers of the plastid (highest), mt (intermediate) and nuclear (lowest) genomes differ widely in a plant cell (Petit and Vendramin, 2007), and genes with a similar number of copies should be amplified at similar rates. Nine loci were selected for amplification in P. canariensis: two chloroplast (cp) loci (rbcL and matK), three native mt loci (cox1, matR and the gymnosperm-type nad5-1), three nuclear loci (cco, cad and 4CL) and the angiosperm-type nad5-1. The qPCR primer sequences, annealing temperature and sizes of each product are listed in Supplementary Table S3. The qPCR assays were performed on a Bio-Rad CFX Real-time system (Bio-Rad, Hercules, CA) using iQ SYBR Green Supermix (Bio-Rad). Genomic DNA of three P. canariensis individuals was used as templates, a fivefold dilution series of each DNA was used in the amplification of the nine loci. The amplification efficiency for each locus was estimated using the Bio-Rad CFX manager version 3.1. We conducted a melt-curve analysis at the end of the PCR run to examine the specificity of the amplification. Each pair of primers demonstrated an efficiency >90% and amplified only the target region.

Phylogenetic analyses

The phylogenetic analyses were performed using maximum parsimony (MP), maximum likelihood (ML) and Bayesian approaches. Gaps were treated as missing. MP analysis was implemented in PAUP version 4.0b10 (Swofford, 2002). Heuristic searches were performed using random sequence addition with 100 replicates, tree-bisection-reconnection branch swapping and the MulTrees option, and a maximum of 500 trees were saved per round. All character states were unordered and equally weighted. Bootstrap supports for the clades found in the most parsimonious trees were obtained from 1000 replicates using the same heuristic search settings. ML analysis was performed using RAxML version 7.3.0 (Stamatakis, 2006) under the GTRGAMMA model as suggested in the instructions. One thousand ML trees were generated to find the best-scoring ML tree, and topological robustness was investigated using 1000 nonparametric bootstrap replicates. Bayesian analyses were conducted using MrBayes version 3.2 (Ronquist et al., 2012). The optimal model for the DNA sequence was determined using Modeltest 3.7 (Posada and Crandall, 1998). Four Markov Chain Monte Carlo runs were conducted, each comprising 10 million generations; trees were sampled every 100 generations, and the first 2.5 million generations of each run were discarded as the burn-in, to ensure that the chains were stationary. The remaining trees were used to calculate a strict consensus tree with the posterior probabilities for each split.

To assess the alternate topological placement of the angiosperm-type nad5-1, we performed the parsimony-based Templeton test (Templeton, 1983) and the likelihood-based Shimodaira–Hasegawa test (Shimodaira and Hasegawa, 1999). For the Templeton test, we used one-tailed probabilities as suggested by Felsenstein (1985). For the Shimodaira–Hasegawa test, the significance levels of the differences in the likelihood scores for each comparison were checked for the null distribution obtained with the re-estimated log-likelihoods approximated with 1000 nonparametric bootstrap replicates. The Templeton and Shimodaira–Hasegawa tests were implemented in PAUP. Gene conversion between the angiosperm- and gymnosperm-type nad5-1 was tested using RDP3.44 (Martin et al., 2010). We set the highest acceptable P-value to 0.05 and default values for the remaining settings.

Divergence time estimation

We extracted full chloroplast genome sequences of 38 species from Parks et al. (2012) to perform cpDNA-based divergence time estimation. These 38 species represent all 11 subsections of the genus Pinus recognized by Gernandt et al. (2005), and three outgroups (Picea sitchensis, Abies firma and Cathaya argyrophylla) (Supplementary Figure S1). All seven species of the subsection Pinaster, including P. canariensis, were included in this analysis. Previous work shows that two loci (ycf1 and ycf2) have very different mutation rates from other parts of the cp genome in pines (Parks et al., 2009), thus these two regions were removed from this analysis. A likelihood ratio test in PAUP rejected a strict molecular clock for the cpDNA data (X2=41445.87, df=36, P<0.001), thus we used two relaxed molecular clock approaches to estimate the mutation rate and divergence time: one was a Bayesian method, with an assumption of independent lineage rate changes and was implemented in BEAST version 1.7.2 (Drummond et al., 2012), and the other was a Bayesian estimation, with autocorrelated lineage rates and was implemented in Multidivtime (Kishino et al., 2001). In each analysis, we set 85 million years ago (MYA) as a lower bound and 45 MYA as an upper bound for the divergence time between the two subgenera Pinus and Strobus, as proposed by Willyard et al. (2007).

For analysis in BEAST, we used a general time-reversible substitution model with Gamma distributed rate heterogeneity, an uncorrelated lognormal relaxed clock model and a Yule speciation model. We calibrated the tree by placing a uniform prior on the age of the split between the two subgenera, ranging from 45 to 85 million years (MY). The Markov chain Monte Carlo was run for 20 million generations, whereas trees and parameter estimates were sampled once every 2000 generations. The log file of the run was inspected using Tracer version 1.6 (http://beast.bio.ed.ac.uk/Tracer) to confirm that the chain was stationary and that the effective sample sizes were adequate (ESS>200) for the estimated parameters. From the sampled posterior trees, the first 2500 (25%) were treated as the burn-in, and the remaining trees were used to summarize the node height and rate statistics using TreeAnnotator version 1.7.2 (part of the BEAST package).

For Multidivtime, the topology of the maximum clade credibility tree generated by BEAST was used as the target tree and reference topology. The maximum and the minimum constraints to the divergence between the subgenera were set to 85 and 45 MY, respectively. The prior for the mean and s.d. of the ingroup root age (rttm and rttmsd) were both set to 0.85; the mean and s.d. of the prior distribution for the rate of molecular evolution at the ingroup root node (rtrate and rtratesd) were both set to 0.02; and bigtime was set to 100. The Markov chain Monte Carlo was run for one million generations, and parameters were sampled every 100 generations after a burn-in of 0.1 million generations. Three independent runs were performed, and all generated similar results.

Results and Discussion

Using the primers previously developed for nad5-1 in gymnosperms (Wang et al., 2000), we co-amplified both angiosperm- and gymnosperm-type nad5-1 in P. canariensis individuals. The angiosperm-type copy was 1084 bp and the gymnosperm-type was 1462 bp. These two copies of nad5-1 sequences were highly divergent showing 66 substitutions, 24 indels ranging from 1 to 33 bp, and a 514 bp region that cannot be unambiguously aligned between them. The angiosperm-type nad5-1 in P. canariensis consisted of a full-length nad5 intron 1 and partial nad5 exons 1 and 2. Using primers specific to nad5 exons 1 and 2 of angiosperm species, we obtained an additional 197 bp sequence in exon 1, but failed to extend further into exon 2. We also used the universal primers for nad5 exon 2 of seed plants on P. canariensis; these primers amplified only the gymnosperm-derived copy, not the angiosperm copy. Thus, a 1281 bp angiosperm-type nad5-1 sequence (comprising the original 1084 bp plus the additional 197 bp from exon 1) was obtained from P. canariensis and used for subsequent analyses.

We performed phylogenetic analyses on the nad5-1 sequences of P. canariensis and the other 175 land plant species (Supplementary Table S2). The MP, ML and Bayesian analyses produced similar trees. The P. canariensis gymnosperm-type nad5-1 was grouped with other subgenus Pinus species, whereas the angiosperm-type copy was clustered inside the eudicots of flowering plants (Figure 1; Supplementary Figure S2). The anomalous phylogenetic placement of the P. canariensis angiosperm-type nad5-1 sequence was not caused by errors in phylogenetic reconstruction, because the topology was strongly supported by all three methods (bootstrap values=99 and 98% for MP and ML analyses, respectively, and posterior probability=1.00 for Bayesian analysis), and both the Templeton and Shimodaira–Hasegawa tests (P<0.001) suggested the placement of the angiosperm-type nad5-1 in the angiosperm clade rather than in the subgenus Pinus. Apart from the peculiar placement of this copy of P. canariensis, the phylogenetic relationships throughout the rest of the tree were consistent with currently accepted seed plant phylogeny (Qiu et al., 2006; APG, 2009).

Figure 1
figure 1

The simplified strict consensus tree of land plants based on the nad5-1 sequence (only families and genera in Pinaceae are shown). Bootstrap values for maximum parsimony (left) and maximum likelihood (right) analyses are presented above the branches, and Bayesian posterior probabilities are shown below the branches. The angiosperm- and gymnosperm-type nad5-1 of P. canariensis are indicated in bold with an enlarged font. GEEC: the families Geraniaceae, Ericaceae, Euphorbiaceae and Cucurbitaceae; BCM: the families Brassicaceae, Caricaceae and Malvaceae; LPGLSA: the families Lamiaceae, Phrymaceae, Gesneriaceae, Lentibulariaceae, Solanaceae and Apocynaceae; PLANTPC: the genera Pseudotsuga, Larix, Abies, Nothotsuga, Tsuga, Pseudolarix, Cedrus. The details of this tree, including sampled taxa, are shown in Supplementary Figure S2.

The presence of the angiosperm-type copy of nad5-1 in P. canariensis cannot be attributed to contamination of plant materials for the following reasons. First, genomic DNA of P. canariensis was extracted independently in three different labs—Umeå University (Umeå, Sweden), Institute of Botany (Beijing, China) and Forestry and Forest Products Research Institute (Tsukuba, Japan)—from multiple individuals collected in different years, and all samples were amplified and sequenced at least twice. Second, the angiosperm-derived exons 1 and 2 were nonfunctional, owing to reading frame shifts that resulted in premature stop codons. Contamination of DNA from an extant plant would probably introduce functional copies of exons 1 and 2 rather than pseudogenes. Therefore, the most reasonable explanation for our results is that P. canariensis acquired the angiosperm-type nad5-1 from an angiosperm via HGT.

For a HGT event to be successful, foreign DNA must be integrated into the recipient’s genome. We adopted a qPCR strategy that is based on the differences in copy numbers among the three genomes in a plant cell. We found that genes from the same genome were amplified at similar rates. The signals of the two plastid genes (matK and rbcL) appeared first during qPCR amplification, the three mt genes (cox1, matR and the gymnosperm-type nad5-1) appeared next, and the three nuclear genes (4CL, coo and cad) appeared last (Figure 2), which is in accordance with copy number expectations for the three genomes in plants (Petit and Vendramin, 2007). The angiosperm-type nad5-1 was amplified at a rate comparable to those of the gymnosperm-type nad5-1 and two other mt genes (Figure 2), suggesting that the angiosperm-type copy was located in the mt genome. That the angiosperm-type nad5-1 has been integrated into the mt genome is also supported by the low substitution rate of this region among individuals. Sequencing the angiosperm-type nad5-1 in 22 individuals throughout the range of P. canariensis revealed no intraspecific variation. In contrast, genes transferred from mitochondria to nuclei often exhibit accelerated substitution rates (Palmer et al., 2000). These results are consistent with previous observations that most transferred mt genes are integrated into the receptors’ mt genomes (Bock, 2010). The presence of both angiosperm-derived exons and the intron in P. canariensis suggests that the exons and intron were transferred simultaneously as DNA. The angiosperm-derived exons 1 and 2 contained premature stop codons and thus were nonfunctional (see above), indicating pseudogenization of the foreign nad5-1 in the host’ genome after acquisition by HGT. Gene conversion between native and foreign gene copies has been reported in previous studies of HGT (Hao et al., 2010; Mower et al., 2010), but this was not detected between the two nad5-1 copies in P. canariensis. Future sequencing of the whole mt genome of P. canariensis is needed in order to clarify the accurate location of the two segments in the mt genome of P. canariensis, and the extent of mt HGT in this species.

Figure 2
figure 2

Quantitative PCR assay of the angiosperm-type nad5-1 (red), two chloroplast (cpDNA) (matK, light green; rbcL, dark green), three mitochondrial (mtDNA) (cox1, yellow; matR, pink; the gymnosperm-type nad5-1, golden) and three nuclear (nDNA) (4CL, dark blue; coo, light blue; cad, black) loci in P. canariensis. All reactions in the plot were run in three replicates using 6 ng genomic DNA as the template.

Using primers specific to angiosperm-type nad5-1, we examined the presence of this sequence in 148 individuals of 44 Pinus species, including all six close relatives of P. canariensis in subsection Pinaster (Supplementary Table S2). We found that the angiosperm-type nad5-1 was present in all 22 P. canariensis individuals but absent in all other 43 pine species investigated here. These results were further confirmed by the amplification of two copies of nad5-1 in P. canariensis but only the gymnosperm-type copy nad5-1 in all other pine species by the universal primers for seed plants’ nad5-1. The presence of angiosperm-type nad5-1 in all P. canariensis individuals but absence in all other pines indicated that the HGT probably occurred after P. canariensis diverged from its closest relatives. Thus, the divergence time between P. canariensis and its close sister species can be regarded as a lower bound for the HGT event. Previous studies suggest that P. canariensis is closely related to the Himalayan chir pine Pinus roxburghi i (Klaus and Ehrendorfer, 1989; Parks et al., 2012; Grivet et al., 2013). On the basis of cpDNA dating, the divergence time between P. canariensis and P. roxburghii was 4.4 MYA (95% interval 7.3−1.6 MYA; Figure 3; Supplementary Figure S1 and Supplementary Table S4). A phylogenetic study based on the mt genome suggests that the mt genome of P. roxburghii was captured from an Asian hard pine (Wang and Wang, 2014). Thus, we cannot rule out the possibility that the HGT occurred in the common ancestor of P. canariensis and P. roxburghii and then the foreign copy of nad5-1 was lost from P. roxburghii owing to the capture of another mtDNA. Pinus canariensis and P. roxburghii diverged from their closest relative Pinus pinea (Parks et al., 2012; Grivet et al., 2013) 5.1 MYA (8.4−2.0 MYA; Figure 3; Supplementary Figure S1 and Supplementary Table S4). This pushes the lower bound of the time of HGT slightly earlier. Taken together, the HGT event in P. canariensis probably occurred no earlier than the late Miocene. This result is consistent with the colonization history of P. canariensis in the Canary Islands (Navascues et al., 2006). Island populations are typically established by only one or a few individual founders. When the founder population is small, rare alleles should theoretically have a higher probability of reaching high frequencies and eventually become fixed by drift in the newly colonized area (Klopfstein et al., 2006). Severe bottlenecks promoting the fixation of rare mitotypes in newly occupied territories have been found in other pine colonization events (Wang et al., 2011). In this context, the angiosperm-derived nad5-1 could have been fixed by genetic drift during population expansion. The nonfunctional exons 1 and 2 of the angiosperm-type nad5-1 also suggest that the fixation of this copy was most likely promoted by drift rather than because it confers any selective advantage.

Figure 3
figure 3

Dated phylogeny for the genus Pinus based on full chloroplast genome sequences. Branch lengths are drawn to reflect BEAST divergence estimates by using 85 MYA (million years ago) as a lower bound and 45 MYA as an upper bound for the divergence time between the two subgenera Pinus and Strobus. Terminal branches within each group are collapsed for simplicity (See Supplementary Figure S1 for details), except those of subsection Pinaster.

There are two likely routes for the transfer of nad5-1 into P. canariensis, either through direct cell–cell contact, or the foreign genetic material was bracketed and moved by an external vector, for example, fungi, bacterial pathogens or plant viruses. Many of the reported HGT events concern donor and recipient species that are engaged in parasitic or epiphytic interactions (Davis et al., 2005; Mower et al., 2010; Rice et al., 2013; Xi et al., 2013). They illustrate that physical association can facilitate HGT between species, and foreign DNA can be integrated into the recipient genome via recombination (Bock, 2010). This mechanism of HGT is supported by the observation that large pieces of DNA, and even entire organelles, can be transmitted between plant cells that come into contact with each other (Stegemann and Bock, 2009). Pines are hosts of mistletoe species (Santalales, eudicots), for example, European mistletoe (Viscum album) is known to parasitize pines in the Mediterranean region (Zuber and Widmer, 2009). The putative candidate angiosperm-type nad5-1 donor in this case can be deduced from the relative position of the sequences in the phylogenetic tree. However, the low resolution of the nad5-1 sequence tree does not allow for the identification of the candidate donor. Further studies including mistletoe species would help to test whether the angiosperm-type nad5-1 in P. canariensis was from a mistletoes species via direct cell–cell contact. Vector-mediated HGT by fungi, bacterial pathogens or plant viruses has also been proposed in previous studies (Davis et al., 2005; Mower et al., 2010). Mycorrhizal fungi, that are essential for pines’ survival and vitality, are notoriously nonspecific in their host selection and connect many distantly related plants in the same community (Simard et al., 1997). The ‘wood-wide web’ (Simard et al., 1997) could facilitate widespread exchange of DNA across phylogenetic distances spanning all green plants. Thus, it is also possible that the HGT event in P. canariensis was mediated through a fungal bridge between the donor and this species.

In summary, we found a new case of HGT in a gymnosperm: P. canariensis acquired an angiosperm-derived nad5-1 sequence. The foreign copy of nad5-1 has been integrated into the mt genome of P. canariensis and gone through pseudogenization. The HGT event was dated to the late Miocene after P. canariensis split from its closest relatives. Following the transfer, the foreign nad5-1 copy was fixed by drift during the colonization of the Canary Islands by this species. The mechanism of this HGT is unclear but most likely occurred via either direct cell–cell contact or external vectors. Future investigation that includes candidate donor angiosperm species in the region would help to identify the origin of the foreign mtDNA segment in P. canariensis and the mechanism of the HGT. Overall, our findings provide new evidence for HGT as an additional source of mt genome variation in gymnosperms, a subject that has been little discussed thus far. Such findings together with those instances of HGT observed in angiosperms raise questions related to the evolutionary significance of HGT that need to be addressed, for example, the structural and functional consequences of foreign gene integration, and the prevalence and characteristics of HGT in different groups of land plants. Our study provides a new natural system for exploring these questions further.

Data archiving

DNA sequence: Genbank accessions KP141747-KP141754.