Introduction

Plasmids are self-replicating, extrachromosomal, mobile, genetic elements of ecological importance, as they may confer functions or beneficial traits enabling their hosts to thrive in a given environment and—equally important—they can act as horizontal gene transfer vehicles1 thereby contributing to the spread of genetic information within a microbial community. Thus, plasmids act as important evolutionary driving force by accelerating genome innovation and allowing acquisition of evolutionary novelty2.

There are myriads of studies addressing bacterial and archaeal plasmids, which eventually revealed the typical functions ensuring plasmid replication, mobilization and maintenance, as well as other accessory characteristics3,4,5,6,7,8,9,10,11,12. Only quite recently, however, it was recognized that to characterize them as a whole is necessary to understand their ecological impact in microbial ecosystems13,14,15,16,17,18. In this regard, the total plasmid DNA present in an ecological niche was defined as the “plasmidome”19. Indeed, plasmidome studies based on independent microbial culture methods substantiated the significance of extrachromosomal genetic elements with respect to different environments, such as the bovine rumen and wastewater treatment plants13,14,15. We have recently reported the study of the plasmidome of the Puquio de Campo Naranja, an extreme environment of the Puna Argentina18. However, despite the ecological relevance of plasmidomes such studies were only rarely performed, which is possibly due to the complexity of sample processing and the need of tailor-made bioinformatics tools for analysis.

Andean Puna constitutes a large reservoir of Andean Microbial Ecosystems (AMEs) including biofilms, microbial mats, microbialites and endoevaporites, all of which exposed to multiple extreme conditions such as high UV irradiation, and—due to the high altitude—low oxygen pressure, large thermal fluctuations, high dryness, hypersalinity, alkalinity, high concentrations of heavy metals and metalloids such as arsenic20,21. The highest concentration of As (up to 347 mg L−1 in the summertime) was reported for the Diamante Lake22, which is located inside the Galán Volcano boiler (40 km diameter) at an altitude of 4589 m above sea level in the Catamarca province, Argentina (Fig. 1). Physico-chemical parameters include high pH-values (9–11), elevated salinity (270 g L−1, 217 mS cm−1), strong UV irradiation (84 W m−2 of UV-AB at noon) and vast day–night temperature ranges (− 20 °C to + 20 °C)23. Microbial communities facing such an environment need to develop capacities ensuring survival. As stated above, extrachromosomal genetic elements could procure the genes enabling microbes to withstand harsh environmental conditions24. Rascovan et al.25 reported the discovery of a red biofilm consisting of 94% archaeal representatives, with members belonging to the class of the haloarchaea dominating. In the conventional metagenomic studies already carried out on such microbial communities25,26, a high abundance of genes related to anaerobic arsenate respiration (arr) as well as arsenite oxidation (aio) was detected, strongly supporting the assumption that arsenic is used in bioenergetic processes. In addition, genes related to detoxifying mechanisms for removal of intracellular arsenic were also found, e.g. the acr3 gene and the arsABCRD operon. However, in those studies, the impact of plasmids is possibly underestimated because it is difficult to discriminate them from the chromosomal DNA, due to their low number of copies and corresponding low proportion. In this study, we assessed the plasmidome of the red biofilm from Diamante Lake (Fig. 1D), and compare it to the previously reported plasmidome of microbialites from Puquio de Campo Naranja18. In addition, we compared it to the plasmidome from a wastewater-treatment-plant containing effluents of the chemical/pharmaceutical industry (WWTP Visp, Switzerland) as it concerns an environment similarly loaded with high concentrations of metals15. Potential hosts as well as encoded functions were identified.

Figure 1
figure 1

(A) Panoramic photography of the Diamante Lake, located in the Catamarca province, Argentina. (B-C) Submerged microbialites. (D) Red biofilms attached to gaylussite crystals at the bottom of the submerged microbialite.

Materials and methods

Sampling and biomass purification

Samples were aseptically taken from red biofilms attached to gaylussite crystals at the bottom of submerged microbialites in the Diamante Lake, Catamarca, Argentina (26°00′51.04″S, 67°01′46.42″W) in September 2019 (Fig. 1). Microbialites were found at a distance of 2 m from the lake shore. Samples from three randomly chosen sites were taken and pooled to ensure representativeness. Stored in sterile plastic flasks at 4 °C such pooled samples were further processed within a week. Permission for sample collection was granted by the Secretaría de Medio Ambiente, Catamarca, Argentina (No. 22935/2016).

Microorganisms were separated from the sample by using the protocol described by Perez et al.18. Pellets obtained from biomass purification were kept at − 20 °C until plasmid DNA extraction.

DNA extractions

The plasmid DNA was isolated by using the Large-Construct kit as recommended by the manufacturer (Qiagen, Hilden, Germany).

In parallel, metagenomic DNA was extracted from red biofilm samples by using the FastDNA Spin kit for soil as recommended by the manufacturer (MP Biomedicals, CA, USA). The extracted metagenomic DNA served as template for 16S rRNA gene amplicon sequencing.

Chromosomal DNA removal

The plasmid DNA was subjected to overnight digestion at 37 °C with exonuclease V Rec BCD (New England Biolabs, Massachusetts, USA) to remove chromosomal DNA. PCR reactions using universal primers covering the V4 region of 16S rRNA gene were performed to check for chromosomal DNA contamination. The primers used for bacteria were F5′-CCTACGGGNGGCWGCAG-3′ (Bac_341F) and R5′-GGATTAGATACCCBDGTAGTC-3′ (Bac_785R)27; and for archaea F5′-CCCTAYGGGGYGCASCAG-3′ (Arc_340F) and R5′-ATTAGAKACCCSNGTAGTCC-3′ (Arc_806R)28. The plasmid DNA was purified using the SureClean Plus kit (Bioline, London, UK).

Sequencing, quality control and assembly

Illumina shotgun paired-end sequencing libraries were generated from isolated plasmid DNA using the Nextera XT sample preparation kit as recommended by the manufacturer (Illumina, CA, USA). The MiSeq system together with the MiSeq reagent kit version 3 (600-cycle) was used for the plasmidome sequencing as recommended by the manufacturer (Illumina). The quality control of raw sequence reads was carried out with FastQC v0.11.9 and the reads were quality-filtered using Trimmomatic v0.38.029. Finally, the reads were de novo assembled by using SPAdes software v3.9.0 with the -meta parameter to call the metaSPAdes module30. Recycler algorithm was used to assemble cyclic sequences, which are likely plasmids, phages and other circular elements from the assembly graphs provided by SPAdes31. The bioinformatic analysis pipeline described by Kothari et al.17 was also used to identify the complete closed circular contigs. The circular elements obtained in both cases were compared to DoriC 10, a database of replication origins in prokaryotic genomes including chromosomes and plasmids32.

Bioinformatic analysis

The reads generated by sequencing of the plasmid DNA were aligned with the metagenome contigs using the Bowtie2 tool33. Metagenome contigs were assembled using SPAdes v3.9.0 from sequencing data of three independent red biofilm samples taken on another occasion and published by Saona et al.26.

Annotation and labeling of all the relevant genomic characteristics on plasmidome contigs were done with Prokka v1.14.534. The assembled plasmidome dataset was submitted to the MG-RAST server35 for functional and taxonomic analysis. Comparisons with the SEED subsystem database were performed by using a maximum E-value of 10–5. The deduced functional profile of the red biofilm plasmidome was compared with the one derived from the metagenome mentioned above26 by employing the software STAMP (Statistical Analysis of Metagenomic Profiles)36. Comparisons with other plasmidomes were also performed15,18.

Both the known plasmid sequences from NCBI database and the domains related to plasmid replication and mobilization were assessed as described previously by Perez et al.18. In addition, the plasmidome contigs were compared to TADB 2.0 database by blastn in order to identify toxin–antitoxin (TA) systems37. Furthermore, the Prokka annotation file of the plasmidome was subjected to Conditional Reciprocal Best BLAST (crb-blast) against plasmid genes sequences from the ACLAME database with an E-value ≤ 10–338. Hits with an identity ≥ 70% and an alignment coverage ≥ 90% were selected. Similarly, putative genes encoding metal resistance and virulence factors were also searched by using the BacMet39 and VFDB databases40, respectively.

Due to the high arsenic concentration found in the lake22, arsenic resistance-related genes were separately annotated. For this purpose, the amino acid sequences were downloaded from Uniprot and were subjected to Position-Specific Iterated BLAST (PSI-BLAST)41. CD-HIT v4.8.1 was used for creating non-redundant datasets42 and Clustal Omega v1.2.4 for sequence alignments43. Profiles Hidden Markov model were build and searched for in the plasmidome translated gene sequences identified with Prokka by using HMMER 3.3 (cut-off E-value < 10–3)44.

The Resistance Gene Identifier (RGI) software was employed for prediction of antibiotic-resistance genes using the Comprehensive Antibiotic Resistance database (CARD)45 as a reference.

The ISEScan software pipeline was used to search for mobile elements such as insertion sequences46, and the HMM profiles downloaded from TnpPred web47 for prediction of prokaryotic transposases by HMMER 3.3.

Amplicon sequencing and taxonomic analysis

16S rRNA gene amplicon sequencing was performed using the above-described primers partially covering the 16S rRNA gene sequence. The MiSeq system together with MiSeq reagent kit version 3 (600-cycle) was used for sequencing of the amplicons as recommended by the manufacturer (Illumina). Data quality control and analysis were performed using the QIIME software48. First, paired-end reads were joined with PEAR v0.9.649. Quality-filtering was performed using the split_libraries_fastq.py script. Forward and reverse primers were removed by using cutadapt v1.1650. USEARCH v1151 was used for zero-radius operational taxonomic unit (zOTU) determination. Taxonomy was assigned against Silva 132 database52.

Results and discussion

Sequencing and assembly output

Illumina sequencing from the Diamante Lake plasmidome generated 1,071,941 paired-end reads, of which 941,587 passed quality-filtering. The SPAdes assembler produced 13,492 contigs (> 500 bp) corresponding to roughly 16.9 Mb (largest contig 20.415 bp) (Supplementary Table S1). It is smaller than the previously reported one for another similar extremophile community of Puquio de Campo Naranja (135,813 contigs, 127.9 Mb)18.

Thirty-nine closed replicons were predicted by the Recycler software (> 1000 bp), the largest consisting of 3313 bp, but none displayed a known plasmid origin of replication when compared to the DoriC 10.0 database. The bioinformatic pipeline used to detect circularity produced 20 circular contigs. The largest comprised 8295 bp and the smallest 2025 bp. Only one of them showed a known plasmid origin of replication (87% similarity, alignment length of 70 nt), corresponding to pSN found in Haloterrigena thermotolerans strain H13 (DoriC ID: pORI00000477). Exclusively hypothetical proteins could be annotated from the open reading frames present in the circular elements.

Functional analysis

MG-RAST analysis revealed that the Diamante Lake plasmidome contains a large fraction of unknown DNA as 18,314 sequences (41.5%) code for predicted proteins with known functions but 25,800 sequences (58.5%) encode putative proteins of unknown function. Thus, such deep-sequencing approaches are suited to detect novel proteins with so far undiscovered functions. In our previous plasmidome analysis from an AME, 39% of the predicted proteins could not be functionally annotated18. The difference between the two environments displaying rather similar environmental characteristics is possibly due to the large proportion of archaea present in the Diamante Lake, as archaeal genomes contain typically a higher fraction of “dark matter” when compared to bacterial genomes. The isolation and cultivation of most of the archaea, and accordingly, the experimental characterization of archaeal gene products, is challenging53. Likewise, Sentchilo et al.15 also reported 52 and 66% of coding sequences without assigned function in two wastewater treatment plant plasmidomes.

From the functional SEED assignment, only 5196 predicted proteins were annotated (28.4%), most of them covering basic metabolic functions such as DNA, RNA and protein metabolism (Fig. 2A). It is noteworthy that among the proteins involved in DNA metabolism those related to DNA repair were rather diverse (Fig. 2B). It is well known that DNA repair plays a key role as an adaptive mechanism to withstand the high UV irradiation in the Andean Puna54,55,56,57,58.

Figure 2
figure 2

Predicted functional profile of the Diamante Lake plasmidome. (A) Number of annotated proteins of each of the major SEED subsystems by using a maximum E-value of 10–5. SEED subsystems with less than 50 hits were grouped into “Others”. (B) On top, “DNA Metabolism” subsystem level 2 classification of the SEED database. On the bottom, “DNA Repair” subsystem level 3 classification of the SEED database (E-value ≤ 10–5).

Although on a smaller scale, the subsystems "Stress Response" and "Virulence, Disease and Defense" were represented (Fig. 2A). With respect to stress response, predicted proteins involved in the response to oxidative stress were most frequently found (57.4%) (Supplementary Fig. S1). As for the DNA repair mechanisms, oxidative stress response systems contribute to protect microorganisms from UV-mediated damage. 82.4% of the assignments to the above mentioned second group corresponded to the subsystem “Resistance to antibiotics and toxic compounds”, with arsenic resistance systems prevailing (67.4%). The arsenic resistance included predicted proteins for an arsenate reductase (ArsC), an arsenical pump-driving ATPase (ArsA), an arsenical resistance operon trans-acting repressor (ArsD) and an arsenical-resistance protein ACR3 (Fig. 3).

Figure 3
figure 3

Predicted functional profile of the Diamante Lake plasmidome. On top, “Virulence, Disease and Defense” subsystem level 2 classification of the SEED database. In the middle, “Resistance to antibiotics and toxic compounds” subsystem level 3 classification of the SEED database. On the bottom, “Arsenic resistance” subsystem level 4 classification of the SEED database (E-value ≤ 10–5).

The unique characteristics of this environment, such as its location at a high altitude, the exposure to extreme conditions and the peculiarities of its microbial composition, which includes a major proportion of archaea, may explain the relatively few functional annotations during grouping into SEED categories.

Functional comparison between plasmidomes

The predicted functional profile of the Diamante Lake plasmidome was compared to the one derived from the Puquio de Campo Naranja plasmidome18. “RNA Metabolism”, “DNA Metabolism”, “Phages, Prophages, Transposable elements, Plasmids” and “Cell Division and Cell Cycle” subsystems were more frequently represented in Diamante Lake than in the other, while “Carbohydrates”, “Cell Wall and Capsule”, “Clustering-based subsystems”, “Stress Response”, “Respiration” were more abundant in Puquio de Campo Naranja. No significant differences in the abundances of other subsystems were observed, suggesting a certain degree of similarity of the predicted functional profiles for both of the AMEs (Fig. 4).

Figure 4
figure 4

Predicted functional profiles derived from the Diamante Lake plasmidome (red bars) and the Puquio de Campo Naranja plasmidome (orange bars). Percent SEED categorizable protein-encoding genes and pairwise proportional differences calculated using STAMP. Fisher’s exact test was used and corrected P-values were calculated using Storey’s FDR. Only the statistically significant SEED subsystems are shown (q < 0.05).

Diamante Lake is known to be among the aqueous environments displaying high arsenic concentrations22,25,59, reinforcing the repeatedly mentioned presence of genes related to its resistance. Proportional differences between “Resistance to antibiotics and toxic compounds” categorizable protein coding genes belonging to the “Virulence, Disease and Defense” SEED subsystem showed that arsenic resistance is more abundant in the Diamante Lake plasmidome than in that of the Puquio de Campo Naranja (Fig. 5). In addition, the comparison of the former with that of a wastewater treatment plant containing effluents from chemical/pharmaceutical industries (WWTP Visp, Switzerland)15 revealed that arsenic resistance traits were again more abundant in the Diamante Lake plasmidome (Fig. 5).

Figure 5
figure 5

Predicted functional profiles derived from the Diamante Lake plasmidome (red bars), the Puquio de Campo Naranja plasmidome (orange bars) and the one derived from the wastewater treatment plant plasmidome in Visp, Switzerland (green bars). Percent “Resistance to antibiotics and toxic compounds” categorizable protein coding genes belonging to the “Virulence, Disease and Defense” SEED subsystem, and pairwise proportional differences calculated using STAMP. Fisher’s exact test was used and corrected P-values were calculated using Benjamini–Hochberg’s FDR. Only the statistically significant SEED subsystems are shown (q < 0.05).

Plasmid-purification advantage

Plasmids usually represent only a small fraction of the total DNA in a given environment, due to their low rate of occurrence and number of copies. Hence, though they are casually recorded by conventional metagenomic sequencing methods, experimental plasmid-purification prior to sequencing allows for an analysis specifically targeting plasmid populations in a culture-independent manner, at best without losing information60. Obtained results are in line with such notion, as only 52% of plasmidome reads aligned with metagenome contigs described above in Saona et al.26. The same applies to the Puquio de Campo Naranja plasmidome, in which alignment reached only 30%18. Thus, our study strengthens that plasmid-purification prior to sequencing more satisfactorily meets the requirements to comprehensively assess the ecological importance of plasmid-borne sequences.

The pairwise comparison aiming at distinguishing the plasmid gene pool from the metagenomic one (Fig. 6) accordingly revealed that “Phages, Prophages, Transposable elements, Plasmids” and “Membrane Transport” subsystems are more frequently represented in the plasmidome.

Figure 6
figure 6

Predicted functional profiles derived from the plasmidome (red bars) and the metagenome (blue bars) of Diamante Lake red biofilm. Percent SEED categorizable protein-encoding genes and pairwise proportional differences calculated using STAMP. G-test (w/Yates’) was used and corrected P-values were calculated using Benjamini–Hochberg’s FDR. Only the statistically significant SEED subsystems are shown (q < 0.05).

Plasmid backbone functions: replication, mobilization and maintenance

In order to identify plasmid-like traits within the plasmidome, we focused on the search for Pfam domains related to plasmid replication and MOB-type relaxase families, which are related to plasmid mobilization.

RHH_1 and DUF1424 were detected as the main Pfam domains of plasmid replication in the Diamante Lake plasmidome, followed by RepL (Table 1). Likewise, RHH_1 and RepL protein families were also the most abundant in the plasmidome from Puquio de Campo Naranja18. It was not the case of DUF1424, which is a family of several archaeal proteins that seems to be present exclusively in Halobacterium and Haloferax species. Although the function of the latter family is unknown, its members are probably rep proteins due to the presence of conserved functional motifs61,62.

Table 1 Plasmid replication-related Pfam in the Diamante Lake plasmidome.

Rep_1 and Rep_3 are the major families of replication initiation proteins. They have been reported among the most abundant in plasmidomes from wastewater treatment plants and a rat cecum15,16,63,64. In this study, domains belonging to the Rep_1 and Rep_3 families were also detected, but with a lower hit rate. Domains of replication initiation proteins from other known families were not detected (Table 1). Possibly, there are replication systems for which the molecular details and the mechanisms are currently unknown, particularly as most of the contributing microorganisms were not cultured and because of the taxonomic composition dominated by specific taxa, i.e. halobacteria25.

The most abundant relaxase families in the plasmidome were MOBC and MOBP, for which 41 and 18 protein domain matches, respectively, were counted. MOBT, MOBV and MOBM were also present (3, 2 and 1 protein domain matches, respectively) (Table 2). Mobilization elements have been reported in most of the previous plasmidome analyses14,15,16,63, however, the classification in relaxase MOB families proposed by Garcillán-Barcia et al.65,66 was not performed. Meanwhile, 29 protein domain matches were counted for MOBT family in the Puquio de Campo Naranja plasmidome18, and Kothari et al.17 reported the MOBQ and MOBP families as the most abundant in circular plasmids from groundwater plasmidomes.

Table 2 Relaxase MOB families in the Diamante Lake plasmidome.

In addition to the above plasmid replication and mobilization entries, sequences harboring genes involved in plasmid maintenance such as loci corresponding to toxin–antitoxin (TA) systems were identified (identity and coverage at least 85%). All of the TA systems belong to type II TA-loci (Supplementary Table S2). Only a single complete system could be annotated, i.e. a toxin with the respective antitoxin (T2787-AT2787), both located in the same contig (NODE_2880). It corresponds to the VapBC family, where the toxin is a PIN-domain ribonuclease (145 aa) and the antitoxin is a transcription factor (98 aa)67. Interestingly, this system is known from Haloquadratum walsbyi DSM 16790, a halophilic archaeon that was isolated from a solar saltern in Brac del Port (Alicante, Spain), and it was found to dominate most of the thalassic NaCl-saturated environments68.

In the previous plasmidome studies, TA systems were not taken into consideration. Only Kothari et al.17 reported the YoeB-YefM and RelE/StbE-RelB/StbD type II TA systems in some circular plasmids from groundwater plasmidomes. YoeB and RelE are ribosome-dependent RNase toxins that bind directly to the A site of the ribosome, where they cleave ribosome-associated mRNA69.

Plasmid accessory functions: antibiotic resistance and arsenic resistance

Sequence analysis of the plasmidome from the Puquio de Campo Naranja revealed that antibiotic resistance traits are widespread in this extreme pristine environment, as 123 putative antibiotic resistance genes (ARGs) were annotated18. In the present study, only 8 ARGs could be classified, conveying resistance to 10 drug classes, among them macrolides, carbapenems, cephalosporins, penams (Supplementary Table S3). Such noticeable difference with respect to the number of ARGs found in similar extreme environments is probably due to microbial regional distinctions. The metabolic processes and the cell walls of bacteria and archaea display significant differences, offering an explanation for the fact that a number of antibiotics are effective against the former but do not threaten the latter70. Moreover, studying antibiotic-resistance mechanisms is—due to the clinical relevance—of much more necessity in the bacterial domain, as pathogenic archaea have not yet been identified71. So far, only a relationship between the periodontal-disease-severity and the relative abundance of the archaeon Methanobrevibacter oralis was reported72.

The bias necessarily introduced by the existing databases and developed from the information currently available, as well as the dominance of the archaea in the microbial community studied interfere with the analysis of the resistome encoded by the Diamante Lake plasmidome. Thus, it cannot to be excluded that the lack of relevant knowledge is the reason for the low number of identifiable ARGs and virulence factors.

Regarding the resistances to metals, a respective search in the BacMet database produced no hits, possibly also due the above reason as the database consists solely of bacterial entries. However, our manual annotation disclosed the presence of arsenic resistance genes. Arsenic hits various microorganisms, however, several bacteria and archaea possess detoxification systems enabling growth even under high As-concentrations73. The most common resistance system is encoded by the ars operon for which different genetic organizations were described among prokaryotes74. In addition to the extrusion systems, composed of the gene-products encoded by arsA, arsB, arsC, arsD, arsR, and acr3, another mechanism involving a putative arsenite(III)-methyltransferase (ArsM) was reported in Halobacterium sp. NRC-175. We identified 28 proteins possibly related to arsenic resistance; ten of them had been automatically annotated as "hypothetical proteins" by Prokka (Supplementary Table S4). Hence, automatic annotation bears the risk of less accurate results or the disclosure of a fewer number of genes than actually exist. It is to be emphasized that genes enabling microorganisms to conduct anaerobic arsenate respiration (arr) and arsenite oxidation (aio) were not detectable in the plasmidome, as these are usually encoded by the chromosome.

As already mentioned, the genetic organization of the ars operons can vary among diverse microorganisms. In the Diamante Lake plasmidome, arsC (arsenate reductase) and acr3 (arsenite efflux transporter) genes were present twice in close proximity, but the most relevant was the arsADR gene cluster of contig 116 (Supplementary Table S4). The genetic arrangement agrees with that described for pHLAC01 and pNRC100 of Halorubrum lacusprofundi ATCC 49239 and Halobacterium sp. NRC-1, respectively (Supplementary Fig. S2). In all cases, the arsDA and arsR genes are transcribed in opposite directions and the absence of the arsenite transporter ArsB-encoding gene is noticeable. This otherwise unusual operon structure is apparently characteristic for the haloarchaea74,75.

Plasmid databases: NCBI and ACLAME

Sequences belonging to 24 megaplasmids described in 13 strains of halophilic archaea isolated from different saline environments were found (Table 3). Most of the matches were with the plasmid of Halobacterium sp. DL1 (315 kb), which was isolated from a freshwater pond (NZ_CP007061.1). Fourteen matches were detected between the plasmidome sequences and the sequences of the plasmid pHLAC01 (431 kb) of Halorubrum lacusprofundi ATCC 49239 that was isolated from the Deep Lake, a hypersaline Antarctic site. It is noteworthy that one of these matches (identity 92%, length 4123 bp) corresponds to contig 116 of the plasmidome, which harbors the arsDA genes (Supplementary Fig. S3). Thirteen sequences match with plasmid pHTIA (330 kb) of Halorhabdus tiamatea SARL4B, which was isolated from the Shaban deep-sea hypersaline anoxic lake in the Red Sea76. Thus, such plasmid sequences are evidently preserved in different high salinity environments.

Table 3 Plasmid matches of the Diamante Lake plasmidome with entries derived from the NCBI database.

When the red biofilm plasmidome genes annotated by Prokka were compared with plasmid genes of the Aclame database, 125 matches corresponded to genes of 11 megaplasmids of five different halophilic archaeal and one actinobacterial strain (Rhodococcus sp. RHA1) (Supplementary Table S5). Most of them were related to DNA metabolism, transposition and recombination. Gene related to arsenic resistance, DNA repair and plasmid partitioning were also identified. Thus, for many of the hypothetical proteins identified by Prokka a function was attributed, but not to all of them. Anyway, the comparison provided further justification of the practice as the presence of known plasmid-associated genes in our plasmidome dataset was proven.

Mobile genetic elements: transposases and insertion sequences

A total of 532 insertion sequences (IS) were identified in the Diamante Lake plasmidome, with IS200/IS605 and IS5 like elements being the most frequent ones followed by members of the IS4, IS6, IS630 and ISH3 families (Supplementary Table S6). The first are the five main IS families spreading in Halobacteria, which is the dominating class of the studied community77. Most of the archaeal IS fall into families detected in Bacteria, while others are restricted to Archaea such as members of the ISH3 family78. Two new potential IS not attributable to any of the known families were classified as well (GenBank accession OK172335 and OK172336). In the Puquio de Campo Naranja plasmidome, a much lower number of IS (28) was reported. Again, most of them were assigned to the IS5, IS630 and IS4 families18. The absence of Tn3 family transposases in both of the plasmidomes is conspicuous, as it represents one of the most abundant families in bacterial genomes, and Tn3 elements preferentially transpose into plasmids79.

The presence of so many IS elements is in line with the notion that the plasmidome substantially contributes to genome evolution as well as adaptation processes by facilitating the acquisition of novel genes and beneficial traits80,81.

Taxonomic analysis

Although plasmids can be transferred between different microorganisms, the taxonomic assignment of the plasmidome contigs allows an estimation of potential hosts. Eighty-eight percent were assigned to Archaea, while 11% were assigned to Bacteria and the remaining 1% to Eukarya. Among the Archaea, the phylum Euryarchaeota (99.85%) is dominating.

When the phylum distribution of the plasmidome was compared with 16S rRNA sequencing data from the corresponding metagenomic DNA sample, the phylum Euryarchaeota again stood out with the highest relative zOTU abundance (Fig. 7A). In both, the class of the halobacteria dominated. With respect to the Bacteria, the Proteobacteria (36%), the Firmicutes (22.3%) and the Actinobacteria (10.5%) comprised most of the contigs assigned in the plasmidome. Also, in the metagenome, the Proteobacteria (66.9%) and the Firmicutes (17.3%) account for the phyla with the highest relative zOTU abundance with the Bacteroidetes, however, ranking third (10.9%) (Fig. 7B). Notably, plasmid contigs of Actinobacteria, Chloroflexi and Deinococcus-Thermus were obtained, whereas the 16S rRNA analysis did not disclose members of these phyla. A possible explanation might be horizontal transfer of plasmid-borne genes between bacterial phyla, or the existence of plasmids with a wide host range. On the other hand, the 16S rRNA analysis indicates the presence of members of the phylum Halanaerobiaeota but no plasmidome contig could be assigned to the latter, which is possibly due to a bias in plasmid databases when a phylum is not well represented or to the absence of plasmids within the taxon.

Figure 7
figure 7

Red biofilm taxonomic analysis from Diamante Lake. Red bars show the relative abundance of each archaeal (A) or bacterial (B) phylum in the plasmidome by MG-RAST analysis using similarity to the RefSeq database (E-value ≤ 10–5). Blue bars show the relative abundance of each archaeal (A) or bacterial (B) phylum by metagenomic DNA analysis using 16S rRNA gene amplicon sequencing. Phyla with abundance less than 1% in both datasets were not included.

Conclusions

It is currently possible to study plasmid elements in the course of a conventional metagenomic analysis, but an approach to specifically target plasmid populations allows to overcome the inherent constraints of the bioinformatic tools applied for the analysis of plasmids from total community DNA. Under this perspective, and from the comparison with the metagenome of the same community, this study showed that part of the plasmid information will not be detected when the experimental plasmid-purification is not carried out prior to sequencing. Furthermore, a large fraction of genes with an unknown function was present in the plasmidome dataset, as at least 58.5% of the predicted proteins were hypothetical. In addition, the percentages of SEED assignments were even lower. The relatively few functional annotations may accord with the peculiarities of the extreme environment, which harbors a microbial community that is dominated by archaea. On the other hand, functions related to the response to oxidative stress and DNA repair were annotated, which agrees with the requirement of adaptive mechanisms enabling the hosts to withstand the exposure to the high UV irradiation in the Andean Puna.

Comparison of the Diamante Lake plasmidome to that of Puquio de Campo Naranja, revealed a certain degree of similarity between the predicted functional profiles of both AMEs. However, striking differences with respect to antibiotic and arsenic resistance were detected. Sequences pointing to arsenic resistance are more abundant in the Diamante Lake plasmidome, a fact that also accounts for the plasmidome derived from a wastewater treatment plant that contains large quantities of effluents of the chemical/pharmaceutical industry. Our results reflect the high amount of arsenic present in the environment under investigation. Traits expected to be found in a plasmid pool were detected, such as Pfam domains related to plasmid replication, MOB-type relaxase families related to plasmid mobilization and genes belonging type II toxin–antitoxin systems related to plasmid maintenance. Moreover, there are sequences known from megaplasmids of halophilic archaea isolated from different saline environments, which provides further evidence for known plasmid-associated genes in the obtained dataset.

The results presented here along with the detection of numerous IS elements favors the opinion that the plasmidome facilitates the mobility and the transfer of genes within such extreme microbial communities.