Introduction

Synechococcus is an important group of Cyanobacteria that contribute to global biogeochemical cycles1. They offer an attractive system to explore bacterial taxa relationships, distribution, coexistence, ecology, and evolution2. The widespread distribution of this group can be attributed to its high degree of genetic diversity3. Synechococcus comprises some of the major forms of Cyanobacteria that inhabit marine and freshwater environments3. Although their genetic diversity has been documented for these ecosystems, there is scant knowledge of the biodiversity, abundance, and distribution of this genus in coastal lagoons. The latter environments are highly productive and valued ecosystems while being morphologically and ecologically complex. Coastal lagoons are subject to more variable environmental conditions than the open sea. Due to relative isolation from the sea and their location within a hydrological catchment, these lagoons are more susceptible to changes in physicochemical parameters, leading to increasing salinity, a decrease in nutrient availability and concentration, and light spectral intensity. These factors influence the adaptation strategies of photosynthetic microorganisms, including Synechococcus species.

The emergence of next-generation sequencing approaches has granted astonishing insight into microbial biodiversity. Notably, the use of the gold standard gene marker for 16S rRNA now enables microbial diversity assessment across the globe in distinct seasons and locations depending on different environmental conditions4. Recently, the use of high-resolution methods, such as oligotyping, has allowed researchers to investigate unexplained diversity within operational taxonomic units and uncover ecologically and biologically distinct taxa5. Based on these techniques, seasonal and geographical behavioral patterns of Synechococcus strain abundances and distributions have been described in several studies6,7,8,9,10. Some studies reported higher abundances in the summer season6,7,8, whereas other studies have shown temporary blooms in spring or summer under eutrophic conditions9,10. In both cases, correlations between physicochemical parameters such as temperature, salinity, and nitrate concentration and Synechococcus abundances were reported8,10,11,12. Spatial differences were also observed; Synechococcus strains were more abundant in coastal waters than in estuaries. Surprisingly, high Synechococcus cells abundance was observed even in polar oceans, which are thought to be devoid of them13. Twenty Synechococcus clades with different patterns of distribution have been described7. Indeed, many Synechococcus clades, including Clades I and IV, have been observed in temperate or polar waters, as well as coastal and higher latitude regions. The question of consistent co-occurrence (for instance, between Clades I and IV) remains partially unanswered, and the role of environmental parameters remains poorly understood.

Synechococcus strains have been classified into three major subclusters (5.1, 5.2, and 5.3) based on 16S rRNA gene phylogeny. Marker genes used to study the diversity of marine Synechococcus were essential to describe an important number of subclades, providing more accurate resolution, such as the internally transcribed spacer (ITS), catalase-peroxidase gene (cpeA), nitrate reductase gene (narB), global nitrogen regulator gene (ntcA), ribulose bisphosphate carboxylase large chain gene (rbcL), DNA-directed RNA polymerase subunit beta gene (rpoC1), and especially petB gene coding for cytochrome b6f. This latter gene helped identify more than thirty subclades3,4. Furthermore, these markers divided the identified subclusters into more than twenty distinct genetic clades4. Marine Synechococcal clades are significantly diverse in terms of depth, temperature, and nutrient availability requirements. Clade II is found in tropical offshore environments, with 'hot spots found in the nutrient-rich coastal upwelling of Morocco and causing seasonal blooms in the Red Sea and the Gulf of Aqaba1. Clade III is predominantly found in tropical and subtropical warm waters1, whereas Clades I and IV occur in nutrient-rich, temperate, and cold environments either nearshore or offshore1. Most of these clades (I, II, III, IV, and CRD1) belong to subcluster 5.1. Marine subcluster 5.2 has been observed in nearshore, coastal, and estuarine environments1. Different clades may co-occur in similar ecological niches, with reports of as many as six clades found at once4. Co-occurrence patterns are observed in coastal waters. Clades II and III co-occurred in the Californian Current during the spring pre-bloom period, while clades I and IV predominated when the bloom itself occurred. Clades V, VI, and X coexist in the Red Sea during transitional periods between mixing and stratification8. Abundant Synechococcus clades are impacted by limiting factors such as light, nutrient availability, temperature, or viral infections8. These factors might also change across seasons and over time scales of environmental changes, leading to clade coexistence7. When less abundant, they may persist at low but stable levels14,15,16 and serve as reservoirs of genetic diversity8.

In this study, we used 16S rRNA amplicon oligotyping to investigate and compare Synechococcus population diversity and co-occurrence patterns in two Moroccan lagoons, Marchica and Oualidia, during the summers of 2014 and 2015. We hypothesized that the distribution and cooccurrence of Synechococcus oligotypes follow spatiotemporal patterns in lagoonal environments.

Results

Detection of Synechococcus oligotypes in environmental samples

A total of 535,138 16S rRNA amplicon sequence reads were generated from the microbial populations of four samples collected at two stations on the summer solstice, 21st June in 2014 and 2015. The relative Synechococcus read number compared to the total microbial community varied between both sampling sites during both timepoints (Fig. 1 and Table 1).

Figure 1
figure 1

Cross comparison of 31 Synechococcus oligotypes in Marchica and Oualidia lagoons in 2014 and 2015.

Table 1 V4–V5 Oligotype observation matrix.

We identified 31 Synechococcus oligotypes, which was equivalent to 95% of all Synechococcus reads analyzed. The most abundant representative Synechococcus reads were used for downstream analyses depending on the sampling location and date. Our phylogeny confirmed that Synechococcus strains are classified into ten different clades based on representative V4–V5 16S rRNA sequences (Fig. 2).

Figure 2
figure 2

Phylogenetic tree constructed from V4–V5 16S rRNA gene amplicon sequence representatives of Synechococcus clade and representative sequences for the most abundant oligotypes, both aligned using Muscle (version 3.8.4,17). Shown is a neighbor-joining rooted tree generated using the Geneious Prime 2021.1 software package (Biomatters Ltd, Auckland, New Zealand).

The identified clades were separated into three distinct subclusters (5.1, 5.3 and 5.2). Ten oligotypes belonged to clade III (subclade 5.1A) (O1, O15, O19, O21, O23, O24, O25, O29, O30, O31), six belonged to clade I (subclade 5.1B) (O6, O8, O9, O14, O16, O26), another six belonged to Clade 5.3 (O2, O5, O11, O20, O22, O27), two belonged to clade IV (subclade 5.1A) (O4, O12), another two belonged to clade VII (subclade 5.1B) (O3, O7), and only one oligotype belonged to clades II (subclade 5.1A) (O28), CB5 (subclade 5.2) (O10), WPC1 (subclade 5.1) (O17), VIII (subclade 5.1B)(O18), and IX (subclade 5.1B) (O13) (Table S1).

The eight abundant oligotypes in our dataset (i.e., represented by > 100 reads) shared more than 95% V4–V5 sequence similarity with each other (Table 2).

Table 2 Percent sequence similarity between V4 and V5 representative sequences for each oligotype.

Synechococcus comprised a higher fraction of the microbial population and showed higher relative abundances in the summer of 2014 in Marchica than in 2015. We placed the eight abundant oligotypes within a phylogenetic tree that included known Synechococcus strains (Fig. 2). Table 1 shows higher read counts of Synechococcus oligotypes in the Marchica Lagoon (n = 15,447), (n = 1,667) in the summers of 2014 and 2015, respectively, compared to Oualidia (n = 26), (n = 14) during both the summers of 2014 and 2015.

Interestingly, Oligotype 1 was strongly represented in both sampling lagoons (Table 1), where it comprised a larger segment of the overall Synechococcus community. In contrast to the 2014 summer community in Marchica, a few Synechococcus oligotypes decreased in 2015; for instance, O7 changed from 177 Synechococcus reads to not detected, and O8 changed from 119 Synechococcus reads to not detected. In Oualidia, we noticed the presence of clades III (O1), IV (O4), I (O6, O14, O26), and VII (O7) in 2014 in contrast with 2015, where clade VII was absent and only 5 Synechococcus oligotypes were identified: O1, O4, O6, O9 and O14.

Distribution of Synechococcus oligotypes

Although some oligotype distribution patterns within each sampling site clearly displayed some differences over both space and time, many oligotypes were shared as well. Network analysis allowed visualization of the specificity of the oligotypes and how they were distributed in Mediterranean Marchica and Atlantic Oualidia lagoons and further investigation of which factors influenced this distribution (Fig. 3).

Figure 3
figure 3

A network analysis of the oligotypes present at each sample. Each dot indicates an oligotype present in at least one sample, and each edge on the network connects an oligotype to one or more samples. Colored circles represent oligotypes present in only one, two, three, or all samples.

We identified oligotypes that were either found in one collected sample, shared by two samples, or present in all samples (Fig. 3). Oligotypes in OSD24-2014 accounted for the largest fraction (25/31), whereas oligotypes shared by the four samples made up a small portion of the total number (4/31). Most oligotypes were shared between both 2014 and 2015 samples of Mediterranean Marchica, in addition to some overlap with Atlantic Oualidia. Among all oligotypes, six were exclusively found in Marchica in 2014 (Fig. 3). Furthermore, the distribution of cooccurring oligotypes was clearly different in Marchica and Oualidia during the summers of 2014 and 2015. Cooccurring Synechococcus oligotypes tended to be well connected to each other (Table 2), showing strong correlations (e.g., O4 (clade IV) and O6 (clade I), = 0.98, O4 (clade IV) and O8 (clade I), = 0.99).

Environmental variables influencing oligotypes diversity

Following principle component analysis (PCA) of oligotype relative abundance, we observed that physicochemical factors in the lagoons correlated with oligotype co-occurrence patterns. The first principle component (PC1) captured 46% of the variance in oligotype relative abundance and discriminated oligotypes according to nitrate. The second principle component explained an additional 27% of the variation and discriminated oligotypes according to salinity and temperature.

The composition shift seen during the summer was supported by the statistical connections between dominating PCs and environmental variables. Notably, identified oligotypes in Marchica (O2, O3, O4, O5, O6, O7, O9, O10, O14, O15, O16, O17, O26) in 2014 and (O1, O2, O4, O5, O6, O9, O10, O14, O15, O16, O17, O26) in 2015 correlated with higher temperatures and salinities (26 Celcius, 27 Celcius, 35 ppt, 35 ppt, respectively), when oligotypes in Oualidia (O1, O4, O6, O7, O14, O26) in 2014 and (O1, O4, O6, O9, O14) in 2015 correlated with lower ones (21 Celcius, 20 Celcius, 27ppt, 29ppt respectively). Both oligotypes found in Marchica and Oualidia in 2014 correlated with a higher nitrate concentration (12 mg/l, 10 mg/l, respectively). However, those observed in 2015 correlated with a lower nitrate concentration (4 mg/l, 2 mg/l). Furthermore, Oligotype O1 in 2014 was spatially isolated, not affiliated with principal component.

Discussion

In a previous study18, we used bioinformatics tools to analyze the metagenome and the amplicon 16S sequences to gain an insight into microbial diversity in Moroccan lagoons, namely Marchica and Oualidia. 16S rRNA gene classification revealed a high percentage of bacteria in both lagoons. On average, bacteria accounted for 90% of the total prokaryotes in Marchica and ~ 70% in Oualidia. The five phyla that were the most abundant in both lagoons, Marchica and Oualidia, respectively, were Proteobacteria (53.62%, 29.18%), Bacteroidetes (16.46%, 43.49%), Cyanobacteria (0.53%, 34.35%), Verrucomicrobia (1.75%, 15.82%), and Actinobacteria (7.42%, 13.98%). At the genus level, we found that the highest assigned hits were attributed to Synechococcus, which was highly abundant in Marchica (32%) compared to Oualidia (0.07%) in 2014. This amount dropped to 22% in Marchica and 0.04% in Oualidia in 2015. Hence, in this study we performed the analysis of the Synechococcus genus community using oligotyping to investigate their dynamics and understand their co-occurrence and covariation in space and time within fragile ecosystems such as lagoons.

We may divide our results into two emerging Synechococcus communities: one dominated in 2014 and the other was less present in 2015, each composed of different cooccurring Synechococcus oligotypes. The abundant Synechococcus community in Marchica in 2014 consisted of clades I, 5.3, III, IV, and VII. These clades are typically found in either warmer or more oligotrophic environments19,20. This result is in accordance with Marchica's environmental characteristics; it is an oligotrophic ecosystem with high primary production and warmer water in summer21. The community included clades CB5 and WPC1 in Marchica 2014 and 2015 when the number of Synechococcus reads was lower. Strains belonging to the CB5 clade lack phycourobilin (PUB), contain one motile strain22,23, are present in temperate coastal waters and are prevalent in polar/subpolar waters24,25,26. WPC1 strains are observed in open-ocean and near-shore waters1,24,27. Clades IV and I usually co-occur and are more prevalent in cold coastal waters19,28,29,30. Interestingly, Clade III was prominent in Marchica. This clade is known to be motile and restricted to warm, oligotrophic water19,20,30. Although at a smaller read number, clade III was also observed in Oualidia, where the temperature is cooler compared to Marchica. Furthermore, we found that clade III growth has been shown to be severely affected at low temperatures30. Moreover, representatives of both clades I and IV were present in Oualidia in both the summers of 2014 and 2015. Some Synechococcus strains, which are known to prefer cooler water temperatures and salinities, were in higher relative abundance in the waters of Marchica. This result agrees with a previous study showing that Synechococcus isolates of clades I and IV exhibited temperature preferences31. Their growth rates were marginally lower at low temperatures in strains from clades I and IV, which were dominant in temperate regions.

Nitrate levels are typically low or undetectable in these lagoons, which allows the persistence of clades that would not typically thrive in coastal waters at other times of the year. In 2014, the nitrate concentration was higher than the average of 10 mg/l, which could be due to increased agricultural activities and wastewater treatment plant effluent21. The decreasing nitrate concentration in Marchica in 2015 could be explained by the newly installed inlet in 2010, which was designed to improve water exchange with the open sea and reduce the amount of suspended matter21. Temperature and salinity have a large effect on nitrate in marine ecosystems32; the highest nitrate degradation rates were observed at 35 °C and at increasing salinity rates. Therefore, we expected to see correlations between salinity, temperature and nitrate concentrations. Interestingly, clades CB5 in Marchica and IV in Oualidia increased in relative abundance in summer 2015 compared to 2014, when the nitrate concentration decreased. Moreover, the Synechococcus microbial community diversity and density are variables depending on the variations in the physical and chemical parameters. These parameters are strongly influenced by the marine waters passing through the artificial inlets, which have an impact on the internal hydrodynamics of both lagoons and hence the distribution and co-occurrence of Synechococcus strains. In addition, anthropogenic activities also have a great influence on Synechococcales population growth and interactions with their viruses33,34.

This study revealed some differences between Marchica and Oualidia in identified Synechococcus clades. The Marchica lagoon showed more heterogeneity (clades I, II, III, IV, VII, VIII, 5.3, WPC1, CB5, and IX) than the Oualidia lagoon, where fewer clades were identified (I, III, IV, and VII). There was a clear variation in the pattern of correlation between oligotypes of the same or different clades for both the 2014 and 2015 samplings. Furthermore, we observed complex patterns of co-occurrence among oligotypes; in 2014 (clades I, III, IV, 5.3, VII), and in 2015, we found clades CB5 and WPC1. In Oualidia, values decreased in comparison to Marchica in both 2014 and 2015 summer samplings, following a pattern of co-occurrence, especially for both clades I and IV in both sampling years. Many studies have shown that the relative proportions of cooccurring Synechococcus populations to each other at the clade and subclade levels vary in space and time based on environmental factors such as seasonal temperature fluctuations, nutrient availability and upwelling, circulation patterns, and abundance of other phytoplankton8.

We presume that the greater variability in oligotype co-occurrence behavior observed in Marchica Lagoon, especially in the summer of 2014, could be due to the higher abundance and diversity of Synechococcus oligotypes, physico-chemical parameter fluctuations or rehabilitation of the lagoon.

Less abundant oligotypes could also be considered potential bioindicators of Synechococcus genetic diversity. Their seasonal occurrence might contribute to changing ecological and biogeochemical characteristics of the marine environment35. The Synechococcus relative abundance count revealed that the Marchica Synechococcus community included the least abundant oligotypes in 2015. For instance, O7 and O8 were detected in 2014 and were absent in 2015 (Table 1). It is unclear which factors served to constrain the relative abundances of these least present oligotypes, but temperature and salinity could have an impact on their distribution in Marchica (Fig. 4) and the opposite for Oualidia, which are cooler-temperature adapted ones. We noticed that the relative abundance of cooccurring Synechococcus was not constant. For instance, oligotype 4 belonging to Clade IV showed higher values in summer 2014 (974 reads) in Marchica compared to summer 2015 (319 reads), and the opposite was observed in Oualidia, with a lower abundance compared to Marchica. Increased values of cooccurring clade I oligotypes (14, 26, and 6) were detected in the summer of 2014 in both lagoons.

Figure 4
figure 4

Principle component analysis of Synechococcus oligotype relative abundance. The plot is generated using the relative abundance of each oligotype, T temperature, S Salinity, and NO3− Nitrate. Each point represents an oligotype. Colors represent the year of sampling; red for 2014 and blue for 2015. The shape of point indicates the sampling site; rounded points refer to Marchica lagoon, and triangles refer to Oualidia. Circles represent the normal distribution of oligotypes; the red circle refers to 2014, and the blue one refers to 2015.

In comparing our results with a study from Little Sippewissett Marsh (LSM)8 that used oligotyping to investigate the distribution of the genus Synechococcus in space and time sequencing the V4-V6 hypervariable region of the 16S rRNA gene, we found 31 oligotypes, while they identified 12. In both studies, the proportion of Synechococcus oligotypes increased in summer and in coastal waters compared to estuaries. In addition, Clades I and IV were more abundant in saline conditions, such as Marchica Lagoon. However, these clades were found in greater relative abundances at cold temperatures, in contrast to our study, where they were identified in Marchica's warm waters. Moreover, clade CB5 tended to be prominent at relatively warm temperatures (17–20 °C)6. In our work, it was not prevalent either in cooler or warmer water. Notably, the relative abundance of rare oligotypes was higher in warm hypersaline estuary waters8,18, while in our case study, they occurred in cooler moderately saline Oualidia waters.

The dominance of a certain clade could have many different ecological ramifications, especially as the clades can be incredibly diverse in their growth, loss, nutrient utilization and other attributes. The dominant clade's growth and loss patterns will set the stage for the population dynamics. For instance, if the dominant clade only blooms in a given environmental factor such as temperature, light, or salinity, it will then affect the timing of blooms, and follow-on the effects of subsequent grazing, lysis or even biogeochemical cycling. Even if the population is diverse, the dynamics as a whole will be a composite response of each individual clade's ecophysiology, making it important to understand their composition and how it changes over space and time.

While the rpoC1 gene is a higher resolution diversity marker36, 16S amplicon data can be used for exploring the entire bacterial assemblage including Synechococcus clade designations via oligotyping35. The latter has a great advantage in answering unexplained diversity contained in taxa using 16S rRNA gene sequences. Nevertheless, it has some limitations, as it acts optimally only when performed on taxa that are closely related. Regarding distantly related taxa, the high number of increased-entropy locations makes the supervision steps difficult. In addition, although oligotyping does not rely on clustering conditions or availability of existing reads within reference databases, it demands preliminary operational taxonomic unit clustering to find closely related species appropriate for the analysis. This method is under continuous improvement to better exploit the information within subtle variations in 16S rRNA gene sequences5.

In conclusion, we explored the patterns of Synechococcus diversity in space and time using an oligotyping approach to examine these populations in lagoon waters of Mediterranean Marchica and Atlantic Oualidia, in Morocco. Patterns that have been observed at the clade and subclade levels, such as Synechococcus, relative abundance and the co-occurrence of groups from different clades, were shown to occur among oligotypes. The Marchica Lagoon showed a heterogeneous Synechococcus diversity compared to Oualidia in summer 2014. Thirty-one Synechococcus oligotypes were identified. Two distinct communities emerged in the 2014 and 2015 summer samplings, abundant and rare Synechococcus species, each comprising cooccurring Synechococcus oligotypes from different clades. Network analysis showed that six oligotypes were exclusive to Marchica Lagoon. The identified clades I, III, IV, VII, and 5.3 in Marchica were in accordance with its environmental characteristics. In addition, the relative abundance of some cooccurring Synechococcus strains was not constant over time and space (e.g., clades I and IV). Using gene oligotyping, we illustrated some of the challenges associated with the identification of novel Synechococcus strains or studied their co-occurrence in space and time. Oligotyping has been instrumental in discriminating closely related Synechococcus strains. However, this study leaves open questions about how samples differ by location and whether locations differ from year to year. Do cooccurring oligotypes interact with each other and to what extent do they correlate with physicochemical parameters? What triggers the coexistence of clades I and IV with clade III in warm water or 5.3 with VII, which do not know much about. Finally, how do relative abundances change over seasons. Hence, future work needs to consider additional stations and seasons to provide better statistical support for our findings and to better understand their correlation with physical and chemical environmental parameters. Other factors were not considered in this study, such as nutrient availability, chlorophyll, irradiance, viral lysis, and greater sequencing depth, which could also influence the observed seasonal dynamics.

Methods

Sampling and sequencing

Samples were collected from Marchica Lagoon (N 35.11562, W 2.52803) and Oualidia Lagoon (N 32.74675, W 9.036667) on June 21st, 2014, and 2015, boreal summer solstice, as part of the Ocean Sampling Day (OSD) campaign (Fig. 5). Approximately 20 L was collected using a 10% acid-washed bucket and then sequentially filtered onto a 0.22 μm pore size Sterivex and frozen at − 80 °C until DNA extraction. Metadata (temperature, salinity, and nitrate) were measured and uploaded into https://github.com/MicroB3-IS/osd-analysis/wiki/Guide-to-OSD-2014-dataaccessec on 1 December 2021.

Figure 5
figure 5

Sampling locations: (A) Marchica Lagoon (Image©2022Google) and (B) Oualidia Lagoon (Image GeoEye from Google Earth, 2014).

DNA was extracted using the Power Water isolation kit (MoBio, Carlsbad, CA, USA) following the manufacturer’s instructions. Amplification of the 16S rRNA gene was performed using the primer pair, designated: 515F-Y (5’-GTGYCAGCMGCCGCGGTAA-3’) and 926R (5’-CCGYCAATTYMTTTRAGTTT-3’)37. The Illumina libraries were prepared using the NuGEN Ovation Rapid DR Multiplex System 1–96. Amplicon gene sequencing (2 Å ~ 250 paired ends) was performed with Illumina MiSeq using V3 chemistry. Samples were sequenced in eight MiSeq runs (2 × 300 bp), which generated 2 × 40,000 amplicon reads per sample.

Data processing

Raw sequencing reads were preprocessed as described in the OSD workflow (github.com/MicroB3-IS/osd-analysis/wiki/Sequence-Data-Preprocessing_accessed on December 1, 2021), which produced "workable" amplicon fasta files. We used VAMPS38 to process 16S rRNA gene sequences, where taxonomy assignment was performed using Global Alignment for Sequence Taxonomy (GAST)39 and the SILVA rRNA gene reference database40. The obtained files include the reference ID, the taxonomy assigned, and the source of the taxonomy.

Oligotyping

For Synechococcus investigation, we used 16S rRNA gene oligotyping as described in5. This method is based on a supervised algorithm that identifies microdiversity using 16S rRNA gene sequences. Oligotyping is unlike regular taxonomic classification based on available reference databases available sequences or cluster analysis based on the selection of the similarity threshold. This technique tackles the taxonomic resolution limitation by finding the most information-rich nucleotide positions (i.e., oligotypes). Sequences identified as Synechococcus were extracted from the Vamps database. We aligned Synechococcus reads using PyNAST41. Of the 22,387 sequences identified as Synechococcus, 17,941 remained after quality filtration and Pynast alignment. The mean length of Synechococcus reads was 254 bp. Next, we removed the uninformative gaps in the resulting aligned sequences using the “o-trim-uninformative-columns-from-alignment” script. Subsequently, we calculated the entropy of each nucleotide position within the oligotype package. After the initial calculation of Shannon entropy using the “analyze-entropy” script, we ran 16S rRNA oligotyping for the Synechococcus genus until each oligotype had converged. Uninformative nucleotide positions were excluded. Seven nucleotide positions were used in total to define each oligotype, and to minimize the impact of sequencing errors on oligotyping results, we used a “minimum substantive abundance” criterion (M) of 5; thus, an oligotype was not included if the most common sequence for that type occurred less than five times. To reduce the noise, each oligotype was required to appear in at least one sample but was not required to comprise a certain percentage of reads or represent a minimum number of reads in all samples combined. We removed any oligotypes that did not meet these criteria from the analysis. The final number of quality-controlled oligotypes revealed by the analysis was 31 and represented 95% of the total Synechococcus reads. For each oligotype, the oligotyping pipeline chose the most abundant read as the representative sequence to be used for downstream analyses. Upon completion of oligotyping analysis, the resulting “observation matrices” are concatenated to generate a single “observation matrix” for our V4-V5 dataset. These observation matrices report counts, which are the number of reads assigned to each oligotype in each sample (Table 1). We then converted counts to percent abundances within each sample and used these normalized relative abundances for subsequent analyses. We searched the most biologically relevant representative sequence of our oligotypes using blastn version 2.2.26 to assign taxonomy for each oligotype. We kept default parameters, except ‘per. identity 100’ to have hits with 100% sequence identity reported.

Oligotype network analysis

We performed network analysis using Gephi software, version 0.9.242, to determine the distribution of all Synechococcus oligotypes from both lagoons using a force-directed graph algorithm (ForceAtlas2 in Gephi software). Every dot identifies an oligotype present in at least one sampling site, and each edge on the network connects an oligotype to one or more sampling sites.

Clade identification

We designated a clade for each oligotype’s representative sequence by matching this latter to a key reference database of 16S rRNA gene sequences from cultured Synechococcus6. Synechococcus sequences downloaded from NCBI GenBank clade classifications were obtained from the following sources4,6. We added the representative sequences for each oligotype to the Synechococcus database, and we aligned them with Muscle (version 3.8.4,17). We used exact matches between each oligotype Synechococcus sequence and the Synechococcus sequence database to infer clade designation.

Statistical analyses

To group oligotypes statistically, we computed a principal components analysis (PCA) using R package “ggfortify” with respect to a sample matrix of Synechococcus oligotype reads normalized to total Synechococcus reads for that sample. Each oligotype was projected onto the first two PCs of the matrix. To investigate environmental correlates of oligotype grouping, multiple regressions of each of the first two PCs were computed against the three environmental factors, which are the water temperature, salinity, and the concentration of nitrate.