Introduction

S. japonica is brown seaweed originally distributed along the coast of the Japan Sea and the Okhotsk Sea that is cultivated on a large scale1,2,3. It is typical giant kelp with a life history alternating between macro-sporophytes and micro-gametophytes4. Gametophytes can be easily isolated and preserved in the laboratory and used to produce new hybrids. The sporophyte of S. japonica is composed of a blade (frond), stipe and holdfast. After one year of cultivation in China, blade length is usually 2~4 meters, and the width is approximately 20~50 cm. This kelp is used mainly as human food, marine animal feed, and raw material for extracting alginate, mannitol and iodine5,6.

A total of 5.6 million tons (wet weight) of S. japonica was produced worldwide in 2012, mainly in China7. Although cultivation areas can be enlarged to some extent, kelp production in China relies partly on the application of newly developed varieties to increase productivity7. In breeding, the kelp blade length and width are always targeted for selection. Since the 1950s, many new kelp varieties with longer or wider blades have been obtained, such as “Haiqing No. 2”8, “860”9, “901”10. “Huangguan No. 1”11, “Dongfang No. 7” and “Rongfu”12,13. Nevertheless, the breeding process is often costly and labor intensive, and the development of one new kelp variety generally takes 5~6 years or more14. For efficient breeding, quantitative inheritance of blade length, blade width and stipe length has been studied in S. japonica15,16,17. However, little is known about the individual genes and their networks in relation to most quantitative traits, especially yield-related blade length and width.

Recently, attempts have been made to use molecular markers to construct genetic linkage map and perform quantitative trait locus (QTL) mapping for S. japonica18,19,20,21,22. In the two QTL mapping studies of S. japonica to date, several QTLs have been mapped for kelp blade length and width, raw weight and base shape20,22. The linkage maps were mainly constructed with moderate-throughput DNA markers, such as AFLPs and SSRs, with average genetic distances ranging from 6.7 cM to 7.92 cM. To improve QTL mapping efficiency for S. japonica, high-density genetic linkage maps are desirable, and these are best constructed using SNP markers.

SNP markers are di-allelic and represent the smallest type of genetic variation in genomes; they have been extensively applied for genetic diversity detection, genetic linkage map construction, QTL mapping, and GWAS23. Due to the comparatively high cost of resequencing, several methods such as RAD-seq (restriction associated DNA sequencing), 2b-RAD, and GBS (genotyping by sequencing), have recently been developed to discover additional SNPs and genotype large sample sets for non-model species24. SLAF-seq (specific-locus amplified fragment sequencing) is another such method and has a better balance of cost-efficiency and SNP detection than do other methods25. Recently, SLAF has been used to construct genetic linkage maps for other economically seaweeds, such as Pyropia haitanensis26, Undaria pinnatifida27. Zhang et al.28 constructed a genetic linkage map for S. japonica with an average genetic distance of 0.36 cM based on SNP markers. However, it was used to map a sex determination locus in S. japonica and not to map QTL for blade length and width28.

In this study, we conducted QTL mapping for S. japonica’s blade length and width based on a high-density genetic linkage map constructed with SNP markers that we developed using SLAF-seq. Based on a new S. japonica draft genome sequence (MEHQ00000000), fourteen Tic20 genes and three peptidase genes were found to be potential candidate genes associated with blade length and width in S. japonica. Our aim is to combine the collinearity of the high-density genetic linkage map with the reference genome to identify candidate genes for the yield-related blade length and width traits of S. japonica.

Materials and Methods

Construction of BC1F2 mapping population

The kelp germplasm used in this study was maintained at the Key Laboratory of Experimental Marine Biology of Institute of Oceanology, Chinese Academy of Sciences, Qingdao29. The “860” variety of S. japonica and the species S. longissima were selected as parents for construction of the mapping population. Variety “860” has ever been used extensively for cultivation and has a blade length of 240.2 ± 23.1 cm and a blade width 25.4 ± 3.0 cm10. S. longissima has a longer blade of usually 7~8 meters; thus, the species has often been used to breed new hybrids11,30. In brief, the male gametophyte clone of “860” was crossed with the female gametophyte clone of S. longissima to produce the F1 hybrid generation. Sporelings were cultured in the laboratory until the blade length reached 1~2 cm, after which the seedlings were moved to sea cultivation. When the F1 hybrids reached maturity in August, female and male gametophyte clones were isolated from one F1 hybrid sporophyte and preserved. Then, the “860” male gametophyte was crossed with F1 hybrid female gametophyte clones to develop the BC1F1 population. One mature sporophyte was chosen from the BC1F1 population and then self-fertilized in the laboratory to produce the BC1F2 mapping population, which was cultivated in the sea (E 122°62′, N 37°22′). During the cultivation, the blade length and width of 178 individuals were measured on April 17th, May 8th, May 22th and June 9th. The blade length was measured from the top of the stipe to the blade tip. The width was measured at the widest part of the blade.

DNA extraction

The gametophytes from the male parent, “860”, and the female parent, S. longissima, as well as the 178 sporophytes from the BC1F2 mapping population were used for genomic DNA isolation. Each sample was ground into a powder using a high-speed homogenizer, and genomic DNA was extracted and purified according to the improved CTAB (cetyltrimethyl ammonium bromide) method31. The purity and concentration of the extracted DNA were determined using both an ND-1000 spectrophotometer (NanoDrop, Wilmington, DE, USA), and electrophoresis in an 0.8% agarose gel with a lambda DNA standard marker.

SLAF library preparation and sequencing

The parental gametophytes and 178 sporophyte progeny were genotyped using the reported SLAF method with minor modifications25. The draft genome sequences of the kelp were in silico digested (GenBank Accession Number: MEHQ00000000). HaeIII and RsaI were selected as the fittest restriction endonucleases for SLAF analysis. For SLAF library construction, the extracted genomic DNA was first incubated at 37 °C with HaeIII (New England Biolabs, NEB), T4 DNA ligase (NEB), and HaeIII adapter. Then, RsaI was added for additional digestion at 37 °C. The reaction-ligation reactions were inactivated by incubating at 65 °C. Then, PCR was performed using diluted restriction-ligation samples, dNTPs, Taq DNA polymerase (NEB) and PCR primers containing barcode 1. The PCR product were purified with Agencourt AMPure XP beads (Beckman Coulter, High Wycombe, UK) and pooled. The pooled sample was purified with a Quick Spin column (Qiagen) and then separated by electrophoresis on a 2% agarose gel. Fragments with sizes of 314~414 bp were excised and purified using a gel extraction kit (Qiagen, Hilden, Germany). Then, the purified products were subjected to PCR amplification with Phusion Masters Mix (NEB) and Solexa Amplification primers mix to add barcode 2. Finally, paired-end sequencing of 2 × 100 bp was performed on the selected SLAFs using an Illumina high-throughput sequencing platform (Illumina, Inc., USA). After sequencing, the Q30 and nucleotide compositions of the raw data were checked for sequencing quality.

Development of SNP markers from the SLAF reads and genotyping

The separate fastq files of SLAF reads for each kelp DNA sample were aligned to the draft reference genome of S. japonica (MEHQ00000000) via the program BWA32, with a maximum of 4% sequence mismatch allowed. The alignments were transformed to the SAM (sequence alignment map) format, combined and assigned into one master BAM (binary file of SAM) format using SAMtools 0.1.1833. SNPs were detected with the genome analysis toolkit (GATK) using the BAM file; detailed manual for GATK analysis can be found at the website: https://software.broadinstitute.org/gatk/guide/.

The alleles belonging to each SNP marker were evaluated to determine minor allele frequency. For mapping populations of diploid sporophytes of kelp, one SNP locus can contain no more than four genotypes; thus, loci with more than four putative alleles were considered likely to originate from repetitive genome regions and excluded from the subsequent analysis. Therefore, only those SNP loci with 2~4 SLAFs were considered potential markers for construction of linkage maps in kelp. These polymorphic SNP markers were coded into eight segregation patterns as aa × bb, ab × cd, ef × eg, hk × hk, lm × ll, nn × np, ab × cc and cc × ab. Here, the segregation pattern aa × bb was used for genetic linkage map construction.

Construction of linkage map and detection of QTLs for blade length and width

Before construction of the linkage map, the SNP markers were filtered for quality, requiring genotype calls at each SNP to have a depth of coverage of at least four reads in each individual kelp sporophyte and simultaneously excluding markers of significant segregation distortion (P < 0.05). Moreover, the segregation distortion for the SNP markers used for construction of linkage map was also examined with the software DistortedMap34.

HighMap was used for genetic linkage map construction. Pairwise modified logarithm of odds (MLOD) scores were used to partition mapping markers into LGs (linkage groups), and those markers with MLOD values < 5 were excluded from marker ordering. For the iterative process of marker ordering, a combination of enhanced Gibbs sampling, spatial sampling and simulated annealing algorithms was applied. The SMOOTH strategy was adopted for error correction according to the parental genotypes. The Kosambi mapping function was used to calculate the map distances between markers. Moreover, a heat map and a haplotype map were constructed to assess the quality of the genetic map35,36.

QTLs underlying blade length and blade width were mapped using QTL IciMapping 4.1.0.0 (http://www.isbreeding.net) with the inclusive composite interval mapping module37. QTLs with additive and dominance effects were examined with the method ICIM-ADD. The parameters for ICIM-ADD were set as a 1.0 cM scan step, a 0.001 probability used for stepwise regression analysis, 2.5 used for the LOD threshold score for significant QTLs for each trait. Furthermore, QTLs underlying blade length and blade width were also detected with the software QTL.gCIMapping from the R website38.

Comparative mapping of candidate genes in detected QTLs

SLAF reads corresponding to the SNP markers in detected QTLs were used in a BLAST analysis against the kelp draft genome sequence (MEHQ00000000). The candidate genes were annotated and analyzed via the NCBI database (http://www.ncbi.nlm.nih.gov).

RNA extraction and RT-PCR analysis

Blades of the parental sporophytes were collected on the four observation days for RNA extraction. Total RNA was extracted following the method of Yao et al.39. The extracted RNA was treated with RNase-free DNase I (TaKaRa, Dalian, China) to remove residual genomic DNA. First-strand cDNA was prepared with the PrimeScript II cDNA synthesis kit (Takara, Tokyo, Japan) according to the manufacturer’s instructions and then stored at −80 °C.

Transcriptional analysis of the four Tic20 genes from the parental sporophytes was conducted with real-time quantitative PCR (RT-qPCR). Primer pairs specific to the UTRs (untranslated region) of each of the four Tic20 genes, were used to amplify one 150 -bp amplicon per gene (Table S1). The β-actin primers qActin-F (5′-GACGGGTAAGGAAGAACGG-3′) and qActin-R (5′-GGGACAACCAAAACAAgggCAggAT-3′) were designed as internal controls.

RT-qPCR was performed with the SYBR Premix Ex Taq II kit (Takara, Tokyo, Japan) on the TP800 Thermal Cycler Dice (Takara, Tokyo, Japan). Thermal cycling protocol and data analysis were as previously reported40.

Results

SLAF analysis and SNP marker development and genotyping

In total, 364.11 M paired-end reads with a length of 200 bp were obtained from the 51.1 Gb of raw data from the SLAF library. The Q30 ratio was 87.04%, and the guanine/-cytosine content was 46.48%. In total, 13,567,809 reads were generated from the male parent (1.9 Gb of data) and 7,318,321 reads from the female parent (1 Gb). The average read number of the 178 individual sporophytes was 1,831,165. After the reads were mapped to the draft genome of kelp using BWA, we found that approximately 67.73% of total reads from the male parent, 44.18% from the female parent and 79.48% on average from each individual in the mapping population could be mapped to specific regions of the presumed chromosome regions. Ultimately, 1,365,570 polymorphic SLAFs were developed for the parents and the mapping population, of which 443,249 SLAFs were from the male parent, 226,614 from the female parent, and an average of 226,366 from the 178 BC1F2 individuals ranging from 117,848 to 587,670. Therefore, the average coverage for each SLAF marker was 30.6-fold in the male parent, 32.3-fold in the female parent, and an average of 8.0-fold in each individual of the mapping population.

Using SAMtools and GATK for realignment and detection, we identified a total of 1, 092, 944 SNPs from the 364.11 M original reads in the parent gametophytes and 178 BC1F2 sporophytes, of which 507, 994 SNPs appeared in the male parent, 330, 843 SNPs in the female parent, and an average of 292,415 in each of the 178 individuals of the mapping population.

After the low-depth SNPs were filtered out and the parental lines assigned different letters of alphabet to indicate their genotypes, we assessed SNP segregation patterns and found that 97, 189 SNPs out of 1, 092, 944 total polymorphic SNPs were successfully encoded and grouped into four segregation patterns (aa × bb, nn × np, lm × ll, hk × hk). Since the two parents were haploid gametophytes with genotypes of a and b, we selected 75,024 markers that followed the aa × bb segregation pattern for the subsequent linkage analysis.

Genetic linkage map construction

After incomplete markers and those showing significant segregation distortion markers were removed from the selected 75, 024 SNP markers, 7627 SNPs were finally retained for genetic map construction. All 7627 markers were assigned to the 31 LGs of kelp using HighMap (Fig. 1). The results from the software DistortedMap also showed that most of SNP markers in the 31 LGs have no segregation distortion, and the details of DistortedMap analysis for LGs 10, 11, 14 and 15 were listed in supplementary Excel file. The details of the SNP based genetic linkage map are listed in Table 1. The total genetic length of the SNP map was 5232.42 cM. The genetic distances of the 31 LGs spanned 108.51 cM (LG1)~199.40 cM (LG6), with mean marker intervals ranging from 0.42 cM (LG23) to 2.53 cM (LG27). The average distance between adjacent markers was 0.69 cM. The largest LG (LG6) contained 374 SNPs and was 199.4 cM long, while the smallest LG1 had 59 SNPs and was 108.51 cM long. Each LG had an average of 246 SNP markers and was 168.79 cM long. The maximum gap in each LG ranged from 4.09 cM to 15.96 cM; In most cases, the maximum gap was between 4 cM and 7 cM, although six LGs had maximum gaps of over 10 cM.

Figure 1
figure 1

Genetic lengths and marker distribution of 31 linkage groups in genetic map of the kelp.

Table 1 Summary statistics of the genetic map of S. japonica.

The quality of the genetic map was evaluated with heat maps and haplotype maps generated for each LG. The heat maps showed that there were no ordering errors detected in the 31 LGs of kelp. The heat map for LG6 is provided as additional Fig. S1. The haplotype maps confirmed that most of the recombination blocks in each LG were clearly defined. The haplotype map for LG6 is provided as additional Fig. S2.

QTL analyses for blade length and blade width

Based on the high-density SNP linkage map and the phenotype data (Fig. 2; Table 2), we conducted QTL analyses of the blade length and width in kelp on the four days observed: April 17th, May 8th, May 22th and June 9th (Table 3). QTLs detected with LOD values above 2.5 using 1000 permutations were recognized as significant intervals. In total, 12 QTLs were detected for blade length and 10 for blade width (Table 3). Nevertheless, the total number of QTLs was only 18, indicating that some QTLs are likely pleiotropic. For example, the intervals between marker57879 and marker57878, and from LG7 was simultaneously detected for blade length and blade width on April 17. The QTL interval between marker52375 and marker52366 (LG15) was detected for blade width on May 22th and June 9th. In addition, the QTL interval between marker26129 and marker26127 (LG30) was detected for at least one trait on three observation dates. To assess the additive effects of each QTL in the kelp, the parental origins of the alleles at each QTL were evaluated. For qL4-5, the additive allele for increasing blade length originated in the female parent, S. longissima, while the additive allele of qW5-7 for increasing blade width originated in the male parent, “860”. Similar relationships were also found for qL10-4 and qW7-15. Interestingly, the allele from the female parent, S. longissima, could increase blade width at qW12-30 detected for width on June 9th, despite the fact that S. longissima has a narrow blade. Additionally, the allele from the male parent, which has a shorter blade, could increase blade length at qL6-24, which was mapped for blade length on April 17th. In addition, some QTLs were detected with obvious dominant values, which indicated that significant dominance existed between the alleles at these QTLs. The phenotypic variance explained by each designated QTL ranged from 5.37% to 18.69%. The most influential QTL interval was between marker26422 and marker53425 in LG24, accounting for 18.69% of the observed phenotypic variance with a LOD value of 5.53 (Table 3). The least effective QTL interval was between marker8613 and marker37947 in LG20 on May 22th, accounting for only 5.37% of the observed phenotypic variance with a LOD value of 2.52. The confidence intervals spanned by each mapped QTL ranged from 0.28 cM to 4.68 cM, with an average of 1.21 cM. The smallest interval was between marker26129 and marker26127 in LG30, and the largest interval was between marker5803 and marker5890 in LG2. The QTLs underlying the blade length and the blade width were also examined with the software QTL.gCIMapping from the R website, which adopted different genetic model and different statistical method for the QTL analysis from that in software QTL IciMapping, and the results were listed in Table S2. Although many QTLs detected by QTL.gCIMapping were different from that in Table 3, some QTLs still remained the same, such as the QTLs 1, 2, 3 for the blade length at April 17th in Table S2.

Figure 2
figure 2

Frequency distribution patterns of the blade length and the blade width of S. japonica × S. longissima BC1F2 population at four serial observation days. (A) The blade length frequency distribution of S. japonica × S. longissima BC1F2 population measured at April 17th. (B) The blade width frequency distribution of S.japoncica × S. longissima BC1F2 population measured at April 17th. (C) The blade length frequency distribution of S. japonica × S. longissima BC1F2 population measured at May 8th. (D) The blade width frequency distribution of S. japonica × S. longissima BC1F2 populaltion measured at May 8th. (E) The blade length frequency distribution of S. japonica × S. longissima BC1F2 populaltion measured at May 22th. (F) The blade width frequency distribution of S. japonica × S. longissima BC1F2 populaltion measured at May 22th. (G) The blade length frequency distribution of S. japonica × S. longissima BC1F2 populaltion measured at June 9th. (H) The blade width frequency distribution of S. japonica × S. longissima BC1F2 populaltion meausred at June 9th.

Table 2 Summary statistics of the blade length and width in BC1F2 family on four observation days.
Table 3 Details of QTLs detected for blade length (FL) and blade width (FW) in S. japonica on four observation days.

Candidate genes associated with blade length and blade width

Through comparison mapping, candidate genes were identified in the QTL intervals between marker26422 and marker53425 in LG24 and between marker26129 and marker26127 in LG30. Fourteen Tic20 genes were annotated between marker26422 and marker53425 in approximately 190 kb (208307~398831 in scaffold 192), and three peptidase genes were annotated between marker26129 and marker26127 in approximately 53 kb (210011~263935 in scaffold 182) (Table S3). The collinear relationship between LG30 and the reference genome including scaffold 182 was evaluated (Fig. S3). The proteins encoded by the Tic20 genes composed the TOC and TIC (translocon at the outer/inner envelope membrane of chloroplasts, respectively) complexes, which are responsible for translocating precursor proteins synthesized in the cytosol across or into the double chloroplast membrane envelope41. The enzymes encoded by the three genes of peptidases s8 and s53 belong to clan SB serine peptidases42. Transcriptional analysis of four selected Tic20 genes in the parental sporophytes was performed with RT-qPCR on the four growth dates; the results showed that the expression of these genes was different not only between the parent sporophytes but also between the four growth dates (Fig. 3). At the third observation day, for male parent “860” of S. japonica, each of the four Tic20 genes had the highest expression level and the expression level of Tic20-1 was significantly higher than that of female parent S. longissima (16.3 fold, P < 0.01). For Tic20-1, the expression level at the third observation day was significantly higher than that of the second observation day in “860” of S. japonica (7.9 fold, P < 0.01).

Figure 3
figure 3

The expression pattern of four Tic20 genes in parent “860” and S. longissima on four observation days 1(April 16th), 2(May 6th), 3(July 2th), 4(July 24th). (a) Tic20-1 (b) Tic20-5 (c) Tic20-10 (d) Tic20-12. * and ** indicate significance level of P < 0.05 and 0.01, respectively.

Discussions

The collinearity of a high-density genetic linkage map with a reference genome can be used to enable the fine mapping of quantitative traits. In soybean, one map consisting of 5785 SNPs was constructed to identify the genes encoding isoflavone biosynthetic enzymes, with the detected QTL then comparatively mapped to the soybean reference genome43. Similarly, a high-density genetic map with 3441 SNP markers was generated for apple, and 80, 64, 17 genes related to fruit weight, fruit firmness and fruit acidity, respectively, were detected, after the relevant QTLs were comparatively mapped to the complete apple genome44. A comparative mapping strategy based on next-generation sequencing has also been attempted for localization of one “mut” locus for impaired branching pattern in Ectocarpus siliculosus45. Here, in S. japonica, we constructed a high-density genetic linkage map based on 7627 SNPs with an average genetic distance of 0.69 cM. We used this map to perform fine mapping for QTLs related to blade length and blade width in S. japonica. Eventually, 14 Tic20 genes and three peptidase s8 and s53 genes were identified as being associated with blade length and width in S. japonica. Thus, the colinearity of high-density genetic linkage maps with reference genomes can be a feasible and efficient way to identify candidate genes for quantitative traits.

A genetic correlation was detected between blade length and blade width in S. japonica in this study. The correlation coefficient between these two traits ranged from 0.80 to 0.92 (Table S4). Similarly, Wang (1984) reported that the phenotype correlation coefficient between blade length and blade width in S. japonica ranged from 0.81 to 0.9516. These results provide further evidence that phenotypically correlated traits are often mapped together, such as frond length and width in Pyropia haitanensis26 and kernel number per spike and thousand-kernel weight in wheat46. Liu et al.20 believed that the QTLs determining blade length and width in S. japonica were linked and located in the same chromosome region, but these authors did not detect a close correlation between blade length and blade width QTLs in their study20. In contrast, our study found that many QTLs determining both blade length and width were mapped to the same areas, such as the interval between marker57879 and marker57878 (Table 3). This indicated that the close phenotypic correlation of blade length and width in S. japonica was due mostly to genetic correlation, although environmental influences may also play a role. Due to the positive genetic correlation of blade length and blade width in S. japonica, selection for blade length could cause concurrent changes in blade width. Therefore, the identified locations of QTLs controlling both blade length and blade width in S. japonica should provide markers for more efficient marker-assisted selection (MAS) for these two traits in future breeding.

The genetically derived phenotypic correlation between blade length and width may be attributed to pleiotropy or to linkage disequilibrium47. Even though some QTLs for blade length and blade width were co-localized in the same genomic regions in this study, the possibility that the genes in the mapped QTLs are pleiotropic for these two traits must be further verified. The effects observed thus far might instead be due to linkage disequilibrium between different genes determining these two traits.

Fourteen Tic20 genes and three peptidase s8 and s53 genes were annotated to the QTL intervals between marker26422 and marker53425 in LG24 and between marker26129 and marker26127 in LG30, respectively. However, this finding does not exclude that the existence of other genes or regulators associated with blade length and blade width in S. japonica because the data used in this study were incomplete; the kelp genome contains some gaps in the contigs, and the correct assembly of some scaffolds remains uncertain.

The genes identified from the peptidase s8 and s53 gene families may be candidate genes for selection and breeding in kelp. In rice, one GS5 QTL was found to encode a putative serine carboxypeptidase belonging to the peptidase S10 family that contained a promoter region reported to be associated with grain width48. The peptidase s8 and s53 gene families are believed to have undergone a large evolutionary expansion in brown algae, and population genomic studies have showed that these gene families were artificially selected during the domestication of S. japonica49. Further research should explore the direct relation of these genes to blade length and width.

The expression differences in four Tic20 genes in the parental genotypes on the four cultivation dates were examined (Fig. 3). Whether these Tic20 genes are involved in determining blade length and blade width in S. japonica, remains to be verified. In the future, NIL (near isogenic lines) or CSSL (chromosome segment substitute lines) could be constructed to allow map-based cloning of the genes in the QTL region, or RNAi could be employed to verify the functions of these genes in kelp growth50,51.

In conclusion, a high-density genetic linkage map with 7627 SNPs was constructed for QTL analyses of blade length and width in S. japonica. A lot of 12 QTLs were found to be associated with blade length and 10 with blade width. The candidate genes mapped to these QTL intervals included 14 Tic20 genes and three peptidase s8 and s53 genes, which were proposed as candidate genes for blade length and width in S. japonica. The identified locations of QTLs controlling both blade length and width in S. japonica should lead to markers allowing for more efficient MAS for these two traits in future breeding.