Introduction

Nemipterids are marine, brightly coloured, small to medium-sized fishes distributed along tropical and sub-tropical Indo-Pacific regions. The family Nemipteridae consists of five genera and 71 species, which constitute a major component of commercial fisheries. Among the Nemipterus genera, Nemipterus randalli is one of the most commonly caught species in demersal landings (38.1%) and has been targeted by hook and line and bottom trawlers1. Randall’s threadfin bream, Nemipterus randalli, is one of the most commercially important demersal finfish species in Indian waters due to its edibility and export value. It provides raw materials for the production of surimi, fish balls and fish cakes2, which are important for export3. It is a non-migratory fish that inhabits the muddy and sandy bottoms in coastal waters at a depth range of 22–450 m, usually in schools. It is distributed in the western Indian Ocean including India, Pakistan, Persian Gulf, Red Sea, Gulf of Aden, East African coast, Seychelles and Madagascar4. N. randalli is categorized as the least concern species in the IUCN Red List of Threatened Species5 due to its wide distribution. They are fractional spawners with protracted spawning periods6 with an average life span of about 3 years7. They are demersal carnivores which feeds on crustaceans, squids, annelids and small fishes8 and ecologically plays a key role in trophic level as secondary consumers (trophic level 3.8). In the recent years, the annual catch of threadfin bream in India was accounted for 81,000 tonnes during 2019–20209. An increase in the production of N. randalli could potentially suggest a rise in fishing activities driven by a growing market demand. However, if not managed properly, this could lead to overfishing and depletion of local stocks.

To effectively manage a fishery, it is crucial to have a clear understanding of the stock structure of the species. This understanding is essential for developing appropriate management regulations, particularly in cases where multiple stocks are subject to varying levels of exploitation10. Fisheries stock assessments rely heavily on stock identification, which is crucial for effective management of fish populations. For effective management, various stock-rebuilding strategies must be implemented that involves identifying the stock structure11. Marine fish species often exhibit weak genetic differentiation due to the lack of physical barriers along with high gene flow and dispersal rate12. Especially in migratory fish species, the signal of genetic differentiation is weak and it is difficult to delineate the population structure. With the advent of improved genetic methods, distinct populations or stocks can be resolved. Although there are no apparent physical barriers to gene flow, population structure is often detected in many marine species13. This may arise due to various factors such as reproductive isolation, geographical distance14, biogeographic barriers15, or specific behavioural traits16. The advent of molecular markers has proven to be a valuable tool for detecting genetic variation in fish populations at both inter-specific and intra-specific levels. This technology is so sensitive that even low levels of differentiation can be detected. One such marker is mitochondrial DNA (mtDNA), which can be used as an effective marker to elucidate genetic structure due to its rapid evolutionary rate, uniparental inheritance, high mutation rates and lack of recombination17. Another set of markers are nuclear EPIC markers, which are extremely effective in detecting genetic structure18 due to their high polymorphism19.

Understanding the population structure and genetic diversity of this commercially important species is crucial for sustainable utilisation, conservation and management. To obtain deeper insights into the genetic diversity and population structure, genetic tools such as mitochondrial DNA and nuclear DNA have been widely used20. In this study, we used molecular markers with different rates of evolution, to investigate the population structure and genetic diversity of N. randalli from Indian waters. Mitochondrial DNA markers such as Cytochrome b sequences have proven to be highly effective for studying diversity and systematics. These markers exhibit a combination of slowly and rapidly evolving codon positions, as well as conserved and variable regions. This unique property makes Cytb DNA sequences ideal for investigating a wide range of questions, including phylogeny, population dynamics, and recent divergence levels21. To assess genetic variability, nuclear DNA markers such as Ribosomal protein Exon primed Intron Crossing (Rp EPIC) were used, which span the exon regions and detect polymorphisms within the conserved intron region and can be used as selectively neutral nuclear markers22.

Despite its commercial importance, there is little of information about the population structure of N. randalli in Indian waters. Studies on N. randalli from Indian waters are limited to stock assessment1, biology23, diversity24, morphometrics25 and systematics26. A recent investigation of N. randalli based on mitochondrial D-loop makers indicated lack of stock structuring in Indian waters27. However, the same authors reported the stock structure in Indian waters based on morphological parameters and otolith shape28. However, these studies were fragmented and incomplete. We conducted a comprehensive investigation by collecting samples along the entire Indian coast and assessing the genetic structuring using mitochondrial and nuclear markers.

We used two classes of genetic markers, the mitochondrial Cytochrome b (Cyt b) gene and the nuclear Ribosomal protein gene intron S7 (Rp S7). We also investigated the historical demography of N. randalli in Indian waters, which is critical for proper management and ensuring sustainability of the fishery.

Results

Mitochondrial DNA

A 507 bp of Cytb gene sequence was amplified from 125 individuals from five locations. The Cytb sequences had 53 polymorphic sites consisting of 58 substitutions including 47 transitions and 11 transversions. Out of 507 sites, 454 were found to be invariable while 53 sites were variable. Furthermore, there were 28 singleton sites and 25 parsimony informative sites. The average nucleotide composition was C: 32.86%, T: 29.86%, A: 24.63% and G: 12.65%. Point mutations occurred at the first and second codon positions, resulting in seven non-synonymous amino-acid substitutions. Out of 125 individuals, a total of 58 haplotypes were identified, with 51 of them being private haplotypes (87.93%) and the remaining 7 (12.06%) being shared haplotypes (Table S1). While the majority of the private haplotypes were unique to a single individual, two private haplotypes were identified in multiple individuals. Table 1 provides information on the number of individuals, number of haplotypes, segregating sites, haplotype diversity, and nucleotide diversity for each location. Overall haplotype diversity was high (Hd = 0.893), but nucleotide diversity was low (π = 0.012) in N. randalli.

Table 1 Diversity measures of N. randalli for Cytb and Rp S7 sequences.

Haplotype network analysis using the median-joining method identified two haplogroups, which were separated by five mutation steps as illustrated in the diagram (Fig. 1). The central part of the network was occupied by shared haplotypes, while the unique haplotypes branched out from the centre. Haplogroup I with the common haplotype H13, comprised of haplotypes from the east coast (Chennai and Puri). On the other hand, haplogroup II with a common haplotype of H1, included haplotypes from the west coast (Veraval, Mangalore, and Cochin). The distribution of haplotypes based on Cytb gene sequences exhibited a star-like pattern indicating population expansion.

Figure 1
figure 1

Median joining network for N. randalli based on Cytb sequences. Haplotypes are represented in circles and colours indicate geographical locations. Vertical lines indicate mutational steps. Black dots indicate lost or not sampled haplotypes.

The range of genetic divergence within the population varied between 0 and 0.2%, with the lowest value observed in Mangalore and Puri, and the highest in Cochin. Genetic divergence between populations ranged from 0.1 to 0.8%. Among the populations compared, Coc-Man, Coc-Ver, Che-Pur, and Man-Ver exhibited the least divergence, while Coc-Che, Coc-Pur, Che-Man, Che-Ver, Man-Pur, and Pur-Ver showed the highest divergence. However, when the populations were categorized according to their respective coasts, the divergence within each coast was only 0.1%, while between coasts, it reached 0.7% (Table 2). Therefore, the K2P distance analysis indicated genetic divergence among N. randalli populations along both coasts of India. Bayesian analysis displayed two major clusters. The upper cluster contained haplotypes from the East Coast and the lower branch contained haplotypes from the West Coast with relatively high posterior probability value (Fig. 2).

Table 2 Pairwise genetic distance within population (shown in bold along diagonal), between population (above diagonal) and fixation index (FST) (below diagonal) based on Cytb sequences and Rp S7 sequences.
Figure 2
figure 2

Bayesian tree for Cytb haplotypes of N. randalli. Posterior probability values are indicated at the branches.

AMOVA analysis indicates statistically significant population differentiation of N. randalli (FST = 0.739, p < 0.05) along the Indian coast, with a hierarchical AMOVA analysis revealing high genetic differentiation between the east and west coasts (ΦCT = 0.824, p < 0.05) (Table 3). Table 2 shows the pairwise FST values among five N. randalli populations, indicating significant differentiation among most of the compared populations except for a few, namely Coc-Man, Coc-Ver, Che-Puri, and Man-Ver, which showed no statistical significance. The findings indicated that populations along the same coast showed insignificant values, suggesting no genetic differentiation, whereas populations between the west and east coast displayed notable genetic differentiation. Thus, the populations can be divided into two groups: those from the west coast forming one group and those from the east coast forming the other. Analysis using SAMOVA revealed that the genetic differentiation among groups was highest when the number of groups (K) was set to 2, resulting in an FCT value of 0.824 (p < 0.05). The first group consisted of populations from the east coast, including Chennai and Puri, while the second group comprised populations from the west coast, namely Cochin, Mangalore, and Veraval. The Mantel test supports isolation by distance, showing a significant positive correlation between genetic and geographical distances (r = 0.511, p < 0.05) (Fig. S1a).

Table 3 Hierarchical analysis of AMOVA carried out using Cytb sequences and Rp S7 sequences to estimate levels of differentiation among groups of populations, among populations within groups and within populations.

The overall Tajima’s D (D =  − 1.1764, p > 0.05) and Fu’s Fs (Fs =  − 44.03, p < 0.05) values were negative, suggesting a deviation from neutrality. This negative value indicates an excess of rare haplotypes or rare mutations within populations. However, Tajima’s D value shows non-significant negative values. All five populations also had negative values for both Tajima’s D and Fu’s Fs and are statistically significant, suggesting population expansion (Table 4). The mismatch distribution graph for the entire population showed a bimodal and ragged pattern, indicating population equilibrium. However, individual populations showed both unimodal and bimodal patterns (Fig. 3). The mismatch distribution for east and west coasts showed unimodal and bimodal values respectively. The results of the mismatch distribution contradict the neutrality tests. Hence, to further test the validity of the neutrality test results, the raggedness index and SSD were calculated under the sudden demographic expansion model. The results showed that the data has a relatively good fit to a model of population expansion since both the Harpending Raggedness index and SSD values were non-significant for all populations, as shown in Table 4. In the Indian waters, the estimated tau (τ) values derived from the Cytb sequences were 1.57266 (Table 4). By applying the equation T = τ/2u, the resulting time for population expansion based on the Cytb sequences was determined to be 77,000 years. The estimated θ1 was higher than θ0, indicating population expansion from a small to a large size (Table 4). According to the Bayesian skyline plot (BSP), there was a demographic expansion approximately 77,000 years ago. Prior to this expansion, the flat skyline plot suggested that the population had remained stable with a constant size during that time period (Fig. S2).

Table 4 Demographic parameters of N. randalli based on Cytb sequences.
Figure 3
figure 3

Mismatch distribution analysis using constant population size model to observe the population size changes in N. randalli populations based on Cytb gene sequences. The observed frequency represented by a red line. The expected frequency was depicted by a dotted green line.

Nuclear DNA

An intron fragment 577 bp long was obtained from 125 individuals across five sampling locations, resulting in a total of 250 sequences. The average nucleotide composition was C: 20.93%, T: 30.90%, A: 22.19% and G: 25.99%. A total of 186 alleles were detected in all samples, with 293 segregating sites consisting of 17 singleton sites and 276 parsimony informative sites (Table S2). Of the 440 substitutions observed, 212 were transitions and 228 were transversions, and there were no shared alleles between populations. Overall allelic diversity was high (Ad = 0.997), but nucleotide diversity was low (π = 0.039). The number of alleles, allelic diversity, nucleotide diversity and the average number of nucleotide differences of individual populations are presented in Table 1. The median-joining haplotype network for Rp S7 gene intron 1 sequences revealed a complex pattern due to high levels of polymorphism, with a less obvious geographical structure compared to that detected for the cytb gene sequences (Fig. S3). The Bayesian tree topology did not reveal partition based on geographical region. No obvious structure was observed in the Rp S7 haplotypes from Indian coast (Fig. S4). Genetic distances within populations ranged from 2.5 to 4.9% and genetic distances between populations ranged from 2.9 to 5.2%. Since the nuDNA sequences show high polymorphism, it will reflect the overall genetic divergence. As a result, populations may exhibit higher K2P values (Table 2).

Pairwise FST for populations within the coasts showed non-significant structuring, with low FST and ΦST between all populations (Table 2). Hierarchical AMOVA based on two gene pool showed non-significant genetic differentiation (ΦCT = 0.008, p = 0.195) (Table 3). The analysis conducted using SAMOVA indicated a non-significant result, suggesting that no genetic structure was detected in the population. The mantel test showed non-significant correlations between genetic and geographical distance where r = 0.447, p = 0.16 (Fig. S1b).

Discussion

The present results provided insights into the patterns of genetic differentiation and demographic history of N. randalli in the Indian waters. Previous studies have investigated the population structure of N. randalli using the mt D-loop region27, the COI region29, and the otolith shape28. Srihari et al.27 revealed a lack of structuring among populations from Indian waters. These studies were conducted at a reduced geographical scale mainly along the west coast which may be the reason for the lack of significant genetic differentiation. In a separate study, Srihari et al.28 investigated the stock structure of N. randalli based on morphology and otolith shape from five major fishing regions along the Indian coast and observed different phenotypic stocks along the Indian coast. However, they concluded that the observed heterogeneity might be due to adaptation to the particular environmental conditions in this region.

The present work is characterized by high haplotype diversity and low nucleotide diversity for each population and across all populations as revealed by both mtDNA Cytb and nuDNA Rp S7 sequences suggesting a recent population expansion after a period of low effective size30. The presence of genetic differentiation among populations of N. randalli was confirmed through pair-wise FST and AMOVA analysis. The measurement of FST enables the assessment of genetic variation within and between populations, providing valuable insights into how evolutionary processes impact genetic composition31. Wright characterized FST values as follows: 0–0.05 indicates little differentiation, 0.05–0.15 indicates moderate differentiation, 0.15–0.25 indicates large differentiation, and values greater than 0.25 indicate very large differentiation32. When comparing the mitochondrial and nuclear markers to assess the genetic differentiation of N. randalli, the molecular analyzes performed led to contradictory results. In particular, the mitochondrial marker indicated high genetic differentiation, suggesting the presence of two distinct stocks, one on the east coast and one on the west coast. This was further supported by hierarchical AMOVA analysis based on Cytb DNA sequences, which showed significant FST and FCT values, indicating significant genetic structure among N. randalli populations. In contrast, the analysis based on the intron 1 sequences of the Rp-S7 gene did not reveal any significant genetic differentiation in the hierarchical AMOVA. In this study, the FST values obtained for Cytb showed high genetic differentiation, suggesting the presence of two distinct populations, one on the east coast and one on the west coast. This structure was present in the Cytb results but not in the Rp S7 results. The median joining network for Rp-S7 gene intron 1 sequences represented an ambiguous haplotype network due to high polymorphism, indicating a less obvious geographical structure compared to the Cytb gene sequences.

The median joining network diagram based on Cytb gene sequences revealed the presence of two haplogroups in which individuals from the east coast were assigned to haplogroup I and from the west coast to haplogroup II. The high level of genetic differentiation among N. randalli populations in Indian waters, as observed in the Cytb DNA sequences, is corroborated by the corresponding phylogenetic tree. This tree exhibits results consistent with the hierarchical AMOVA, revealing two distinct clades with a high posterior probability value, indicating geographical differentiation among haplotypes. However, phylogenetic analyses of the nuclear Rp S7 gene intron 1 did not reveal significant genetic differentiation among alleles, suggesting a more uniform genetic composition for this marker. Furthermore, application of SAMOVA to the Cytb DNA sequences, unveiled a clear geographical separation among N. randalli populations along the Indian coast. In contrast, the SAMOVA results for the Rp S7 gene intron 1 sequences did not indicate significant genetic structure within the population, implying that this particular marker may not be as informative in discerning geographical differences among N. randalli populations. The lack of genetic differentiation as indicated by the Rp S7 gene intron 1 sequences is due to the high variability of intronic regions which masked the signals of genetic differentiation if any. Similar results of high intronic variability have been indicated in studies of species like the Amazonian cardinal tetra33.

Discrepancies are evident in the results obtained from mitochondrial DNA (mtDNA) and nuclear DNA (nuDNA) sequences. The mtDNA sequences reveal genetic differentiation between the east and west coast, characterized by the absence of haplotype sharing. In contrast, the nuDNA sequences show no genetic differentiation between the east and west coast populations. The efficiency of mitochondrial markers in detecting population differentiation is attributed to their lower effective population size34. Additionally, the faster genetic drift, coupled with a significantly higher mutation rate (about five to ten orders of magnitude) than that of nuclear DNA, enhances their discriminatory power35. The nuclear DNA, inherited from both parents, allows the examination of both maternal and paternal lineages, providing a comprehensive understanding. The biparental inheritance of nuDNA facilitates the interpretation of genetic lineages. The high polymorphism in nuclear markers is advantageous for revealing subtle population structures, aiding in identification of subpopulations. However, the extensive polymorphism can occasionally complicate the interpretation of population structure.

The genetic differentiation observed in N. randalli can be influenced by various life history traits exhibited by the species. Traits such as non-migratory behaviour, natal homing behaviour, parental care, characteristics of eggs, and limited larval dispersal have the potential to restrict gene flow between populations, leading to genetic differentiation. However, our understanding of these specific traits in N. randalli is currently limited. While it is known that N. randalli displays non-migratory behaviour and scatters its eggs on open substrates without guarding them36, information regarding the distribution and duration of the larval stage is lacking. The spawning season of this species is observed to be from June to October in Kerala3. This information is missing from other regions which constraints the interpretation of biological causes of genetic differentiation. The limited migratory capacity of N. randalli may contribute to the observed genetic differentiation between the east and west coasts.

The present study also indicated that N. randalli populations fits the isolation by distance model. The relationship between geographical and genetic distances showed significant scatter, indicating that there was considerable variation in the data. The Mantel test provides further evidence for the genetic structure observed in our study, suggesting that it aligns with the phenomenon commonly found in fish species with limited dispersal capabilities37. This also supports genetic differentiation between populations of N. randalli. In contrast, when examining Rp S7 gene intron 1 sequences, no significant correlation between genetic and geographic distances was observed. This lack of correlation suggests that there is no substantial evidence of genetic isolation by distance.

The genetic differentiation observed between N. randalli species on the west and east coasts of India could be due to a combination of factors including historical events, geographical barriers, and environmental differences. Geographical barriers such as ocean currents and physical features like mountain ranges or shallow water areas can also prevent gene flow between populations, leading to genetic differentiation. The Indian peninsula divides the Northern Indian Ocean into two marine regions, the Bay of Bengal on the east and Arabian Sea on the west which might restrict gene flow between these two coasts38,39. Another hard barrier which may restrict the gene flow between the west and east coasts of India is the Adam’s Bridge or Rama’s Bridge between Pamban Island, off the southeast coast of the southern Indian state of Tamil Nadu, and Mannar Island off the north west coast of Sri Lanka. This bridge formed a land barrier potentially preventing the spread of N. randalli contributing to the genetic differentiation40.

The Arabian Sea on the west and Bay of Bengal on the east coasts of India have different oceanographic conditions. The Arabian Sea exhibits higher productivity compared to the Bay of Bengal, with significant upwelling events during the summer monsoon41. The upwelling along the Arabian Sea is influenced by monsoonal winds and Ekman transport42. The prevailing monsoon winds blow parallel to the coastline, resulting in Ekman transport, that advects nutrient-rich water into the euphotic zone, promoting upwelling. This upwelling sustains cooler waters and supports abundant biological productivity, making the Arabian Sea one of the most productive regions in the Indian Ocean. In contrast, the upwelling in the Bay of Bengal is generally weaker and occurs primarily during the winter monsoon43. The south-westerly monsoon winds induce Ekman pumping and upwelling in the Bay of Bengal. The Bay of Bengal receives significant rainfall and freshwater from river runoff, leading to higher stratification and the formation of a thick barrier layer, inhibiting strong vertical mixing driven by summer monsoon winds (Ekman pumping)44. These contrasting oceanographic conditions serve as barriers to gene flow between fish populations on the west and east coasts of India.

Finally, environmental differences such as temperature, and salinity, can also influence genetic differentiation. The west and east coasts of India have different environmental conditions, with the west coast experiencing a more arid climate and higher salinity due to intrusion of high salinity waters from the Persian Gulf and the Red Sea. Additionally, it is the warmest region, with evaporation exceeding precipitation while the east coast has large amounts of precipitation influenced by monsoon rains and considerable influx of freshwater from major river systems leading to lower salinity38,45,46. The north Indian Ocean attains its highest global temperatures before the onset of the summer monsoon, specifically during the months of April and May47. During the south west monsoon, the Arabian Sea cools faster and sea surface temperature (SST) decreases than in the Bay of Bengal. While stratification associated with heavy precipitation and river run-off in the Bay of Bengal hampers the mixing of warm surface waters with cooler waters below, resulting in the Bay of Bengal retaining warmer SST than the Arabian Sea48. These environmental differences could lead to differences in the selection pressures acting on fish populations, leading to genetic differentiation. Overall, the genetic differentiation observed between some fish species on the west and east coasts of India is likely due to a combination of historical, geographical, and environmental factors. Other marine organisms have also been reported to exhibit a genetic structure that indicates genetic differentiation between the east and west coasts of India49,50.

Mismatch analyses revealed a bimodal distribution pattern, which is typically regarded as indicative of prolonged population stability51. Conversely, the values of Tajima’s D, Fu’s Fs, SSD, and Hri indicated that there was a demographic expansion. The multimodal mismatch distribution could also be attributed to population sub structuring52,53. Thus, we inferred that the bimodal pattern observed in the mismatch distributions described above is due to the existence of distinct haplogroups, rather than demographic stability. In contrast, each population exhibited a unimodal mismatch distribution pattern indicating that the population experience expansion. The mismatch distribution analysis conducted on the east coast population of N. randalli displayed a unimodal graph, indicating a signature of population expansion. In contrast, the mismatch distribution analysis performed on the west coast population revealed bimodal characteristics, suggesting that the population was in equilibrium, possibly due to a colonization event involving random haplotype lineages54.

The demographic expansion of N. randalli populations based on the Bayesian skyline plot and estimates of τ value from mismatch analysis indicated the time of expansion as 77,000 years ago during late Pleistocene, which was also supported by neutrality tests such as Tajima’s D and Fu’s Fs and mismatch distribution. During the late Pleistocene, which lasted from approximately 126,000 to 11,700 years ago, several climatic and environmental factors contributed to the expansion of marine populations. The late Pleistocene was marked by significant fluctuations in sea levels due to glacial-interglacial cycles55. During glacial periods, when large volumes of water were locked up in continental ice sheets, sea levels dropped, exposing vast areas of the continental shelves. As the Late Pleistocene progressed, the large continental ice sheets that had formed during the preceding glacial periods began to melt. The melting of ice sheets led to a significant increase in sea levels, flooding of coastal areas and expansion of marine habitats. This allowed marine populations to colonize newly available regions and contributed to their expansion. One factor that could contribute to distribution of N. randalli is historical events such as past changes in sea level or oceanographic conditions. The Last Glacial Maximum (LGM) occurred approximately 26,500 to 19,000 years ago, and had a significant impact on the North Indian Ocean. During this period, the Earth's climate was colder, and a large amount of water was trapped in ice sheets, resulting in lower sea levels worldwide56. Sea level decline leads to the formation of land bridges connecting Sri Lanka and India, affecting the dispersal of marine species and leading to the isolation of marine populations40,57. As a result, populations on the east and west coasts of India may have been separated for long periods of time. The Indian Ocean is influenced by the Indian monsoon system, characterized by seasonal winds and rainfall patterns58,59. During the Pleistocene, variations in monsoonal intensity occurred, affecting oceanic conditions. The strengthening of the Indian monsoon during the interglacial periods led to increased rainfall and freshwater runoff into the Indian Ocean. This freshwater influx, coupled with nutrient-rich sediment carried by rivers, enhanced productivity in coastal areas, supporting the growth and expansion of marine populations60,61,62. Oceanic currents and upwelling events influenced the distribution and abundance of marine populations in the Indian Ocean during the late Pleistocene. The Indian Ocean's circulation patterns, such as the Indian Ocean Dipole and the Somali Current, played a role in transporting nutrients and supporting the growth of marine life. Upwelling events, driven by wind patterns and oceanic circulation, brought nutrient-rich waters to the surface63, creating productive zones that fuelled the expansion of N. randalli populations.

Conclusion

This study provides insights into the genetic structure and diversity of N. randalli. In our study, we showed evidence of significant population structuring in N. randalli populations using mitochondrial Cytb gene sequences. Mitochondrial DNA analysis of N. randalli showed genetic differentiation between the west and east coasts of India. The high level of polymorphism in the nuclear marker RpS7 masked the population structuring in N. randalli. Thus, concordant results were not obtained using mitochondrial and nuclear DNA sequences. In the present study, the nuclear marker showed a reduced efficiency to delineate populations with the given sample size. Hence, further studies should be carried out with advanced marker to delineate stock structure of N. randalli.

The high genetic structuring observed in N. randalli along the Indian waters indicates the necessity of devising distinct fishery management and conservation plans for the east and west coasts of India. Even though N. randalli is considered as a species of “least concern” by the IUCN, this does not mean that it is not endangered. By studying stock structure, such commercially important fish species can be managed in a way that maximizes economic benefits while minimizing negative impacts on the environment and the long-term health of the fish populations. This knowledge can contribute to a more comprehensive understanding of stock structure and inform conservation efforts more broadly.

Materials and methods

Molecular methods

Sampling and DNA extraction

A total of 125 samples of N. randalli were collected from five locations (Cochin (Coc), Chennai (Che), Mangalore (Man), Veraval (Ver) and Puri (Pur)) (Table S3) along Indian coast. Muscle tissues were collected from the caudal region (approximately 50 mg) and preserved in 95% ethanol and stored at room temperature. The fish samples used in this study were handled according to the guidelines for the care and use of fish in research by DeTolla et al.64. The protocols were approved by the Ethics Committee of ICAR—Central Marine Research Institute, Kochi. These methods are also reported in accordance with the ARRIVE guidelines (http://arriveguidelines.org). Total genomic DNA was extracted from tissues of N. randalli specimens using phenol–chloroform extraction method65 and stored at -200C. The quantity and quality of DNA were measured by using Nanodrop (Thermo Fisher Scientific, USA) and integrity was determined using 0.8% agarose gel.

PCR amplification and sequencing

The partial mitochondrial Cytochrome b (Cytb, 507 bp) region and the single-copy ribosomal protein S7 gene intron 1 (Rp S7, 577 bp) were amplified using universal primers Cytb A-CCATGAGGACAAATATCATTYTG (Forward)66 and Cytb C-CTACTGGTTGTCCTCCGATTCAT (Reverse)67 and S7RPEX1F-TGGCCTCTTCCTTGGCCGTC and S7RPEX2R-AACTCGTCTGGCTTTTCGCC68 respectively.

For both the markers, PCR amplification was performed in a total volume of 20 µl reaction mixture containing 1 µl template DNA (50 ng), 0.4 µl of each forward and reverse primer, 8.2 µl PCR water and 10 µl master mix (TaKaRa) using a Biorad T100 thermocycler (Biorad, USA). The PCR program for the Cytb region involved an initial denaturation at 94 °C for 4 min, followed by 32 cycles of 94 °C denaturation for 30 s, 55 °C annealing for 30 s and 72 °C extension for 1 min, followed by a final extension at 72 °C for 10 min and a final hold at 4 °C. The PCR program for Rp S7 gene intron 1 involved an initial denaturation at 94 °C for 4 min, followed by 32 cycles of 94 °C denaturation for 30 s, 58 °C annealing for 30 s and 72 °C extension for 1 min, followed by a final extension at 72 °C for 10 min and a final hold at 4 °C. PCR products obtained were subjected to electrophoresis on 1.2% agarose gel stained with ethidium bromide and visualized using gel documentation system (Vilber Lourmat, France). The PCR products were then sequenced using Sanger sequencing.

Sequence analysis

All sequences were checked manually against their chromatogram using the program Seqman from the software DNAStar Lasergene 6.0 (DNASTAR Inc., Madison, WI, USA). Sequences were manually checked, assembled, aligned and trimmed using Clustal W implemented in MEGA 7.0 software69. All the mitochondrial DNA sequences (Cytb) were translated into aminoacids using MEGA 7.0 software to ensure that stop codons were not present69. Heterozygous positions in individual sequences of nuclear genes were initially checked for double peaks in DNAStar Lasergene 6.0 software and alleles of the heterozygous individuals were phased using Bayesian program PHASE implemented in DnaSP 6.070. The runs comprised 1000 number of iterations and 1000 burn-in iterations. The phased data were then used for the diversity analysis and network construction. The sequences were verified as mtDNA Cyt b region and ribosomal protein S7 region through BLAST in the NCBI. All unique haplotypes and alleles were deposited in the GenBank database under accession no OR178681–OR178738 (Cyt b) and OR180116–OR180301 (Rp S7).

Statistical analysis

Diversity analysis

Genetic diversity indices such as number of haplotypes (h) or alleles (for nuclear), haplotype diversity (Hd) or Allelic diversity (Ad) (for nuclear), nucleotide diversity (π), number of segregating sites (S), average number of nucleotide differences (k), singleton and parsimony informative sites, were estimated for each and entire population using DnaSP 6.070 and Arlequin v3.571 for both data sets. The complete aligned data set was also assessed for nucleotide composition, transition and transversion using Arlequin v3.571.

Historical demography

Population demographic history (Cytb) was assessed using neutrality tests and mismatch distribution analysis (MDA) implemented in DnaSP 6.070 and Arlequin v3.571. Neutrality tests such as Tajima’s D72 and Fu’s Fs73 were conducted to assess whether the DNA sequences experienced any deviation from neutral mutation hypothesis. Mismatch distribution was done to test whether a population experienced any expansion event. The calculated expected values for a model with a constant population size were plotted against the observed values. Populations that have undergone expansion are expected to have unimodal distribution, whereas a population at equilibrium gives multimodal distribution51. In addition, goodness of fit was assessed by the sum of squared deviation (SSD)51 and Harpending Raggedness index (Hri)74 between the observed and expected distribution under models of demographic and spatial expansion and its significance was determined by a parametric bootstrap with 10,000 replicates. The expansion signal for a population was indicated by smooth distribution with non-significant p values. The distribution of pairwise differences between sequences (mismatch distribution) were also estimated based on the demographic parameters θ0 (population size before expansion), θ1 (population size after expansion) and τ (time since expansion expressed in units of mutational time) using a generalized least-square approach implemented in Arlequin v3.571. The time since last population expansion (T) was calculated using equation T = τ/2u51. The parameter u can be measured as u = µk, where µ is the mutation rate per nucleotide which is 2% per million years (Myr) for mitochondrial Cytb region of fishes75 with an estimated generation time of 1 year and k is the number of nucleotides covered in the data (507 bp). Coalescent-based Bayesian skyline plot (BSP) approach76 was used to estimate changes in effective population size (Ne) through time using BEAST software v1.877. The analysis was run for 108 iterations with burn-in of 107 under HKY model for cytb gene sequences. A strict molecular clock and stepwise skyline model were applied. The model parameters and genealogies were sampled every 1000 iterations, and all operations were automatically optimized. Finally, the resulting skyline plot was visualized using Tracer v1.578.

Population genetic structure

The pairwise genetic divergence within and among populations were calculated by MEGA 769 using Kimura 2-parameter model79. Pairwise genetic divergence between population were computed using fixation index (FST) and their significance was assessed with 10,000 permutations using Arlequin v3.571. The significance of pairwise comparisons was adjusted by sequential Bonferroni corrections80. After correcting with Bonferroni corrections, significant probability values were reported for pairwise comparisons of FST values between all populations as well as among subpopulations. Analysis of molecular variance (AMOVA)81 was performed for both data sets to detect genetic differentiation among and between populations with 10,000 permutations to examine the statistical significance of fixation indices using Arlequin v 3.571. In addition to the global AMOVA (considering all population-single gene pool) (global ΦST), a hierarchical AMOVA was carried out to estimate relative contribution of molecular variance among groups (ΦCT), among populations within group (ΦSC) and among populations relative to total samples (ΦST) with two gene pool (west: Cochin, Mangalore, Veraval; east: Chennai, Puri). Spatial Analysis of Molecular Variance (SAMOVA) v.2.082 was utilized to further investigate the spatial structure, aiming to identify groups of populations that exhibited geographical homogeneity while displaying the highest level of differentiation from one another. The value of K (no. of groups) was set between 2 and 5, with 10,000 simulated annealing steps. The optimal number of population groups was determined based on the highest variance among groups (FCT). Isolation by Distance (IBD) was tested to understand whether there is a correlation between genetic distance (FST from Arlequin) and geographical distance (km) using the Mantel test83 with 10,000 permutations implemented in Arlequin v3.571. Geographic distances between sampling sites were obtained using Google Earth v.9 based on longitude and latitude. To interpret and visualize phylogeographic relationship among haplotypes or alleles (for nuclear), median-joining haplotype network was constructed using PopART84.

Phylogenetic analysis

Phylogenetic analyses were carried out on the Cytb and S7 gene sequences separately to investigate the relationships among haplotypes/alleles from five geographical locations. The Bayesian inference method was used for both mitochondrial DNA (mtDNA) and nuclear DNA (nuDNA) to construct the phylogenetic tree. MrBayes v 3.2.4 was used to perform these analyses85. The best nucleotide substitution model for constructing the phylogenetic tree for each marker was inferred using MEGA 7 based on the AIC (Akaike Information Criterion)69. Spondyliosoma cantharus was chosen as the outgroup (GenBank accession no: MH545766 (Cytb) and MH545829 (Rp S7)). GTR + G + I was selected as the best model for constructing the phylogenetic tree for mtDNA and nuDNA. The Bayesian analysis was run for 1,000,000 generations using the Marko chain Monte Carlo (MCMC) method, with sampling done every 1000 generations. The analysis was stopped when the average standard deviation of split frequency was less than 0.01. The first 25% of the results were discarded as burn-in. The resulting phylogenetic tree was viewed using FigTree v1.4.3 (http://tree.bio.ed.ac.uk/software/figtree/).