Introduction

Menarche is a significant sign of pubertal onset, marking the beginning of a female’s fertility and reproductive ability1. The age at menarche (AAM) is a well-remembered and widely measured marker of female sexual development. AAM has been widely used in studies of female health2, where it has been observed to correlate with a body-mass index (BMI)3, height4, fertility5, psychological health6, and cancer7. A combination of genetic and environmental factors determines AAM. Large-scale genomic analysis of AAM has identified hundreds of associated variants and determined the genetic mechanism underlying the role of AAM in breast and endometrial cancer risk8. However, the effects of environmental factors on AAM remain unclear.

Dietary habits are critical for human health and disease prevention9. Unhealthy dietary habits may lead to various diseases, including cardiovascular10, endocrine11, and infertility12. Dietary habits are an important non-genetic factor affecting AAM13, and different dietary habits have different effects14. Research on the relationship between dietary habits and AAM has focused on observational studies and studies of single food groups. However, a systematic exploration of the potential correlations between dietary habits and AAM remains lacking. It is important to note that most eating habits are correlated and heritable. Genome-wide association studies (GWAS) have been conducted to analyze individual macronutrients in five questionnaires on macronutrient intake15 and 24-h dietary recall16. A recent GWAS study of dietary habits based on the UK Biobank identified genetic associations between hundreds of dietary habits, providing the potential to study the causal relationship between dietary habits and disease risk17. Dietary habits have been widely used to assess causal relationships with diseases or traits, such as migraines18, osteoporosis19, and cerebral cortex structure20.

With the advent of genome-wide association studies and improvements in widely applicable tools, the use of GWAS data for correlation analysis between multiple traits is becoming more common. Linkage disequilibrium score regression (LDSC) is a widely used method to identify genetic correlations among complex traits and to distinguish inflated test statistics from confounding biases and polygenicity in GWAS data21. Using the aggregate data from GWAS, LDSC provides a simple and reliable method to screen thousands of traits simultaneously and determine their real genetic correlations22. However, LDSC can only analyze genetic correlations between traits. To understand the confounding factors in observational studies and determine causality, Mendelian randomization (MR) can be used with genetic variation as an instrumental variable (IV) to assess whether the observed associations between risk factors and outcomes are consistent with causal effects23. MR has been used to identify reliable risk factors for various diseases24. The combination of LDSC and MR analyses has been widely used to explore the associations between complex diseases and their risk factors25.

In this study, we used LDSC to detect the genetic correlations between dietary habits and AAM. MR analysis assessed the causal relationship between the 143 dietary habits selected and AAM. Our results help to elucidate the potential genetic relationship between dietary habits and AAM.

Materials and methods

GWAS summary data

The genetic instruments for dietary habits were acquired from a public GWAS dataset17, which included 455,146 individuals from the UK Biobank. All individuals were 40–69 years old and lived in the UK between 2006 and 2010. Dietary habits were assessed using the UK Food Frequency Questionnaire (FFQ)26. The information collected included the number of tablespoons of cooked vegetables eaten per day (field 1289), overall oily fish intake (field 1329), and data on foods that were never eaten (from the options dairy, eggs, sugar, and wheat) (field 6144). The GWAS data were processed as follows: Heritability measures were obtained using BOLT-lmm software (v.2.3.2)27. Additional covariates in the BOLT-lmm analysis for both heritability and GWAS included the results from genotyping arrays and the first 10 genetic principal components (PCs) derived from a subset of unrelated European individuals using FlashPCA252, followed by projection of related individuals onto the PC space. Principal component analysis (PCA) generated 85 PC-DPs that captured the correlated structures between single food intake information and represented independent components of real-world eating habits17. A total of 814 independent loci (defined as > 500 kb apart) were identified, which exceeded the genome-wide significance (p < 5.0 × 10−8). A full description of the study design, sample characteristics, statistical analysis, and quality control can be obtained from the study results17.

GWAS summary data relating to AAM was obtained from the ReproGen Consortium to avoid sample overlap. This dataset28 included 182,416 females of European descent from across 58 studies. Individuals who reported their age at menarche as < 9 or > 17 years were excluded from the analysis. Single nucleotide polymorphisms (SNPs) were excluded from the individual study datasets if they were poorly imputed or rare (MAF < 1%). We obtained 3915 SNPs associated with AAM (p < 5 × 10−8). A full description of the study design, sample characteristics, statistical analysis, and quality control can be obtained from the study results28.

Genetic correlation analysis

The LD score regression (LDSC, v1.0.1, https://github.com/bulik/ldsc) software21 evaluated the genetic correlations between 143 dietary habits and AAM. LDSC is a useful approach for estimating the components of heritability and the genetic correlation and has been widely used to analyze complex diseases21. After strict Bonferroni correction, p < 0.000350 (0.05/143) was considered a significant association. The basic principle of the LDSC method is to use the test statistic of the expected value of the observed χ2 SNP under the original hypothesis of no association. SNPs that mark more neighbors—and thus have higher LD scores—are more likely to mark one or more causal loci that influence the phenotype5. We restricted our analysis to Hapmap3 SNPs using pre-calculated European LD scores from the 1000 Genomes Project Phase 3 provided by LDSC.

Selection of instrumental variables (IVs)

The IVs used in MR analysis should meet three conditions: (1) they are correlated with exposure, (2) they are not associated with confounding factors, and (3) they are not related to outcome directly but are related through exposure29. The IVs used in this study met the above conditions and are listed in Supplementary Table S2. All variables met the genome-wide significance threshold of p < 5 × 10−8. The parameters kb = 10,000 and r2 = 0.01 were used to remove the linkage disequilibrium between each variable. F-statistics were computed to estimate whether a weak instrument bias was observed and to improve the power of the selected instrumental variables. The F-statistics for all IVs were above the threshold of 1030.

Two-sample MR analysis

A two-sample MR analysis was used to evaluate the causal relationship between dietary habits and AAM. The SNPs used as IVs were within a distance of 10,000 kb and r2 > 0.001. A two-sample MR package (version 0.5.6) was used to analyze MR31. Five models were used in the MR analysis: (1) the inverse-variance-weighted (IVW) model, (2) the weighted median estimator, (3) the MR-Egger regression method, (4) the simple mode, and (5) the weighted mode. The IVW model was used as the primary method to evaluate the causal effect of dietary habits on AAM. The significance level was taken as p < 0.05. Significantly associated dietary habit SNPs were further assessed using statistical analyses, including Cochran’s Q test, a pleiotropy test, and a leave-one-out sensitivity test. However, if the pleiotropy test suggested the presence of pleiotropy (p < 0.05), MR Pleiotropy RESidual Sum and Outlier (MR-PRESSO) was used to filter potential outliers and assist in correcting them32. Finally, the leave-one-out sensitivity analysis was performed to evaluate whether a single SNP provided significant results.

Institutional review board statement

Ethical approval was not applicable to our study as publicly available data were used for all analyses.

Informed consent statement

Informed consent statement was not applicable to our study as publicly available data were used for all analyses.Informed consent was obtained from all subjects involved in the original study.

Results

Genetic correlations between 143 dietary habits and AAM

The LDSC analysis identified 29 candidate dietary habits significantly associated with AAM (Pldsc < 3.50 × 10−4, Fig. 1). These included milk types (skimmed, semi-skimmed, and full cream; coefficient = 0.2704; Pldsc = 1.13 × 10−14) and PC1 (coefficient =  − 0.1699; Pldsc = 1.00 × 10−10). PC1 is primarily defined by the type of bread consumed (whole grain/whole meal vs. white bread, two correlated FI-QTs contributing 15.4–15.8%). A further 41 dietary habits showed a suggestive association with AAM (0.05 < Pldsc < 3.50 × 10−4), such as the frequency of adding salt to food (coefficient = 0.0862, Pldsc = 6.00 × 10−4), PC24 (coefficient = 0.1093, Pldsc = 5.00 × 10−4), and overall poultry intake (coefficient =  − 0.0945, Pldsc = 3.80 × 10−3). All genetic correlations between the 143 dietary habits and AAM are summarized in Supplementary Table S1.

Figure 1
figure 1

Heatmap of significant correlation (Pldsc < 3.50 × 10−4) of LDSC analysis.

Causal relationships between 143 dietary habits and AAM

The MR study identified causal relationships between 19 dietary habits and AAM. In addition, 15 dietary habits showed genetic correlations with AAM and therefore were deemed to have causal relationships with AAM. These included bread type (white vs. any other; OR 1.71, 95% CI 1.28–2.29; Pmr = 3.20 × 10−4), tablespoons of cooked vegetables per day (OR 0.437, 95% CI 0.29–0.67, Pmr = 1.30 × 10−4), milk type (skimmed, semi-skimmed, and full cream; OR 3.37, 95% CI 1.76–6.44, Pmr = 2.50 × 10−4), cups of coffee per day (OR 0.72, 95% CI 0.57–0.92, Pmr = 8.31 × 10−3), and PC3 (OR 1.14, 95% CI 1.03–1.27, Pmr = 0.01) (Fig. 2, Supplementary Tables S2S3). PC3 is primarily defined by the spread type (butter vs. any other, one correlated FI-QT contributing over 10%)17.

Figure 2
figure 2

Results of MR analysis. (A) Veen plot of LDEC and MR. (BD) Forest plots of MR results. MR Mendelian randomization, LDSC Linkage disequilibrium score regression, CI confidence interval.

Sensitivity analysis

All dietary habits identified as significantly associated with AAM were further analyzed using Cochran’s Q test (Supplementary Table S4), the pleiotropy test (Supplementary Table S6), and the leave-one-out sensitivity analysis. The results of these tests were used to inform the MR method. In the absence of heterogeneity and pleiotropy, estimated IVW results were preferentially used, thus this method was used most frequently in this study. When there was only heterogeneity, but no pleiotropy, a weighted median or random-effect IVW was used (Supplementary Table S5). Although a few results were heterogeneous, the direction of the effect obtained from these other methods was concordant with the IVW results. However, the MR-Egger method was used if the pleiotropy test suggested that the result was multi-efficacious. The leave-one-out sensitivity test results suggested the result should be considered reliable (Fig. 3, Supplementary Information 2).

Figure 3
figure 3

Leave-one-out sensitivity test.

Discussion

In this study, we performed LDSC and MR analyses to investigate the relationship between dietary habits and AAM. Using LDSC analysis, we identified 29 candidate dietary habits significantly associated with AAM. We further evaluated the causal relationship between dietary habits and AAM using MR analysis. After sensitivity analyses, we identified eight dietary habits that showed significant causal relationships with AAM: bread type (whole meal, whole grain, white, brown), cereal type (cornflakes, frosties), milk type (skimmed, semi-skimmed, full cream), spread type (butter, margarine, olive oil), overall oily fish intake, tablespoons of vegetables per day, and PC1 and PC3, which primarily represent bread type and spread type, respectively.

Diet plays a key role in human health, and a recent study showed that diet-associated risk is among the top five risks of attributable deaths worldwide33. Dietary habits affect physical health in childhood34 and adulthood, especially concerning chronic diseases35. A pooled analysis of 2181 population-based studies36 showed that lifelong health advantages and risks are affected by heterogeneous nutritional quality. Diet influences the age of puberty onset and further affects height gain during adolescence and late adolescence. Another prospective study of 3983 children showed that puberty started later in children with a high diet quality37. A prospective cohort study of 215 girls showed that girls with a low intake of white meat (poultry and fish), fruits, and vegetables had an earlier AAM38. In addition to dietary patterns, single dietary components can also affect AAM. A cohort study among Chinese children suggested that the higher the childhood soy intake, the later the puberty39. According to Cheng TS. et al., a higher dietary fat intake has been associated with earlier puberty40. Although many cohort studies have observed the effects of different diets on AAM, the underlying mechanisms have not yet been elucidated. The effect of diet on AAM is influenced by many confounding factors, which are often difficult to control41. Therefore, the current research on the relationship between diet and AAM is mainly observational, and the results are often controversial. MR Analysis has a unique advantage in excluding confounding factors42. Although Cheng TS. et al. showed a causal effect of higher dihomo-γ-linolenic acid concentrations on earlier AAM using MR analysis43. There have been no studies on the association between dietary habits in general and AAM. We used the most comprehensive dietary habit GWAS summary dataset to evaluate the relationship between dietary habits and AAM from a genetic perspective.

The effect of estrogenic endocrine disruptors (EEDs) on puberty has long been understood44. Numerous observational studies have shown that girls chronically exposed to EEDs are more likely to experience early puberty45,46,47, regardless of the incidence of obesity48. The similar structure of EEDs to estrogen allows them to bind and activate estrogen receptors, thus exerting effects similar to estrogen49. Kisspeptin regulates puberty and fertility in humans50 by stimulating GnRH neuronal activity and regulating ovulation by driving luteinizing hormone surges51. Exposure to EEDs affects the expression of kisspeptin and GnRH and influences the pulsatile release of GnRH, which subsequently alters gonadotropin levels52. Animal studies have shown that EEDs alter hypothalamic Kiss1 mRNA expression levels and kisspeptin fiber density while altering gonadotropin secretion and/or gonadotropin-releasing hormone neuronal activation53. Coffee is one of the most widely consumed beverages worldwide54 and contains various ingredients, including caffeine, carbohydrates, lipids, and proteins, and has a few estrogenic activities55. In addition to caffeine, other components of coffee, such as aromatic acids, esters, and sterols, have estrogenic activity56. Our results suggest that increased daily coffee intake leads to earlier AAM, which may be attributed to the estrogenic activity of coffee. Isoflavones as EEDs have the potential to modulate estrogen metabolism57. Isoflavones are structurally similar to estrogens and can compete with endogenous estrogen for the estrogen receptor58. Whole wheat bread contains high concentrations of isoflavones (average 450 mcg/100 g)59 and is the primary source of isoflavones in the European diet60. Thus, our results suggest that the intake of whole grain bread may result in earlier AAM compared to that associated with white bread. Similarly, increased daily vegetable intake was associated with earlier AAM. The consumption of phytoestrogen-containing vegetables, such as soy and soy products, may promote puberty onset by performing estrogen-like biological functions61,62. The intake of vegetables, especially raw vegetables, may increase the risk of pesticide exposure63,64, and the effects on puberty timing as a type of EED have been extensively studied65,66.

Insulin-like growth factors (IGFs), including IGF-1 and IGF-2, are important peptides that regulate essential cellular activities67. IGF-1 levels are associated with age and play a key role in growth68. A few longitudinal studies have shown that higher IGF-1 levels are related to earlier puberty onset69 and higher breast cancer risk70. Oily fish were positively associated with circulating IGF-I concentrations71. Our results suggest that an increased intake of oily fish may lead to earlier AAM. In addition, several prospective cohort studies have shown that milk intake before puberty may accelerate AAM72,73. Other studies have shown no effect of milk intake on AAM74,75. Although milk contains IGF-1, studies have shown that humans do not absorb biologically significant levels of intact IGF-I from their food76. Different types of milk may contain different levels of sugar, fats, proteins, and other trace elements. A study based on the Growth and Obesity Cohort showed that consumption of sugar-sweetened milk beverages led to earlier mammary gland development compared to that in low-fat dairy products, namely low-fat milk and yogurt77. Another study showed that milk and butter consumption at the ages 3–5 was inversely related to breast development at age 10.878. Our results suggest that butter intake was negatively associated with AAM, while the intake of full-cream milk did not lead to earlier AAM compared to skimmed or semi-skimmed milk. The controversy over these results is mainly due to the complex composition of milk beverages79. Therefore, stricter control of milk composition in the study cohort would be needed to produce more rigorous results. Cultural factors may play a confounding role; for example, girls with earlier AAM may avoid milk if they believe it is associated with acne, while girls with later AAM may consume more milk and grow further in height. These factors may lead to uncertainty in the causal associations determined by such studies.

Despite the lack of longitudinal studies, this work is suggestive of new directions for understanding the genetic mechanisms between diet habits and AAM, since we are the first to combine LDSC and MR analysis to investigate the causal association between dietary habits and AAM. However, our study has some limitations. Although eating habits may change in adulthood compared to childhood, eating habits formed during childhood have been shown to have a lasting effect on adult eating habits80. We used data of eating habits in adulthood to reflect the exposure to certain products in childhood, due to the lack of large-scale heritability studies of childhood eating habits. First, dietary recall errors and individual differences in dose estimates are inevitable in an FFQ. Such reporting errors may lead to underestimating the true relationship between dietary habits and AAM. Second, people with AAM < 9 years or > 17 years were excluded. Therefore, the conclusions of this study are limited to people with AAM within the typical range and do not apply to people with abnormal pubertal initiation (precocious puberty or delayed pubertal development). Further, only European populations were included in this study, and as such, the conclusions should be interpreted with caution concerning other ethnic groups. Finally, the results of LDSC and MR only suggest possible genetic correlations and causal associations from a genetic perspective. Further experimentation is needed to confirm any underlying biological mechanisms indicated by these genetic associations. It is worth noting that the interpretation of genetic factors in dietary traits can be complex. Especially when AAM is also influenced by environmental factors. Therefore, more GWAS data on children and lifestyle questionnaires are expected to be generated. This will facilitate the study of genetic associations of lifestyle in children.

Conclusions

Based on GWAS summary data for dietary habits and AAM, we identified 29 candidate dietary habits that showed genetic associations with AAM. MR evaluations revealed that 19 dietary habits were causally associated with AAM, including bread type (whole meal, whole grain, white, brown), cereal type (cornflakes, frosties), milk type (skimmed, semi-skimmed, full cream), spread type (butter, margarine, olive oil), overall oily fish intake, and tablespoons of vegetables per day. These results provide a potential novel understanding of the genetic mechanisms underlying the relationship between diet habits and AAM.