Introduction

Preeclampsia, a pregnancy complication characterized by hypertension after 20 gestational weeks, is a major cause of maternal and neonatal morbidity and mortality. Globally, preeclampsia is estimated to complicate 2.0% to 8.0% of all pregnancies1. In the United States (U.S.), the prevalence of preeclampsia increased by 25% between 1987 and 20042. The risk of preeclampsia has also increased over the years, with women giving birth in 2003 at a 6.7-fold increased risk of preeclampsia compared to women who gave birth in 19803. In addition to the rising rates and risk of preeclampsia, the condition is costly. In the U.S. in 2012, one study estimated the cost of preeclampsia in the year following delivery to be 2.18 billion dollars4. In addition to its immediate economic cost, the pregnancy complication has massive effects on the health of the mother and infant. Risk of stillbirth, neonatal death, fetal growth restriction, lower birth weights, preterm birth, and perinatal mortality is higher for women with preeclampsia5,6. There is growing evidence that preeclampsia is not just an issue of pregnancy, rather it effects the long-term health of mothers and infants. Preeclamptic women are later at increased risk of developing and dying from cardiovascular disease while children exposed to preeclampsia have higher blood pressures, body mass index (BMI), and risk of hypertension5,7.

Pre-existing maternal conditions, pregnancy-specific characteristics, environmental factors, and family history have been identified as risk factors for preeclampsia. Increasing age of mothers, higher BMI, pre-existing hypertension, and diabetes all place women at increased risk of preeclampsia during pregnancy8,9,10,11. Having preeclampsia in a prior pregnancy and/or a family history of preeclampsia are significant risk factors for preeclampsia. The odds of preeclampsia are 2.9 (95% confidence interval (CI) 1.70–4.93) times greater in individuals who have experienced it in a previous pregnancy compared to those without a personal history of the condition. Individuals with family members who had preeclampsia are 7.19 (95% CI 5.85–8.83) times more likely to develop it compared to those without a family history9,10,11. Heritability studies of preeclampsia present strong support for a genetic basis for preeclampsia with twin studies estimating heritability for preeclampsia to be approximately 55%12. Several small genome wide association studies (GWAS) have been performed, with less than 50 associated polymorphisms identified to date13,14,15. However, identified risk factors and polymorphisms have yet to fully explain the exact biologic mechanism leading to preeclampsia. There is some evidence supporting that genetic predisposition to hypertension may be associated with preeclampsia in European ancestry and Asian women14. Several polymorphisms associated with preeclampsia have also been previously associated with blood pressure traits, though the shared genetic risk has yet to be fully evaluated14,15,16. Recent studies have further sought to investigate the connection between preeclampsia or hypertensive disorders of pregnancy, blood pressure, and hypertension outside of pregnancy16,17,18,19,20,21,22. Though they uncover more of the genetic background of preeclampsia and its relationship to blood pressure traits, these studies utilized populations that were primarily of European descent, potentially limiting their generalizability to other non-European populations.

As preeclampsia is a blood pressure disorder and puts individuals at greater risk of hypertension later in life, we investigated whether collective blood pressure-raising alleles may contribute to preeclampsia in Black and White females. To increase the understanding of preeclampsia etiology and identify the potential shared genetic architecture with other blood pressure traits, we evaluated the association between polygenic scores (PGS) for three blood pressure traits—diastolic blood pressure (DBP), systolic blood pressure (SBP), and pulse pressure (PP)—and preeclampsia in several cohorts of Black and White reproductive-aged females with documented pregnancies. We also investigated the utility of blood pressure PGSs for the prediction of preeclampsia, determining whether the addition of genetic components improve the performance of prediction models composed of two major clinical risk factors: BMI and age.

Results

Population characteristics

In total, 8513 individuals were included in the study; 2979 individuals from BioVU, 3279 from the Electronic Medical Record and Genomics Network (eMERGE) cohort, and 2255 from the Penn Medicine Biobank (PMBB) cohort (Table 1). There were 1319 cases of preeclampsia, with 17%, 12%, 18% in BioVU, eMERGE, and PMBB. As identified through their electronic health records (EHRs), 41.16% were Black (38.47% in BioVU, 34.74% in eMERGE, 54.06% in PMBB cohort) and 58.84% were White (61.53% in BioVU, 65.26% in eMERGE, and 45.94% in PMBB) in the overall cohort. The overall rate of preeclampsia was 15.49%. Higher percentages of preeclampsia were found in Black individuals compared to White individuals (18.78% vs 13.20%). Within cohorts and across the entire population, cases had higher average BMIs and ages compared to controls.

Table 1 Population characteristics.

Blood pressure trait PGSs-preeclampsia associations

In Black individuals, PGSs for DBP and SBP were not significantly associated with preeclampsia in the meta-analyses (odds ratio (OR)DBP = 0.99, 95% confidence interval (CI) 0.88–1.12, p-value = 0.90; ORSBP = 1.10, 95% CI 0.99–1.23, p-value = 0.09) (Table 2). The PGS for PP was significantly associated with preeclampsia in Black individuals, with every one standard deviation increase in the PP PGS, the odds of preeclampsia is associated with a 1.16-fold increase (ORPP = 1.16, 95% CI 1.03–1.31, p-value = 0.01). In White individuals, DBP, SBP, and PP PGSs were all significantly associated with preeclampsia (ORDBP = 1.14, 95% CI 1.07–1.23, p-value = 2.25 × 10−4; ORSBP = 1.18, 95% CI 1.10–1.27, p-value = 4.42 × 10−6; ORPP = 1.14, 95% CI 1.05–1.23, p-value = 1.77 × 10−3). In the cross-ancestry meta-analysis, all three PGSs were highly significant (Table 3; Figs. 1, 2, 3).

Table 2 Association between blood pressure PGSs and preeclampsia by self-reported race.
Table 3 Association between blood pressure PGSs and preeclampsia (cross-ancestry).
Figure 1
figure 1

Cross-ancestry meta-analysis of the association of diastolic blood pressure polygenic risk score and preeclampsia.

Figure 2
figure 2

Cross-ancestry meta-analysis of the association of systolic blood pressure polygenic risk score and preeclampsia.

Figure 3
figure 3

Cross-ancestry meta-analysis of the association of pulse pressure polygenic risk score and preeclampsia.

Prediction models

Models composed of solely of all three PGSs and 10 principal components had extremely poor discriminatory abilities, with area under the receiver operating characteristics curve (AUCs) ranging from 0.49 to 0.53 (Table 4). Clinical models had slightly better discrimination (AUCs: 0.59–0.67). The addition of the PGSs to the clinical models decreased the predictive ability of the models. Brier scores for PGS, clinical, and full models demonstrate low accuracy in the testing population. In general, lower AUCs and higher Brier scores were seen in PGS models compared to models composed of clinical risk factors and in EHR-identified non-Hispanic Black populations compared to White or cross-ancestry populations. Overall, inclusion of PGSs in a predictive model composed of significant clinical risk factors did not provide an improvement in predictive performance. Calibration and discrimination of the predictive models indicated poor fits in the testing population (Table 4).

Table 4 Prediction models in testing (BioVU) population.

Discussion

We evaluated the potential overlap between genetic risk factors for blood pressure traits and preeclampsia. Utilizing PGSs for DBP, SBP, and PP, we found significant associations between polygenic risk for these blood pressure traits and preeclampsia. These findings support shared genetic architecture across preeclampsia risk and blood pressure traits. Overall, these findings suggest genetic factors contributing to blood pressure regulation in the general population also predispose to preeclampsia. However, polygenic risk scores for blood pressure traits had poor predictive ability in our study population. Furthermore, the addition of PGSs to a risk score composed of known clinical risk factors for preeclampsia did not result in a significant improvement in the prediction of preeclampsia. The poor predictive ability of the PRSs should be viewed with caution, as the analyses were likely influenced by small sample size and limited power.

Pre-pregnancy hypertension is a major risk factor for preeclampsia8. The development of preeclampsia also increases the risk of persistent hypertension outside of pregnancy7. This overlapping risk is likely due in part to shared underlying genetic mechanisms. Several preeclampsia GWAS support this idea, finding polymorphisms previously associated with hypertension are also associated with preeclampsia14,15. A recent large-scale meta-analysis of European and Central Asian women further reinforced the concept that genetic predisposition to hypertension contributes to the risk of hypertensive disorders of pregnancy14. While an association between a hypertension PGS and hypertensive disorders in pregnancy were observed, the effect was stronger for gestational hypertension compared to preeclampsia, suggesting that, while there is overlap in genetic susceptibility, additional factors are involved in preeclampsia.

It is unclear whether blood pressure complications and preeclampsia are comorbid conditions or if complications lead to preeclampsia or vice versa. Maternal cardiovascular systems undergo significant changes during pregnancy23,24. Individuals who experience preeclampsia may undergo further physiological changes. For example, previous research has noted significant increases in arterial stiffness in those with preeclampsia compared to individuals with normotensive pregnancy or those with gestational hypertension25. Our study provides further evidence of the role of arterial stiffness in preeclampsia, given that the PGS for PP, a marker of arterial stiffness, was strongly associated with preeclampsia in both Blacks and Whites. Additional pathways involved in preeclampsia could also be uniquely triggered by pregnancy-related changes and the fetal genome. Blood pressure regulation through genetic mechanisms may also be perturbed during pregnancy, with interactions resulting in those with increased genetic risk becoming more likely to develop preeclampsia. Additional research is needed to disentangle the causal origins of shared risk factors between blood pressure traits, complications, and preeclampsia and to identify variants that uniquely contribute to preeclampsia risk. Our cohort is also significantly younger than most cohorts used in blood pressure research. Age-related changes in blood pressure regulation have also been reported previously and could account for some of the variation seen between genetic risk factors for blood pressure traits and preeclampsia26.

This study builds upon previous research by investigating the traits underlying hypertension and further disentangling how genetic risk factors for these traits impact preeclampsia risk. Two recent large scale GWAS have been conducted for preeclampsia and hypertensive disorders of pregnancy. Honigberg et al. used 20,064 cases of preeclampsia and 703,117 control individuals to perform a GWAS. Results from the GWAS were then used to create a PRS for preeclampsia16. They also used the Giri et al. study27 to derive a PRS for SBP; however, they use different methods to create the score. The result was a score contain 1,064,898 HapMap3 variants compared to our score using 4294 single nucleotide polymorphisms (SNPs). The conclusion of the association analysis was the same as our study: preeclampsia is associated with SBP PGS. Though our study and Honigberg et al. both utilized PGS constructed from summary statistics from Giri et al. and found associations, the populations that were used to train and test these models are important considerations. Honigberg et al. trained their PRS on the UK Biobank European LD panel and validated the models in two other cohorts that were either completely or predominantly (> 73.0%) of European descent16. We utilized eMERGE and BioVU populations that had a large proportion of non-Hispanic Black individuals.

Previous research has demonstrated that predictive performance of European ancestry derived PGS is lower in non-European ancestry samples28. Thus, the differences in methods, training, and testing populations make further comparison of results across studies difficult. Additional studies have also reported associations between blood pressure PGS (SBP and DBP) and preeclampsia17,18,19. Research utilizing blood pressure PGS to predict hypertensive disorders of pregnancy is conflicting. Several studies report improvement in predictive ability when PGSs were added to a model of clinical risk factors16,21, while others report no significant improvement in performance19. However, clinical risk factors and population characteristics used in prediction modeling were inconsistent across studies. Utilization of PGS for predicting preeclampsia warrants further investigation.

Our results, which support the conclusion that genetic risk factors for blood pressure have a small contribution to the risk of preeclampsia, have several key strengths. First, previous genetic studies of pregnancy-specific phenotypes, including preeclampsia, have included all females from the source population, regardless of pregnancy status. Results of these studies could be biased as a proportion of their controls do not have the ability to be classified as cases. Our study cohort was also restricted to females with a pregnancy, ensuring all individuals possessed the appropriate window during which the outcome could have occurred. Previous research, including two recent studies which found associations between hypertensive disorders of pregnancy and polygenic risk for gestational hypertension/preeclampsia or blood pressure traits21,22, has been largely limited to Asian and White women or those of European ancestry. Here, we investigate preeclampsia genetic risk factors in EHR-identified Black individuals, a population known to be at increased risk of preeclampsia11.

Our study has several considerations. The overall rate of preeclampsia in our study population was higher than expected based on previous observational data. The biobanks in our study all contain data from hospital systems, as opposed to population-based samples used to obtain estimates of incidence of preeclampsia. The populations served by these hospitals may be different from the general population. For example, Vanderbilt University Medical Center (VUMC) has a high-risk obstetric team which can provide advanced care. Individuals with preeclampsia may have been sent to VUMC by other health systems that are unable to provide care for these complex cases. The racial diversity in our study cohort could also contribute to the higher rates of preeclampsia. This study included a large proportion of Black individuals. Epidemiologic research has revealed that individuals who identify as Black or African American are at increased risk of preeclampsia compared to those from other racial and ethnic groups. While our sample size is small, results from our association analyses align with previous research which has found an association between polygenic risk scores for blood pressure traits and both gestational hypertension and preeclampsia21.

Our study found small differences in the associations of blood pressure PGSs and preeclampsia between the EHR-identified races, which cannot be corroborated by other research as previous studies have limited diversity or did not validate the PGS associations across other, non-European populations. Our stratified results should be interpreted carefully, especially in the non-Hispanic Black group which had small sample sizes and displayed a larger degree of uncertainty in effect estimates. Additionally, the less significance associations and poor transferability of the PGS in the non-Hispanic Black group may be related to the populations used to create them. Each PGS was created using summary statistics from a cross-ancestry GWAS and the DBP and SBP PGS were subsequently validated on a cross-ancestry population. However, both populations were predominately those of European ancestry or non-Hispanic White individuals. The differences in the ancestry of the populations used to discover significant loci, those used to validate the PGS, and our study cohort could have biased the results and contributed to the lack of significance in the non-Hispanic Black populations. It is also notable that the PGS performed better in White compared to Black populations generally27, which may also impair the quality of their prediction of preeclampsia. To ensure health equity and transferability of PGS between populations of different ancestries, future genetic research of preeclampsia should aim to increase diversity of their cohorts.

Prediction modeling using the PGS models was also limited due to many individuals in the study cohort missing BMI and being excluded during predictive model development and testing. Prediction models composed of all three PGS had poor predictive ability. Adding the PGS to the clinical model resulted in poorer performance than the clinical model alone. This could be due to a portion of individuals with higher genetic burden being classified as controls, as the study cohort did not exclude those with hypertension diagnoses and the timing of hypertension development and diagnoses relative to pregnancy could not be fully determined without prospective blood pressure measurement. A previous study found that incorporating an SBP PGS into a clinical model resulted in a small improvement in the risk prediction and discrimination of preeclampsia21. However, the study also included a preeclampsia PGS and additional variables (e.g., in-vitro fertilization, renal insufficiency, multifetal pregnancy, etc.). Results from our prediction modeling should also be considered cautiously as clinical and environmental risk factors, including family history, were incomplete in our datasets. The addition of other known risk factors in development and testing of prediction models in larger datasets with more complete BMI information may increase their predictive ability and clinical value.

While this study further endorses the connection between genetic risk factors for blood pressure and preeclampsia, results suggests that many variants, each with small effect sizes, likely affect genetic predisposition to preeclampsia. Though many of these variants may mainly impact preeclampsia through regulation of blood pressure, other molecular pathways are likely involved. Future research should continue to develop and validate genome informed clinical models for preeclampsia across global populations and conduct additional analyses of preeclampsia shared genetic architecture with other conditions. These analyses will expand our understanding of preeclampsia etiology and potentially inform future treatments by providing evidence for repurposing treatments for conditions that share a biological basis with preeclampsia.

Methods

Study populations

Our study populations were obtained from three sources: BioVU, the Electronic Medical Record and Genomics Network (eMERGE), and the Penn Medicine Biobank (PMBB) at the University of Pennsylvania29,30,31. BioVU is Vanderbilt University Medical Center’s biorepository linking DNA samples to deidentified electronic health records (EHR). EMERGE, a national network which also combines DNA biorepositories with EHR systems, contains data from individuals at more than 10 sites around the U.S. Though Vanderbilt is an eMERGE site, data from eMERGE is independent of BioVU. Like BioVU, the PMBB connects de-identified EHR data to a biobank from a single academic healthcare system. The data and source populations from BioVU, eMERGE, and PMBB are similar. All individuals were recruited at their corresponding healthcare system, largely in an outpatient or hospital setting, and have linked genotype and de-identified EHR data. Each source contains a subset of the active patients within their system. Most individuals are of European ancestry or identify as non-Hispanic White, though a sizable proportion identify as African American or Black, Hispanic, and Asian. Mirroring the U.S. population, a slightly higher proportion of individuals in BioVU, eMERGE, and the PMBB are female29,30,31.

This study was deemed non-human subjects research and approved by Vanderbilt University Medical Center Institutional Review Board. All methods were carried out in accordance with relevant guidelines and regulations. Individuals were eligible for inclusion in the study if they were females who had documented pregnancies or deliveries, were of reproductive age at last known EHR entry (18 to 45 years old), identified in the EHR as non-Hispanic Black or non-Hispanic White and had genotype information available. Identification of individuals who are or were pregnant and/or had deliveries was accomplished using International Classification of Disease, 9th and 10th revisions (ICD-9/ICD-10) codes. Presence of any of the following codes indicated potential eligibility for inclusion in the study: 631, 633, 634.3, 635 through 679, 760–779, 796.5, V22-V24, V27, V28, V72.42, V82.4, V89, V91 (ICD-9), O09-O16, O20-O26, O28-O36, O40-O48, O60-O77, O80, O82, O85-O92, O94, O98, O99, O9A, Z03.7, Z32.01, Z33, Z34, Z36, Z37, Z39, or Z3A (ICD-10). Preeclampsia cases were defined using the presence of any one of codes 642.4, 642.5, 642.6, 642.7 (ICD-9 codes), O11, O14, and O15 (ICD-10).

Genotyping and quality control

Samples from BioVU were genotyped using the custom Multi-ethnic Genotyping Array (MEGA-Ex, Illumina) and imputed to the Haplotype Reference Consortium (HRC) panel v1.1 using the Michigan Imputation Server (MIS)30,32,33. As described previously, individuals from all eMERGE sites were genotyped via Illumina 660WQuad array for those reported as European or unknown ancestry and the Illumina 1M-Duo array for those of African ancestry and were similarly imputed to the HRC genotype reference panel using the MIS34. Standard quality control procedures were performed on cohorts from BioVU and eMERGE, including filtering based on sample quality and composition (i.e., checking for sex inconsistencies, sample relatedness, and sample genotyping call rate) and marker quality (i.e., marker genotype call rate, concordance, minor allele frequency, and Hardy-Weinberg Equilibrium)29,30. The eMERGE cohort included additional quality control procedures, such as strand orientation analysis, before genotype data from the contributing sites were merged34. PMBB data was genotyped on an Illumina Global Screening Array v.1 and v.2.0 (GSAv1, GSAv2). Sample-level quality control was performed before imputation was performed using Minimac5 version 2.0.0 software on the TOPMed Imputation Server. As with the cohorts from BioVU and eMERGE, post-imputation quality control was performed and included imputation score filter (R2 > 0.7), removal of palindromic variants, biallelic variant check, genotype call rate (> 99%), minor allele frequency (> 1%), sex check, sample call rate (> 99%) filtering, and a Hardy–Weinberg equilibrium test (p-value > 1.0 × 1−8)31.

Blood pressure traits polygenic scores (PGS) development

We evaluated the association between PGS for three blood pressure traits (DBP, SBP, and PP) and preeclampsia. Individually, DBP, SBP, and PP are highly heritable and common methods to capture blood pressure in genetic studies35. We evaluated these traits independently because the genetic and biologic mechanisms underlying these traits vary. They capture different domains of blood pressure: SBP is the maximal arterial pressure exerted when the heart beats while DBP is the (arterial) pressure between heartbeats, and PP is the difference between the two and indicative of arterial stiffness27,35. We utilized previously developed PGS for these three blood pressure traits36. Briefly, summary statistics for genetic associations with DBP, SBP, and PP were obtained from a previously published cross-ancestry genome-wide association study (nmax = 760,226 subjects)27. In this large meta-analysis for blood pressure phenotypes, the measurements for SBP, DBP, and PP were obtained from electronic health records. For the discovery cohort, containing individuals from the Million Veterans Program, these phenotypes were defined as the median value of all eligible measurements of SBP and corresponding DBP. To be eligible for inclusion in calculation of the median and final blood pressure phenotypes, these measurements could not come from Emergency Department outpatient visits and could not have occurred at or after an ICD-9 code for chronic kidney disease (code group 585), secondary hypertension (code group 405), or heart failure (code group 428). For those with available pain scores, blood pressure measurements taken during encounters where individuals had a pain score greater than or equal to 5 were also ruled ineligible and taken out of median calculation. Individuals on an antihypertensive medication, they added 15 mmHg and 10 mmHg to SBP and DBP, respectively. Finally, if median values were documented multiple times on distinct dates in an individual’s electronic health record, the earliest measurement was used to identify DBP, antihypertensive treatment status, and age of the individual. In the UK Biobank cohort, SBP, DBP, and PP phenotypes were averaged over two measures and similarly adjusted for antihypertensive drug status27.

The PGSs for DBP, SBP, and PP were develop using these summary statistics and the resulting SBP and DBP PGSs were validated in BioVU (nmax = 37,132, DBP PGS p = 7.63 × 10–113, SBP PGS p = 4.18 × 10–132) for association with blood pressure measurements, adjusting for age, sex, BMI, and the top ten principal components of ancestry36. The BioVU genetic data was pruned for linkage disequilibrium at an r2 threshold of 0.1 with a maximum distance of 250 kilobases from associated SNPs (p < 1 × 10–5) in the blood pressure summary statistics. Weighted scores were calculated in PLINK using p-value thresholding (p ≤ 0.00001)37. The DBP, SBP, and PP scores contained 4194, 4294, and 3345 SNPS respectively (Supplementary Tables S1S3)36. PGS were transformed by multiplying scores by the standard deviation of the PGS in the original population from which they were optimized.

Blood pressure trait PGSs-preeclampsia association analyses

Logistic regression was used to evaluate the association between each of the three blood pressure traits and preeclampsia separately in the three cohorts (BioVU, eMERGE, and PMBB) stratified by race (Black and White). Analyses were adjusted for age, age squared, and the top ten principal components. Associations were considered statistically significant at p-values less than 0.05. Odds ratios (OR) and 95% confidence intervals (CI) are reported for each model as the increase in odds of preeclampsia per one standard deviation increase in each blood pressure polygenic risk score. Analyses were completed using R version 4.2.238. To investigate whether trends were similar across different populations, we combined effect estimates and standard errors using fixed-effects meta-analyses for each EHR-identified race. Then, we meta-analyzed data overall to obtain a single value across the entire study for each of the three blood pressure PGSs. Fixed effects meta-analyses were performed in RevMan, version 5.4, using an inverse-variance approach39.

Predictive modeling

To evaluate the performance of blood pressure PGS for the prediction of preeclampsia, we developed prediction models using tenfold cross validation. Joint prediction models composed of all three PGSs (DBP, SBP, and PP) were developed. Clinical models, composed of two major preeclampsia risk factors (BMI and age at last EHR entry), were constructed for comparison. A full model, which added the three PGSs to the clinical model, was used to investigate whether PGS improved prediction of preeclampsia beyond traditional risk factors. Models that included PGSs also adjusted for the top ten principal components.

Models were developed in the cohort from eMERGE and tested in BioVU. Synthetic minority over-sampling techniques (SMOTE) were used to attenuate the effects of class imbalance (number of cases and controls) due to the rarity of the outcome during model training30,40. Only those with complete covariate information were used in prediction modeling. All models were created and evaluated separately for each EHR-identified race (non-Hispanic Blacks and Whites) with subsequent development of cross-ancestry models. Model performance was assessed through discrimination and calibration using the area under the receiver operating characteristics curve (AUC) and the Brier Score. All prediction modeling was performed in R38.