Introduction

The World Health Organization defines preterm birth as the delivery of a neonate between 20 and 37 weeks of gestation1. Globally, 14.84 million preterm births and 1.1 million prematurity-related deaths occur each year2,3. The United States has seen the incidence of preterm birth continue to increase since 2014, and this rate has recently plateaued at nearly 10% since 20184. Preterm birth is the leading cause of neonatal mortality (death within 28 days of delivery) and morbidity (e.g., neonatal respiratory morbidity and NICU triage/admission) as well as infant mortality (death before 5 years of age)3,5. Prematurely born infants are at an elevated risk of developing chronic diseases that may include neurological disorders (e.g., learning disabilities) and cardiovascular diseases later in life6,7. Preterm birth poses a substantial economic burden on society; approximately $26.2 billion are spent annually on the care of prematurely born infants8; hence, there is a sustained effort to develop strategies to prevent and reduce the impact of preterm birth. Although there has been success in reducing the mortality rate, other adverse neonatal outcomes associated with prematurity have remained consistently prevalent9,10,11.

Preterm birth is either iatrogenic (i.e., medically indicated in the event of disease, such as preeclampsia) or spontaneous following preterm labor or preterm pre-labor rupture of the membranes12,13. Spontaneous preterm labor accounts for most of the preterm births, characterized by multiple underlying etiologies culminating in the activation of a common pathway of labor that leads to spontaneous preterm delivery13,14. There is evidence to support a causal relationship between microbial-associated or sterile intra-amniotic inflammation and spontaneous preterm labor and birth15,16,17,18,19,20,21,22,23,24. Intra-amniotic inflammation, however, is responsible for only a subset of cases with spontaneous preterm labor and delivery, while most cases of preterm labor are considered to be idiopathic or have an unknown etiology.

To address the complex public health problem of preterm birth, the development of biomarkers for preterm labor is necessary25,26,27,28. Currently, risk modeling of preterm birth relies on maternal factors: a history of preterm birth29, of late miscarriage30 or cervical excisional surgery31, a sonographic short cervix32, a low customized cervical length percentile33 during the current pregnancy, amniotic fluid sludge34, and an abnormal cervical consistency index35. Biochemical markers, e.g., fFN36, PIGFBP-137, PAMG-138, inflammatory cytokines39, and cervical acetate40, as well as the vaginal microbiome41 and maternal blood transcriptome42, have also been suggested as predictive of preterm birth. Combinations of biomarkers have shown superior predictive performance compared to any single test43. However, given the syndromic nature of spontaneous preterm labor, there is a need to improve preterm birth prediction performance relative to current biomarkers43,44.

Amniotic fluid (AF) surrounds the developing fetus and is in continuous exchange with fetal organs and gestational tissues45,46,47; hence, this fluid is a rich source of potential biomarkers for preterm labor and birth48. The cell-free supernatant that remains upon removing the AF's cellular components reflects fetal well-being and pregnancy status46,47,49,50,51,52,53,54,55,56. Indeed, studies have described changes in the AF cell-free transcriptome in fetal genetic disorders, such as trisomy 1850, trisomy 2149, and Turner syndrome52, as well as in fetal growth restriction54 and preeclampsia55. Others have investigated the association between the AF cell-free transcriptome and maternal factors such as race, obesity, and smoking status51,56, as well as neonatal morbidity53; yet, the AF cell-free transcriptome in spontaneous preterm labor was not assessed in prior investigations.

To address the current knowledge gap, we performed whole transcriptome profiling of AF in mothers who underwent transabdominal amniocentesis after an episode of preterm labor and then utilized machine learning to predict the time-to-delivery after amniocentesis. Given the recent development of cell-type-specific signatures based on single-cell genomic studies of the placenta57,58,59,60,61 and the relevance of these signatures in identifying preeclampsia58,62 and preterm parturition61, we have also assessed the perturbations of these signatures in AF cell-free RNA.

Results

Demographic characteristics of the study participants

We examined the AF cell-free transcriptome in samples collected from 38 women who had a transabdominal amniocentesis performed after an episode of preterm labor (Fig. 1a). Women were divided into two groups according to the interval from amniocentesis to delivery (Fig. 1b): (1) women who delivered within 24 h of amniocentesis (n = 10) and (2) women who delivered after 24 h from amniocentesis (n = 28). Table 1 presents a comparison of the clinical characteristics of women between the two groups.

Figure 1
figure 1

(a) Study Design (created with biorender.com). The cell-free transcriptome of amniotic fluid samples collected from women after an episode of preterm labor was quantified with microarrays. (b) Distribution of gestational age at sampling. Each line corresponds to a single mother, and each circle represents a sample. The green triangles mark the gestational age at delivery. (c) Unsupervised clustering. Heatmap shows the hierarchical clustering of samples based on the expression of the most variable genes. The R package, pheatmap, was used to generate the heatmap. (d) Principal component analysis of amniotic fluid cell-free RNA expression. All samples are depicted as their first and second principal components derived from the cell-free amniotic fluid transcriptome. The proportion of variance explained by each principal component is shown along the axis. The R/Bioconductor package, PCAtools, was used to calculate and plot the principal components.

Table 1 Demographic characteristics of the women included in the transcriptomics study.

Women who delivered within 24 h of amniocentesis had smaller babies than those who delivered after 24 h (median birth weight: 1907.5 g vs. 2142.5 g, p = 0.047); yet, there were no differences in the birthweight percentiles. The AF interleukin (IL)-6 concentrations, frequency of AF glucose levels < 14 mg/dl, and frequency of AF white blood cell counts ≥ 50 cells/mm3 were higher in women who delivered within 24 h of amniocentesis than women who delivered after 24 h. Other fetal and maternal characteristics were similar between the two groups. Of importance, there was no significant difference in gestational age at amniocentesis between women who delivered within 24 h of amniocentesis compared to women who delivered after 24 h (median gestational age at amniocentesis: 32.8 weeks vs. 31 weeks, p > 0.5).

Amniotic fluid WBC count, IL-6 levels, and culture determinations are used in the clinical decisions in patients with preterm labor who undergo an amniocentesis. These decisions may include administering tocolytic agents to inhibit myometrial contractions, administering antenatal corticosteroids for fetal lung maturity, or early delivery in cases of severe infection/inflammation. However, none of the 38 women in this study had labor induced preterm, with only one case having labor induction at 40 weeks. Therefore, indicated early delivery was not a confounding factor in the analysis of cell-free RNA data. Clinical decision to administer tocolytic agents may affect the interval from amniocentesis to delivery, yet this was not the case here since tocolytic agent administration occurred at similar rates between the two groups (delivery ≤ 24 h from amniocentesis vs. > 24 h) (40% vs. 46.4%, respectively) (Table 1).

Differential expression with imminent delivery after amniocentesis

Hierarchical clustering (Fig. 1c) based on the most variable genes and principal components analysis (Fig. 1d) shows an overall separation between the group of women who delivered within 24 h of amniocentesis and those who delivered after 24 h. The first principal component that captured 36% of the variation was correlated with gestational age at amniocentesis (Pearson correlation = 0.36, p = 0.027).

The comparison of AF cell-free transcriptomes of women who delivered within 24 h of amniocentesis and those from women who delivered after 24 h showed differential expression (fold change > 1.25, q value < 0.1) of 2385 genes (1508 up-regulated and 877 down-regulated) (Table S1, Fig. 2).

Figure 2
figure 2

Differential expression analysis. The figure shows (a) the volcano plots of log10 transformed q- values against log2 transformed fold changes of all genes and (b) heatmaps based on the top 50 up-regulated and down-regulated genes for the comparison between women who delivered within 24 h of amniocentesis and those who delivered after 24 h. The R/Bioconductor packages, EnhancedVolcano, and pheatmap, were used to generate the volcano plot and heatmap, respectively.

The five most up-regulated genes were CCL4 (C–C motif chemokine ligand 4), IL1B (interleukin 1 beta), AQP9 (aquaporin 9), BCL2A1 (BCL2 related protein A1), and CXCL8 (C-X-C motif chemokine ligand 8). Functional analysis of up-regulated differentially expressed (DE) genes showed an over-representation of 1918 biological processes, 171 cellular components, and 143 molecular functions (Table S2). Most enriched biological processes were related to the inflammatory and immune responses to stimuli, e.g., pattern recognition receptor signaling pathway, leukocyte mediated immunity, NIK/NF-kappaB signaling, and cytokine-mediated signaling pathway. Significantly over-represented cellular components included terms related to extracellular region, membrane, cytoplasmic vesicle part, and I-kappaB/NF-kappaB complex. The over-represented molecular functions included receptor binding, receptor regulator activity, signaling receptor activity, and oxidoreductase activity, acting on NAD(P)H, oxygen as acceptor.

The most down-regulated genes with imminent delivery included HBG2 (hemoglobin subunit gamma 2), SNORD71 (small nucleolar RNA, C/D box 71), LCE3E (late cornified envelope 3E), LOC100129037 (WEE1 homolog (S. pombe) pseudogene), and SNORA11 (small nucleolar RNA, H/ACA box 11). Gene ontology enrichment analysis of down-regulated genes identified 492 biological processes, 151 cellular components, and 130 molecular functions (Table S3). The significant biological processes were related to nucleobase-containing compound metabolism, organelle organization, cell cycle, and cellular response to DNA damage. The most-enriched cellular components included nuclear part, organelle, catalytic complex, and cell junction. Protein binding, DNA binding, DNA-dependent ATPase activity, and N-acetyltransferase activity were among the most over-represented molecular functions among the down-regulated genes.

Changes in cell-type and tissue-specific signatures in AF with imminent delivery

To gain further insight into the meaning of the differential expression findings between the study groups, we compared the expression of gene sets defined based on their specificity to tissues and cell-types (according to the GNF Expression Atlas63). The average standardized expression of genes preferentially expressed in 23 tissues or cell-types was significantly increased in the AF cell-free transcriptome of women who delivered within 24 h of amniocentesis compared to women who delivered after 24 h (q < 0.05) (Fig. 3a). These included tissues and cell-type-specific signatures of organs (e.g., fetal lung, liver, and olfactory bulb) and immune system-related signatures (bone marrow, lymph nodes, whole blood, T cells, B cells, monocytes, and natural killer cells).

Figure 3
figure 3

Expression of tissue-specific and placental single-cell RNA Seq signatures. For each (a) tissue and (b) placental single-cell signature, the expression of the top 20 most preferentially expressed genes was transformed into a Z score and averaged. The Z scores were compared between women who delivered within 24 h of amniocentesis and those who delivered after 24 h. Significant tissues and placental single-cell types (q value < 0.05) are shown in the figure.

Changes in placental scRNA-Seq signatures in AF with imminent delivery

By utilizing the same type of analysis described above for tissues, we have also analyzed changes in placental scRNA-Seq signatures derived from the placentas' three compartments (basal plate, placental villi, and chorio-amniotic membranes) in women with term labor or preterm labor61. There was a significant increase in the average expression of genes specific to monocytes, myeloid progenitor cells, dendritic macrophages, activated T cells, B cells, natural killer cells, and extravillous trophoblasts in women who delivered within 24 h from amniocentesis compared to those who delivered after 24 h (Fig. 3b).

Prediction of time-to-delivery

Based on the differential expression analysis, we concluded that the AF cell-free transcriptome of women with preterm labor echoes the inflammatory response that precedes delivery. We hypothesized that predictive models that capture these effects should predict the time-to-delivery after amniocentesis. Our modeling strategy included the selection of RNAs that were most informative about the interval from amniocentesis to delivery in a multivariate evaluation, followed by random forest model fitting of gene expression data to predict time-to-delivery as a continuous variable. The cross-validated prediction of time-to-delivery by a transcriptomic model was significant, with a Spearman's correlation coefficient of 0.49 (p < 0.001) and a root-mean-square error (RMSE) of 3.1 weeks (Fig. 4a). When assessed as a binary outcome, prediction of delivery within 24 h of the procedure by the transcriptomic model time-to-delivery estimates was also significant as indicated by a receiver operating characteristic (ROC) curve analysis (Fig. 4b). The areas under the ROC curve (AUROC) for prediction of delivery within 24 h, 1 week, and 2 weeks were 0.81, 0.74, and 0.72, respectively.

Figure 4
figure 4

Prediction of time-to-delivery by cell-free amniotic fluid transcriptomics. (a) Cross-validation based predictions of time to delivery are plotted against actual values for all samples. (b) Receiver operating characteristic curve for the prediction of imminent delivery by estimates of the time-to-delivery (generated with the R package, pROC). RMSE root mean squared error.

To assess the robustness and reproducibility of genes selected as predictors during the cross-validation analysis, we calculated the average Jaccard similarity (0.82) and average kappa coefficient (0.9) between all sets of predictor genes identified across leave-one-out iterations. Based on Kursa's64 definition of significantly self-consistent selection, 53 of the most-predictive genes of time-to-delivery in AF samples are highlighted in Table 2. All 53 genes selected as predictors were up-regulated in the AF cell-free RNA of women who delivered within 24 h of amniocentesis compared to those who delivered later. Of note, 23 genes were selected in all iterations of leave-one-out cross-validation; these included IL1B (interleukin 1 beta), CXCL8 (C-X-C motif chemokine ligand 8), AQP9 (aquaporin 9), PLEK (pleckstrin), BCL2A1 (BCL2 related protein A1), CCL3 (C–C motif chemokine ligand 3), CCL4 (C–C motif chemokine ligand 4), and CCL20 (C–C motif chemokine ligand 20). Figure 5 shows a highly connected protein–protein interaction network built based on the corresponding 53 predictor genes. In this figure, several enriched Gene Ontology biological processes related to the immune and inflammatory responses are highlighted (q value < 0.05).

Table 2 Amniotic fluid cell free RNAs most predictive of time from amniocentesis to delivery.
Figure 5
figure 5

Protein–protein interaction network for the most predictive genes. Selected biological processes over-represented among the most predictive genes are shown in the pie charts. The network was created with stringApp v1.5.0 in Cytoscape v3.7.2.

Discussion

Spontaneous preterm labor is a syndrome with many etiologies and may involve intra-amniotic inflammation with or without microbial infection, oxidative stress, and placental dysfunction14. Accurate prediction and mitigation of spontaneous preterm birth are still challenging. Identification of symptomatic patients at the greatest risk of impending delivery allows obstetricians to implement prophylactic interventions and timely transfer into tertiary care centers and to guide antenatal therapy and postnatal care intended to reduce the risk of adverse outcomes the neonate65,66,67,68. Reassuring mothers that they are not at risk of imminent delivery also alleviates stress, thereby improving the likelihood of an uncomplicated pregnancy69.

We hypothesized that the proximity of amniotic fluid to the fetus and gestational tissues makes it an ideal source to explore the transcriptomic perturbations preceding delivery. Herein we have characterized for the first time AF cell-free transcriptomic changes and identified placental single-cell RNA signatures in women who present with an episode of preterm labor. These data could inform the development of biomarkers in subsequent studies based on minimally invasive samples, such as maternal blood.

The current study included women (n = 38) who had transabdominal amniocentesis performed after an episode of preterm labor, with some delivering on the same day (within 24 h, n = 10) and others delivering later in gestation (n = 28). All women who delivered within 24 h of amniocentesis had a preterm birth, whereas 57% (16/28) of women delivering after 24 h had a preterm birth. Intra-amniotic inflammation (IL-6 ≥ 2600 pg/ml) was diagnosed in 80% (8/10) of women who delivered within 24 h of amniocentesis and in 29% (8/28) of women who gave birth later in gestation. Consistent with this observation, comparison of AF cell-free transcriptomes between the two groups revealed an up-regulation of genes involved in the immune system's innate and adaptive components, including myeloid leukocyte activation, regulation of complement activation, toll-like receptor signaling pathway, natural killer cell-mediated cytotoxicity, B-cell-mediated immunity, and T-cell-mediated cytotoxicity. The expression of gene signatures defining cell-types (myeloid cells, monocytes, T cells, B cells, and natural killer cells) and tissues involved in the immune response also increased in the amniotic fluid of women with imminent preterm delivery. Previous studies have shown that the fetal or maternal origin of the amniotic fluid immune response depends on the gestational age, with fetal origin suggested during early preterm gestation and maternal origin later in pregnancy70,71,72,73.

Genes having a lower expression close to delivery were enriched for intra-cellular biological processes, e.g., cellular macromolecule metabolic process, organelle organization, cytoskeleton organization, cell cycle, and embryonic development. Thus, we observed a shift in the amniotic fluid milieu from one that reflects fetal organ maturation and growth to a pro-inflammatory phenotype induced by the stimulation of a maternal and or a fetal immune response as delivery approached. The activation of the immune response was accompanied by the increased expression of genes coding for pro-inflammatory cytokines (e.g. IL-1α, IL-1β, and IL-6), chemokines (e.g. CCL20, CXCL5, CCL5, and CXCL8), matrix metalloproteases (e.g. MMP1, MMP8, and MMP9), nuclear factor kappa B (NFKB1 and NFKB2), and prostaglandin-endoperoxide synthase 2, all of which have been implicated in the pathogenesis of the preterm parturition syndrome18,74. Taken together, these results suggest that the activation of fetal and maternal immune responses acts as the trigger for preterm parturition in women who delivered within 24 h of amniocentesis. This response may be triggered when microorganisms invade the amniotic cavity (intra-amniotic infection/inflammation) or even in the absence of detectable microbes (sterile intra-amniotic inflammation)19. We did not apply the molecular microbiological techniques that reliably discriminate between these two conditions75. However, a previous report suggests that sterile intra-amniotic inflammation is more prevalent than intra-amniotic infection/inflammation in preterm deliveries19. Sterile intra-amniotic inflammation is thought to be initiated by danger signals, or alarmins, derived from cellular stress or necrotic release of intracellular matter into the intra-amniotic space20,76,77,78,79,80. This process involves the activation of the NLRP3 inflammasome22,23,81,82,83,84,85,86. Indeed, we observed an increased expression of the putative activators, components, and effectors of the inflammasome, such as IL-1α, baculoviral IAP repeat containing 3, guanylate binding protein 5, NLRP3, caspase-1, and IL-1β84,87,88.

In the current study, we also interrogated the second-trimester AF cell-free transcriptome to identify predictors of time-to-delivery after amniocentesis and to evaluate their predictive performance in risk stratification of women who present with symptoms of preterm labor. Previously, Ngo et al.42 developed a transcriptomic model of time-to-delivery based on longitudinal maternal blood cell-free RNA, and reported that a term-delivery-based model does not predict gestational age at delivery in women with preterm birth (RMSE, 11.4 weeks). Herein, we used the AF cell-free transcriptomic data from women with preterm labor to train robust machine-learning models to estimate the interval from amniocentesis to delivery. The cross-validated transcriptomic models showed a significant prediction (Spearman’s correlation 0.5, RMSE 3.1 weeks). When the continuous time-to-delivery estimates from the transcriptomic model were translated into binary predictions of delivery within 24 h, 1 week, and 2 weeks, the risk of imminent delivery was predicted with an AUROC of 0.81, 0.74, and 0.72, respectively. These results point to the AF cell-free transcriptome's potential to predict pregnancy duration after preterm labor.

A parsimonious set of 53 genes was reliably retained as predictors during cross-validation. These genes were up-regulated at delivery, and functional analysis identified biological processes related to an immune/inflammatory response to a stimulus. Biomarkers targeting these processes were identified in predictive models based on a whole-blood transcriptome42,89,90. The pro-inflammatory cytokines, IL-1β, CXCL8, CCL3, CCL4, and CCL20, were among the strongest predictors selected in all iterations of leave-one-out-cross-validation. Several studies have shown an increase in amniotic fluid concentrations of proteins encoded by these genes in women with preterm labor due to intra‐amniotic inflammation/infection20,91,92,93,94. A causal role in preterm birth has been established for IL-1β in animal models15,95. We have reported on the ability of these cytokine concentrations in the amniotic fluid in predicting the risk of early preterm delivery94. Other strong predictors included pleckstrin (PLEK), B-cell lymphoma 2-related protein A1 (BCL2A1), Solute Carrier Family 34 Member 2 (SLC34A2), Aquaporin 9 (AQP9), and TNFAIP6. Upon phosphorylation by protein kinase C, PLEK increases cytokine secretion in phagocytes and contributes as an adaptor to the microbicidal activity in neutrophils96,97. BCL2A1 is an anti-apoptotic protein shown to prolong chorio-decidual neutrophil survival in preterm rhesus macaques in an IL-1-dependent manner98. SLC34A2 gene is expressed in alveolar type II cells of the fetal lung and may be involved in the cellular uptake of phosphate to produce surfactants99. However, the expression of genes coding for surfactant proteins was not significantly different between women delivering within 24 h of amniocentesis and women who delivered after 24 h, indicating that fetal lung maturity may not be related to imminent delivery in this study. Instead, the overexpression of SLC34A2, as well as AQP9, may reflect the metabolic adaptations needed to sustain the immune response100.

Of note, although the current study does not have immediate and direct consequences on the management of spontaneous preterm labor, it provides an RNA-level signature that could be correlated in the future with maternal blood omics data, allowing for the development of non-invasive approaches to risk assessment. Similar to the use of amniotic fluid IL6101,102,103, patients considered at high risk of imminent premature delivery could benefit from currently available interventions. These include the administration of antenatal corticosteroids to accelerate fetal lung maturity66,104 and tocolytics treatment to inhibit myometrium contractions, and eventually allow in utero transfer to tertiary neonatal intensive care units to provide better care to prematurely born neonate67,68,104,105.

Strengths and limitations

This study is the first to describe the cell-free transcriptome of AF in women with spontaneous preterm labor. The main limitation of this study is that no additional targeted validation studies were performed. However, our previously reported in-silico analysis56 based on the same sample collection, RNA extraction, and expression quantification in the same population demonstrated that gestational age-specific effects in the transcriptome strongly correlate with independent reports based on samples from women of different ethnic backgrounds who were not in labor than those studied herein, for which most women self-identified as African-American. The moderate sample size and cross-validation strategy enabled the robust evaluation of predictive analytic approaches and the identification of a parsimonious set of candidate cell-free mRNAs.

Conclusion

The changes in the AF cell-free transcriptome in women who delivered within 24 h of amniocentesis compared to those who delivered later in gestation indicated the establishment of an inflammatory milieu in the intra-amniotic space in response to a pathologic stimulus. Placental single-cell-specific gene signatures of critical immune cell types were also dysregulated in these women. These effects are critical components captured by the transcriptomic models predicting the duration of gestation after transabdominal amniocentesis.

Materials and methods

Study design

Pregnant women were enrolled into a prospective longitudinal study at the Center for Advanced Obstetrical Care and Research of the Perinatology Research Branch, Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD), National Institutes of Health, U.S. Department of Health and Human Services, in the Detroit Medical Center and Wayne State University. We designed a retrospective cross-sectional study from this cohort to include women who underwent transabdominal amniocentesis after an episode of preterm labor. We excluded cases with multiple gestations and genetic anomalies. The final dataset included 38 AF samples from 12 women who went on to deliver at term and 26 women who delivered preterm. Of the 38 women, 37 delivered after the spontaneous onset of labor, with augmentation of labor required in 7 women. Labor was only induced in one term pregnancy (40.7 weeks) more than 8 weeks after amniocentesis. There were no cases of preterm prelabor rupture of the membranes. All women provided written consent for the use of biological specimens and metadata in research prior to sample collection. The Institutional Review Boards of Wayne State University and NICHD approved the study protocol.

Clinical definitions

Gestational age was determined based on the date of the last menstrual period and a first or early second-trimester ultrasound examination. Term labor was defined as the presence of regular uterine contractions with a frequency of at least one every 10 min and cervical changes occurring after 37 weeks of gestation106. Spontaneous preterm labor was defined as the spontaneous onset of labor with intact membranes before 37 weeks of gestation.

Amniotic fluid samples

Obstetricians used a 22-gauge needle to withdraw AF transabdominally while monitoring with ultrasound under antiseptic conditions. Amniotic fluid was immediately transported to the research laboratory in a capped sterile syringe. Amniotic fluid was centrifuged at 350×g and the supernant (5 ml) was immediately stored at – 80 °C46.

RNA extraction

RNA was extracted from the AF samples with the Plasma/Serum RNA Purification Maxi Kit (Cat. No. 56200; Norgen Biotek, Thorold, ON, Canada) by following the manufacturer's protocol that included the optional DNase treatment. We applied the RNA Clean & Concentrator-5 Kit (Cat. No. R1015; Zymo Research, Irvine, CA, USA) to improve RNA quality and to concentrate each sample in a volume of 12 μl. RNA concentration and quality were assessed by using a DropSense spectrophotometer (Trinean NV, Gent, Belgium) and the Agilent 2200 Tapestation (Agilent Technologies, Santa Clara, CA, USA), respectively.

Transcriptome profiling

Following the manufacturer’s protocol, we reverse-transcribed and amplified 10 ng of RNA to cDNA by using the GeneChip WT Pico Reagent Kit (Affymetrix, Santa Clara, CA, USA). Using the same kit, fragmentation and labeling of 5.5 μg sense-stranded cDNA were performed. Hybridization of 200 μl of the hybridization cocktail was performed onto the GeneChip Human Transcriptome Array 2.0 (Affymetrix, Santa Clara, CA, USA) at 45 °C, 60 rpm for 16 h in a hybridization oven. Washing and staining of the arrays were done on a GeneChip Fluidics Station 450 (Affymetrix, Santa Clara, CA, USA) and scanning was performed with a GeneChip Scanner 3000 (Affymetrix, Santa Clara, CA, USA). GeneChip Command Console Software (Affymetrix, Santa Clara, CA, USA) was used to generate the raw intensities from the array images. The University of Michigan Advanced Genomics Core conducted microarray profiling.

Data analysis

Clinical characteristics

Continuous and categorical variables were compared between groups by using the Welch's t test107 and the Fisher's exact test108, respectively. A nominal p value < 0.05 was considered statistically significant.

Microarray data preprocessing

The Robust Multi-array Average109 method implemented in the R/Bioconductor's oligo package110 was applied to background correct, normalize, and summarize the raw probe-level microarray data into gene-level expression summaries based on probe-to-gene assignment provided by a custom chip definition file111,112. Since samples were profiled in multiple batches, we used the removeBatchEffect function of R/Bioconductor's limma package113 to correct batch effects. To assess sources of variability in the transcriptomic data, we conducted principal component analysis on the complete set of 32,907 genes.

Differential expression analysis

We compared the AF cell-free transcriptome between groups by fitting linear models implemented in the limma113 package. A minimum fold change of 1.25-fold and a false discovery rate adjusted p value (q value) < 0.1 determined the statistical significance. We summarized the results of differential expression analysis with volcano plots and heatmaps. A hypergeometric test implemented in the GOstats package114 was used to identify significantly enriched Gene Ontology115 biological processes, molecular functions, and cellular components among the differentially expressed genes (q-value < 0.05).

Tissue-specific and single-cell-specific expressions

Tissue-specific genes were defined as those having a median expression 30 times higher in a given tissue than all other tissues in the GNF Gene Expression Atlas63. Genes specific to a placental cell type were defined based on the single-cell RNA-Seq analyses61.

The log2 transformed expression values for each gene were standardized by subtracting the mean and dividing by the standard deviation calculated from the reference study group (term delivery)56. The standardized values referred to as Z scores were then averaged over the top 20 genes specific to a tissue or a single-cell type. We compared the average Z scores between groups by fitting a linear model. A q value of less than 0.05 was considered significant.

Random forest prediction models

Random forest is an ensemble-supervised, machine-learning algorithm for classification and regression tasks116 suitable for high-dimensional data117,118. Unlike other methods, random forests are robust enough to feature transformation and parameter tuning while being computationally efficient119. Random forests consistently rank among the top performers in studies evaluating and comparing different supervised machine-learning algorithms120. They provide an unbiased measure of the predictor variable importance based on out-of-bag samples (observations not included in fitting individual decision trees)116. The R package, randomForest121,was used to fit random forest models with 1000 trees.

Feature selection for the random forest models

Before fitting the random forest models, a preliminary multi-variable feature selection procedure was applied based on the sparse Hilbert–Schmidt independence criterion (SHS)122. SHS relies on the kernel-based Hilbert–Schmidt independence criterion (HSIC) to measure the relationship between the gene expression data and the response. By introducing penalties for the number of selected features, SHS chooses a parsimonious set of genes that maximizes the dependence with the response variable while taking into account the correlation between genes. We used the algorithm implementation provided by the authors of SHS to perform this analysis, using MATLAB (version R2018a).

Prediction performance assessment

We used leave-one-out cross-validation to evaluate the predictive performance of the transcriptomic models of time from amniocentesis to delivery. At each iteration in the cross-validation, both feature selection and random forest model fitting were performed. Prediction performance metrics included Spearman’s correlation coefficient and root-mean-square error (RMSE). We also calculated the AUROC curves by using the predicted time-to-delivery as a surrogate of the inverse of the risk of delivery within 24 h, 1 week, and 2 weeks after amniocentesis.

To retain a final set of the most informative genes for the prediction of time-to-delivery, we employed the strategy of Kursa 201464 based on the frequency of gene selection across different partitions of the data. Significantly enriched Gene Ontology biological processes were identified among the most predictive genes by using the GOstats114 package.

The network of protein–protein interactions

To gain insight into the relations among genes predictive of time-to-delivery in AF samples, we constructed a protein–protein interaction network representation of these genes by using the stringApp 1.5.0123 in Cytoscape 3.7.2124. An edge between two nodes (genes) was defined based on a protein–protein interaction confidence score > 0.4. The network's most inter-connected subnetwork was retained, and genes were annotated to significantly enriched biological processes.