Main

Coronavirus disease 2019 (COVID-19) continues to ravage the global community. Across age groups, patients over the age of 65 years remain at the highest risk of severe complications of COVID-19 and account for about 80% of COVID-19-related deaths (https://data.cdc.gov). The human lung, a multi-compartmental dendritic structure composed of highly specialized cell types and infiltrated immune cells1, is the most vulnerable tissue in COVID-19 (refs. 2,3,4,5). In patients with severe disease, the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus propagates in the lower respiratory tract to cause hypoxaemia and severe pneumonia, accompanied by acute respiratory distress syndrome5. Through histological examinations, we and others have identified features of bilateral diffuse alveolar damage, signs of respiratory inflammation, hyaline membrane formation, microvascular complications and fibrosis in the lungs of individuals who have died of COVID-19 (refs. 2,4). However, to develop urgently needed and effective targeted symptomatic therapies, we need a comprehensive cellular and molecular profile of lung pathology of patients with COVID-191,6.

Here, we integrated transcriptomics, proteomics and histopathological technologies to construct a comprehensive multi-omics and single-nucleus transcriptomic atlas of the lungs from the patients with COVID-19. In a series of verification experiments, our data link COVID-19-associated changes in lung phenotypes and functions with cell type-specific gene expression changes, and we resolve molecular mechanisms of enhanced senescence, inflammation, apoptosis, coagulation and fibrosis. Overall, our work deepens our understanding of the COVID-19-diseased human lung and provides avenues for developing strategies for the treatment of symptoms.

Results

Transcriptomic and proteomic analyses of COVID-19 pathology

We obtained post-mortem lung-tissue specimens from patients with COVID-19 (mean age of 66 years; Fig. 1a and Supplementary Table 1). We also obtained age-matched normal lung samples from a cohort without any known history of infectious diseases as controls (Control; Fig. 1a and Supplementary Table 1). Following histopathological examination of SARS-CoV-2-infected lung parenchyma, we found a spectrum of diffuse alveolar damage characterized by desquamation of the alveolar epithelium and mucous-plug formation (Fig. 1b, Extended Data Fig. 1a,b and Supplementary Table 1). We also detected vascular injury with disseminated intravascular coagulation (Fig. 1b). In addition, fulminant intra-alveolar macrophage infiltration, severe pulmonary fibrosis and increased apoptosis were evident (Fig. 1b and Extended Data Fig. 1c).

Fig. 1: Integrated analysis of COVID-19 pulmonary pathology.
figure 1

a, Study overview. Information on the patients with COVID-19 and the pathological processes (left). Schematic of the experimental process of the analysis of five COVID-19 and four normal (Control) lung tissues by bulk RNA-seq, liquid chromatography with tandem mass spectrometry (LC-MS/MS), snRNA-seq and experimental verification (right). b, Representative images of haematoxylin-and-eosin staining of lung sections from patients with COVID-19 (n = 5 lungs with samples from three lung lobes each). Scale bars, 400 μm (main image; centre top) and 50 μm (magnified images). c, Principle component (PC) analysis showing the differences between the lungs of Controls and patients with COVID-19 based on the expression patterns from the RNA-seq (top) and LC-MS/MS analyses (bottom). d, Spearman’s correlation analysis of the expression levels of overlapping genes and proteins that were differentially expressed between the COVID-19 and Control lungs (|log2(fold change, FC)| > 1.5, adjusted P < 0.05). Linear fitting is indicated by a black line with confidence intervals represented in grey shading. e, GO term and pathway enrichment analysis of overlaps between COVID-19 DEGs and DEPs. f, Network plot showing upregulated transcription factors identified by ingenuity pathway analysis of the bulk RNA-seq. The small and large node sizes indicate low and high numbers of target genes, respectively. g, Gene-set scores of UPR pathway- and apoptosis-related genes at the RNA (top) and protein (bottom) level. h, Gene-set scores of inflammation response-related genes at RNA and protein levels (left). Heatmap showing inflammation-related genes upregulated at both the RNA and protein level (right). g,h, Control, n = 4 lungs; and COVID-19, n = 5 lungs, with samples from three lung lobes each. The boxes show the median (centre line) and the quartile range (25-75%), and the whiskers extend from the quartile to the minimum and maximum values. P values are indicated; Wilcoxon signed-rank test. i, Venn plot showing the DEPs that are common to the lungs and sera of patients with COVID-19. The upregulated proteins are listed (right).

To explore the molecular mechanisms of COVID-19 pneumonia, we performed genome-wide RNA sequencing (RNA-seq) and data-independent acquisition (DIA) mass spectrometry-based proteomics analyses on lung samples from five COVID-19 cases and four age-matched Control individuals (Fig. 1a,c and Supplementary Table 1). We identified 3,972 differentially expressed genes (DEGs) and 2,299 differentially expressed proteins (DEPs) in the COVID-19 group (Extended Data Fig. 1d and Supplementary Table 2), which largely overlapped (Fig. 1d).

Consistent with lung deformation and dysfunction caused by SARS-CoV-2 infection, we identified downregulated cytoskeleton organization and cell-junction assembly (Fig. 1e and Extended Data Fig. 1e–g), and upregulated apoptosis, viral gene expression, ribosome and endoplasmic-reticulum localization pathways (Fig. 1e and Extended Data Fig. 1e–g) through functional annotation of overlapping DEGs and DEPs by Gene Ontology (GO) term analysis. As determined by transcription factor network analysis, upregulated DEGs primarily coalesced into activation of the unfolded protein response (UPR; for example, ATF4 and DDIT3) and apoptosis induction (Fig. 1f,g, Extended Data Fig. 1h,i and Supplementary Table 3). Together with immunohistological staining results showing spike protein expression and an increased number of apoptotic cells in the lungs of patients with COVID-19 (Extended Data Fig. 1a,c), these findings suggest that SARS-CoV-2 infection triggers the UPR and cell apoptosis.

Interactions between SARS-CoV-2 and host tissues induce inflammation. Accordingly, we found that genes and proteins related to leukocyte degranulation and migration (for example, RELB) were dramatically upregulated in COVID-19 pulmonary parenchyma (Fig. 1e,f and Extended Data Fig. 1e,f). Among the most-upregulated genes at both the transcriptional and translational levels were C-reaction protein (CRP; Fig. 1h)7 and serum amyloid A (SAA1; Fig. 1h)8, both predictive COVID-19 markers. Furthermore, inflammation-related genes in the S100 family and serpin serine protease inhibitors increased at both the RNA and protein levels (Fig. 1h). By jointly analysing the upregulated protein profiles in lung tissues and sera9, we found that CRP, SAA1 and serpin serine protease inhibitors (SERPINA3, SERPING1 and SERPINA1) also accumulated in the sera of patients with severe COVID-19 (Fig. 1i), indicative of their potential use as markers of the degree of COVID-19-related lung damage.

A single-cell transcriptional atlas of the lungs of patients with COVID-19

To achieve high-resolution molecular profiling of COVID-19 pneumonic pathology, we performed single-nucleus RNA-seq (snRNA-seq) on the lungs of patients with COVID-19 and the aforementioned Control group (Fig. 1a). After quality control and cluster annotation, (Extended Data Fig. 1j–n), we identified 28 cell types based on signature genes (Fig. 2a,b and Extended Data Fig. 1o,p), which were classified into four major cell types-epithelial, endothelial, stromal and immune cells (Fig. 2b, Extended Data Fig. 2a–f and Supplementary Table 4). Consistent with previous reports10,11, our data revealed that ACE2 and TMPRSS2 were primarily expressed in epithelial cells, such as type I and II alveolar pneumocytes (AT1 and AT2, respectively), which are also the main SARS-CoV-2 target cell types (Fig. 2c). Notably, increased numbers of epithelial cells positive for ACE2 and/or TMPRSS2 were detected in the COVID-19 samples (Fig. 2d), suggesting that SARS-CoV-2 increases the infectivity in patients with COVID-19 by eliciting positive-feedback effects12.

Fig. 2: Construction of single-nucleus atlases of the human lung by snRNA-seq.
figure 2

a, Human lung single-nucleus transcriptional atlas. Uniform manifold approximation and projection (UMAP) plots showing different cell types by snRNA-seq (left). Table of abbreviations used in all panels for the different cell types (right). b, Heatmap showing the gene expression signatures of the top 30 marker genes corresponding to each cell type in human lungs (left). Each column represents a cell type, each row indicates the expression of one gene and each colour represents one cell type as illustrated in a. Representative marker genes for each cell type are shown (right). c, UMAP plots showing ACE2 and TMPRSS2 double positive cells with positivity for FURIN, CTSB or CTSL, other ACE2 and TMPRSS2 double positive cells, other ACE2+ cells and ACE2 cells in the human lung atlas. Bottom panels show the distribution of cells from Control and COVID-19 groups. d, Cell-type (colour coded as per the legend in a) composition of cells that express genes associated with SARS-CoV-2 entry and their percentages in the lungs of the two groups-COVID-19 and Control (top). Percentages of different cell types expressing the corresponding SARS-CoV-2 entry-associated genes (bottom).

Molecular characterization of COVID-19 pathology

To further investigate the cell type-specific transcriptional characteristics of pathogenesis in the lungs of patients with COVID-19, we compared DEGs between COVID-19 and Control groups by cell type (hereafter referred to as COVID-19 DEGs; Fig. 3a,b and Extended Data Fig. 2g–i). We identified a total of 3,264 COVID-19 DEGs across 28 cell types (Extended Data Fig. 2g) and found that COVID-19 DEGs were most prevalent in myofibroblasts, alveolar fibroblasts, aerocyte capillary endothelial cells (Cap.EC.a) and AT1 (Extended Data Fig. 2g). The DEG numbers in the 28 cell types were not significantly associated with the post-mortem interval (Extended Data Fig. 2h,i). Using modularized analysis of the COVID-19 DEGs, we deconvoluted gene expression perturbations to various cell types (Fig. 3a,b and Supplementary Table 4) and identified six functional upregulated DEG modules (Fig. 3b). These were: Module 1, ubiquitin-dependent protein catabolic process (commonly upregulated pathways); Module 2, diseases associated with surfactant metabolism and lung fibrosis (mainly attributed to epithelial cells); Module 3, angiogenesis (mainly attributed to endothelial cells) and Module 4, extracellular-matrix organization (mainly attributed to stromal cells; Fig. 3b); Module 5, which was mainly enriched in myeloid cells (macrophage, monocytes and mast cells), as reflected by upregulated gene expression associated with myeloid leukocyte activation; and Module 6, which was mainly enriched in lymphocytes, as reflected by interferon type I signalling pathways and SARS-CoV infection (Fig. 3a,b). We also identified six downregulated DEG modules, generally linked with tissue morphogenesis, structural integrity and homeostasis, including Module 7 (mainly attributed to epithelial cells) and Module 12 (mainly attributed to immune cells; Fig. 3a,b).

Fig. 3: Transcriptional characteristics of the lungs of patients with COVID-19.
figure 3

a, Heatmap showing DEGs (|logFC| > 0.5, adjusted P < 0.05), which were clustered into 12 modules by k-means analysis, across different cell types in the lungs of the COVID-19 and Control groups. Modules 1-6 indicate the specific upregulated modules in the lungs of patients with COVID-19 (top) and modules 7-12 indicate the specific upregulated modules in Control lungs (bottom). Dotted lines represent different cell types (e.g. epithelial, endothelial, stromal or immune cells). b, GO term-enrichment analysis of DEGs from different modules as shown in a. Pathways of upregulated and downregulated DEGs are indicated in red and blue, respectively. c, Transcriptional network showing the core transcription factors identified based on COVID-19 DEGs using SCENIC. The outer nodes represent different cell types and the sizes of the outer nodes indicate the number of target genes involved in this cell type. The inner nodes represent upregulated and downregulated transcription factors. The intensities of the colours indicate the number of target genes regulated by these transcription factors. d, Violin plots showing the expression levels of NFKB1, HIF1A and FOXO3 in the lungs of the two study groups (top). Ridge maps showing the gene-set scores of targets genes of the same transcription factors (bottom). Cell-type abbreviations and colour codes as per Fig. 2a.

To discern how changes in cell type-specific gene expression contribute to the RNA profile of the lung as a whole, we jointly analysed DEGs obtained at single-cell and bulk levels (Extended Data Fig. 3a,b). We found that pro-fibrotic pathways (such as collagen-fibril organization and lung fibrosis) dominated in fibroblasts and myofibroblasts (Extended Data Fig. 3a,b). By comparison, cytokine, inflammatory and chemotaxis pathways were activated in endothelial cells (Extended Data Fig. 3a,b). In addition, hypoxia factor HIF1Α was associated with hypoxaemia in COVID-19, whereas the pro-inflammatory factors TRAF3 and IFNGR1 were augmented in multiple cell types (Extended Data Fig. 3c,d and Supplementary Table 4).

To dissect the transcriptional regulons underlying COVID-19 pathogenesis, we performed SCENIC analysis to identify candidate transcription factors governing DEGs across cell types. As expected, pro-inflammatory transcription factors-including NFKB1, REL, STAT1, -3, -4 and -5A as well as hypoxia regulator HIF1A, UPR regulator ATF6 and apoptosis regulator BCL6-were identified as regulatory nodes for upregulated DEGs (Fig. 3c,d, Extended Data Fig. 3e, 4a–c and Supplementary Table 3). Of these, we verified the upregulation of NF-κB1 and HIF-1α in the lungs of patients with COVID-19 by immunohistology (Extended Data Fig. 4d). By contrast, we identified FOXO3, a gene often implicated in tissue morphogenesis and regeneration13,14, as the top transcription regulator for downregulated COVID-19 DEGs (Fig. 3c,d and Extended Data Fig. 4e). These analyses highlight the multi-faceted consequences of COVID-19 and provide a molecular portrait of the pathology of the lungs of patients with COVID-19.

Senescence as a pathological characteristic of the lungs of patients with COVID-19

Previous studies suggest that SARS-CoV-2 infection triggers senescence in the immune system15,16. Here, when compared with samples from Control individuals, we found that p16, p21, IL-6 and p53 were upregulated along with an increase in the DNA oxidation marker 8-hydroxy-2′-deoxyguanosine (8-OHdG) in the lungs of patients with COVID-19 (Fig. 4a,b and Extended Data Fig. 4f,g). In addition, we detected a decrease in lamina-associated polypeptide 2 (LAP2) expression and a trend towards decreased expression of the heterochromatin-associated HP1γ (Fig. 4b and Extended Data Fig. 4f,g) along with an increasing trend of retrotransposable element LINE1-ORF1p expression, indicative of exacerbated lung senescence in patients with COVID-19 (Extended Data Fig. 4h)17,18,19.

Fig. 4: Dissection of the relationships between ageing and COVID-19 DEGs in human lungs.
figure 4

a, Hallmarks of senescence. Heatmaps (periphery, along the y and x axes) showing the relative expression levels of CDKN2A (left) and CDKN1A (right) in each cell type in the lungs of the COVID-19 and Control-Y groups compared with the Control group. The proportions of CDKN2A- (left) and CDKN1A-positive (right) cells in the lungs of the COVID-19 and Control-Y groups compared the Control group are shown (bar graphs). b, Immunohistochemical analysis of p21, IL-6, 8-OHdG and LAP2 in the indicated groups. Scale bars, 50 μm (main images; top) and 10 μm (magnified images; bottom). Quantitative data are shown as the mean. Control, n = 13 lungs; COVID-19, n = 5 lungs, with samples from three lung lobes each; and Control-Y, n = 4 lungs. One-tailed Student’s t-test P values are indicated. c, Genes shared by the ageing and COVID-19 DEGs. The proportion of DEGs shared by the ageing and COVID-19 groups are shown for the different cell types. d, GO term and pathway enrichment analyses for genes shared by the ageing and COVID-19 DEGs. e, Ridge map showing the density of distribution of the SASP scores of all of the cells in the Control, COVID-19 and Control-Y lungs (top). The medians of SASP scores of different groups are indicated with vertical lines. Violin and box-and-whisker plot showing the SASP scores of the Control (n = 34,974 cells), COVID-19 (n = 73,116 cells) and Control-Y (n = 33,062 cells) lungs. The boxes in the box-and-whisker plots show the median (centre line) and the quartile range (25–75%), and the whiskers extend from the quartile to the minimum and maximum values. P values, determined using a Wilcoxon signed-rank test, are indicated. f, Network plot showing the genes that altered concomitantly in Control and COVID-19 lungs and genes in the age-related lung disease database (https://www.disgenet.org/home/). The white-to-black legend on the left indicates the number of genes- from low to high, respectively-related to the indicated diseases; whereas the white-to-dark brown legend on the right indicates the number of cell types expressing the indicated genes from low to high, respectively. COPD, chronic obstructive pulmonary disease. Cell-type abbreviations as per Fig. 2a.

Source data

Given that earlier work highlighted cellular senescence in lung ageing20 and that in this study we detected multiple cellular senescence markers in the lungs of patients with COVID-19 (Fig. 4a,b and Extended Data Fig. 4f–h), we speculated that lung ageing might be aggravated by SARS-CoV-2 infection. Accordingly, we analysed ageing-associated DEGs (ageing DEGs) by comparing the snRNA-seq profiles of Control lung samples from young individuals (Control-Y) with Control samples (Fig. 4c). Next, we performed an integrated analysis and identified 91 genes that were shared by ageing DEGs and COVID-19 DEGs (Fig. 4c and Extended Data Fig. 4i). These DEGs were primarily associated with epithelial and endothelial cells (Fig. 4c), suggesting that these two cell types are more prone to manifest ageing-related changes in SARS-CoV-2-infected lungs. GO analysis showed that the downregulated DEGs were related to tissue morphogenesis, whereas the upregulated DEGs were mainly associated with the inflammatory response, cell chemotaxis and NF-κB signalling (Fig. 4d). Notably, in most cell types, we found ageing-associated senescence-associated secretory phenotype (SASP)-gene expression to be further elevated by COVID-19 (ref. 21), indicating that SARS-CoV-2 infection amplifies the pro-inflammatory microenvironment of the aged lung (Fig. 4e and Extended Data Figs. 4j, 5a–c). We further identified 20 genes that altered concordantly in the lungs of patients with COVID-19 and aged individuals, and genes underlying a variety of ageing-related lung disorders (Fig. 4f), suggesting that COVID-19 augments pulmonary senescence programmes, which in turn contribute to lung ageing and related disorders.

Molecular signatures of epithelial damage in the lungs of patients with COVID-19

AT1 and AT2 cells, which both express ACE2, are the main targets of SARS-CoV-2 infection and their damage drives COVID‐19 pathogenesis2,4. Here we found an alarming disintegration of both the AT1 (P = 0.016) and AT2 (P = 0.19) populations, in line with the alveolar epithelium sloughing that we had defined histologically and confirmed by immunostaining with AT1- and AT2-specific markers (Fig. 5a and Extended Data Fig. 6a).

Fig. 5: Cellular and molecular features underlying diffuse alveolar damage in the lungs of patients with COVID-19.
figure 5

a, Proportions of AT1 cells in the lungs of the Control (n = 4 donors) and COVID-19 (n = 5 donors) groups (left). Immunofluorescence analysis of PDPN in the lungs of the Control and COVID-19 groups (middle). Alveolar diagrams on the right showing the shedding of lung alveolar epithelial cells in patients with COVID-19. b, Proportions of cells expressing AT1- or AT2-representative markers (CLIC5, CAV1 and CAV2; and SFTPC, PGC, WIF1 and SFTA3, respectively) in the indicated samples. c, Death scores of AT1 and AT2 cells in Control (AT1, n = 5,918 cells; and AT2, n = 6,576 cells) and COVID-19 (AT1, n = 2,536 cells; and AT2, n = 8,612 cells) lungs. a,c, The boxes in the box-and-whisker plots show the median (centre line) and the quartile range (25-75%), and the whiskers extend from the quartiles to the minimum and maximum values. P values, determined using a Wilcoxon signed-rank test, are indicated. d, Gene-set scores of cell-death and epithelial cell markers in AT1 and AT2 cells in the lungs of the Control (n = 12,494 cells) and COVID-19 (n = 11,148 cells) groups. Each dot represents one cell. e, GO term and pathway enrichment analysis of COVID-19 DEGs in AT1 (left) and AT2 (right) cells. f, Expression levels of regeneration-related genes in the AT2 cells of COVID-19 and Control lungs. RL, right lower lobe; RM, right middle lobe; and LL, left lower lobe. g, Heatmap showing the expression levels of surfactant- and mucoprotein-related genes in Control and COVID-19 samples (left). Alveolar diagrams on the right show the accumulation of mucus in lung alveoli of patients with COVID-19. Cell-type abbreviations as per Fig. 2a. h, Network plot showing core transcription factors (TFs) in the regulation of surfactant-related COVID-19 DEGs in lung epithelial cells. The diamonds indicate the surfactant-related genes and the nodes denote transcription factors. nMotif denotes the number of motifs for each gene by the indicated transcription factor. i, Immunofluorescence analysis of MUC5B in the bronchiole and alveolar ducts of Control and COVID-19 lungs. a,i, Control, n = 13 lungs; and COVID-19, n = 5 lungs, with samples from three lung lobes each. Scale bars, 20 μm (main images) and 5 μm (magnified images). Quantitative data are shown as the mean ± s.e.m. Two-tailed t-test P values are indicated.

Source data

The remaining AT1 and AT2 cells expressed lower levels of cell type-specific marker genes along with an increased apoptotic rate (Fig. 5b-d). The AT1 population expressed high levels of genes associated with response to cytokine signalling (such as NFKB1, CXCL2, CXCL8 and STAT1, -3 and -4), response to hypoxia (such as HIF1A) and SARS-CoV infections (Fig. 5e and Supplementary Table 4). Similarly, the expression levels of genes involved in the regulation of the defence response (such as C3, CD47 and IFI16) and SARS-CoV infections were elevated in the remaining AT2 cells in the lungs of patients with COVID-19 (Fig. 5e and Supplementary Table 4). Specifically, SARS-CoV-2 infection in AT2 cells led to downregulation of regeneration-related genes critical for lung-injury repair (Fig. 5f). Furthermore, we found that alveolar differentiation intermediate cells (AD.inter), with an elevated damage-associated transient progenitors signature (KRT8 and CLDN4), was an intermediate cell type in a stagnant state during AT2-to-AT1 differentiation (Extended Data Fig. 6b)22. This cell type accumulated up to fivefold in the lungs of patients with COVID-19 relative to the controls (Extended Data Fig. 6c), demarcating dysregulated epithelial cell differentiation in the COVID-19 group6,23.

At the air–liquid interface, pulmonary epithelial cells secrete surfactants to reduce the pulmonary surface tension24. We detected lower levels of SFTPC, SFTA3 and SFTPA1 in different epithelial cell types in the lungs of patients with COVID-19 (Fig. 5g). Concomitant with the loss of surfactants, genes encoding MUC5AC, MUC5B, MUC4 and, MUC16 were highly expressed in airway epithelial cells (Fig. 5g and Extended Data Fig. 6d,e), possibly as a result of transcriptional regulation by factors such as NFKB1 and STAT3 (Fig. 5h). In agreement with these molecular changes, we detected mucus-plug formation (which could block the airway), as evidenced by haematoxylin-and-eosin staining and immunostaining in the lungs of patients with COVID-19 (Figs. 1b, 5i and Extended Data Fig. 6e). Together, our data show that SARS-CoV-2 infection is associated with epithelial cell apoptosis and functional decline, manifested as surfactant loss and mucus hypersecretion, hindering gas exchange and causing hypoxaemia in the lung.

Dissection of the immune-cell disorders in the lungs of patients with COVID-19

The combination of dysregulated immune responses and an excessive host defence response is considered to be an important cause of injury to the lungs of patients with COVID-19 (refs. 25,26,27,28,29). However, how immune cells are regulated in the parenchyma of the lung of patients with COVID-19 at the single-cell resolution remains largely unclear. To fill this knowledge gap, we profiled the 16 immune-cell types comprising the immune-cell landscape of the parenchyma of the lungs of patients with COVID-19 (Fig. 6a). Macrophages are known to play a pivotal role in COVID-19 lethality30. Consistent with the fulminant inflammatory infiltration present in the lungs of patients with COVID-19 (Fig. 1b and Extended Data Fig. 6f), the total macrophage population was enriched in the COVID-19 group (Fig. 6b,c and Extended Data Fig. 6f,g), as confirmed by immunostaining (Fig. 6d). To exert inflammation-regulatory roles, macrophages polarize to become either classic pro-inflammatory macrophages (M1) or anti-inflammatory macrophages (M2)31,32. Here we found that both M1 alveolar and M1 interstitial macrophage populations increased proportionally, whereas M2 alveolar macrophages decreased (Fig. 6c and Extended Data Fig. 6h,i), indicating a shift towards the M1 phenotype in response to SARS-CoV-2.

Fig. 6: The disturbed immune system in the lungs of patients with COVID-19.
figure 6

a, UMAP plots showing immune cells in the lungs of the Control (left) and COVID-19 (right) groups; n = 8,000 cells each. b, Proportions of different immune cell types in the lungs of the Control and COVID-19 groups. Asterisks indicate significant differences in the cell proportions between the two groups. The dashed line denotes a cell proportion of 50%. c, M1/M2 ratios in alveolar (AM) and interstitial (IM) macrophages in the lungs of the Control and COVID-19 groups. d, Immunohistology analysis of the macrophage marker CD68 (left) in the lungs of the Control and COVID-19 groups. Scale bars, 50 μm (main images) and 10 μm (magnified images). Quantitative data are shown as the mean ± s.e.m. Control, n = 13 lungs; and COVID-19, n = 5 lungs, with samples from three lung lobes each. Two-tailed t-test P values are indicated. Alveolar diagrams showing the infiltration of immune cells and macrophage polarization in the lungs of patients with COVID-19 (right). e, Venn diagram showing the genes that are shared by the groups of genes related to ‘myeloid leukocyte activation’, ‘cytokine signalling in immune system’ and ‘SARS-CoV infections’ (left). Heatmap showing COVID-19 DEGs related to SARS-CoV infections in different immune cell types (right). f, Network showing the cell-cell communications across all cell types in the lungs of patients with COVID-19 compared with those in the Control group. Red lines indicate increased cell–cell interactions in the COVID-19 group. Blue lines indicate decreased cell-cell interactions in the COVID-19 group. g, GO term and pathway enrichment analysis of specific cell-cell communications in the lungs of patients with COVID-19. h, Heatmap showing the cell–cell communications in the lungs of patients with COVID-19 compared with the Control group. AM.M1, M1 alveolar macrophage; AM.M2, M2 alveolar macrophage; IM.M1, M1 interstitial macrophage; IM.M2, M2 interstitial macrophage; Pro.mono, proliferative monocyte; Mono.macro, mononuclear macrophage; DN.T, doublet negative (CD4-CD8-) T cells; and the remaining cell-type abbreviations as per Fig. 2a.

Source data

Similarly, the transcriptional profiles of other immune cells shifted to a more activated state, as indicated by elevated expression of genes related to myeloid-leukocyte activation, cytokine signalling and SARS-CoV infections (Fig. 6e). Among these, a panel of glycosylation genes essential for immune cell activation and host immune defence initiation, such as MGAT1, MGAT4A, MGAT5, PARP8, PARP14, RPN2, ST6GALNAC3 and ST3GAL1, were expressed at high levels in COVID-19 immune cells33, especially macrophages (Fig. 6e).

To further understand the pathological consequences of the dysregulated immune system, we analysed the cell-cell interactions between immune cells and other types of pulmonary cells (Fig. 6f and Extended Data Fig. 6j), and found increased pro-inflammatory (that is, cytokine–cytokine receptor interaction) and pro-fibrotic (that is, NABA-collagens) cell-cell interactions in the COVID-19 samples (Fig. 6g and Supplementary Table 5)34. For instance, the interactions of IL1B, IL6, and TGFB1, -2 and -3 with their receptors were augmented in the lungs of patients with COVID-19 (Fig. 6h). Together, these data suggest that an imbalanced host immune system worsens lung damage by releasing excessive cytokine factors that drive diffuse alveolar damage.

Characterization of the endotheliopathy in the lungs of patients with COVID-19

Endothelial cell dysfunction and impaired vascular function contribute to the COVID-19-associated complications with a high mortality risk such as coagulopathy and thrombosis35,36. To study the cell type-specific molecular basis of COVID-19-related vasculopathy, we divided pulmonary endothelial cells into six subtypes consisting of three alveolar-capillary cell types (Cap.EC.a, which is specialized for gas exchange; Cap.EC.g, which is involved in capillary regeneration; and Cap.EC.i, which is a capillary intermediate between the Cap.EC.a and Cap.EC.g states), pulmonary arterial endothelial cells (Art.EC), pulmonary vein endothelial cells (Vei.EC) and pulmonary lymphatic endothelial cells (Lym.EC; Fig. 7a)37. Both the Cap.EC.a and Cap.EC.g populations declined in the lungs of patients with COVID-19 (Fig. 7b), consistent with the compromised oxygen-exchange ability and hypoxaemia observed in these patients. Through pseudotime analysis, we uncovered a group of Cap.EC.i cells, occupying the trajectory between the Cap.EC.g and Cap.EC.a cells (Extended Data Fig. 7a), that accumulated to levels around twofold more in the lungs of patients with COVID-19 than in the controls (Extended Data Fig. 7b). Similar to that of the AT2-AD.inter-AT1 cell axis, the Cap.EC.i population, with high expression of endothelial inflammation- and damage-associated genes (IL6, SERPINE1 and TNFAIP3), appeared as an intermediate cell type during Cap.EC.g-to-Cap.EC.a differentiation in the lungs of patients with COVID-19 (Extended Data Fig. 7c).

Fig. 7: Molecular basis underlying pulmonary endotheliopathy in the lungs of patients with COVID-19.
figure 7

a, UMAP plots showing subtypes of CLDN5+ endothelial cells in the lungs of the Control and COVID-19 groups; n = 4,500 cells in each. b, Proportions of endothelial cell subtypes in the lungs of the Control and COVID-19 groups. The P value, determined using a Wilcoxon signed-rank test, is indicated. c, Ridge map showing increased death scores in different endothelial cell subtypes in the lungs of patients with COVID-19 compared with those of the Control group. d, Expression levels of endothelial damage-related genes in different endothelial cell subtypes. e, Heatmaps showing the upregulated COVID-19 DEGs related to inflammation in the different endothelial cell subtypes. f, Reconstruction of the transcriptional regulon network linking core regulatory transcription factors to potential target genes of the indicated pathways. g, Cell-cell interactions between immune cells and various endothelial cell subtypes. h, UMAP plots showing the expression levels of genes related to pro- (top) and anti-coagulation (bottom) in different endothelial cell subtypes in the lungs of the Control and COVID-19 groups. i, Gene-set core analysis of pro- (left) and anti-coagulation (right) pathways in individual cells of distinct endothelial cell subtypes. j, Diagram of the coagulation (left) and fibrinolysis (right) pathways (top). The values below the gene names indicate the log2FC value at the RNA (left) and protein (right) levels. Heatmap showing the expression levels of the indicated genes in the lungs of patients with COVID-19 (bottom). k, Schematic showing the vascular pathological changes in the lungs of patients with COVID-19. AM.M1, M1 alveolar macrophage; AM.M2, M2 alveolar macrophage; IM.M1, M1 interstitial macrophage; IM.M2, M2 interstitial macrophage; Pro.mono, proliferative monocyte; Mono.macro, mononuclear macrophage; and the remaining cell-type abbreviations as per Fig. 2a.

Apoptosis pathways were highly activated in all of the examined endothelial cell subtypes, underscoring the high vulnerability of endothelial cells in COVID-19 (Fig. 7c). Concomitantly, endothelial-damage markers, including VWF, ICAM1 and VCAM1, were upregulated in most endothelial cell types, reflecting massive endothelial injury (Fig. 7d)38,39. We also observed that a panel of interleukins, chemokines, interferon and other inflammatory factors along with upregulated inflammation-related transcription factors, such as NFKB1 and STAT3, were highly expressed in SARS-CoV-2-infected endothelial cells (Fig. 7e). Specifically, SCENIC analysis pinpointed NFKB1, REL and CEBPD as master regulators underlying SARS-CoV-2 vascular damage (Fig. 7f). Analysis of the cell–cell interactions revealed the interactions between VEGFA and its receptors FLT1 (VEGFR1), NRP1 and NRP2—essential for angiogenesis and vascular permeability-to be enhanced in endothelial and myeloid cells40,41 (Fig. 7g), delineating how SARS-CoV-2 leads to dysregulated angiogenesis of endothelial cells, a signature feature of COVID-19 (ref. 42).

Endothelial inflammation is associated with an increased risk of triggering excessive activation of coagulation43. Here we found increased VWF production in damaged endothelial cells (Fig. 7d), thereby probably activating and mediating platelet adhesion and thus triggering coagulation cascades and the formation of blood clots44. When we evaluated the expression of coagulation- and fibrinolysis-related genes in endothelial cells, we found that the coagulation pathway-associated genes were highly activated in all vascular endothelial cell types derived from the lungs of patients with COVID-19 (Fig. 7h–j), in accordance with the occurrence of systemic microangiopathy in COVID-19 pneumonia45. These data point to a working model of how SARS-CoV-2 infection progressively causes endothelial injury and widespread endotheliopathy (Fig. 7k).

Accumulation of myofibroblasts in the lungs of patients with COVID-19

The cellular and molecular mechanisms that cause severe pulmonary fibrosis in COVID-19 infection are largely unknown. Our joint analysis of bulk-RNA-seq and snRNA-seq data suggest that fibroblasts and myofibroblasts at least partially increase the pro-fibrotic response in the lungs of patients with COVID-19 (Extended Data Fig. 3d,e). Furthermore, a larger number of fibroblasts produce extracellular matrix (Fig. 8a) and similarly, myofibroblasts-known to drive lung fibrosis-increased by about threefold and were also more proliferative in the lungs of patients with COVID-19 (Fig. 8b). Pseudotime analysis predicted that the majority of myofibroblasts originated from fibroblasts, with fewer arising from smooth muscle cells, pericytes and AT2 cells (Fig. 8c and Extended Data Fig. 7d–k). The fate 2 cells represent pulmonary cells that could possibly convert into myofibroblasts, following a differentiation trajectory from epithelial cells, stromal cells and fibroblasts (with relatively low collagen expression) to myofibroblasts (with relatively high collagen expression). The number of fate 2 cells doubled in the lungs of patients with COVID-19 compared with the controls (Fig. 8c and Extended Data Fig. 7f,g). By employing trajectory-based differential expression analysis, we identified 560 upregulated genes related to myofibroblast formation, 224 of which were also upregulated in the lungs of patients with COVID-19 (Fig. 8d). These genes included key fibrogenic factors (TIMP1 and PDLIM5) and 16 collagen genes known as molecular drivers underlying myofibroblast formation (Extended Data Fig. 7f–h). In addition, HIF-1α was identified as a top upregulated transcription factor in the lung fibroblasts of patients with COVID-19 (Fig. 8e and Extended Data Fig. 8a)46.

Fig. 8: Activation of myofibroblasts eliciting the pathobiology of pulmonary fibrosis.
figure 8

a, UMAP plots showing the gene-set scores of extracellular matrix-related genes in Control (left) and COVID-19 (right) lungs. b, Proportions of myofibroblasts in the lungs of individuals in the Control and COVID-19 groups. c, Pseudotime trajectory analysis of epithelial and stromal cells in Control and COVID-19 samples (left). Proportions of cells in cell fate 2 (except for myofibroblasts) from the Control and COVID-19 groups (right). b,c, Cell-type abbreviations as per Fig. 2a. Controls; n = 4 donors; and COVID-19; n = 5 donors. The boxes in the box-and-whisker plots show the median (centre line) and the quartile range (2575%) and the whiskers extend from the quartile to the minimum and maximum values. P values, determined using a Wilcoxon signed-rank test, are indicated. d, Heatmap showing the top 2,000 DEGs in cell fate 2 during pseudotime trajectory (left). GO analysis of the genes that overlap between upregulated DEGs implicated in myofibroblast formation (cluster 5) and COVID-19 DEGs. e, Relative cell proportions and gene expression levels of Adv. fib (left) and Alv. fib (right) in the lungs of patients with COVID-19 compared with those in the Control group. TFs, transcription factors. f, Expression levels of FOXO3 in the lung fibroblasts of the Control and COVID-19 groups. P values, determined using a Wilcoxon signed-rank test, are indicated. g, Apoptosis analysis of human fibroblasts following FOXO3 knockdown (si-FOXO3) by flow cytometry. Quantitative data are shown as the mean ± s.e.m. of n = 3 biologically independent samples per condition. The two-tailed t-test P value is indicated; si-NC, siRNA to a negative control duplex. h, GO term and pathway enrichment analysis of DEGs following FOXO3 knockdown in human fibroblasts. Red denotes upregulation and blue denotes downregulation. i, Network plots showing DEGs related to the indicated terms and pathways following FOXO3 knockdown in human fibroblasts. j, Venn diagram showing the genes shared by the DEGs following FOXO3 knockdown in human fibroblasts, COVID-19 DEGs in fibroblasts and myofibroblast-related genes (left). Myofibroblast-related genes were defined as myofibroblast-marker genes and upregulated genes implicated in myofibroblast formation based on pseudotime analysis. Heatmap showing the relative expression levels of the indicated genes (right). k, Schematic showing the systemic pathological changes in the lung tissues of patients with COVID-19.

Source data

Next, we employed gain-of-function and loss-of-function experiments in fibroblasts to manipulate transcriptional programmes accounting for the pathology of the lungs of patients with COVID-19. When profiling primary human lung fibroblasts expressing constitutively active HIF-1α (Extended Data Fig. 8b–d), we found that some of the DEGs were related to COVID-19 pathology (Extended Data Fig. 8e,f and Supplementary Table 6), suggesting that excessively activated HIF-1α is an upstream contributing factor to driving fibroblast malfunction in COVID-19. We also observed decreased levels of FOXO3 expression in COVID-19 fibroblasts (Fig. 8e,f), a molecular phenotype implicated in idiopathic pulmonary fibrosis47. When we used short interfering RNA (siRNA) to knock down FOXO3 in primary human lung fibroblasts, we observed increased cell apoptosis (Fig. 8g and Extended Data Fig. 8g-i). Genes that were upregulated following FOXO3 repression included those related to extracellular-structure organization, collagen-fibril organization and regulation of the immune response (Fig. 8h,i and Extended Data Fig. 8j–m). Notably, a series of myofibroblast-marker genes (COL14A1 and COL3A1) were induced in FOXO3-deficient fibroblasts (Fig. 8j, Extended Data Fig. 8l,m and Supplementary Table 6). These findings support the role of FOXO3 silencing in mediating pro-fibrotic and pro-inflammatory effects in the lungs of patients with COVID-19.

Discussion

Here we combined single-nucleus transcriptomics, bulk transcriptomics, proteomics and a battery of verification experiments to generate a comprehensive reference chart for studying the pathobiology of COVID-19 (Fig. 8k). In addition, we defined parenchymal lung senescence as a marked feature of COVID-19 pathology. Notably, the accumulation of SASP factors may also account for COVID-19-related cytokine storms and long-term lung sequelae, such as pulmonary fibrosis. Together, these relationships suggest that the alleviation of lung senescence might have potential as an intervention approach in patients infected with COVID-19 (ref. 48). Recent studies indeed indicate that senolytic drugs (compounds that eliminate senescent cells) may reduce both macrophage infiltration and inflammation, thereby alleviating COVID-19 syndrome49,50,51.

At the inception of our study, with both the limited number of available donors and the previously identified relevance of ageing-related changes to COVID-19 pathology considered, we selected COVID-19 samples based on age range rather than virus exposure time, length of hospital stay or other medical conditions. Given the demarcated differences, we were able to delineate the cellular and molecular changes associated with the lung pathology at the single-cell resolution, despite the existing heterogeneity between the COVID-19 samples. Thus, our study remains a valuable resource for understanding the disease mechanisms underlying SARS-CoV-2 infection of the human lung and lays a foundation for discovering biomarkers and developing treatment strategies for COVID-19.

Methods

Ethics statement

This research was executed in line with the Ethical Principles and was approved in advance by the Biomedical Research Ethics Committee of the Institute of Zoology of the Chinese Academy of Sciences, Southwest Hospital of Third Military Medical University (TMMU). The human lung tissues were obtained under the approval given by the Research Ethics Committee of Huoshenshan Hospital, Southwest Hospital of TMMU and the First Hospital of Kunming Medical University. The donors (or their relatives) of the lung samples used in this study provided written informed consent. Detailed information on the samples used can be found in Supplementary Table 1.

Human samples

The lung specimens of patients with COVID-19 were derived from autopsies of five patients with COVID-19 from Huoshenshan Hospital2. The autopsy materials were collected shortly after the death of the decedents. Exclusion criteria included a post-mortem interval > 18 h. Tissues of three lung lobes (right lower lobe, right middle lobe and left lower lobe) from each patient were either snap-frozen and stored at −80 °C or immediately fixed in 4% (wt/vol) formaldehyde solution. All of the decedents met the diagnostic criteria for COVID-19, and the presence of SARS-CoV-2 in all cases was confirmed by digital droplet PCR (ddPCR) tests and immunohistochemistry staining of spike glycoprotein (Supplementary Table 1). Basic patient information and clinical data are summarized in Supplementary Table 1. As controls, human lung tissues were collected from young and old (Control-Y and Control) patients with pulmonary bulla or lung cancer. All of the samples were evaluated by pathological examination to confirm their absence of obvious disease features such as inflammation before downstream analyses. The Control individuals were selected to match the gender and age distribution of patients with COVID-19 involved in this study.

Haematoxylin-and-eosin staining

Haematoxylin-and-eosin staining was conducted as previously described52. Tissues were dehydrated using an ethanol gradient, paraffin-embedded and then sectioned at a thickness of 5 μm. The sections were deparaffinized using xylene, hydrated using an ethanol gradient (100, 100, 95, 85, 75 and 50%), stained with haematoxylin for 2 min, differentiated with 1% hydrochloric acid ethanol for 3 s, stained with eosin for 2 min and rinsed with running water. Finally, the tissues were dehydrated in an ethanol gradient and xylene before being fixed with cytoseal-60 (Stephens Scientific).

Artificial intelligence-based automatic assessment platform for lung pathology analysis

We used datasets that manually label each category of lung slices of the lungs of patients with COVID-19 (mucus, serous, blood vessel, bronchus and cell exudation) to train a semantic segmentation algorithm based on deep learning, which establishes an automatic quantitative model of lung digital pathology. The percentage of each category of the target slice in the slice area was calculated by the model. The slice was binarized to delete the blank area of the slice. The proportion of interstitial components in the slice was obtained by subtracting each area category from the slice area after binarization treatment. The proportion of fibrosis was obtained by subtracting the proportion of normal lung tissue from the proportion of interstitial components in the target slice. The results of the lung pathology analysis are shown in Supplementary Table 1.

Immunofluorescence

Immunofluorescence staining was performed as previously described53. Paraffin-embedded sections were deparaffinized and rehydrated. After being rinsed in distilled water, the sections were microwaved five times, 5 min each time, in 10 mM sodium citrate buffer (pH 6.0). Once the sections had cooled down to room temperature (RT), they were permeabilized with 0.4% Triton X-100 (Sigma) in PBS for 60 min and then blocked with 5% donkey serum in PBS for 1 h at RT, followed by overnight incubation with primary antibodies at 4 °C and fluorescence-labelled secondary antibodies for 1 h at RT. The nuclei were counterstained with Hoechst 33342 (Thermo Fisher Scientific) before the sections were mounted in VECTERSHIELD anti-fading mounting medium (Vector Laboratories, h-1000). Images were captured using a confocal laser scanning microscope (IXplore SpinSR, Olympus). The antibodies used are listed in Supplementary Table 8.

Immunohistochemistry

Immunohistochemistry staining was performed as previously described10. Briefly, paraffin-embedded sections were deparaffinized and rehydrated, followed by antigen retrieval as described in the ‘Immunofluorescence’ section. After the sections had cooled down to RT, they were penetrated with 0.4% Triton X-100 and incubated with 3% H2O2 for 20 min for the inactivation of endogenous peroxidase. The sections were then blocked with 5% donkey serum in PBS for 1 h and incubated with primary antibodies at 4 °C overnight. The sections were then incubated with horseradish peroxidase-conjugated secondary antibodies for 1 h at RT, followed by colorimetric detection using DAB and counterstaining with haematoxylin. Finally, the sections were dehydrated before being mounted in neutral resinous mounting medium. Images were captured using an Olympus VS200 system. The antibodies used are listed in Supplementary Table 8.

TUNEL staining

Terminal deoxynucleotidyl transferase dUTP nick end labelling (TUNEL) staining was performed as previously described54. Briefly, paraffin-embedded sections were deparaffinized and rehydrated, and then TUNEL staining was performed following the manufacturer’s protocol using a one-step TUNEL apoptosis assay kit (C1088, Beyotime).

Cell culture

Human lung fibroblasts (2BS) were provided by T.-J. Tong and Z.-Y. Zhang from Peking University in China55. The cells were cultured in RPMI 1640 medium (Thermo Fisher Scientific) supplemented with 10% fetal bovine serum (Gibco), 100 U ml−1 penicillin and 10 mg ml−1 streptomycin (Thermo Fisher Scientific) in 5% CO2 at 37 °C. When they reached 80% confluency, the cells were collected with 0.25% trypsin and then passaged at a ratio of 1:2. All of the cell cultures tested negative for mycoplasma contamination.

FOXO3 knockdown in human lung fibroblasts

Short interfering RNAs targeting FOXO3 messenger RNA were purchased from RIBOBIO. The sequences are listed in Supplementary Table 7. The negative control duplex (RIBOBIO) was not homologous to any known mammalian genes52. Cells were transfected with the negative control duplex or siRNAs against FOXO3 using Lipofectamine RNAiMAX transfection reagent (Thermo Fisher Scientific) following the manufacturer’s instructions. At 72 h after the transfection, the cells were collected for reverse transcription-quantitative PCR (RT-qPCR) and western blotting. At six days after the transfection, the cells were collected for either RNA-seq or apoptosis analysis.

Construction of an expression vector encoding constitutively active HIF-1α

The PLE4-HIF-1α overexpression vector was constructed by replacing the green fluorescence protein (GFP) sequence of the origin vector (PLE4-GFP, a gift from T. Hishida) with the HIF-1α complementary DNA sequence. Next, a site-directed mutagenesis kit (TRAN, FM111-01) was used to generate the constitutively active mutant of HIF-1α (HIF-1α-CA) containing two mutation sites, P402A and P564A, following the manufacturer’s instructions56. The primers used for cloning the HIF-1α cDNA sequence and introduction of HIF-1α mutations are listed in Supplementary Table 7.

Lentivirus production

Lentiviruses were produced as previously described57. HEK293T cells (originating from the American Type Culture Collection) were transfected with lentiviral vectors for protein overexpression along with the packing plasmids psPAX2 and pMD2.G. The supernatant containing the lentiviral particles was collected at 48 and 60 h after the transfection and mixed before ultracentrifugation at 19,400g for 2.5 h at 4 °C.

RNA isolation and analyses

Total RNA was extracted from tissues or cells using TRIzol (Thermo Fisher Scientific, 15596018). The GoScript reverse transcription system (Promega) was then used to reverse-transcribe cDNA. RT-qPCR was conducted using the iTaq Universal SYBR Green SuperMix (Bio-Rad) on a CFX384 real-time PCR system (Bio-Rad). For each gene, the relative mRNA expression level was normalized to the expression level of GAPDH (Extended Data Fig. 8g,m) or 18S (Extended Data Fig. 8f), as appropriate, calculated using the ∆∆Cq method. The ddPCR examination for virus from tissue biopsies was performed as previously described4. Briefly, ddPCR assays were performed on an QX200 AutoDG droplet digital PCR system (Bio-Rad) with a One-step RT-ddPCR advanced kit (Bio-Rad, 186-4021) according to the manufacturer’s instructions. The primer and probe sequences of SARS-CoV-2 were obtained from the National Institute for Viral Disease Control and Prevention (http://nmdc.cn/#/nCoV). All of the primers and probes that were used are listed in Supplementary Table 7. For the bulk RNA-seq, sequencing libraries were prepared using a NEBNext UltraTM RNA library prep kit for Illumina and individually indexed. The resulting libraries were analysed on an Illumina paired-end sequencing platform by 150-base-pair read length by Novogene Bioinformatics Technology Co. Ltd.

Western blot analysis

Western blotting was performed as previously described10. Briefly, the protein concentrations were determined using a BCA kit. The protein lysates were subjected to SDS-PAGE and electrotransferred to a polyvinylidene fluoride membrane (Millipore). The membrane was blocked in blocking buffer, incubated overnight with primary antibodies at 4 °C and then with horseradish peroxidase-conjugated secondary antibodies before visualization on a ChemiDoc XRS system (Bio-Rad). Band quantification was performed using the Image Lab software. The antibodies used are listed in Supplementary Table 8.

Flow-cytometry analysis

Apoptosis analysis was performed according to the manufacturer’s protocol. Briefly, after transfection with siRNA for 6 d, the cells were collected and stained with propidium iodide and annexin V-EGFP for 10-15 min at 37 °C using an Apoptosis detection kit (Vigorous Biotechnology). The samples were then analysed using a BD LSRFortesa flow cytometer.

Protein extraction and digestion for LC–MS/MS

Lung tissues were individually homogenized and lysed in lysis buffer (1% SDS containing 1× protease inhibitor cocktail); the protein concentrations were then determined by BCA assay. For digestion, the protein solutions were precipitated with 20% tricarboxylic acid for 2 h at 4 °C. The supernatant was removed via centrifugation at 4,500g for 4 min and the precipitates were washed three times with pre-cooled acetone. The precipitates were resuspended in 200 mM triethylammonium bicarbonate and incubated with trypsin at a trypsin-to-protein mass ratio of 1:50 for overnight digestion. Finally, the precipitates were incubated with 5 mM dithiothreitol for 30 min at 56 °C and alkylated with 11 mM iodoacetamide for 15 min at RT in the dark.

Peptide fractionation for LC–MS/MS

For the construction of the DIA spectral library, digested peptides were fractionated using high-pH reverse-phase HPLC with an Agilent 300 extend C18 column (5 μm particles, 4.6 mm internal diameter, 250 mm length). Briefly, the peptides were separated into 60 fractions with a gradient of 8-32% acetonitrile in 10 mM ammonium bicarbonate (pH 9) and then combined into 12 fractions and dried by vacuum centrifugation.

Nuclei isolation and snRNA-seq on the 10x Genomics platform

The isolation of nuclei was performed using a previously published protocol58. All sample handling steps were performed on ice. Briefly, the frozen tissues were ground using a pestle and mortar, and solubilized in 1.5 ml lysis buffer containing 250 mM sucrose, 25 mM KCl, 5 mM MgCl2, 10 mM Tris buffer, 1 μM dithiothreitol, 1×protease inhibitor, 0.4 U μl−1 RNaseIn, 0.2 U μl−1 Superasin and 0.1% Triton X-100 in nuclease-free water. The samples were filtered several times through a 40 μm cell strainer (BD Falcon), centrifuged at 1,000g for 8 min at 4 °C and resuspended in PBS supplemented with 0.3% BSA, 0.4 U μl−1 RNaseIn and 0.2 U μl−1 Superasin. The nuclei were stained with acridine orange and propidium iodide, and then counted using a dual-fluorescence cell counter (Luna-FL, Logos Biosystems). Mononuclear capture was conducted using a 10x Genomics single-cell 3′ system. Approximately 9,000 nuclei were captured for each sample following the standard 10x capture and library preparation protocol (10x Genomics) and then sequenced in a NovaSeq 6000 sequencing system (Illumina, 20012866).

Bulk RNA-seq data processing

Raw reads were trimmed using TrimGalore (version 0.4.4_dev; https://github.com/FelixKrueger/TrimGalore). The trimmed reads were mapped to the hg19 genome using HISAT2 (version 2.0.4)59, generating sam files, which were then converted to bam files by SAMtools (version 1.6; http://www.htslib.org/). HTSeq was used to calculate the read count of each gene (version 0.11.0)60. Differentially expressed genes were identified using the R package DESeq2 (version 1.26.0)61, with a cutoff of adjusted P < 0.05 and |log2FC| > 1.5.

Ingenuity pathway analysis of upstream regulators

The upstream regulators of DEGs from bulk-seq were identified using ingenuity pathway analysis (Qiagen; https://www.qiagenbioinformatics.com/products/ingenuitypathway-analysis). Only regulators with an activation z-score showing either an increased or a decreased activation state for the implicated biological function and those with a P value less than 0.05 were kept for downstream analyses. Network plots were generated using Cytoscape (version 3.7.2)62.

Proteomics data analysis

Spectral libraries were generated following a previously published protocol63. Briefly, data-dependent acquisition and DIA data were processed using the Pulsar search engine in Spectronaut (v14.6) and default settings. Tandem mass spectra were searched against the human SwissProt database (20,366 entries) concatenated with a decoy database. The false discovery-rate thresholds of the total numbers of identified peptide sequences (PSMs), peptides and proteins were set to less than 1%. All of the DIA data were analysed in Spectronaut (v14.6) against the spectral library and the retention times were recalibrated by nonlinear calibration. The identification settings were set as follows: the maximum number of decoys was set to a fraction of 0.1 of the library size and the q-value cutoff was set to 0.01 on the precursor and protein levels. Relative protein quantification was performed using the MSstats package. Differentially expressed proteins were defined as those with a cutoff of |log2FC| > 1.5 and adjusted P < 0.05.

Processing of snRNA-seq data

Sequences from the NovaSeq analysis were de-multiplexed using bcl2fastq (version 2.20.0.422) to convert BCL files to FASTQ files. A pre-mRNA reference of hg19 was created following the Cell Ranger (version 3.1.0) protocol (https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/advanced/references). Gene expression matrices for downstream analyses were calculated using the ‘count’ function of Cell Ranger and the default parameters.

Filtering of low-quality cells, clustering and identification of cell types

The output h5 file from Cell Ranger was calculated using CellBender (version 0.2.0) to reduce ambient RNA bias using default parameters, which was applied to every sample before the count matrices were merged64. The filtered matrixes were further analysed using Seurat (version 3.1.3)65. Cells with ≤ 200 genes or with a mitochondrial gene ratio ≥ 5% were regarded as low-quality cells and excluded. Possible doublets were detected using DoubletFinder (version 2.0.2)66. The data of each sample were then normalized using the function ‘SCTransfrom’ by Seurat. Features and anchors for downstream integration were selected using ‘PrepSCTIntegration’ and ‘FindIntegrationAnchors’. Nineteen post-mortem interval-related genes67 of human lung tissues were obtained to set up a gene set and regress them out using the ‘ScaleData’ function of Seurat. After data integration and scaling, the principle component analysis and clustering was performed using the ‘RunPCA’ and ‘FindClusters’ functions of Seurat. Dimensionality reduction was performed using the ‘RunUMAP’ function. Marker genes for each cluster were identified using the ‘FindAllMarkers’ function (adjusted P < 0.05 and |logFC| > 0.5). In addition, four clusters in the lung snRNA-seq data were excluded due to low quality, which was defined as having no specific marker genes, relatively low gene numbers or high mitochondrial gene ratios. In the end, a total of 141,152 cells considered to be of high quality were further analysed. The cell types were identified according to the expression of canonical marker genes for each cluster (Supplementary Table 3).

Analysis of DEGs from snRNA-seq data

Differential gene expression analysis between the COVID-19 and Control groups (COVID-19 DEGs) or between the Control and Control-Y (ageing DEGs) groups was performed using the ‘FindMarkers’ function of Seurat using the Wilcoxon signed-rank test. LogFC of DEGs in snRNA-seq data was calculated by natural logarithm. Genes with adjusted P < 0.05 and |logFC| > 0.5 were identified as DEGs. For a given cell type, if the total number of cells were 5 in any group, they were excluded from downstream analyses.

Module analysis of DEGs

We used the ‘AverageExpression’ function to calculate the averaged gene expression levels of upregulated/downregulated DEGs across 28 different cell types in the Control and COVID-19 groups, and clustered and classified these DEGs into six modules using the ‘hcluster’ and ‘kmeans_k’ functions of pheatmap R packages (version 1.0.12).

Transcription factor regulatory network analysis

Transcription factor regulatory network analysis was performed using SCENIC workflow (version 1.1.2.2)68 with default parameters. The transcription factor database based on the hg19 genome were downloaded using RcisTarget (version 1.6.0) to be used as a reference. Gene regulatory networks were inferred using GENIE3 (version 1.6.0) based on the COVID-19 DEGs across 28 cell types of the lung. Enriched transcription factor-binding motifs, predicted candidate target genes (regulons) and regulon activity were inferred using RcisTarget. The transcription regulatory network was visualized using Cytoscape (version 3.7.2)62.

GO term and pathway enrichment analysis of DEGs

GO term and pathway enrichment analysis of DEGs was performed using Metascape (version 3.5; http://metascape.org/gp/index.html). The results were visualized using the ggplot2 R package (version 3.2.1; https://ggplot2.tidyverse.org/).

Pseudotime analysis

Pseudotime analysis was performed on epithelial (AT1, AT2, AD.inter, basal, goblet, club and ciliated cells) and stromal (alveolar fibroblasts, adventitial fibroblasts, myofibroblasts, airway smooth muscle cells, Vas.SMC and pericytes) cells from the lung atlas data using the Monocle2 R package69. Gene ordering was performed using a cutoff of expression in at least ten cells and a combination of intercluster differential expression and dispersion with a q-value cutoff of < 0.01. The structure of the trajectory was plotted in two-dimensional space using the DDRTree dimensionality reduction algorithm and the cells were ordered in pseudotime.

Cell–cell communication analysis

Cell-cell communication analysis was conducted with the snRNA-seq data using the CellPhoneDB software (version 1.1.0)70. Only receptors and ligands expressed in more than 10% of cells of any type from the Control or COVID-19 groups were further evaluated and a cell-cell communication was considered non-existent if the ligand or the receptor was unmeasurable. The average expression of each ligand-receptor pair was analysed in various cell types and only those with P < 0.01 were used for the prediction of cell-cell communication between any two cell types.

Gene-set score analysis

Gene sets were obtained from GSEA (https://www.gsea-msigdb.org/) and DisGeNET (https://www.disgenet.org/home/). For the snRNA-seq data, gene sets were used for scoring each input cell using the Seurat function ‘AddModuleScore’. For the RNA-seq and proteomics data, the sum of all genes in a gene set was calculated and the score was defined as the log10-transformed sum. Changes in the scores between the COVID-19 and Control groups were analysed using the ggpubr package and the Wilcoxon signed-rank test (version 0.2.4; https://github.com/kassambara/ggpubr).

Statistics and reproducibility

No statistical methods were used to predetermine the sample size. The experiments were not randomized and the investigators were not blinded to allocation during experiments and outcome assessment. All data were statistically analysed using a one-tailed t-test, two-tailed t-test, Wilcoxon signed-rank test or analysis of variance to compare differences between groups, assuming equal variance, using the PRISM software (GraphPad 8 software) or R packages. P < 0.05 was considered statistically significant. P values are presented in the figures, as appropriate. The numbers of experimental repeats are indicated in the figure legends.

Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.