Main

The molecular features of a patient’s tumour influence clinical responses and can be used to guide therapy, leading to more effective treatments and reduced toxicity1. However, most patients do not benefit from such targeted therapies in part owing to a limited knowledge of candidate targets2. Lack of efficacy is a leading cause of the 90% attrition rate in the development of cancer drugs, and fewer molecular entities to new targets are being developed3. Unbiased strategies that effectively identify and prioritize targets in tumours could expand the range of targets, improve success rates and accelerate the development of new cancer therapies.

CRISPR–Cas9 screens that use libraries of single-guide RNAs (sgRNAs) have been used to study gene function and their role in cellular fitness4,5. CRISPR–Cas9-based genome editing provides high specificity and produces penetrant phenotypes as null alleles can be generated. Here we present genome-scale CRISPR–Cas9 fitness screens in 324 cancer cell lines and an integrative analysis that enables the prioritization of candidate cancer therapeutic targets (Fig. 1a), which we illustrate through the identification of Werner syndrome ATP-dependent helicase (WRN) as a target for tumours with microsatellite instability (MSI).

Fig. 1: Target prioritization framework.
figure 1

a, Strategy to prioritize targets in multiple cancer types, incorporating CRISPR–Cas9 gene fitness effects, genomic biomarkers and target tractability for drug development. ADaM (adaptive daisy model) distinguishes context-specific and core fitness genes. Datasets are available on the project Score website (https://score.depmap.sanger.ac.uk/). b, Number of genes exerting a fitness defect in a given number of cell lines. The bars indicate the percentage of genes that induce a dependency in less than (bottom bar) or at least (top bar) 50% of cell lines. c, Bottom, number of core and context-specific fitness genes identified by ADaM for 13 cancer types (median = 866 and 2,813, respectively). The ADaM threshold is the number of cell lines for a gene to be classified as core fitness. Top, comparison of the effect size for ADaM core and context-specific fitness genes (only significant genes are shown, BAGEL FDR = 5%). CNS, central nervous system; Haemat., haematological; PNS, peripheral nervous system.

Genome-scale CRISPR–Cas9 screens in cancer cell lines

To comprehensively catalogue genes that are required for cancer cell fitness (defined as genes required for cell growth or viability), we performed 941 CRISPR–Cas9 fitness screens in 339 cancer cell lines, targeting 18,009 genes (Extended Data Fig. 1a, b and Supplementary Table 1). Following stringent quality control (Extended Data Fig. 1c–h), the final analysis set included 324 cell lines from 30 different cancer types, across 19 different tissues (Extended Data Fig. 1i). These cell lines are part of the collection of Cell Model Passports of highly genomically annotated cell lines6, broadly represent the molecular features of tumours in patients7, and include common cancers (such as lung, colon and breast cancers) and cancers of particular unmet clinical need (such as lung and pancreatic cancers). Analysis of screen data from these 324 cell lines demonstrated high sensitivity, specificity and precision in classifying essential and non-essential genes8 (Extended Data Fig. 1g, h, j), and results were not biased by experimental factors (Extended Data Fig. 2a–e).

Defining core and context-specific fitness genes

Genes required for cell fitness in specific molecular or histological contexts are likely to encode favourable drug targets, because of a reduced likelihood of inducing toxic effects in healthy tissues. Conversely, fitness genes that are common to the majority of tested cell lines or common within a cancer type (referred to as pan-cancer or cancer-type-specific core fitness genes, respectively) may be involved in essential processes in cells and have greater toxicity. It is therefore important to distinguish context-specific fitness genes from core fitness genes.

We identified a median of 1,459 fitness genes in each cell line (Extended Data Fig. 2f–n and Supplementary Table 2). In total, 41% (n = 7,470) of all targeted genes induced a fitness effect in one or more cell lines and the majority (83%) of these genes induced a dependency in less than 50% of the tested cell lines (Fig. 1b). To identify core fitness genes, we developed a statistical method, the adaptive daisy model (ADaM; Extended Data Fig. 3a–d), to adaptively determine the minimum number of dependent cell lines that are required for a gene to be classified as a core fitness gene (Fig. 1c). Genes that were defined as core fitness in at least 12 out of 13 cancer types (also adaptively determined) were classified as pan-cancer core fitness genes (Extended Data Fig. 3e–g). This yielded a median of 866 cancer-type-specific and 553 pan-cancer core fitness genes (Fig. 1c and Supplementary Table 3).

Of the pan-cancer core fitness genes identified using ADaM, 399 were previously defined as essential genes8,9 and 125 are genes involved in essential cellular processes10,11 (Extended Data Fig. 4a). The remaining 132 (24%) genes were newly identified and are also significantly enriched in cellular housekeeping genes and pathways (Extended Data Fig. 4b, c and Supplementary Table 4). In comparison to previously identified reference core fitness gene sets8,9, our pan-cancer core fitness gene set showed greater recall of genes involved in essential processes (median = 67%, versus 28% and 51% in the previously published gene sets of refs. 8 and 9, respectively, Extended Data Fig. 4d), with similar false discovery rates (FDRs) for putative context-specific fitness genes (taken from a previous study12; Extended Data Fig. 4e). Blood cancer cell lines had the most distinctive profile of core fitness genes (31 exclusive core fitness genes; Extended Data Fig. 4f). Cancer-type-specific core fitness genes are generally highly expressed in matched healthy tissues (Extended Data Fig. 4g), consistent with their predicted role in fundamental cellular processes and suggesting that they show potential toxicity if used as targets. Notably, five genes were core fitness in a single cancer type and were lowly or not expressed at the basal level in the matched normal tissues (Extended Data Fig. 4g), suggesting that they could induce cancer-cell-specific dependencies in these tissues.

Overall, using a statistical approach, we refined and expanded our current knowledge of core fitness genes in humans and identified genes that have a high likelihood of toxicity, which thus represent less favourable therapeutic targets. Furthermore, owning to the large scale of our dataset, we could now define context-specific fitness genes (median n = 2,813 genes per cancer type), many of which had a loss-of-fitness effect that was similar to or stronger than core fitness genes (Fig. 1c).

A quantitative framework for target prioritization

To nominate promising therapeutic targets from our list of context-specific fitness genes, we developed a computational framework that integrated multiple lines of evidence to assign each gene a target priority score—which ranged from 0 to 100—and generated ranked lists of candidates for an individual cancer type or a pan-cancer candidate (Fig. 1a and Extended Data Fig. 5a). To exclude genes that are likely to be poor targets because of potential toxicity, core fitness genes were scored as ‘0’, as were potential false-positive genes, such as genes that were not expressed or homozygously deleted. For each gene, 70% of the priority score was derived from CRISPR–Cas9 experimental evidence and averaged across dependent cell lines on the basis of the fitness effect size, the significance of fitness deficiency, target gene expression, target mutational status and evidence for other fitness genes in the same pathway. The remaining 30% of the priority score was based on evidence of a genetic biomarker that was associated with a target dependency and the frequency at which the target was somatically altered in tumours in patients7. For the biomarker analysis, we performed an analysis of variance (ANOVA; Fig. 2a, Extended Data Fig. 5b and supplementary data 1) to test associations between fitness genes and the presence of 484 cancer driver events (151 single-nucleotide variants and 333 copy number variants)7 or MSI, in each cancer type with a sufficiently large sample size (n ≥ 10 cell lines) and pan-cancer. We derived a priority score threshold (55 and 41 for pan-cancer and cancer-type-specific analyses, respectively) based on scores calculated for targets with approved or preclinical cancer compounds (Extended Data Fig. 5c and Supplementary Table 5).

Fig. 2: Target prioritization and biomarker discovery.
figure 2

a, Differential dependency biomarkers were analysed by ANOVA. Each point is an association between the fitness effect of a gene (top name) and a molecular feature or MSI (bottom name). Colours indicate results from 13 cancer-type-specific (number of cell lines indicated in Supplementary Table 1) or pan-cancer (n = 319) analyses. FDRs were calculated using the Storey–Tibshirani method. b, Cancer-type-specific and pan-cancer priority targets classified based on tractability for drug development as groups 1, 2 and 3 (strong, weak and absence of evidence, respectively). c, Priority targets with a genomic biomarker defined as class A, B or C (from strongest to weakest, based on statistical significance and effect size).

In total, we identified 628 unique priority targets, including 92 pan-cancer and 617 cancer-type-specific targets (Fig. 2b and Supplementary Tables 6, 7). The number of priority targets varied approximately threefold across cancer types with a median of 88 targets. The majority of cancer-type priority targets (n = 457, 74%) were identified in only one (56%) or two (18%) cancer types, underscoring their context specificity. Most priority pan-cancer targets (88%) were also identified in the cancer-type-specific analyses (Extended Data Fig. 5d). The 11 priority targets that were identified only in the pan-cancer analysis typically included dependencies that occurred in a small subset of cell lines from multiple cancer types (for example, CREBBP and JUP) or in a cancer type for which the limited numbers of available cell lines prevented a cancer-type-specific analysis being performed (for example, SOX10 in melanoma; Extended Data Fig. 5e).

Of the 628 priority targets, 120 (19%) were associated with at least one biomarker identified using ANOVA with high significance and large effect size (defined as class A targets) and these proteins would therefore be of particular interest for drug development (Fig. 2c). For example, PIK3CA is a class A target in breast, oesophageal, colorectal and ovarian carcinoma; PI3K inhibitors are in clinical development for cancers with mutations in PIK3CA13. Using progressively less stringent significance thresholds expanded the targets with at least one biomarker association as identified by ANOVA, which were defined as class B (n = 61, 10%) followed by class C (n = 117, 19%) targets, some of which were identified in multiple cancer types (Supplementary Table 8). Taken together, these results highlight the potential of a data-driven quantitative framework to prioritize targets by combining CRISPR–Cas9 screening data from multiple cell lines and associated genomic features.

Tractability assessment of priority targets

On the basis of current drug-development strategies, targets vary in their suitability for pharmaceutical intervention and this informs target selection. Using a target tractability assessment for the development of small molecules and antibodies, we previously assigned each gene to 1 of 10 tractability buckets (with 1 indicating the highest tractability)14. We cross-referenced the 628 priority targets with their tractability and categorized them into three tractability groups (Fig. 2b and Supplementary Table 9).

Tractability group 1 (buckets 1–3) comprised targets of approved anticancer drugs or compounds in clinical or preclinical development, and included 40 unique priority targets, such as ERBB2, ERBB3, CDK4, AKT1, ESR1, TYMS and PIK3CB in breast carcinoma and PIK3CA, IGF1R, MTOR and ATR in colorectal carcinoma (Figs. 3a, 4 and Extended Data Fig. 6). Of these 40 priority targets, 20 have at least one drug that has been developed for the cancer type in which the target was identified as priority, whereas the remaining 20 targets have drugs that have been used or developed for treatment of other cancer types, which present opportunities for the repurposing of these drugs. A third of the priority targets in group 1 have a class A biomarker, indicating that they are highly desirable targets (Supplementary Tables 8, 9). An example is CSNK2A1, which is encoded by the highly significant fitness gene CSNK2A1 in colorectal cancer cell lines with amplification of a chromosomal segment that contains FLT3 and WASF3 (P = 6.65 × 10−6, Glass’s Δ > 2.9, Fig. 3b) and targeted by silmasertib. Other priority targets in group 1 with markers show ERBB2 or ERBB3 dependency in the presence of ERBB2 amplification, CDK2 dependency in ASXL-amplified oesophageal cancer cell lines, PIK3CA dependency in the presence of PIK3CA mutations, and PIK3CB dependency in breast cancer cell lines with PTEN mutations (Fig. 3b and supplementary data 1).

Fig. 3: Priority targets and biomarker-linked dependencies.
figure 3

a, All priority targets from cancer-type and pan-cancer analyses and their tractability. Priority score thresholds are indicated and selected examples labelled. b, Differential fitness analysis (quantile-normalized gene depletion fold change between the average of targeting sgRNAs versus plasmid library) for selected priority targets comparing cells with (+) or without (−) a genomic marker (classes A–C as previously defined from ANOVAs). Each data point is a cell line and colours represent cancer type. Box-and-whisker plots show 1× interquartile ranges and 5–95th percentiles, centres indicate medians.

Fig. 4: Cancer-type priority targets.
figure 4

Results for 4 of the 13 cancer-type-specific analyses. Points are target priority scores and the shapes indicate approved or preclinical compounds to the corresponding target (other disease (squares), anticancer targets (triangles) or those specific to the cancer type considered (rhombus)), or the absence of a compound (circles). Symbols indicate the strength of a genomic biomarker. Selected priority targets are labelled.

Tractability group 2 (buckets 4–7) contained 277 priority targets without drugs in clinical development but with evidence that support target tractability (Figs. 3a, 4, Extended Data Fig. 6 and Supplementary Table 9). Of these, 18% have a class A biomarker, including KRAS dependency in KRAS-mutant cell lines, USP7 dependency in APC wild-type colorectal cell lines, KMT2D dependency in breast cancer cell lines with amplification of a chromosomal segment that contains PPM1D and CLTC, and TRIAP1 dependency in MYC-amplified bone and gastric cancer cell lines (Fig. 3b and supplementary data 1). Of note, we observed a class A biomarker-type dependency on WRN in colorectal and ovarian cell lines with MSI and pan-cancer (Fig. 3b). Of the priority targets in group 2 that were not associated with a biomarker, GPX4 is a target in multiple cancer types (Fig. 4, Extended Data Fig. 6 and Supplementary Table 9). Sensitivity to GPX4 inhibition has been associated with epithelial–mesenchymal transition15 and we observed differential expression of markers associated with epithelial–mesenchymal transition in GPX4-dependent cell lines (Extended Data Fig. 7a and supplementary data 2). This is indicative of how future refinement of our target prioritization scheme can capture priority targets that are associated with an expanded set of molecular features, including gene expression, chromatin modifications and differentiation states.

Lastly, group 3 (buckets 8–10) included 311 priority targets that had no support or a lack of information that could inform tractability (Figs. 3a, 4 and Extended Data Fig. 6); this group is significantly enriched in transcription factors (Extended Data Fig. 7b and supplementary data 3). Examples of priority targets in group 3 with class A biomarkers include FOXA1 and GATA3 in breast cancer, MYB in haematological and lymphoid cancer, STX5 in ovarian cancer and PFDN5 in neuroblastoma cell lines (Fig. 3b).

Priority targets in tractability group 1 were enriched in protein kinases, highlighting a major focus of drug development against this class of targets, compared to groups 2 and 3, which included a more functionally diverse set of targets (Extended Data Fig. 7b and supplementary data 3). Targets in group 2 are most likely to be novel and tractable through conventional modalities and, therefore, represent good candidates for drug development. Newer therapeutic modalities, such as proteolysis-targeting chimaeras, may increase the range of proteins that are amenable to pharmaceutical intervention to include targets in group 3. Overall, our framework informed a data-driven list of prioritized therapeutic targets that would be strong candidates for the development of cancer drugs.

WRN is a target in cancers with MSI

To substantiate our target prioritization strategy, we investigated WRN helicase as a promising target in MSI cancers (Figs. 3, 4). WRN is one of five RecQ family DNA helicases, of which it is the only one that has both a helicase and an exonuclease domain, and has diverse roles in DNA repair, replication, transcription and telomere maintenance16. The MSI phenotype is caused by impaired DNA mismatch repair (MMR) due to silencing or inactivation of MMR pathway genes. MSI is associated with a high mutational load and occurs in more than 20 tumours types and is frequent in colon, ovarian, endometrial and gastric cancers (3–28%)17.

Dependency on WRN was highly associated with MSI in the pan-cancer ANOVA, and analyses of colon and ovarian cancer cell lines (Figs. 2a, 3b and supplementary data 1). Most endometrial and gastric cancer cell lines with MSI were dependent on WRN; however, the association with MSI was not significant (for gastric) or not tested because of small sample sizes (Extended Data Fig. 7c). MSI is rare (<1%) in many other tumour types, such as kidney, melanoma and prostate cancers17 and most (4 out of 5 tested) MSI cell lines from these tissues were not dependent on WRN (Extended Data Fig. 7c). Other tested RecQ family members (BLM, RECQL and RECQL5) were not associated as fitness genes in MSI cell lines. A focused analysis of non-synonymous mutations, promoter methylation and homozygous deletions of MMR pathway genes confirmed a significant association between WRN dependency and hypermethylation of the MLH1 promoter (Student’s t-test, FDR = 7.72 × 10−3) or mutations in MSH6 (FDR = 3.85 × 10−2); as well as mutations in the epigenetic regulator MLL2 (also known as KMT2D) (FDR = 1.43 × 10−4) (Fig. 5a).

Fig. 5: WRN is a target in MSI cancer cells.
figure 5

a, Circle plot of cell lines. From the outer ring to inner ring the following are shown: the fitness effect of WRN knockout and mean effect of core fitness genes (red dashed line); cancer-type; MLH1 methylation (meth.) status; mutation (mut.) status of MLL2 and MSH6; and the DNA mutation rate. b, WRN dependency in a co-competition assay. sgRNAs that target essential (sgEss) and non-essential (sgNon) genes were used as controls. Each point represents the mean co-competition score for a cell line (seven MSI and seven microsatellite stable (MSS) lines in duplicate); four WRN sgRNA guides were used. A score less than 1 denotes selective depletion of sgRNA-expressing knockout cells. Box-and-whisker plots show 1.5× the interquartile range and the median. P values were determined using a two-sided Welch’s t-test. c, WRN rescue using wild-type (WT), exonuclease-deficient (def.) or helicase-deficient mouse Wrn in SW48 cells with MSI. Mean ± s.d. from 3 independent experiments. P values were calculated using a standard two-sided t-test assuming equal variance; comparison to wild-type Wrn. NS, not significant. d, Tumour volume of WRN sgRNA-expressing HCT116 (clone a) xenografts treated with doxycycline (yellow line) or vehicle (grey line). P = 0.006, two-way ANOVA. Data are mean ± s.e.m. Numbers of mice in each cohort are indicated. e, Representative KI-67 immunohistochemistry assessment of WRN sgRNA-expressing HCT116 (clone a) tumours explanted after one week. Scale bar, 50 μm; 40× magnification. f, Quantification of KI-67 staining. Data are mean ± s.d. of 10 fields from three different samples. n = 30; P value were calculated using a two-sided Welch’s t-test.

Source Data

To further validate WRN, we performed CRISPR-based co-competition assays in which the relative fitness of WRN-knockout versus wild-type cells was compared. WRN knockout using four individual sgRNAs decreased fitness of WRN-knockout compared to wild-type cells in six MSI cell lines from colon, ovarian, endometrial and gastric cancers (Fig. 5b, Extended Data Fig. 8a and Supplementary Table 10). By contrast, there was no difference in all microsatellite stable cell lines from these four tissues. Consistently, WRN was selectively essential for MSI cells in clonogenic assays (Extended Data Fig. 8b, c). Of note, WRN knockout had a potent effect on cell fitness with an effect size similar to core fitness genes (Fig. 5a, b). Furthermore, we mined data from systematic RNA interference screens and confirmed WRN dependency in MSI cancer cell lines12 (Extended Data Fig. 8d), and confirmed that WRN downregulation by RNA interference robustly impaired growth in MSI HCT116 cells (Extended Data Fig. 8e, f), thus providing validation in an orthogonal experimental system. Despite the strong association between MMR deficiency and WRN dependency, knockout of MLH1 in microsatellite-stable SW620 cell line did not induce WRN dependency; conversely complementation of HCT116 cells with chromosomes that contain MLH1 and/or MSH3—to restore their expression and correct MMR deficiency18—did not revert the effect of WRN knockout (Extended Data Fig. 9).

To determine whether the loss-of-fitness effect was selective to WRN and identify a potential strategy for drug targeting, we performed functional rescue experiments using wild-type, or hypomorphic versions of mouse Wrn (resistant to the WRN sgRNAs that we used) with a mutation in the exonuclease (E78A) or helicase (R799C or T1052G) domain to impair protein function19,20,21. Expression of wild-type or exonuclease-deficient Wrn rescued knockout of WRN in MSI cells, whereas expression of helicase-deficient Wrn led to no (R799C) or weak (T1052G) rescue (Fig. 5c and Extended Data Fig. 10a, b). Thus, the helicase activity of WRN is required and is an important domain that can be used for therapeutic targeting.

To evaluate in vivo sensitivity of MSI cells to WRN depletion, we developed a doxycycline-inducible WRN sgRNA system in HCT116 cells (Extended Data Fig. 10c, d). Following subcutaneous engraftment of WRN sgRNA-expressing HCT116 cells in mice, treatment with doxycycline led to significant growth suppression of established tumours and a reduction in the number of proliferating cells (Fig. 5d–f and Extended Data Fig. 10e, f). These findings confirm that WRN is necessary to sustain in vivo growth of colorectal cancer cells with MSI.

Discussion

New approaches are needed to effectively prioritize candidate therapeutic targets for cancer treatments. We performed CRISPR–Cas9 screens in a diverse collection of cancer cells lines and combined this with genomic and tractability data to systematically nominate new cancer targets in an unbiased way. Confirmatory studies are necessary to further evaluate the priority targets that we identified. Even a modest improvement in drug-development success rates, and an expanded repertoire of targets, through approaches such as ours could provide benefits to patients with cancer. Our CRISPR–Cas9 screening results are also a resource with diverse applications in fundamental and evolutionary biology, genome engineering and disease genetics. Results are available through the project Score database (https://score.depmap.sanger.ac.uk/).

We identified WRN as a promising new synthetic lethal target in MSI tumours. This finding is corroborated by the accompanying study by Chan et al.22. WRN physically interacts with MMR proteins23, can resolve DNA recombination intermediates24, and the yeast homologue Sgs1 has a redundant function with MMR proteins to suppress homeologous recombination in regions of nucleotide mismatch25. Together with our finding that modulation of MMR proteins alone is insufficient to confer WRN dependency, this suggests a model in which WRN is required to resolve the genomic structures present in MMR-deficient cells, which are possibly homeologous recombination structures, and failure to efficiently resolve these underpins the synthetic lethal dependency. Mutation of WRN leads to Werner syndrome, an autosomal recessive disorder characterized by premature ageing and an increased risk of cancer16. Thus, loss of WRN is compatible with human development; however, targeting WRN could result in damage to normal cells. Consideration should be given to maximizing therapeutic benefits through patient selection and dose scheduling. A possible route for clinical development of WRN antagonists would be as an adjunct therapy to approved immune checkpoint inhibitors in MSI tumours26.

In summary, we developed an unbiased and systematic framework that effectively ranks priority targets, such as WRN. Efforts such as ours, and from others5,8,12,22,27,28, to build a compendium of fitness genes, and the identification of context-specific dependencies as part of a cancer dependency map, could be transformative to improve success rates in the development of cancer drugs.

Methods

CRISPR–Cas9 screening

Plasmids

All plasmids have previously been described10 and are available through Addgene (Cas9 vector, 68343; gRNA vector, 67974). Plasmids were packaged using the ViraPower Lentiviral Expression System (Invitrogen, K4975-00) as per the manufacturer’s instructions.

Cell culture

Cell lines used in this study (Supplementary Table 1) were selected from 1,000 cell line panel7 of the Genomics of Drug Sensitivity in Cancer study, had been annotated in the Cell Model Passports database (https://cellmodelpassports.sanger.ac.uk/) and were maintained as previously described7. To control for cross-contamination and sample swaps a panel of 92 single-nucleotide polymorphisms was profiled for each cell line before and following completion of the CRISPR–Cas9 screening pipeline. This study includes commonly misidentified cell lines: Ca9-22, short tandem repeat (STR) analysis confirmed that the identity matched the Japanese Collection of Research Bioresources Cell Bank (JCRB) reference (JCRB0625) and RIKEN (RCB1976); MKN28, noted as derivative of MKN74 in Cell Model Passports and clinical information matches MKN74; KP-1N, known misidentification issue, Cell Model Passports data for both KP-1N & Panc-1 are identical; OVMIU, known misidentification issue, Cell Model Passports data for both OVMIU and OVSAYO are identical; SK-MG-1, STR profile matches JCRB profile, which internally matches Marcus, Cell Model Passport data for both SK-MG-1 and Marcus are identical. Commonly misidentified lines have been noted in Supplementary Table 1 and on the Cell Model Passport. All commonly misidentified cell lines were retained, because the misidentification does not impact tissue or cancer type of origin, and all datasets used were generated in-house from the same matched cell line.

A separate set of HCT116 cell lines was used for WRN validation experiments: HCT116 parental cells and HCT116 cells carrying Chr.3 or Chr.5, or both were a gift from M. Koi. HCT116 cells carrying Chr.2 were a gift from A. Goel. HCT116 cells carrying Chr.2 or Chr.3 were maintained in 400 μg ml−1 G418 (Thermo Fisher Scientific, 10131027); HCT116 cells carrying Chr.5 were maintained in 6 μg ml−1 blasticidin (Thermo Fisher Scientific, A1113903); and HCT116 cells carrying Chr.3 + Chr.5 were maintained in the presence of 400 μg ml−1 G418 and 6 μg ml−1 blasticidin. All cells were cultured in McCoy’s 5A medium (Sigma-Aldrich, M4892) with 10% FBS.

Generation of Cas9-expressing cancer cell lines

Cells were transduced with a lentivirus containing Cas9 in T25 or T75 flasks at approximately 80% confluence in the presence of polybrene (8 μg ml−1). Cells were incubated overnight followed by replacement of the lentivirus-containing medium with fresh complete medium. Blasticidin selection commenced 72 h after transduction at an appropriate concentration determined for each cell line using a blasticidin dose–response assay (blasticidin range, 10–75 μg ml−1) and cell viability was assessed using the CellTiter-Glo 2.0 Assay (Promega, G9241). Cas9 activity was assessed as described previously10. Cell lines with Cas9 activity over 75% were used for sgRNA library transduction.

Genome-wide sgRNA library and screen

Two genome-wide sgRNA libraries were used in this study: the Human CRISPR Library v.1.0 and v.1.1. The Human CRISPR Library v.1.0 was described previously and targets 18,009 genes with 90,709 sgRNAs (Addgene, 67989)10. The Human CRISPR Library v.1.1 contains all sgRNAs from v.1.0 plus 1,004 non-targeting sgRNAs and 5 additional sgRNAs against 1,876 selected genes that encode kinases, epigenetic-related proteins and pre-defined fitness genes. An oligo pool of Library v.1.1 was synthesized using high-throughput silicon platform technology (Twist Bioscience) and cloned as described previously10. For consistency, all computational analyses were performed considering only the overlapping sgRNAs between the two libraries (90,709 sgRNAs). Data for the additional sgRNAs in Library v.1.1 can be found in the raw read count files for cell lines screened with this library version (available at available at https://cog.sanger.ac.uk/cmp/download/raw_sgrnas_counts.zip), but have been removed before quality control analysis. The HT-29 cell line was screened with both libraries and resulting datasets were kept separated for comparative analyses (results are summarized in Extended Data Fig. 2j).

A total of 3.3 × 107 cells were transduced with an appropriate volume of the lentiviral-packaged whole-genome sgRNA library to achieve 30% transduction efficiency (100× library coverage). The volume was determined for each cell line using a titration of the packaged library and assessing the percentage of blue fluorescent protein (BFP)-positive cells by flow cytometry. Transductions were performed in technical triplicate (or duplicate for cell lines with a large cell size such as glioblastoma). Owing to the large number of screens performed, multiple batches of packaged library virus were prepared. Each batch was tested in HT-29 cells to ensure consistency between batch preparations. In addition, the HT-29 cell line was screened every 3 months to ensure the quality of data generated by the pipeline was consistent. Transduction efficiency was assessed 72 h after transduction. Samples with a transduction efficiency between 15 and 60% were used for puromycin selection. The appropriate concentration of puromycin for each individual cell line was determined from a dose–response curve (puromycin range, 1–5 μg ml−1) and cell viability was assessed using a CellTiter-Glo 2.0 Assay (Promega, G9241). The percentage BFP-positive cells was reassessed after a minimum of 96 h of puromycin selection. For samples with less than 80% BFP-positive cells, puromycin selection was extended for an additional 3 days and the percentage of BFP-positive cells was assessed again. Cells were maintained until day 14 after transduction with a minimum of 5.0 × 107 cells reseeded at each passage (500× library coverage). Approximately 2.5 × 107 cells were collected, pelleted and stored at −80 °C for DNA extraction.

DNA extraction, sgRNA PCR amplification, Illumina sequencing and sgRNA counting

Genomic DNA was extracted from cell pellets using either the QIAsymphony automated extraction platform (Qiagen, QIAsymphony DSP DNA Midi Kit, 937255) or by manual extraction (Qiagen, Blood & Cell Culture DNA Maxi Kit, 13362) as per the manufacturer’s instructions. PCR amplification, Illumina sequencing (19-bp single-end sequencing with custom primers on the HiSeq2000 v.4 platform) and sgRNA counting were performed as described previously10.

CRISPR screen data analyses

Low-level quality control assessment and filtering

To perform initial low-level quality control, the Pearson’s correlation of treatment counts between replicates was assessed for each cell line (Extended Data Fig. 1c). The resulting correlation scores were generally high (median = 0.8), but not sufficiently distinguishable from expectation (median correlation between replicates of any pair of randomly selected cell lines). Thus, to define a reproducibility threshold, we developed an approach based on a previously published study29. Specifically, we selected a set of the 838 most-informative sgRNAs, defined as those with an average pairwise Pearson’s correlation greater than 0.6 between corresponding patterns of the count fold changes 14 days after transfection versus plasmid library across all screened cell lines. We next computed average gene-level profiles for 308 genes targeted by these informative sgRNAs for each individual technical replicate, and then computed all possible pairwise Pearson’s correlation scores between the resulting profiles. This enabled the estimation of a null distribution of replicate correlations (plotted in grey in Extended Data Fig. 1d). We then defined a reproducibility threshold R value of 0.68, for which the estimated probability mass function of the correlation scores that was computed between replicates of the same cell line (considering the identified 308 genes only) was at least twice that of the null mass probability function (Extended Data Fig. 1d). Of the 332 screened cell lines with at least two technical replicates, 305 had an average replicate correlation higher than this threshold, and therefore passed the reproducibility assessment; for 7 cell lines there were no replicates. Excluding the least reproducible replicate for the 14 cell lines that did not pass the first reproducibility assessment allowed their average replicate correlation to exceed the threshold defined above, thus resulting in a set of 326 cell lines that passed the low-level quality control assessment (Supplementary Table 1).

Screening performance assessment

We considered the genome-wide profiles of gene-level sgRNA fold change values (averaged across targeting sgRNAs and replicates) of each cell line to be a classifier of predefined sets of essential and non-essential genes30 by means of receiver operating characteristic (ROC) indicators (Extended Data Fig. 1g and Supplementary Table 1). In addition, we measured the magnitude of the depletion signal observed in each screened cell line by evaluating the median log(change in sgRNA count), and the discriminative distance between their distributions (as measured by the Glass’s Δ) for predefined essential and non-essential genes30 and ribosomal protein genes31. In total, 2 out of the 326 cell lines were manually removed, because they had area under the ROC curve, area under the precision/recall curve and both Glass’s Δ values that were 3 s.d. lower than the average. On the basis of our low-level quality control and screening performance, the final analysis set was composed of 324 cell lines (Supplementary Table 1). Further details on these analyses are included in the Supplementary Information.

sgRNA count preprocessing and CRISPR-bias correction

The analysis set of 324 cell lines was further processed using CRISPRcleanR32 (https://github.com/francescojm/CRISPRcleanR). sgRNAs with less than 30 reads in the plasmid counts and sgRNAs belonging to only the Library v1.1 were first removed. The remaining sgRNAs were assembled into one file per cell line, including the read counts from the matching library plasmid and all replicates and then normalized using a median–ratio method to adjust for the effect of library sizes and read count distributions33. Depletion/enrichment fold changes for individual sgRNAs were quantified between post library-transduction read counts and library plasmid read counts at the individual replicate level. This was performed using the ccr.NormfoldChanges function of CRISPRcleanR. Next we performed a correction of gene-independent responses to CRISPR–Cas9 targeting34 using the ccr.GWclean function of CRISPRcleanR with default parameters.

Calling CRISPR–Cas9 gene knockout fitness effects

The CRISPRcleanR-corrected sgRNAs-level values (corrected fold change values) were used as input into an in-house-generated R implementation of the BAGEL method30 to call significantly depleted genes (code publicly available at https://github.com/francescojm/BAGELR). Our BAGEL implementation computes gene-level Bayesian factors by the sgRNAs on a targeted-gene basis, by averaging instead of summing them. Additionally, it uses reference sets of predefined essential and non-essential genes30. However, in order to avoid their status (essential or non-essential) being defined a priori, we removed any high-confidence cancer driver genes as defined previously7 from these sets. The resulting curated reference gene sets are available as built-in data objects in the R implementation of BAGEL (curated_BAGEL_essential.rdata and curated_BAGEL_nonEssential.rdata, both available at https://github.com/francescojm/BAGELR/tree/master/data). A statistical significance threshold for gene-level Bayesian factors was determined for each cell line as described previously8. Each gene was assigned a scaled Bayesian factor computed by subtracting the Bayesian factor at the 5% FDR threshold defined for each cell line from the original Bayesian factor, and a binary fitness score equal to 1 if the resulting scaled Bayesian factor was greater than 0. Further details on these analyses are included in the Supplementary Information.

In addition, CRISPRcleanR-corrected sgRNA treatment counts were derived from the corrected sgRNA-level count fold changes (using the ccr.correctCounts function of CRISPRcleanR) and used as input into MAGeCK35 to compute the depletion significance using mean–variance modelling. This was performed using the MAGeCK Python package (version 0.5.3), specifying in the command line call that no normalization was required (as this was already performed by CRISPRcleanR). At the end of this stage, the following gene-level depletion score matrices were produced for each cell line: raw count fold changes, copy number bias-corrected count fold changes, Bayesian factors, scaled Bayesian factors, binary fitness scores and MAGeCK depletion FDRs. All scores are summarized for each cell line and available at https://cog.sanger.ac.uk/cmp/download/essentiality_matrices.zip, together with all the sgRNAs raw count files (available at https://cog.sanger.ac.uk/cmp/download/raw_sgrnas_counts.zip).

High-level CRISPR screen data analyses

Adaptive daisy model (ADaM) to identify core fitness genes

We designed the adaptive daisy model (ADaM), an heuristic algorithm for the identification of core fitness genes, implemented it in an R package and made it publicly available at https://github.com/francescojm/ADaM. ADaM is based on the daisy model8, but it adaptively determines the minimal number of cell lines m from a given cancer type in which a gene should exert a significant fitness effect for that gene to be considered a core fitness gene for that cancer type. ADaM is described further in the Supplementary Information. In order to identify pan-cancer core fitness genes, we applied the same method to determine the minimal number k of cancer types for which a gene should be predicted as a pan-cancer core fitness gene.

Characterization of ADaM pan-cancer core fitness genes

Reference sets of essential and non-essential genes were extracted from a previously published study30. Other reference gene sets (used while characterizing the ADaM pan-cancer core fitness genes, described below) were derived from the Molecular Signature Database (MSigDB36) and post-processed as described previously32. A more recent set of a priori known essential genes was derived from a previously published study9. The pan-cancer core fitness genes that did not belong to any of the aforementioned gene sets were tested for gene family enrichments (using a hypergeometric test) by deriving gene annotations using the BioMart R package37 and biological pathway enrichments using a comprehensive collection of pathways gene sets from Pathway Commons38 (post-processed to reduce redundancies across different sets as described previously39). All enrichment P values were corrected using the Benjamini–Hochberg method. Results are shown in Supplementary Table 4.

Comparison between the ADaM pan-cancer core fitness genes and other reference sets of essential genes

We compared the pan-cancer core fitness genes identified by ADaM with the BAGEL reference set of essential genes30, and a more recently proposed larger set of essential genes9 in terms of size, estimated precision (number of included true positive genes/number of included genes) and recall (number of included true positive genes/total number of true positive genes). In these comparisons, we used gold-standard essential genes involved in cell essential processes (downloaded from the MSigDB36 and post-processed as described previously32). In addition, we estimated FDRs for the three gene sets (number of included false positive genes/total number of false positive genes) considering genes predicted to be strongly context-specific essential (thus not core-fitness essential) to be false-positive genes according to a previous publication12, and using three different confidence levels, as further described in the Supplementary Information.

Basal expression of cancer-type specific core fitness genes in normal tissues

Basal gene median reads per kilobase of transcript per million mapped reads in normal human tissues were downloaded from the GTEx Portal40, log-transformed and quantile-normalized on a tissue-type basis.

Statistical and computational analyses

ANOVA to identify genomic correlates with gene fitness

We performed a systematic ANOVA to test associations between gene-level fitness effects and the presence of 484 cancer driver events (CDEs; 151 single-nucleotide variants and 333 copy number variants)7 or MSI status at the pan-cancer as well as individual cancer-type levels. In total, 10 cancer types with at least 10 screened cell lines were analysed (breast carcinoma, colorectal carcinoma, gastric carcinoma, head and neck carcinoma, lung adenocarcinoma, neuroblastoma, oral cavity carcinoma, ovarian carcinoma, pancreatic carcinoma and squamous cell lung carcinoma). The remaining cancer types were collapsed on a tissue basis (annotation in Supplementary Table 1) and the resulting tissues with at least 10 cell lines were included in the analysis (bone, central nervous system, oesophagus, haematopoietic and lymphoid). A total of 14 analyses (referred for simplicity as cancer-type-specific ANOVAs in the main text and below) plus a pan-cancer analysis including all screened cell lines were performed. Each ANOVA was performed using the analytical framework described previously7 and implemented in a Python package41 (https://github.com/CancerRxGene/gdsctools). Only genes that did not belong to any set of prior known essential genes (defined in the previous sections) and not predicted by ADaM to be core fitness genes were included in the analyses. For all tested gene fitness–CDE associations, effect size estimations versus pooled s.d. (quantified using Cohen’s d), effect sizes versus individual s.d. (quantified using two different Glass’s Δ metrics, for the CDE-positive and the CDE-negative populations separately), CDE P values and all other statistical scores were obtained from the fitted models. An association was tested only if at least three cell lines were contained in the two sets resulting from the dichotomy induced by CDE status (that is, at least three CDE-positive and three CDE-negative cell lines). The P values from all ANOVAs were corrected together using the Tibshirani–Storey method42. Subsequently, MSI status was also tested for statistical associations with differential gene fitness effects for pan-cancer and cancer types with at least three MSI cell lines. We used the following statistical significance and effect size thresholds for category associations between gene fitness effects and genomic markers:

Class A marker: a P-value threshold of 10−3 with a FDR threshold equal to 25% (or 5% for MSI) and with Glass’s Δ > 1. Different FDR thresholds were used for associations with CDEs or MSI because the number of tests performed in the former was six orders of magnitude larger than the latter.

Class B marker: a FDR threshold of 30% with at least one Glass’s Δ > 1 for pan-cancer associations.

Class C marker or weaker: an ANOVA P-value threshold of 10−3 and for pan-cancer associations at least one Glass’s Δ > 1; for weaker, a simple Student’s t-test (for difference assessment of the mean depletion fold change between CDE-positive/CDE-negative cell lines) P-value threshold of 0.05 and for pan-cancer associations, at least one Glass’s Δ > 1.

The additional constraint of Glass’s Δ values (quantifying the effect size with respect to the s.d. of the two involved sub-populations of samples) was considered for the pan-cancer markers in order to account for the significantly larger number of samples analysed in the pan-cancer setting, which might result in highly significant P values even for small effect size associations. Further details on this analysis are reported in the Supplementary Information.

Target priority scores and target tractability

Computation of the target priority scores and their significance is described in the Supplementary Information. To estimate the likelihood of a target to bind a small molecule or the likelihood of a target to be accessible to an antibody, we made use of a genome-wide target tractability assessment pipeline14. The in silico pipeline integrates data from public sources, and assigns human protein-coding genes into hierarchical qualitative buckets. Predicted tractability and confidence in the data increased from bucket 10 to bucket 1; targets in bucket 1 were considered to be the most tractable. Of note, targets in lower buckets (that is, buckets 10 to 8) were considered to have an uncertain tractability, and should not be ruled out as ‘intractable’ without a deep tractability assessment. Further details are provided in the Supplementary Information.

Characterization of target protein families and enrichment analysis

To characterize protein families and compute statistical enrichment, we made use of the Panther online tool43.

GPX4 differential expression analysis

RNA-sequencing gene expression measurements transformed using voom44 were obtained from a previously published study45. For GPX4 analysis, cell lines were divided into two groups according to their loss-of-fitness response to GPX4 knockout (using BAGEL FDR < 5% as significance threshold for gene depletion) and gene expression fold changes were calculated between the GPX4 non-dependent and dependent cell lines (log2 values of the mean difference). Differential gene expression was statistically assessed using the R package Limma46. Gene set enrichment analysis was performed with ssGSEA36 and cancer hallmark gene sets were used to identify significant enrichment among the top differentially expressed genes. Then, 10,000 random permutations were performed for each signature to calculate empirical P values and a Benjamini–Hochberg FDR correction was applied.

WRN dependency in MSI cell lines

Co-competition assay

The sequences of sgRNAs that target WRN and cell lines used in validation experiments are described in Supplementary Table 10. This included two sgRNA from the original screen and two independent sgRNAs. The sgRNAs were cloned into pKLV2-U6gRNA5(BbsI)-PGKpuro2ABFP-W (Addgene, 67974). Cell lines were transduced at around 50% efficiency as described above in six-well plates. A co-competition score was determined as the ratio of the percentage BFP-positive cells (that is, sgRNA-positive cells) on day 14 compared to day 4, as measured by flow cytometry. A co-competition score less than 1 indicates a relative reduction in BFP-positive cells, resulting from targeting of a loss-of-fitness gene.

Clonogenic assay

Cell lines were transduced with lentivirus that encodes WRN sgRNA at around 100% efficiency as described above in six-well plates (2,000 cells per well), typically for 15–21 days. Cells were fixed using 100% ice-cold ethanol for 30 min followed by Giemsa staining overnight at room temperature.

Western blot analysis

Cells were transduced at around 100% as described above in 10-cm dishes. Day 5 after transduction, cells were lysed with 200 μl RIPA buffer supplemented with protease and phosphatase inhibitors and lysates were used for SDS–PAGE and immunoblot analysis. Antibodies used were: WRN (Cell Signaling Technologies, 4666; dilution 1:2,000), WRN for domain rescue experiment (Thermo Fisher Scientific, PA5-27319); MLH1 (Cell Signaling Technologies, 3515; dilution 1:1,000); MSH3 (Santa Cruz Biotechnology, sc-271080; dilution 1:1,000); anti-Flag M2 (Sigma-Aldrich, F3165); β-actin (Cell Signaling Technologies, 4970); and anti-β-tubulin (Sigma-Aldrich, T4026: dilution 1:5,000). Secondary antibodies included: IRDye 800CW donkey anti-mouse antibody (LI-COR, 926-32212); IRDye 680LT donkey anti-rabbit IgG (H+L) (LI-COR, 925-68023); anti-mouse IgG HRP-linked secondary antibody (GE Healthcare, NA931). Molecular weight markers included: SeeBlue Plus2 Pre-stained Protein Standard (Thermo Fisher Scientific, 5925) and Precision Plus Protein Standards (BioRad, 161-0373).

WRN rescue experiment

SW620 and SW48 cells (2 × 105 cells) were transfected by nucleofection (Lonza 4D Nucleofector Unit X) with Cas9–sgRNA ribonucleoproteins (RNP) targeting human MAVS (used as a non-essential knockout control) or WRN, together with overexpression of 200 ng pmGFP control or 200 ng mouse Wrn cDNA (Origene, MR226496). From each sample after nucleofection, 5,000 cells were seeded in a 96-well plate and allowed to grow for 5 days, after which cells were collected for either CellTiter-Glo assay (Promega, G9241) or western blot analysis. CellTiter-Glo data were read on an Envision Multiplate Reader and data analysis was performed using GraphPad Prism 7 software. Student’s t-test was performed using the multiple t-test module in Prism 7. The sgRNA sequences that were used are listed in Supplementary Table 10.

RNA interference

A pool of four siRNAs that target WRN were used (Dharmacon, L-010378-00-0005). HCT116 cells were grown and transfected with siRNA using the RNAiMAX (Invitrogen) transfection reagent following the manufacturer’s instructions. Each experiment included: mock control (transfection lipid only), ON-TARGETplus Non-targeting Control Pool (Dharmacon, D-001810-10-05) as a negative control, and polo‐like kinase 1 (PLK1) (Dharmacon, L-003290-00-0010), which served as a positive control. siRNA sequences are listed in Supplementary Table 10.

Rescue of WRN dependency in HCT116 isogenic lines

HCT116 parental cells and derivatives carrying Chr.2, Chr.3, Chr.5 or Chr.3 + Chr.5 were transduced to express Cas9. After transduction, all lines displayed Cas9 activity >80%. To assess WRN dependency, cells were seeded at 1.5 × 103 cells per well in 100 μl complete growth medium in 96-well plastic cell culture plates. At day 0, cells were transduced with viral particles containing sgRNAs targeting essential or non-essential genes, or WRN sgRNA 1 and WRN sgRNA 4 in order to achieve a >90% transduction efficiency. The following day, the medium was replaced and 48 h after transduction puromycin was added at final concentration of 2 μg ml−1. Plates were incubated at 37 °C in 5% CO2 for 7 days, after which the cell viability was assessed using CellTiter-Glo (Promega) by measuring luminescence on an Envision multiplate reader. Clonogenic assays were performed as described in the ‘WRN dependency in MSI cell lines’ section; and 48 h after transduction puromycin was added at a final concentration of 2 μg ml−1.

In vivo validation

WRN knockout using an inducible CRISPR–Cas9 system

To generate inducible WRN sgRNA-expressing HCT116 cells, we cloned WRN sgRNA 4 into the pRSGT16H-U6Tet-(sg)-CMV-TetRep-TagRFP-2A-Hygro vector (Cellecta). Cas9-expressing HCT116 cells were transduced and selected with 500 μg ml−1 of hygromycin (Thermo Fisher Scientific). To obtain cell populations that both uniformly express Cas9 and contain the inducible WRN-targeting sgRNA, we generated single-cell clones by serial dilution. To measure the growth rate of WRN sgRNA-expressing HCT116 cells after conditional induction of WRN knockout, cells were grown in flasks in the presence or absence of 2 μg ml−1 doxycycline for 24 h and then seeded in 96-well plates, with or without the same concentration of doxycycline. Cell growth was monitored every 6 h using an automated IncuCyte-FLR 4X phase-contrast microscope (Essen Instruments). The average object-summed intensity was calculated using the IncuCyte software (Essen Instruments).

Mouse xenograft studies

Female non-obese diabetic/severe combined immunodeficiency (NOD/SCID) mice (Charles River Laboratories) were used in all in vivo studies. All animal procedures were approved by the Ethical Committee of the Institute and by the Italian Ministry of Health (authorization 806/2016-PR). The methods were carried out in accordance with the approved guidelines. Mice were purchased from Charles River Laboratories, maintained in hyperventilated cages and manipulated under pathogen-free conditions. In particular, mice were housed in individually sterilized cages; each cage contained a maximum of seven mice and optimal amounts of sterilized food, water and bedding. HCT116 xenografts were established by subcutaneous inoculation of 2 × 106 cells into the right posterior flank of 5- to 6-week-old mice. Tumour size was evaluated by calliper measurements, and the approximate volume of the mass was calculated using the formula 4/3π × (d/2)2 × (D/2), where d is the minor tumour axis and D is the major tumour axis. When tumours reached an average size of approximately 250–300 mm3, animals with the most homogeneous size were selected and randomized by tumour size. Doxycycline (Sigma-Aldrich, D9891) was dissolved in water and administered daily at a 50 mg kg−1 concentration by oral gavage. For each experimental group, 8–10 mice were used to enable reliable estimation of within-group variability. Operators allocated mice to the different treatment groups during randomization but were blinded during measurements. The maximal tumour volume permitted in our in vivo experiments was 3,500 mm3 and this limit was never exceeded. In vivo procedures and related biobank data were managed using the Laboratory Assistant Suite, a web-based proprietary data management system for automated data tracking47.

Immunohistochemistry

Formalin-fixed, paraffin-embedded tissues explanted from cell xenografts were partially sectioned (10-μm thick) using a microtome. Then, 4-μm paraffin tissue sections were dried in a 37 °C oven overnight. Slides were deparaffinized in xylene and rehydrated through graded alcohol to water. Endogenous peroxidase was blocked in 3% hydrogen peroxide for 30 min. Microwave antigen retrieval was carried out using a microwave oven (750 W for 10 min) in 10 mmol l−1 citrate buffer, pH 6.0. Slides were incubated with monoclonal mouse anti-human KI-67 (1:100; DAKO) overnight at 4 °C inside a moist chamber. After washings in TBS, anti-mouse secondary antibody (DAKO Envision+System horseradish peroxidase-labelled polymer, DAKO) was added. Incubations were carried out for 1 h at room temperature. Immunoreactivities were revealed by incubation in DAB chromogen (DakoCytomation Liquid DAB Substrate Chromogen System, DAKO) for 10 min. Slides were counterstained in Mayer’s haematoxylin, dehydrated in graded alcohol, cleared in xylene and a coverslip was applied using DPX (Sigma-Aldrich). A negative control slide was processed with only the secondary antibody, omitting the primary antibody incubation. Immunohistochemically stained slides for KI-67 were scanned with a 40× objective. Ten representative images selected from three cases were then analysed using ImageJ (NIH), which segmented cells with positive and negative nuclei. The percentage of the area containing positive cells was calculated as the brown area (positively stained cells) divided by the sum of brown and blue areas (negatively stained cells). The software interpretation was manually verified by visual inspection of the digital images to ensure accuracy.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this paper.