Main

The incidence of many cancers varies substantially worldwide, reflecting genetic differences between populations and their environmental exposures. This is well illustrated by keratinocyte cancers, where incidence varies 140-fold globally14. Keratinocyte cancer risk increases with an individual’s cumulative ultraviolet (UV) exposure that depends on age, outdoor work, sunbathing, use of tanning beds15,16,17,18,19,20 and phenotypes such as freckles, low levels of skin pigmentation and poor tan response21. The most strongly associated keratinocyte cancer risk loci are found near to pigmentation genes such as HERC2, OCA2, MC1R and IRF4, which have much greater allele frequencies in high-risk populations such as in the UK than in low-risk populations in East Asia22.

The intensity of UV reaching the Earth’s surface is quantified using the linearly scaled UV index23. The average daily maximum UV index in Singapore is 8 (https://www.nea.gov.sg/) compared with 3 in the UK (https://uk-air.defra.gov.uk/data/uv-data). Despite this, the age-adjusted incidence of keratinocyte cancers is 17-fold lower in Singapore than in the UK (Fig. 1a)24. Furthermore, keratinocyte cancer incidence has risen rapidly in the UK but not in Singapore.

Fig. 1: Sampling of facial skin across two countries with contrasting skin cancer risk.
figure 1

a, Age-standardized incidence rates per 100,000 person years for keratinocyte cancers (KC) in the UK and Singapore. Data were collated from Cancer Research UK and the Singapore Cancer Registry1,2. b, Sampling method to allow mapping of clones spanning multiple samples of epidermis. c, Number of mutations detected per 2-mm2 sample of epidermis (n = 428) across sequencing of 74 genes per donor.

The aging epidermis of skin from donors from the UK consists of a dense patchwork of somatic clones, with mutant NOTCH1, NOTCH2, NOTCH3, TP53 and FAT1 under positive selection6,7. These genes are commonly mutated in keratinocyte cancers, particularly cutaneous squamous cell carcinoma (cSCC)8,11, suggesting that these tumors develop from such mutant clones. To date, little is known of how this somatic landscape varies across human populations.

In this study, we characterized the somatic clones in histologically normal aging eyelid epidermis of five donors from Singapore compared with published sequencing data7 from six similar donors from the UK (Fig. 1b). Donor sex and age were similar in both countries (mean donor age Singapore = 62 years, UK = 68 years; Table 1).

Table 1 Donor demographics

A total of 13,850 mutations was detected across all samples (Fig. 1c). The number of mutations per sample varied from 0.5 to 73 clones per mm2 (mean = 16.2 clones per mm2; Supplementary Table 1). We estimated the mean genome-wide burden per donor to be fourfold higher in the UK (6.3 mutations per Mb) compared to Singapore (1.6 mutations per Mb; Fig. 2a, Extended Data Fig. 1a and Supplementary Table 2).

Fig. 2: Mechanisms of mutagenesis differ by country.
figure 2

a, Estimates of genome-wide mutation burden per donor according to country (n = 11 donors, two-sided t-test: P = 8.8 × 10−3). Tukey box plot where the lower and upper hinges represent the first and third quartile, the center is the median and the outliers are more than 1.5× the interquartile range (IQR). b, Trinucleotide context for SBS in skin from UK and Singaporean donors. c, Proportion of SBS assigned to each signature per donor (purple, aging; green, UV). Signature contributions are clustered significantly according to country (Methods; unsupervised hierarchical clustering with accuracy obtained through bootstrapping: a.u. = 0.99999, P < 1 × 10−5). d, Proportion of SBS attributed to UV damage was positively correlated with burden estimates per donor (Pearson’s r = 0.87, Spearman rank = 0.95).

Across all donors, 10,311 single-base substitutions (SBS) were detected, of which 66% were C>T changes (Fig. 2b). Mutational signature analysis identified six reference signatures25 SBS1, SBS5 and SBS7a–d (Supplementary Table 3). SBS1 and SBS5 are associated with tissue aging while SBS7a–d are due to UV lesions and their repair25. Mutational signatures differed between countries (Fig. 2c). Most (64%) SBS in Singaporean donors were attributed to SBS1 and SBS5, but in the UK, most (66%) were attributed to SBS7a–d. The proportion of substitutions attributed to UV were positively correlated with genome-wide burden estimates per donor (Fig. 2d).

The number of both SBS1 and SBS5 mutations per mm2 was increased in UK skin compared with Singaporean skin (Extended Data Fig. 1b,c). In epithelial cells, the proportion of the SBS1 mutation was previously correlated with the cumulative number of cell divisions in a tissue. In this study, in both countries, we found increased SBS1 in larger clones (Extended Data Fig. 1d). These differences may reflect the effects of UV, which increases the rate of proliferation and proportion of dividing cells in the epidermis and can drive mutant clone expansion26.

A total of 561 double-base substitutions (DBS), 388 deletions and 94 insertion events were detected. Over eight times more DBS were called in UK skin compared to Singapore (Extended Data Fig. 1e; UK mean = 1.08 DBS per mm2, Singapore mean = 0.126 DBS per mm2). Most DBS (85%) were CC>TT substitutions and showed transcriptional strand bias consistent with UV damage (transcribed/untranscribed = 52.7). There was no significant difference in the number of insertions and deletions (indels) detected according to country (Extended Data Fig. 1f; UK mean = 0.72 indels per mm2, Singapore mean = 0.37 indels per mm2).

We detected copy number aberrations (CNAs) in 32 samples, of which 27 were independent events (Supplementary Table 4). Seventy-eight percent of CNAs detected were loss of heterozygosity (LOH) at the NOTCH1 locus on 9q, which is consistent with mutant NOTCH1 being the most common driver of clonal expansion in normal skin6,7. Other CNAs included two TP53 LOH events, one NOTCH4 duplication and an LOH event at each of NOTCH4, FGFR2 and RB1. CNAs were detected in 13% of UK and just 1.0% of Singaporean samples.

We found that NOTCH1, FAT1, TP53, NOTCH2, NOTCH3, ARID2 and AJUBA harbored a disproportionately high number of non-synonymous mutations relative to synonymous mutations (dN/dS ratio, q < 0.01), suggesting that protein-altering mutations in these genes drive clonal expansions (Extended Data Fig. 2a and Supplementary Table 5), which is consistent with previous studies6,7. A comparison of the dN/dS ratio per gene by country found TP53 non-synonymous mutations overrepresented and NOTCH1 and NOTCH2 non-synonymous mutations underrepresented in UK skin compared to Singaporean skin (Fig. 3a–c). We estimated the percentage of tissue mutant for NOTCH1, NOTCH2, FAT1 and TP53 (Fig. 3d and Supplementary Table 6). We found a fourfold increase in tissue mutant for FAT1 (UK mean = 11%, Singapore mean = 2.6%), which is consistent with a fourfold higher mutation burden in UK skin compared to Singapore. However, we only observed a twofold increase in tissue mutant for NOTCH1 (UK mean = 24%, Singapore mean = 12%) and no significant difference in the percentage of tissue mutant for NOTCH2 (UK mean = 6.4%, Singapore mean = 7.7%). Thus, NOTCH1 and NOTCH2 mutants are less able to colonize UK skin than expected.

Fig. 3: Clonal selection and competition differ according to country.
figure 3

a, Number of mutations of each consequence for positively selected genes (dNdScv: q < 0.01) according to country. ARID2 is not significantly positively selected in the Singapore samples. b, Plot of non-synonymous mutations per gene in Singapore versus UK samples. Gradient of line: total number of non-synonymous mutations in the UK/Singapore = 6,846/1,432. Positively selected genes (purple) are labeled. Red indicates positively selected genes with a significant (one-sided likelihood ratio test: Padj < 0.001) difference in dN/dS ratio according to country, after accounting for global differences. c, A representation of protein-altering mutations in 1 cm2 of skin from donors from Singapore and the UK. Samples were randomly selected and mutations are displayed as circles, randomly distributed in the space. Sequencing data, including copy number, were used to infer the size and number of clones and, where possible, the nesting of subclones. Otherwise, subclones are nested randomly. d, Estimated percentage of cells with at least one non-synonymous mutation per positively selected gene, according to country (n = 11 donors, samples with known CNA removed). Tukey box plot where the lower and upper hinges represent the first and third quartile, the center represents the median and the outliers are more than 1.5× the IQR. Two-sided Wilcoxon signed-rank test: Padj = 0.03 (NOTCH1), 0.66 (NOTCH2), 8.7 × 10−3 (FAT1) and 0.05 (TP53) with Holm multiple testing correction. e, Estimated percentage of cells with at least one non-synonymous mutation across 74 genes according to country (Methods; n = 11 donors, two-sided Wilcoxon signed-rank test: P = 4.3 × 10−3). Tukey box plot where the lower and upper hinges represent the first and third quartile, the center represents the median and the outliers more than 1.5× the IQR. f, Violin plot comparing clone size distributions (summed variant allele fraction) per mutation according to country. UK donor mutations had a lower mean clone size (two-sided Welch’s t-test: P = 4.27 × 10−14).

For TP53, we observed a threefold increase in the percentage of mutant tissue in the UK (UK mean = 14.5%, Singapore mean = 4.9%; Fig. 3d). However, this is only of borderline significance due to a large TP53 mutant clone that spans 16 samples of donor SG1. When we removed this single clone, we found a ninefold difference (UK mean = 14.5%, Singapore mean = 1.7%). We concluded that mutant TP53 is probably a better competitor than mutant NOTCH1 and NOTCH2 in UK skin.

One explanation for the differences in selection we observed in Singaporean skin is that mutant NOTCH1 and NOTCH2 clones are relatively strong competitors because, in a less mutated tissue, they are more likely to be competing against nonmutant cells. We identified further lines of evidence in support of this hypothesis. First, UK skin is saturated with mutant cells. We estimated that an average of 94% of cells in UK donors have a protein-altering mutation in at least one of the sequenced genes, compared to 50% of cells in Singaporean donors (Fig. 3e and Supplementary Table 7). Second, mean clone size was larger in Singaporean skin (0.037) than in UK skin (0.029; Fig. 3f and Supplementary Table 1). Third, NOTCH1 and NOTCH2 mutant clones were larger in Singapore than in the UK (Extended Data Fig. 2b). This contrasts with TP53 mutant clones, which did not significantly differ in size according to country, suggesting that they have a strong relative fitness in both environments. These observations may be reconciled with a simple model in which a high burden of mutant clones restricts clone size through competition for the space available within the tissue (Supplemetary Videos 1 and 2)27.

Protein-altering TP53 mutations differed according to country. The two most frequent codon changes in UK skin were TP53R248W and TP53R282W (3.2 and 2.5 mutations per cm2, respectively). Both have gain-of-function (GOF) properties that can lead to chromosomal instability and R248W is the most frequent codon change in keratinocyte cancers26,28,29,30. We did not observe any mutations at either of these codons in the Singaporean samples (Extended Data Fig. 2c). Even after adjusting for C>T/CC>TT burden across TP53 by country, we would still expect to observe approximately three mutations at codons R248 and R282 across all Singaporean samples (Methods; bootstrapping P < 0.001). This suggests that the underrepresentation of mutations at these codons in Singaporean skin may not be due to differences in mutational signature alone.

Other activating mutations called in the UK samples included multiple FGFR3 mutants: K652M (n = 2), G382R (n = 2), R248C (n = 1), S249C (n = 1), G372C (n = 1) and Y375C (n = 1). FGFR3 GOF mutations drive the formation of the benign keratinocyte tumor seborrheic keratosis31. Oncogenic activating mutations were also found: KRASG12D (n = 1); NRASQ61L (n = 1); and HRASE143K (n = 2) and HRASG12D (n = 1). In Singaporean samples, we observed a single occurrence of an oncogenic activating mutation: KRASG12V.

To gain an insight into differences in genetic background that may exist between donors, we genotyped individuals for single-nucleotide polymorphisms (SNPs) associated with altered risk for keratinocyte cancer (Methods, Fig. 4 and Supplementary Table 8). At 36 risk loci, we found differences in donor genotype according to country. Many were associated with genes linked to pigmentation, such as SLC45A2, IRF4, BNC2, OCA2 and HERC2 (https://genetics.opentargets.org). For example, rs12203592 alters IRF4 levels and expression of the pigmentation enzyme encoded by TYR32. rs4778210, positively selected in Singaporeans33, is adjacent to OCA2, encoding a melanosomal anion channel that is essential for melanin synthesis34. These findings are consistent with the marked difference in UV mutational burden between countries. However, we also observed differences in non-pigmentation-related SNPs. Keratinocyte cancer risk is strongly linked to immunosuppression and is also associated with inflammatory diseases21. rs2111485 is associated with inflammatory and autoimmune diseases, such as psoriasis and inflammatory bowel disease, while rs12129500 and rs2243289 affect the expression of IL6R and IL4, respectively (https://genetics.opentargets.org). We found that four of the Singaporean donors have a large introgression at chromosome 3p21.31 (ref. 33). This region is under positive natural selection in East Asia and contains the skin tumor suppressor gene RASSF1 (ref. 35) and the UV-induced genes HYAL1 and HYAL2 (ref. 36). Other keratinocyte cancer risk SNPs were linked to genes with diverse functions.

Fig. 4: Thirty-six risk loci where donor genotypes differ according to country.
figure 4

Loci are either associated with keratinocyte cancer or tan response (Tan) or previously found to be under selection in Singaporean genomes; – for genotype indicates that no call could be made (Methods). Associated genes and mechanisms are suggested using phenome-wide association studies (PheWAS) and expression quantitative trait locus (eQTL) data from the Open Targets Genetics database. ECM, extra-cellular matrix; – indicates that the mechanism is unknown. Population allele frequencies are reported for East Asia (EA freq) and Great Britain (GBR freq) using data from the 1000 Genomes Project Phase 3 (Methods). Purple shading indicates the frequency of the alternative (ALT) allele in each population. Note that all loci in the HYAL region form part of the same linkage disequilibrium block on chromosome 3p.21. For each donor, green indicates homozygous reference (REF) alleles, yellow heterozygous and red homozygous ALT alleles.

Most keratinocyte cancer sequencing studies have been performed in high-incidence countries and may not reflect tumors from low-incidence populations. We therefore analyzed the whole-exome sequencing data of cheek cSCCs taken from 19 South Korean patients12. The age-standardized incidence of cSCC in South Korea (2.38 per 100,000)37 is comparable to that of Singapore (2.2 per 100,000)24. After genotyping the 19 patients for differential risk loci (Fig. 4), we found the typical Korean genotype most similar to that of the Singaporean donors (Supplementary Table 9).

Analysis of South Korean cSCC for mutational signatures identified nine reference signatures: SBS1, SBS5, SBS7a–d, APOBEC-associated signatures SBS2 and 13, and, in one sample (MP7), the defective DNA repair signature SBS15 (Fig. 5a and Supplementary Table 3)25. In all but two tumors (MP7 and W2), most mutations were caused by UV (Fig. 5b). Indel burden was several fold higher in samples MP7 and W2 (Fig. 5c), which is consistent with an alternative mechanism of mutagenesis in these cases. Analysis of mutant gene selection found TP53, NOTCH1 and the cell cycle regulating tumor suppressor CDKN2A.p16INK4a as positively selected (Fig. 5d and Supplementary Table 5), which is consistent with previous studies8,9,10. In conclusion, cSCCs show convergent genomic features, whether from high-risk or low-risk populations.

Fig. 5: Features of tumors from a low-incidence country (South Korea).
figure 5

a, Proportion of SBS attributed to reference mutational signatures according to donor (purple, aging signatures; green, UV radiation signatures; black, APOBEC mutagenesis; red, defective DNA mismatch repair). Korean samples are labeled as in the original study12. b, Proportion of substitutions attributable to UV radiation reference signatures SBS7a–d according to tissue (Singapore normal: n = 5 donors; UK normal: n = 6 donors; Korean cSCC: n = 19 tumors; red triangle, MP7, a Korean tumor sample with defective DNA repair). Tukey box plot where the lower and upper hinges represent the first and third quartile, the center represents the median and the outliers are more than 1.5× the IQR. c, Number of insertions and deletions called in each Korean tumor sample. d, Ratio of observed and expected non-synonymous mutations for positively selected genes (q < 0.01 by dNdScv). Line drawn at y = 1.

Overall, we conclude that differences in keratinocyte cancer incidence between the UK and Singapore are reflected in the somatic mutational landscape of aging facial skin. In comparison to Singapore, UK skin shares multiple features with keratinocyte cancer, including a high mutational burden, a predominant UV mutational signature, increased CNA and increased selection of mutant TP53. This work supports previous studies suggesting that UV acts to promote carcinogenesis not only as a mutagen but also by promoting the expansion of preexisting TP53 mutant clones38, particularly of mutants that provide an advantage in UV-exposed skin, such as TP53R248W (ref. 26).

There is evidence that aging UK epidermis is nearly saturated with competing mutant clones. It is possible that every cell carries a protein-altering mutation in a cancer-associated gene so that the growth of positively selected clones, such as NOTCH1 mutants, is constrained. This shows how clonal dynamics can be altered by the environment. The comparatively stronger selection of NOTCH1 and NOTCH2 mutations in Singaporean skin compared to the UK and the underrepresentation of NOTCH1 mutation in tumors is consistent with NOTCH1 mutations providing a proliferative advantage but not increasing the risk of carcinogenesis6,7. In contrast, we do not find mutant CDKN2A to be under selection in aging epidermis, but it is common in cSCC, suggesting it drives carcinogenesis in keratinocytes.

In conclusion, comparing normal tissue across genetically distinct human populations that differ widely in cancer risk is most revealing. In the high-incidence population, the normal somatic mutational landscape shares multiple features with the cancers that emerge from it, whereas in low-risk populations this is not the case.

Methods

Sample collection

Sample collection and DNA sequencing of Singaporean skin samples was carried out as described for the UK samples5. Eyelid skin was collected from patients undergoing blepharoplasty surgery in Singapore. Informed consent was obtained in all cases under ethically approved protocols. The underlying fat and dermis were removed from the skin and the remaining tissue cut into approximately 0.25-cm2 pieces. Each piece was incubated in 20 mmol l−1 EDTA for 2 h at 37 °C. The epidermis was peeled from the dermis using fine forceps under a dissecting microscope and fixed for 30 min with 4% paraformaldehyde (PFA) (FD Neurotechnologies) before being washed three times in 1× PBS. The fixed epidermis was then cut into a contiguous array of approximately 40 samples per donor, each measuring 2 × 1 mm (Table 1). Donor ages are listed as ranges to help maintain donor anonymity. DNA was extracted from each sample using the QIAamp DNA Micro Kit (QIAGEN) by digesting overnight and according to the manufacturer’s instructions. DNA was eluted using prewarmed AE buffer, where the first eluent was passed through the column twice more.

DNA sequencing

Deep (approximately 700×), targeted sequencing was performed across 74 genes commonly mutated in cSCC and other cancers (Supplementary Table 1)6. This custom bait capture targets the exonic regions of these 74 genes, in addition to 1,734 SNPs across the genome to aid with copy number analysis. The targeted regions covered 0.67 Mb of the genome, with 0.33 Mb being exonic. Samples were multiplexed and sequenced on an HiSeq 2000 (Illumina) with version 4 chemistry to generate 75-bp paired-end reads. BAM files were mapped to the GRCh37d5 reference using the Burrows–Wheeler (BWA)-MEM39 (v.0.7.17). Duplicate reads were marked using Biobambam2 v.2.0.86.

Indel realignment and coverage

Reads around indels were realigned using GATK IndelRealigner and depth of coverage was calculated for targeted regions per sample using samtools (v.1.14) and bedtools (v.2.28.0). After removing duplicates and reads with a mapping quality of 25 or less and base quality of 30 or less, mean quality sequencing depth of coverage over all 428 epithelial samples (Singapore and UK) was calculated to be 749.0×.

Copy number analysis

The allele frequency for each gene in a sample was estimated by statistically phasing heterozygous SNPs6. All samples of a donor were used as a panel to identify heterozygous SNPs at sites with at least 1,000× total coverage. Due to the variation in read depth across targeted regions, only copy number alterations which lead to an allelic imbalance, including LOH and gains, are detectable via this method (Supplementary Table 4).

Mutation calling

Mutations were called using deepSNV (v.1.21.3)40, a package designed to reliably detect mutations present in a small proportion of cells in a sample using a deeply sequenced panel of normal, sparsely mutated samples to determine a base-specific error model for each site in the targeted region. Mutations in each sample are then called by comparing the observed mutation rate against the background model using a likelihood ratio test41. Fifty-one samples of muscle or fat from UK donors, sequenced using the same method as the epithelial samples, were used to create a reference panel with a mean coverage of 42,611× over the targeted regions. This reference panel was used to call mutations across both Singapore and UK samples. Mutations were assumed to be germline and removed if present in more than 10% of all reads across all samples of a single donor. Across all samples, 13,850 mutations were detected, down to a minimum variant allele fraction of 0.0021 (median variant allele fraction = 0.015). Mean quality sequencing depth of coverage exceeded 200× per sample in both Singapore and UK samples and the number of mutations detected did not correlate with sequencing depth in either group (Extended Data Fig. 3). Mutations were annotated using VAGrENT42 (v.3.3.3).

Spatial mapping of clones

Sampling the tissue in a grid of adjacent samples allows the mapping of large clones that spread over multiple samples. For all downstream analysis, identical mutations called in separate samples of the same donor were merged if the samples were known to have been within 10 mm of each other in the original tissue41 (Supplementary Table 1). Mutations called in separate pieces of epidermis cut from the same individual were not merged because the distance between these samples cannot be accurately known. However, reanalysis with equally sized pieces of epidermis per donor confirmed that the number of samples per piece of epidermis does not confound estimates of mutation burden or clone size.

Estimates of mutation burden and percentage mutant tissue

In the absence of CNA, the proportion of cells in a sample carrying a mutation can be estimated as double the variant allele fraction (the proportion of sequencing reads with a corresponding base change at that position). The genome regions targeted in this study cover genes commonly found to be mutated in cancers and consequently the mutation density observed is not likely to be representative of that genome-wide. Therefore, we used a method41 to estimate the mutation burden per cell per megabase exclusively from synonymous sites in the bait region, excluding 32 samples where CNA was detected (Supplementary Tables 2 and 4). There was no evidence to suggest a difference in mean mutation burden between the eyebrows and eyelids of donors (two-sided Welch’s t-test: P = 0.14). The percentage of mutant tissue upper bound estimates (Supplementary Tables 6 and 7) were calculated by summing the percentage of cells carrying at least one non-synonymous mutation of a sample, across all samples of a donor41. This assumes that, where possible, mutations occur in different cells of a sample. Lower-bound estimates assume that all mutations of a sample occur within the same cell. Patchwork plots were drawn by plotting the circular area of non-synonymous mutations from genes under selection after random selection of samples from each country, to make up 1-cm2 tissue each41. A large TP53 mutant clone (that also carries a NOTCH2 mutation within it) spanned sixteen 2-mm2 samples of skin in SG1. It is an outlier in terms of size compared to all other mutations (Extended Data Fig. 2b, Fig. 3 and Supplementary Table 1). The summed variant allele fraction (VAF) of this clone is 2.86, nearly four times larger than the next largest clone, a NOTCH1 mutant in a UK donor (summed VAF = 0.75). We report the percentages of TP53 mutant tissue according to country and the respective statistical significance both with and without this clone.

Mutational signatures and selection

The trinucleotide context of each SBS was determined and the contribution of 49 reference mutational signatures (characterized across multiple cancers as part of the PCAWG study25) to this distribution was estimated using nonnegative matrix factorization with SigProfiler. To determine if mutational signature contribution differed according to country, we ran an unsupervised clustering algorithm. To compute significance, we used the pvclust R package, which uses bootstrap resampling techniques to compute a P value for each hierarchical cluster. We found that the UK and Singapore were significantly distinct with respect to the signature contribution within each donor (P < 1 × 10−5). Low numbers of DBS and indels called precluded formal signature decomposition. Of the Korean cSCC dataset, two samples (‘W-D_3’ and ‘W-D_4’ in the original paper12) were excluded from the mutational signature analysis due to low mutation burden (fewer than 100 variants). Genes under selection were estimated using dNdScv43 (Supplementary Table 5).

Video model

We illustrated mutant clone growth in a sparsely (Supplementary Video 1) and densely (Supplementary Video 2) mutated environment to simulate clonal competition in skin from Singapore and the UK, respectively. In Supplementary Video 1, mutant clones of two arbitrary levels of fitness divide against a wild-type background of lower fitness until approximately 50% of the space is mutant. The starting mutation burden of Supplementary Video 2 is fourfold higher than for Supplementary Video 1, with cells dividing for the same number of divisions. Final mutant clone sizes are larger in a sparsely mutated environment (Supplementary Video 1) compared to a densely mutated environment (Supplementary Video 2), reflecting the different clone size distributions we observed according to country (Fig. 3f). The cell competition model used is a two-dimensional implementation of a Moran-like process44.

Donor genotyping

Reads across all samples of a donor were used to genotype donors (Supplementary Table 8). We genotyped donors using an SNP panel of 189 genomic sites associated with skin cancer risk, pigmentation and tan response to better explore the genetic differences between the UK and Singapore and gain more insight into cancer-protective mechanisms. Germline variants for each sample were called using GATK (v.4.3.0.0) best practices, with BAM files undergoing base quality recalibration before variant calling with HaplotypeCaller to produce a genomic variant call format (gVCF). Each of the gVCFs per sample was combined using GenomicsDBImport into a GenomicsDB database before joint calling using GenotypeGVCFs. We selected SNPs from the National Human Genome Research Institute-European Bioinformatics Institute GWAS and Open Targets databases and combined the loci associated with keratinocyte cancer (entry EFO_0010176–keratinocyte carcinoma), tan response (EFO_0004279–suntan) and ‘Ease of skin tanning’ (UK Biobank: 1727). We added these to 16 positively selected loci in Singaporean genomes. Associated genes and mechanisms are suggested for each SNP using PheWAS and eQTL data from the Open Targets Genetics database. Population allele frequencies are reported for East Asia and Great Britain using data from the 1000 Genomes Project Phase 3. We note that allele frequencies for Southeast Asia or Singapore are unavailable. East Asia allele frequencies were calculated from 504 individuals (approximately 300 Chinese, approximately 100 Japanese and approximately 100 Vietnamese). Great Britain allele frequencies were taken from 91 individuals across England and Scotland.

Analysis of Korean cSCC exomes

Nineteen whole-exome sequenced Korean sample tumor-normal pairs were obtained from the SRA project SRP349018 using the SRA-toolkit (v.2.10.9). Sequence quality was assessed using FastQC (v.0.11.2) and visualized using MultiQC (v.1.13), confirming that mean Phred scores were all above 30 across 100 or 150 bp reads. The PCAP-Core workflow was followed to align the paired-end reads to the GRCh37d5 human reference genome using the BWA-MEM (v.0.7.17) with optical and PCR duplicates marked using Biobambam2 (v.2.0.86). Mean depth of coverage per sample was calculated using samtools (v.1.14) and bedtools (v.2.28.0) to be 61× (31–80×). Single-nucleotide variants were called using Caveman (v.1.17.4) with indels called using Pindel (v.3.3.0). Mutations were annotated using VAGrENT42 (v.3.3.3). Genes under selection were estimated using dNdScv (v.0.0.1.0)43 (Supplementary Table 5).

TP53 codon bootstrapping

We applied nonparametric bootstrapping to estimate the significance of observing zero mutations at the R248 and R282 TP53 codons in Singaporean skin, given the decreased burden, decreased TP53 selection and decreased UV signature compared to UK skin. The nonparametric bootstrap statistical method is appropriate for these data because it does not make assumptions on distribution45. The method samples from a given distribution (here, UK skin). For a bootstrap test, we simulated expected values for the mutation distribution in Singaporean skin given: (1) UK mutation distribution per donor and per sample of all, C>T/CC>TT, TP53 and R248W/R282W TP53 mutations and (2) the proportions of all, C>T/CC>TT and TP53 mutations between UK and Singapore. If the null hypothesis regarding the similarity of UK (adjusted) and Singapore distributions is true, we would expect that the simulated ‘UK-adjusted’ values would correspond to the Singapore observations, on average. We applied the ‘rule of three’ approach, often used in clinical trials46, to estimate the robustness of zero outcomes in Singaporean skin. This approach suggests that the upper limit of a 95% confidence interval is 3/(n + 1), where n is the number of Singaporean samples. In this study, this value is 3/(191 + 1) = 0.0156. We ran 1,000 simulations to obtain the estimation for an event of zero R248/R282 TP53 codon mutation counts in Singaporean samples. Of 1,000 simulations, none produced a value lower than 0.0156. The bootstrap simulation shows that (1) an expected mutation frequency at R248/R282 TP53 codons is around three mutations across all Singaporean samples and (2) the UK and Singaporean distributions are significantly different (P < 0.001).

Statistics and reproducibility

No statistical method was used to predetermine sample size because the effect size was not known. No data were excluded from the analyses. The experiments were not randomized and the investigators were not blinded to allocation during the experiments and outcome assessment.

Ethics and consent

Written informed consent was obtained in all cases. The study received ethical approval from the Nanyang Technological University Institutional Review Board, NHG study, protocol no. 2016/00659-AMD000l. The UK component of the study received ethical approval under UK approved protocols (research ethics committee references 15/EE/0152 NRES Committee East of England-Cambridge South and 15/EE/0218 NRES Committee East of England-Cambridge East).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.