INTRODUCTION

SETD1B encodes a lysine-specific histone methyltransferase that methylates histone H3 at position lysine-4 (H3K4me1, H3K4me2, H3K4me3) as part of a multisubunit complex known as COMPASS [1, 2]. The SETD1B protein consists of 1,966 amino acids and has several (presumed) functional domains (Fig. 1). The N-terminus contains an RNA recognition motif (RRM), whereas the middle region is characterized by two long disordered regions that differ from other homologs [3, 4], a conserved lysine–serine–aspartic acid (LSD) motif [5] and a coiled-coil structure. At the C-terminus, SETD1B harbors a catalytic SET domain crucial for histone methyltransferase activity, bordered proximally by the N-SET domain including a conserved WDR5-interacting (WIN) motif [6], and distally by the post-SET domain. H3K4me3 is enriched at promoter and transcription start sites whereas H3K4me1 and H3K4me2 are enriched at enhancer sites, therefore being associated with active gene transcription and euchromatin [7]. Indeed epigenetic changes have been observed in both animal models and patient material [8,9,10] at promoters and intergenic regions, confirming that SETD1B epigenetically controls gene expression and chromatin state. In addition, SETD1B is constrained for both missense and loss-of-function variants [11].

Fig. 1: Schematic representation of SETD1B variants in this study cohort (major circles, top labels) and in literature (minor circles, bottom labels).
figure 1

The RRM (residues 94–182), coiled-coil (CC) (residues 1173–1204), N-SET (residues 1668–1821), SET (residues 1822–1948) and post-SET (residues 1949–1966) domains in respectively magenta, orange, cyan, green and brown, the largely disordered regions (residues 320–682 and residues 1338–1640) in light gray, and the LSD (exon 5, residues 577–583) and WIN (exon 12, residues 1745–1750, within N-SET) motifs both in blue.

Consistent with this, pathogenic variants in SETD1B have been associated with a syndromic intellectual developmental disorder including seizures and language delay (IDDSELD, OMIM 619000). To date, clinical features have been described for 11 affected individuals with (likely) pathogenic SETD1B sequence variants [8, 12,13,14,15]. Individuals with microdeletions encompassing SETD1B have also been described [8, 16,17,18,19]; however, most of these deletions encompass additional genes making phenotypic comparisons challenging. In this study, we further delineate the clinical phenotype associated with SETD1B sequence variants, by describing 36 additional individuals. Comparing these new cases to the published ones provides a comprehensive molecular and clinical characterization of the SETD1B-related syndrome. In addition, using protein modeling, in vitro assays, and genome-wide methylation signatures we investigate the effects of selected variants. Together, this expands the molecular and phenotypic landscape associated with SETD1B variants.

MATERIALS AND METHODS

Cohort inclusion

After identification of three individuals with SETD1B variants at Erasmus MC Clinical Genetics, additional cases were identified using GeneMatcher [20] and the Dutch Datasharing Initiative [21] and via our network of collaborators. Individuals were included based on SETD1B variants detected in research or routine clinical diagnostics. Affected individuals were investigated by their referring physicians.

Next-generation sequencing of affected individuals

Full details are provided in the Supplementary Methods and Supplementary Fig. S1.

Variant classification

SETD1B variants were initially classified as variants of uncertain significance (VUS), likely pathogenic, or pathogenic at the performing laboratory or local referring sites. Literature and public database search identified 30 individuals with SETD1B sequence variants (Supplementary Table S1). Reclassification of SETD1B sequence variants was performed according to American College of Medical Genetics and Genomics/Association for Molecular Pathology (ACMG/AMP) Standards and Guidelines [22] (Supplementary Table S1), using reference sequence NM_001353345. For retrieval of population allele frequencies and in silico predictions Alamut® Visual 2.15 (Feb 2020) was used.

Facial gestalt and severity scoring analysis

To generate a composite facial gestalt, Face2Gene (FDNA Inc., Boston, MA, USA) research application was used (default settings). Details of severity scoring are described in Supplementary Methods.

Structural protein modeling

Sequences were retrieved from Uniprot, SWISS-MODEL [23] to produce homology models; RaptorX [3] for predicting secondary structure and disorder; ConSurf [24] for conservation analysis; and eukaryotic linear motif (ELM [25]) for short linear protein motif assessment. Models were manually inspected, and variants evaluated, using Pymol (pymol.org).

Experimental procedures

For in vitro experiments flag-tagged wild type (kindly provided by David Skalnik, Indiana University [5]) and variant SETD1B protein and HA-tagged ASH2 were overexpressed in HEK293 cells. Protein expression, isolation, western blotting, and immunocytochemistry were performed following standard procedures [26,27,28]. Genome-wide methylation profiles were obtained as described [8]. Details on experimental procedures and statistical analysis are provided in Supplementary Methods.

RESULTS

Molecular spectrum of SETD1B sequence variants

A total of 36 individuals with either heterozygous (n = 32, n = 28 confirmed de novo, n = 1 inherited from affected parent, n = 1 inherited from unaffected parent), compound heterozygous (n = 2, biallelic inheritance from unaffected parents) or homozygous (n = 2, siblings, biallelic inheritance from unaffected parents) SETD1B sequence variants were included in this cohort. Thirty-three variants were detected, of which 2 were recurrent. This includes 8 truncating (n = 6 nonsense, n = 2 frameshift), 1 extension, 1 in-frame inversion, and 23 missense variants (Fig. 1). Fourteen variants were classified as pathogenic, ten as likely pathogenic, and nine as uncertain significance. For individuals with VUS, no alternative candidate disease-causing variant was identified. In literature, 26 additional (4 recurrent) SETD1B variants have been reported including 7 truncating, 1 splicing, 1 extension, 3 in-frame insertions or deletions, and 14 missense variants (Fig. 1). Variants are distributed along the protein (Fig. 1), with the majority of (likely) pathogenic missense variants located within the SET domain region.

Clinical spectrum

The cohort consists of 24 males and 12 females, whose age at last evaluation ranged from 1 to 44 years (median 9 years, interquartile range [IQR] 6–15 years). Table 1 gives an overview of the core clinical phenotype, and Fig. 2 displays the facial appearance of individuals for whom photographs were available. The phenotype of individuals with VUS (either biallelic or heterozygous) matched that of the overall cohort (Table 2). More details can be found in Supplementary Case Reports and Supplementary Fig. 2–4.

Table 1 Overview phenotypic features.
Fig. 2: Facial images of affected individuals.
figure 2

Photographs of 16 individuals (plus one affected mother) with indicated SETD1B variants. Dysmorphic features included, among others, a slightly elongated face, high anterior hairline, thick arched or straight eyebrows, deep-set eyes, a prominent nose, and thin upper lips. Lower right corner shows facial composite images for all individuals, or only those with a likely pathogenic or pathogenic variant (note: individual 13 and mother were not included in the composite, given the image angle and glasses hindering Face2Gene program analysis). LP likely pathogenic variant, P pathogenic variant, V variant of uncertain significance.

Table 2 Summary of main phenotypic features in this study cohort and in literature.

Development and neurological findings

Most individuals were born after an uneventful pregnancy at full term, with an unremarkable neonatal period and anthropomorphic measurements in the normal range. Seven individuals (7/31, 23%) had postnatal hypoglycemia. Virtually all individuals (34/36, 94%) showed global developmental delay in early infancy. Notably, individuals 14 and 16 without documented developmental delay are the youngest individuals (respectively 2 and 1 years old). Motor development was delayed in 32 individuals (32/36, 89%), with independent ambulation acquired between 1.0 and 4.5 years of age (median 1.6, IQR 1.3–2.5, one individual is nonambulatory). Motor performance remained an issue, with clumsiness, coordination difficulties, and poor fine motor movements reported. Hypotonia was documented for 16 individuals (16/35, 46%), often manifesting in neonatal or childhood period. Language development was delayed in the majority of individuals (33/36, 92%), with first words acquired between 0.5 and 3.0 years (median 2.0, IQR 1.1–2.1). Five individuals were nonverbal at time of data collection (15%, respectively 2.5, 3, 3.5, 11, and 19 years old), and at least five additional individuals (15%) speak far fewer words than appropriate for their age. Regression of previously acquired skills was reported in nine individuals, especially with regard to language, without an obvious link to epileptic activity. At the last investigation, intellectual disability was present in 28 individuals (28/32, 88%), ranging from mild (n = 9), to moderate (n = 8) and severe (n = 4) (not specified n = 7). Formal IQ testing was performed in 11 individuals with an average score of 60 (IQR 48–67) (mild). Autistic features were observed in 24 individuals (24/36, 67%); other behavioral issues included hyperactivity (13/34, 38%), sleep disturbance (10/32, 31%), anxiety (11/35, 31%), anger or aggressive behavior (11/35, 31%, including self-mutilation for individuals 4 and 10), and obsessive compulsive behavior (individual 7, 26, 30). Epilepsy developed in 28 individuals (28/36, 78%) with a median age of seizure onset of 3 years (IQR 1.0–5.3). Eight individuals remained seizure-free up to an age of 16 years (range 2–16, median 6.0, IQR 4.5–7.3 years). At their onset, the majority of classifiable seizures were generalized (n = 19) and minority focal (n = 5), and included motor (n = 9) or nonmotor (n = 13) involvement, with variable development into seizure types over time (Table 1). Seizure frequency varied (sporadic to very frequent) and was at least daily in the majority of patients. Fever-sensitive seizures were reported in three individuals. Whereas seizures were (partially) controlled using various antiepileptic drugs in eighteen individuals, seizures responded poorly or remained intractable in seven individuals. Brain MRI (Supplementary Fig. S2) was performed in 33 individuals and was often unremarkable (23/31, 74%). Abnormal MRI findings included nonspecific minor subcortical white matter hyperintensities (individual 1); cystic encephalomalacia with ventriculomegaly (individual 4); reduced white matter volume and thin corpus callosum (individual 10); bilateral abnormal signals at frontal, temporal, and occipital lobes (individual 16); extensive irregular gyral pattern with reduced sulcation (individual 19); slightly delayed myelination and small heterotopic gray matter (individual 21); periventricular leukomalacia (individual 30, possible due to an underlying hypoplastic left heart disease); and mild diffuse cerebral volume loss with ex vacuo enlargement of lateral and third ventricles (individual 32).

Additional findings

Ophthalmological findings included strabismus (n = 5), amblyopia (n = 2), myopia (n = 2), astigmatism (n = 3), and cortical vision impairment (n = 1). Eight individuals showed gastrointestinal symptoms, including reflux, constipation, and feeding problems. Ten individuals had dermatological symptoms (eczema, rough or dry skin, café au lait spots, hypo- or hyperpigmentation). A number of individuals displayed skeletal abnormalities (scoliosis [n = 5], kyphosis [n = 2], joint hypermobility [n = 4]). (Recurrent) respiratory and urinary tract infections were reported in six individuals. No malignancies were identified. (Truncal) overweight or obesity was present in 17 individuals (Supplementary Table S2).

Facial appearance

Facial appearance varied from no discernible (5 individuals) to mild dysmorphic features (31 individuals, 86%) (Fig. 2). Dysmorphisms included prominent rounded nasal tip/bulbous nose (n = 15), high anterior hairline (n = 11), (uplifted) large earlobes (n = 10), overfolded superior helices (n = 6), low-set ears (n = 5), thin upper lip (n = 9), pointed/prominent chin (n = 6), deep-set eyes (n = 5), synophrys (n = 4), full cheeks (n = 4), elongated/narrow face (n = 5) and/or bitemporal narrowing (n = 4), and frontal bossing (n = 4). Also, tapering fingers (n = 5), brachydactyly (n = 3), small hands (n = 5), and nail hypoplasia (n = 4) were reported (Supplementary Fig. S3).

Structural modelling of variants

The eight truncating variants (p.[His8fs], p.[Phe95*], p.[Tyr96*], p.[Glu412fs], p.[Arg1329*], p.[Arg1524*], p.[Gln1666*], p.[Ala1730*]) are likely to be targeted for nonsense-mediated decay, but if not would result in removal of the SET region eliminating catalytic activity. Variants p.(His10Gln) and p.(Glu94Asp) are located in a disordered region preceding the RRM (Figs. 1 and 3a) and could affect the specificity of the potential interactions mediated by RRM’s N-terminus [28]. The nucleotide inversion leading to p.(Asn113_Asp121delins9) and substitution p.(Met170Thr) are located in the canonical β1α1β2β3α2β4 RRM region, whereas p.(Gly195Val) is located at the C-terminal loop of α3 (Fig. 3a). Residues 113–121 are located in the α1 helix known to participate in protein–protein interactions in RRM proteins [28]. Furthermore, the RRM domain interacts with COMPASS component WDR82 [5]. Thus, substitution of this 9-residue stretch could severely compromise the RRM fold and its interactions. p.(Met170Thr) and p.(Gly195Val) could affect substrate recognition of RRM because both residues are involved in RNA binding [29]. p.(Thr281Ile) and p.(Thr318Met) are located downstream of the RRM, in a disordered serine, threonine and proline-rich region containing numerous predicted phosphorylation sites [25]. Hence, p.(Thr281Ile) and p.(Thr318Met) might affect the phosphorylation landscape of this region. Substitutions p.(Arg429Trp), p.(Pro545Arg), p.(Pro698Ser), p.(Pro793arg), p.(Arg927His), p.(Arg982Gln), p.(Ala1010Val), p.(Ala1129Val), p.(Pro1328Ser), and p.(Arg1424Gln) are all located in the middle, largely disordered region of SETD1B. The middle portions of Setd1 proteins are divergent [1], suggesting they may have a role in differential genomic targeting of COMPASS through interaction with different targeting proteins. This role might be affected by the mostly nonconservative nature of these substitutions. p.(Ala1129Val), however, is predicted to introduce a noncanonical 5’ splice donor site at nucleotide position c.3384, which would result in a truncated protein p.(Ala1129fs) with eliminated SET catalytic domains. p.(Arg1748Cys) is located in the WIN motif (Fig. 3a) and expected to significantly decrease interaction between SETD1B and WDR5, which is essential for COMPASS assembly and SETD1B participation in H3K4 methylation [6]. Substitutions p.(Arg1792Trp), p.(Arg1825Pro), and p.(Lys1827Arg) are located at the interface with the nucleosome (Fig. 3a) and therefore likely affect interaction with histones and complex stability. Variants p.(Ala1901Val), p.(Ala1901Glu), p.(Tyr1941fs), and p.(Glu1948Lys) are located in the catalytic SET domain (Fig. 3a). Ala1901 is situated in a loop that is part of the S-Adenosyl methionine (SAM) substrate-binding pocket, but is facing away toward an opposing β-strand that is part of the structural core of the SET domain. The substitution of alanine by the larger and negatively charged glutamic acid would create a large stress on the core of the SET domain and potentially disrupt the structural frame maintaining the SAM substrate-binding site and interactions with the adjacent subunits of the complex, whereas alanine to valine substitution introduces a small physicochemical difference which is likely to create some disturbance. p.(Tyr1941fs) would extend the protein, altering the SET domain and post-SET region that are involved in catalysis and cofactor binding, thus likely rendering SETD1B inactive (Fig. 3a). This C-terminal segment is highly conserved [24]. It covers a substantial portion of the binding pocket for histone H3 and the SAM substrate (including the SAM-binding Tyr1943), and three cysteine residues that together with Arg1962 coordinate a zinc atom. Glu1948 is located in a loop adjacent to the histone H3 binding site and, when superimposed to the yeast COMPASS EM structure (PDB:6ven), it is found to be close to the DNA binding surface between Set1 and Bre2 (homolog of ASH2) (Fig. 3a). The replacement of the glutamic acid by a lysine changes the charge of that side chain and could affect interactions of this region.

Fig. 3: Structural and functional evaluation of SETD1B variants.
figure 3

(a) Homology models of SETD1B domains: RRM domain (top left), based on the crystal structure of the RRM of human SETD1A (PDB ID 3S8S, identity = 66%, QMEAN = 0.25). The segment of Asn113 to Asp121 is colored in blue. This region is known to support different protein–protein interactions in other RRM proteins. Met170 and Gly195 are shown as blue sticks. Homology model of the N-SET and catalytic SET domains of SETD1B (gray cartoon), based on the EM structure of the yeast COMPASS in a complex with a ubiquitinated nucleosome (PDB ID 6ven, identity= 40.21%, QMEAN = −5.10) (center superimposed to the template PDB, and zoom-in panels). The region containing Arg1748 was observed more accurately in the X-ray structure of the WDR5:SETD1B Win motif peptide binary complex (PDB ID 4es0 [6], top right): Arg1748 (blue sticks) is inserted into the pocket of WDR5 (yellow) and interacts with the backbone oxygen atoms of Ser91, Phe133 and Cys261 (hydrogen bonds shown as yellow dashed lines). Arg1792 (blue sticks) and the substitution by Trp (dark gray sticks) interacting with surrounding residues in the adjacent alpha helix (e.g., Glu1796, gray sticks) or with the SWD1 subunit (RBBP5 in humans, shown in violet). The insets with Arg1825 and Arg1827 show the proximity of these residues (dark blue sticks) to histone H2A (light blue cartoon). The SET domain containing Ala1901Val, Ala1901Glu, Tyr1941fs and Glu1948Lys was modeled more accurately based on the crystal structure of the yeast COMPASS catalytic module (PDB ID 6chg [40], identity=62%, QMEAN = −1.78). Ala1901 and Glu1948 are presented as blue sticks in the center figure and right insets. The Ala1901Val and Ala1901Glu substitutions (dark gray sticks) could compromise the stability of the adjacent SAM (olive sticks) binding site and the interaction with the SWD1 subunit (RBBP5 in humans, violet cartoon), which in turn contacts ubiquitin (red cartoon). Tyr1941fs alters a segment of SET and Post-SET regions involved in catalysis and cofactor binding (blue cartoon in center figure and right inset): SAM (olive sticks) and histone H3 (green sticks) binding pocket, the key Tyr1943 residue (yellow sticks), three Cys and one Arg (yellow sticks) coordinating a zinc atom (shown as a sphere). The Glu1948Lys substitution (blue/dark gray sticks in center figure and right inset) could disturb potential interactions between the flexible loops and the adjacent subunit (Bre2, homologous to human ASH2, is shown in teal cartoon). (b) Overexpression of wild-type and variant SETD1B protein in HEK293 cells 48 hours post-transfection assessed by western blot. CC cell control, lysate of mock transfected HEK293 cells (one-way analysis of variance [ANOVA] p = 0.09). (c) Nuclear localization of SETD1B variants in HEK293 cells. Upper panel—SETD1B detected by anti-Flag antibody; lower panel—overlay of nuclear staining (DAPI, cyan) and SETD1B (red); scale bar 20 μm. Images representative of 2 independent experiments are shown. (d) Colocalization of SETD1B and ASH2 in HEK293 cells. Left to right: nuclear staining (DAPI), ASH2 (anti-HA tag), SETD1B (anti-Flag tag), merge of ASH2 (green) and SETD1B (red); scale bar 20μm. Pearson’s r value (range: −1, negative correlation, 1, max correlation) calculated with coloc2 plugin (ImageJ), Z-stacks of min. 12 nuclei were used for the analysis. t-test **p = 0.005. (e) Thermal shift analysis of the SET domain. Left: Tm of GST-SETD1B proteins and GST control. Right: change in Tm of the proteins in presence of SAM substrate. Two independent protein preparations were used for the assay performed in triplicates. One-way ANOVA multiple comparison test *p < 0.05, ***p < 0.0001. (f,g) Analysis of methylation profiles. (f) Hierarchical clustering (rows represent methylation probes, columns–samples). (g) MDS plot (control samples in blue, proband samples in red, SETD1B cases from the database in pink). Sample numbers correspond to case numbers: individual 3 p.([His10Gln];[Arg927His]) (3.1 and 3.2 are the parents of individual 3), individual 4 p.([Glu94Asp];[Pro1328Ser]), individual 5 p.(Phe95*), individual 7 p.(Asn113_Asp121delins9), individual 18 p.(Arg982Gln), individual 19 p.(Ala1010Val), individual 20 p.(Ala1129Val), individual 31 p.(Ala1901Glu), and individual 33 p.(Glu1948Lys).

Functional evaluation of selected SETD1B variants

Based on the structural modeling, seven variants in different regions of SETD1B were selected for in vitro studies: p.(His10Gln) and p.(Glu94Asp) N-terminal of RRM; p.(Asn113_Asp121delins9) in RRM; p.(Thr318Met) C-terminal of RRM; and p.(Ala1901Val), p.(Ala1901Glu), and p.(Glu1948Lys) in the catalytic SET domain.

First, stability of SETD1B in cells was evaluated by western blotting of wild-type and variant SETD1B overexpressed in HEK293 cells (Fig. 3b, Supplementary Fig. S4A). No significant differences in protein levels were observed, suggesting that the evaluated variants do not affect protein stability. Genomic targeting of SETD1B might depend on the central region and the catalytic domain, whereas RRM could reinforce chromatin binding [1, 30], resulting in distribution in the nucleus and not in the nucleoli. Therefore, SETD1B nuclear distribution of wild type and variant was assessed by immunofluorescence of transiently transfected HEK293 cells. Overexpressed FLAG-SETD1B was detected in the cytoplasm and nucleus. Nuclear localization patterns of SETD1B remained similar between wild type and variants, except for p.(Asn113_Asp121delins9), which failed to localize to the nucleus (Fig. 3c). Exclusion from the nucleus correlates with an inability to bind chromatin, resulting in loss of function of this variant. As suggested by structural modeling, Glu1948 could be involved in interaction with COMPASS subunit ASH2. Co-transfection and immunostaining were performed to evaluate colocalization (Fig. 3d). Both overexpressed SETD1B and ASH2 were detected in the nucleus and cytoplasm of transfected HEK293 cells, with a higher colocalization correlation for wild type compared to p.(Glu1948Lys) (Pearson’s correlation value of 0.5 and 0.3 respectively). To evaluate the effect of p.(Ala1901Val), p.(Ala1901Glu), and p.(Glu1948Lys), protein stability and ligand binding were evaluated using thermal shift analysis of the catalytic domain (Fig. 3e). After GST-tagged SETD1B SET domain expression of wild type and variants, melting temperature (Tm) was compared (Fig. 3e, left panel; Supplementary Fig. S4C, D). The Tm of p.(Glu1948Lys) was 1.2 °C higher compared to wild type, which indicates that this substitution increases stability of the SET domain, which can result in disturbance of interactions within COMPASS, perhaps at the interface between SETD1B, the nucleosome, and the ASH2 subunit, as suggested by colocalization analysis of this variant with ASH2 subunit (Fig. 3d). Substitutions p.(Ala1901Val), p.(Ala1901Glu) resulted in a 0.3 °C negative and positive shift of Tm respectively, suggesting that these substitutions have minor effects on thermal stability and thus on conformation of the SET domain. However, since these substitutions are predicted to influence interactions between SETD1B and the SAM substrate, the effect on Tm in presence of SAM was evaluated (Fig. 3e, right panel). Generally, substrate-binding stabilizes proteins resulting in an increased Tm, and indeed a mean Tm increase of 0.3 °C was observed for wild type. The Tm changes of the control GST-protein remained < 0.1 °C, suggesting no contribution of GST tag to the SAM interactions. The increase of 0.17 °C Tm for both p.(Ala1901Val) and p.(Ala1901Glu) indicates no significant effect on SAM interaction.

A specific DNA methylation profile (episignature) for individuals with heterozygous loss-of-function pathogenic SETD1B variants has been described [8]. We performed episignature analysis for nine individuals (individuals 3, 4, 5, 7, 18, 19, 20, 31, 33), and the parents of individual 3 (Fig. 3f–g, Supplementary Fig. S4F). Individuals 5 (p.[Phe95*]), 7 (p.[Asn113_Asp121delins9]), 20 (p.[Ala1129Val]), 31 (p.[Ala1901Glu], and 33 (p.[Glu1948Lys]) showed the previously established SETD1B episignature; individual 18 (p.[Arg982Gln]) showed an inconclusive result, whereas individuals 3 (p.[His10Gln];[Arg927His], nor his parents 3.1 and 3.2), 4 (p.[Glu94Asp]);([Pro1328Ser]), and 19 (p.[Ala1010Val]) did not show the episignature associated with heterozygous loss-of-function SETD1B variants.

Taken together, through structural modeling and functional analyses we provide evidence for reduced function and therefore pathogenicity of p.(Phe95*), p.(Asn113_Asp121delins9), p.(Ala1129Val), p.(Ala1901Glu), and p.(Glu1948Lys), whereas functional consequences and clinical significance remains uncertain for p.(Thr318Met), p.(Arg982Gln), p.(Ala1010Val), p.(Ala1901Val), p.([His10Gln]);([Arg927His]), and p.([Glu94Asp]);([Pro1328Ser]).

DISCUSSION

We report on the molecular and phenotypic spectrum of 36 individuals with sequence variants in SETD1B, representing the largest cohort reported to date. Previous work suggested a possible gain-of-function effect of pathogenic variants in SETD1B [14]; however, further reports [8, 12, 13, 15,16,17,18,19], including this work, point toward a loss-of-function mechanism. Clinical features of our cohort compared to previously reported individuals with a (likely) pathogenic SETD1B variant [8, 12,13,14,15] are provided in Table 2.

The emerging phenotype of SETD1B-associated disorder consists of global developmental delay, language delay including regression, intellectual disability, autism, and epilepsy. Other often observed neurobehavioral issues include hyperactivity, anxiety, anger, or aggressive behavior, and sleep disturbance. Importantly, in most cases, developmental delay predates seizure onset, and eight individuals (up to 16 years old) are seizure-free. This indicates that SETD1B dysfunction severely impacts physiological neurodevelopment even in the absence of epilepsy, suggesting the condition is a developmental encephalopathy, with or without epilepsy. Previously alterations of SETD1B were mainly associated with myoclonic absences [13] and predominantly refractory epilepsy. Although myoclonic absence seizures were often observed in our cohort—confirming this association—other seizure types were regularly encountered at onset, including focal or generalized tonic–clonic seizures. Epilepsy was well or partially controlled in most cases, with 7/26 (27%) remaining refractory to treatment. Brain imaging was unremarkable in most cases and observed abnormalities were without a consistent phenotype. Our cohort identifies a number of mild but consistent dysmorphisms in 30 individuals, including a prominent rounded nasal tip and bulbous nose, high anterior hairline, a thin upper lip, mild ear dysmorphisms, deep-set eyes, and mild hand abnormalities including tapering fingers, brachydactyly, small hands, and nail hypoplasia. Finally, previous work reported potential susceptibility to malignancy in SETD1B-related disorder [12]. Malignancies were not identified in our cohort, although this remains important for follow-up given the relatively young age of the cohort.

To identify possible genotype–phenotype correlation, a severity score was calculated for each individual in our cohort based on clinical features (Supplementary Methods). No association could be identified between the clinical severity score and the effect or location of the corresponding SETD1B variant (Supplementary Fig. S5). Intriguingly, there is a significant overrepresentation of males in both our cohort and in literature, with a total of 36 males and 16 females with SETD1B sequence variants reported (binominal test two-tailed p = 0.008) (Supplementary Methods). The reason for this remains unclear. Incidence of hypotonia and seizures did not differ between males and females in our cohort (hypotonia respectively 12/24, 50% and 4/11, 36%; seizures respectively 19/24, 79% and 9/12, 75%), and seizure onset was similar (respectively range and median years 0–12, 3 and 0–11, 2). Behavioral issues were seen more often in males than females (autistic behavior respectively 19/24, 79% and 5/12, 42%; hyperactivity respectively 10/23, 43% and 3/11, 27%; anxiety respectively 9/23, 39% and 2/12, 17%; aggression respectively 9/23, 39% and 2/12, 17%; sleep disturbance respectively 8/20, 40% and 2/12, 17%), although differences were not significant between both sexes. The clinical severity score is significantly lower in females compared to males, especially when considering behavioral features as a group (Supplementary Fig. S5). It is thus possible that females present with a milder phenotype that may not prompt medical evaluation. However, ascertainment bias for the neurodevelopmental phenotype could also contribute to the male predominance. Nevertheless, it is tempting to speculate that sex-linked traits could affect susceptibility to clinical penetrance and spectrum of SETD1B variants, as female-protective effects have been proposed for other neurodevelopmental disorders [31, 32].

We report four males from three families with biallelic variants in SETD1B, in which variants were inherited from unaffected parents. The two consanguineous siblings (individuals 11 and 12) share, besides the homozygous VUS in SETD1B, also homozygous VUS in NBAS (associated with immune defects) and NOS1 (associated with achalasia). If disease causing, these variants could explain parts of the phenotypes of these individuals, but not their neurological findings. For both individuals, as well as for the other two individuals with biallelic SETD1B VUS, no alternative candidate variants were identified. Pathogenicity of the biallelic variants could not be experimentally proven by in vitro assays for p.(His10Gln) p.(Glu94Asp) and p.(Thr318Met), nor did p.([His10Gln]);([Arg927His]) and p.([Glu94Asp]);([Pro1328Ser]) show the episignature previously associated with heterozygous SETD1B loss-of-function variants. However, this does not exclude the involvement of these variants in yet unknown SETD1B functions. Given that the phenotype of these individuals is similar to the heterozygous individuals (Table 2), and complete absence of SETD1B is lethal in several species [10, 33, 34], we speculate that the combined action of both alleles in biallelic cases results in a phenotype similar to that observed in heterozygous cases by reducing the remaining SETD1B activity below a required threshold. A small subset of genes that typically harbor de novo variants has already been associated with recessive inheritance [35]. Further investigations remain necessary to establish causality of these variants, and the possibility of recessive inheritance of the SETD1B-related disorder.

SETD1B adds to a growing list of chromatin modifying genes implicated in neurodevelopmental disorders. SETD1B is one of the six H3K4 methyltransferases present in mammals, and remarkably loss of function of each is associated with human disease (KMT2A: Wiedemann–Steiner syndrome [OMIM 605130]; KMT2B: early-onset dystonia [OMIM 617284]; KMT2C: Kleefstra syndrome type 2 [OMIM 617768]; KMT2D: Kabuki syndrome [OMIM 147920]), with the latest additions to this list being SETD1A and SETD1B (also known as KMT2F and KMT2G, respectively). SETD1B is paralogous to SETD1A (derived from the orthologue Set1) and both associate with the same noncatalytic COMPASS components. SETD1A and SETD1B, however, show nonoverlapping localization within the nucleus and thus likely make nonredundant contributions to epigenetic control of chromatin structure and gene regulation [1]. This might explain why both SETD1A and SETD1B knockout mice are embryonically lethal, albeit at different developmental stages [33]. Also, in adult mice, SETD1B knockout is lethal and provokes severe defects in hematopoiesis [34]. Heterozygous pathogenic variants in SETD1A have been described in individuals with developmental delay, intellectual disability, subtle facial dysmorphisms, and behavioral and psychiatric problems [36] (OMIM 619056). Interestingly, despite the anticipated nonredundant contributions of SETD1A and SETD1B to epigenetic control, the clinical phenotype of both related disorders shares many similarities [36]. These include global developmental delay with motor and language delay, intellectual disability, and behavioral abnormalities. SETD1A variants have also been found in schizophrenia cohorts [36] and mouse models support SETD1A involvement in schizophrenia [37]. One likely pathogenic SETD1B variant without clinical information was identified in a schizophrenia cohort [38], but psychosis was not reported in our SETD1B cohort. Given the relatively young age of the cohort, this will be an important point for follow-up. Noticeable differences between both syndromes are the incidence of epilepsy, which is more common for SETD1B (20% in SETD1A [36], 78% in this cohort), and the absence of a male predominance for SETD1A (9 males of 19 cases [36, 39]).

Germline mutants of Set1, the orthologue of SETD1A and SETD1B in Drosophila melanogaster, are embryonically lethal [10], whereas postmitotic neuronal knockdown shows that Set1 is required for memory in flies, suggesting a role in postdevelopment neuronal function [36]. In Caenorhabditis elegans, the SETD1A/SETD1B orthologue Set-2 is important for transcription of neuronal genes, axon guidance, and neuronal functions [9], further underscoring the importance of both SETD1A and SETD1B for neural function. Interestingly, whereas we found multiple missense variants in the functional domain of SETD1B (RRM, N-SET, SET), in SETD1A only one missense variant is reported within a functional domain (post-SET). Finally, of the 23 missense variants found in SETD1B, 17 are in regions that are homologous in SETD1A. Of note, p.(Arg982Gln) in the disordered region is at a homologous position in SETD1A previously described in a patient with early-onset epilepsy (NM_014712.2(SETD1A):c.2737C>T, p.(Arg913cys]) [39]. It will be interesting to decipher the downstream epigenetic alterations causative for the resulting overlaps and differences in phenotype between both syndromes.