Introduction

Identification of genetic variants that cause Mendelian disorders, which segregate in large families, has been facilitated through linkage analysis coupled with Sanger sequencing.1, 2, 3 In the past few years, next-generation sequencing (NGS) has supplanted this experimental strategy for variant detection for monogenic traits,4 and many articles report successful gene identification.5, 6, 7 Failures are rarely published. Estimates of the success rate for gene identification of Mendelian traits using NGS in clinical settings is as low as 25%.8 However, for studies using large pedigrees with Mendelian segregation, the gene identification success rate can be much higher, for example, in a study of 24 families with multiple affected individuals the success rate was 60% (95% confidence interval (CI): 36%, 78%).9

A frequent strategy used for Mendelian trait gene identification is to select DNA samples from one or more affected or both affected and unaffected family members and perform NGS. If multiple family members have undergone NGS, filtering is then performed based upon variant sharing in affected family members and lack of sharing in unaffected family members. Additional filtering is performed using variant databases such as Exome Variant Server or 1000 Genomes retaining low-frequency variants (eg, <0.1%)that are predicted to be deleterious by bioinformatics tools. The selected variants are then tested for co-segregation with the phenotype in the entire family. However, this approach is based on the assumptions that clinical information is reliable, the disease is fully penetrant, no phenocopies exist and there is locus homogeneity. If these conditions do not hold, identification of the causal variant can be problematic, because segregation with disease will not be observed. In our cohort of pedigrees segregating hearing impairment (HI), initially unrecognized locus heterogeneity was found to be a recurring hindrance to gene identification. In the presence of locus heterogeneity, the causal variant may be falsely rejected as non-causal or may lie outside the mapped region, because affected individuals in different branches or even in the same sibship do not segregate the same causal variant.

Here we present 10 consanguineous families from Pakistan and one European-American family segregating fully penetrant HI. For each of the 10 consanguineous families, homozygous putatively causal variants were identified in two different HI genes, while four variants in three different genes underlie HI etiology in the European-American family. We propose strategies by which locus heterogeneity can be detected in pedigrees, and this information, in turn, can facilitate gene identification.

Materials and methods

Institutional Review Board approval

Before the onset of our study, Institutional Review Board (IRB) approval was acquired from: Baylor College of Medicine and Affiliated Hospitals, Cincinnati Children’s Hospital Research Foundation, Combined Neuroscience IRB at the National Institutes of Health, Michigan State University, National Centre of Excellence in Molecular Biology at the University of the Punjab, Quaid-i-Azam University, and University of Iowa. Written informed consent was obtained from all study participants.

Screening and identification of HI variants

Different methods were used to screen for and identify putatively causal HI variants, including PCR, restriction fragment length polymorphism genotyping, Sanger sequencing, comparative genome hybridization and NGS. Table 1 lists the families screened and the methods used for initial evaluation of specific genes and variants. These screening methods led to the discovery of some but not all the variants identified in each family. All causal variants and the method by which each variant was identified in the families are listed in Table 2.

Table 1 Genes and variants screened in 11 hearing impairment families
Table 2 Variant information for 11 families with hearing impairment and locus heterogeneity

Evaluating pedigree informativeness

Simulation studies were performed to evaluate the maximum LOD (MLOD) scores for the Pakistani pedigrees, which have an autosomal recessive mode of inheritance. Using SLINK,10 replicates were generated for each given pedigree structure conditional on affection status by generating a marker with a minor allele frequency of 0.0001 completely linked (recombination fraction (θ)=0.0) to a fully penetrant autosomal recessive disease locus with an allele frequency of 0.0001. Parametric linkage analysis was then performed using MSIM10 on the generated replicates using the same disease model that was used to generate the data, and MLOD scores were obtained for each pedigree structure.

Linkage analysis

Genotypes for 366 microsatellite markers for family 7 and ~6000 SNP markers for families 8–10 were generated. Homozygosity mapping was performed using HomozygosityMapper.11 Genome-wide parametric linkage analysis12 was performed using a disease frequency of 0.0001 and assuming a fully penetrant autosomal recessive model with no phenocopies, that is, penetrance is 1 for the homozygous minor allele genotypes and 0 for heterozygous and homozygous wild-type genotypes. Marker allele frequencies were estimated using the founders and reconstructed founders from these pedigrees and additional Pakistani pedigrees, which were genotyped using the same marker panel. Two-point linkage analysis was performed using Superlink.13 Multipoint linkage analysis and haplotype reconstruction were performed using MERLIN.14

Results

Depending on the phenotype and how many different variants and genes are involved in disease etiology, screening known genes or variants can be a useful tool prior to NGS. For nonsyndromic (NS) HI, causal variants in ~80 genes have been identified and some of these variants are frequent in individuals with NSHI in particular populations, for example, variants in GJB2.2, 3, 15, 16, 17 DNA samples from hearing-impaired family members were initially screened for putatively causal GJB2 variants and, for Pakistani families, common NSHI variants in CIB2, HGF and SLC26A4. Using this approach, we were able to resolve the locus heterogeneity in four families: variants in GJB2 and HGF are responsible for HI in families 1 and 2, while variants in GJB2 and SLC26A4 are responsible for HI in families 3 and 4 (Figure 1). For six additional pedigrees, homozygous or compound heterozygous variants in GJB2 (families 5, 6, 8 and 11), HGF (family 7) or CIB2 (family 9) were initially identified for some but not all hearing-impaired individuals (Figures 2 and 3, Table 2).

Figure 1
figure 1

Four pedigrees in which intra-familial locus heterogeneity was discovered by Sanger-sequencing genes that are commonly involved in the etiology of hearing impairment. In each family, two homozygous variants in different genes segregate with hearing impairment. The box on the lower side shows the legend for genotypes. Genotypes beneath each symbol follow the order of variants listed above each pedigree. The HGF c.482 +1986_8delTGA variant is denoted as del3. Families 1–3 include individuals without congenital hearing loss who are double heterozygotes for variants in GJB2 and SLC26A4 and for variants in GJB2 and HGF, providing evidence against digenic inheritance for these specific combinations of genes and variants.

Figure 2
figure 2

Two families segregating autosomal recessive hearing impairment where variable phenotype of affected individuals helped to identify intra-familial locus heterogeneity. The box on the lower side shows the legend for genotypes. Genotypes beneath each symbol follow the order and colors of variant names listed above each pedigree. In family 5, a MYO7A frameshift variant segregates with Usher syndrome in branch 1, while the GJB2 c.71G>A (p.(Trp24*)) variant segregates with nonsyndromic hearing impairment in branch 2. Family 6 is from a community isolate, which descended from four ancestors that immigrated to North America from Germany in the seventeenth century. Four NSHI-causal variants segregate in this family. The GJB2 c.35delG haplotype 1 (red) was inherited by 27 of >200 genotyped community members, while GJB2 c.35delG haplotype 2 (black) was identified in 21 genotyped family members who descended from two immigrants originating from nineteenth century Austro-Hungary. Individuals 82 and 96 of family 6 were screened using OtoSCOPE,30 a NGS capture array targeting NSHI and selected hearing impairment syndrome genes, but no causal variants were identified.

Figure 3
figure 3

Pedigrees in which linkage analysis and/or NGS were used to identify intra-familial or intra-sibship locus heterogeneity. In families 7–10, linkage analysis and homozygosity mapping prior to NGS helped to identify the causal variant. Families 8, 10 and 11 demonstrate intra-sibship locus heterogeneity. For family 11, the parents are known to be related (double bars), but the exact relationship is unknown.

The phenotype was similar across hearing-impaired family members, except for families 5 and 6 which demonstrate both locus and phenotypic heterogeneity. Family 5 segregates the GJB2 c.71G>A (p.(Trp24*)) variant in branch 2 (Figure 2). In branch 1, progressive vision loss due to retinitis pigmentosa in three hearing-impaired siblings provided evidence for Usher syndrome (MIM 276900) and led to the identification of the MYO7A c.397dupC variant within this sibship (Figure 2).18

The majority of hearing-impaired members of European-American family 6 (Figure 2) had no other clinical features, except for two individuals. Individual 99 presented with enlargement of the vestibular aqueducts (EVA), which gave a clue that HI may be due to SLC26A4 variants (Table 2 and Figure 2). Recessively inherited SLC26A4 variants can cause NSHI with or without EVA (MIM 600791) as well as Pendred syndrome (MIM 274600).15, 19 Additionally, individual 93 had temporal bone abnormalities, which is consistent with the distinctive phenotype due to variants in POU3F4 (Figure 2).20 Screening of SLC26A4 identified the c.1334T>G (p.(Leu445Trp)) variant in individual 99,19 while a 3.1-Mb deletion that includes POU3F4 was detected in individual 93. Thus, based on clinical information and screening of GJB2, SLC26A4 and POU3F4, the genetic etiology of HI for the majority of hearing-impaired members of European-American family 6 has been resolved, with the exception of individuals 82 and 96 (Figure 2).

Although for families 1–6 phenotypic information and/or genetic screening aided identification of HI-causal variants even in the presence of locus heterogeneity, for many pedigrees and traits, screening of selected genes or variants is not possible due to: (a) the large number of genes known to be involved in disease; (b) gene(s) responsible for the greatest disease burden may be difficult to screen (eg, has a large number of exons and/or harbors many causal variants); and (c) many genes that harbor causal variants have not been identified. For families 7–11, we elucidated the presence of locus heterogeneity by the identification of NSHI variants using linkage analysis, NGS or both.

Linkage analysis and homozygosity mapping were used to identify individuals segregating the same NSHI variant in families 7–10 (Figure 3), which were genotyped with either microsatellite (family 7) or SNP (families 8–10) markers (Table 3). For families 7–9, the observed maximum LOD scores were much lower than the MLOD scores obtained from simulation (Table 3). In family 7, initial screening revealed the HGF c.482 +1986_1988delTGA variant, however this variant only segregated in branch 1 but not in branch 2 (Figure 3). Additionally, the MLOD score for this family was 5.73, but the observed maximum LOD score was 1.48. When linkage analysis was performed separately for each branch, for branch 1 a LOD score of 3.59 was obtained at the HGF region, while for branch 2 a LOD score of 2.53 was obtained at the 9q21.12-q21.13 region, which contains TMC1. Sanger sequencing of all coding exons of TMC1 identified the c.1114G>A (p.(Val372Met)) variant within TMC121 and confirmed co-segregation of this variant with HI in branch 2.

Table 3 LOD scores and mapped loci for four NSHI families with linkage data

Using the entire pedigree structure of family 9, a LOD score of 1.65 was observed although the MLOD score was 6.27. Additionally, the mapped region 4q21.21 did not harbor any known NSHI genes. Sanger sequencing showed co-segregation of the CIB2 c.272T>C (p.(Phe91Ser)) variant with NSHI within branch 2 only. When each branch of pedigree 9 was analyzed separately, there were several linked regions per branch (Table 3). However, the 15q24.1-q26.1 region that contains CIB2 provided the highest LOD score of 2.41 for branch 2, while for branch 1, one of the linked regions was at chromosome 7q, which encompasses SLC26A4. A DNA sample from hearing-impaired individual 19 who is homozygous for the 7q haplotype underwent exome NGS, which revealed a known variant SLC26A4 c.716T>A (p.(Val239Asp))15 that co-segregates with HI within branch 1 (Figure 3).

For families 8, 10 and 11 (Figure 3), subdividing the pedigrees into branches consisting of sibships could not completely resolve the problem of locus heterogeneity due to intra-sibship heterogeneity. Two of these families (10 and 11) consist of only one branch with intra-sibship locus heterogeneity, while family 8 displays intra-sibship heterogeneity in one of its branches (branch 3). Therefore, for the analysis of these branches, the affection status of the siblings needs to be varied.

For family 8 (Figure 3), when all three branches are analyzed together the region of homozygosity at 10q26.3-qter did not encompass GJB2 and SLC26A4, which are involved in NSHI (Table 3). Sanger sequencing of GJB2 revealed that individual 31 of branch 2 is homozygous for the c.35delG variant while individual 29 of branch 1 is compound heterozygous for c.35delG and c.71G>A (p.(Trp24*)). For branch 3 (Table 3), the chromosome 4p region was homozygous in all three affected sibs and produced a LOD score of 2.4, but this region does not contain any known NSHI gene. When individual 34 was considered to have unknown affection status, a number of regions were homozygous and produced the same LOD score of 1.8; two of these regions contained NSHI genes RDX and SLC26A4 (Table 3). DNA from individual 33 of branch 3 underwent exome NGS and was found to be homozygous for the known variant SLC26A4 c.1337A>G (p.(Gln446Arg)). Sanger sequencing confirmed that his hearing-impaired brother (individual 32) is also homozygous for this variant while his other hearing-impaired brother (individual 34) is wild type. The cause of HI for individual 34 remains unknown.

Family 10 has four affected children whose parents are first cousins (Figure 3). For family 10, when all sibs are considered affected the observed maximum LOD score and the MLOD score were equal (3.01), but the mapped region on chromosome 7 does not include the causal variants (Table 3). For these four hearing-impaired siblings, we varied the affection status by alternately assigning unknown affection status for each sib and also different pairs of sibs. Although this approach led to the identification of many regions (data not shown), it revealed which among these affected siblings were likely to segregate the same causal variants. When affection status of individual 10 was made unknown, a large region on chromosome 7, which includes SLC26A4, was homozygous in individuals 9, 11 and 12, and linkage analysis produced a LOD score of 2.4, which is the same as the MLOD score for the selected siblings. Linkage analysis with individuals 9 and 11 having unknown affection status led to mapping of several regions, including chromosome 6p24.3-p12.3, which contains LHFPL5. DNA from individual 12 who is homozygous for chromosomal regions 6p and 7q underwent exome NGS. This individual is homozygous for two variants SLC26A4 c.716T>A (p.(Val239Asp)) and LHFPL5 c.246delC. Sanger sequencing revealed that her brothers (9 and 11) were homozygous for the SLC26A4 variant but not the LHFPL5 variant, whereas individual 10 was homozygous for the LHFPL5 variant but not the SLC26A4 variant (Figure 3).

Family 11 also presents intra-sibship heterogeneity (Figure 3). For this pedigree, DNA is available for two hearing-impaired family members. Hearing-impaired individual 5 is homozygous for GJB2 c.71G>A (p.(Trp24*)), whereas her affected sister is heterozygous for this variant. DNA from individual 7 underwent NGS after targeted capture for coding exons of all previously reported HI genes. NGS and confirmatory Sanger sequencing revealed that individual 7 is homozygous for SLC26A4 c.691G>A (p.(Val231Met)).15

In Pakistani families with HI for which we have performed linkage analysis and sequencing, we detected intra-familial locus heterogeneity in 15.3% (95% CI: 11.4%, 19.9%) (Table 4). The families with locus heterogeneity (N=45) have at least one variant in a previously published HI gene that segregates with HI in some but not all hearing-impaired family members. Conversely, in those families with locus homogeneity (N=249), a variant within a known HI gene completely segregates with HI.

Table 4 Number of families with or without locus heterogeneity for previously reported hearing impairment genesa

Discussion

Linkage analysis alone12, 14, 22, 23 or combined with homozygosity mapping11, 24 for consanguineous pedigrees segregating autosomal recessive traits can help elucidate if there is locus heterogeneity within a pedigree. Using information on the pedigree structure, mode of inheritance, affection status, penetrance and availability of genotype data, simulation studies can be used to obtain the expected MLOD (EMLOD) and MLOD scores for each pedigree. The MLOD score is the highest LOD score obtained for all replicates while the EMLOD score is the average for all replicates of the LOD score, which has been maximized over the recombination fraction (θ) for each replicate. For some replicates, the highest possible LOD score for the pedigree may not be obtained because: (1) marker loci are not fully informative; (2) there is reduced penetrance; or (3) there are phenocopies. If the phenotype under study is fully penetrant without phenocopies, it is reasonable to assume that fully informative marker loci, which are completely linked to the disease locus, are available for analysis and therefore the MLOD score should be obtained, otherwise the EMLOD score should be evaluated. For autosomal recessive HI, the disease is fully penetrant, and it is unlikely that there are phenocopies so it is permissible to evaluate the MLOD. If there is allelic heterogeneity or compound heterozygosity within a pedigree, linkage analysis will facilitate the mapping without a deflation in the LOD score compared with the MLOD/EMLOD score. However, when the underlying disease etiology is due to variants in multiple genes segregating in a single family, the MLOD/EMLOD score for the pedigree will usually not be achieved and low LOD scores are often observed. For autosomal recessive traits, if homozygosity mapping is performed in the presence of locus or allelic heterogeneity, no common region of homozygosity will be observed between affected individuals.

If a pedigree fails to provide a LOD score near the expected maximum values, analyses can be performed by breaking the pedigrees into branches to see whether or not each branch produces the MLOD/EMLOD score for that branch.25 Homozygosity mapping can also be performed separately for each branch. However, if there is intra-sibship locus heterogeneity, this approach can likewise fail. In this case, linkage analysis and homozygosity mapping can be performed by analyzing subgroups of the affected individuals, by comparison of the observed maximum LOD score with the MLOD/EMLOD score for the pedigree subgroup and, when appropriate, identification of those affected individuals who share the same homozygous region. Because of multiple testing, significance levels that are usually used for LOD scores are no longer valid, for example, LOD of 3.3 is no longer equivalent to a genome-wide significance level of P=0.05.26 Even if the observed maximum LOD scores that are obtained are equal or close to the MLOD/EMLOD score, due to the small pedigree size the disease locus may not map to a unique region. Additionally, disease loci may not necessarily lie within the regions with the highest LOD score (Table 2). For example, a region with a LOD score of 3.0 is only 1.6 times more likely to contain the gene of interest compared with a region with an observed LOD score of 2.8; the difference in the likelihood is so small the causal gene may lie in either region. Once NGS is completed, multiple linked regions can readily be followed up to identify putatively causal variants. Subgrouping affected individuals while performing linkage analysis can help to resolve which pedigree members are most likely to have the disorder or trait due to the same variant. However, if the family unit is very small, for example, a single affected individual with parents, it will not be informative for linkage, or the difference in likelihoods are so small that it is difficult to evaluate which subset of pedigree members are affected due to the same gene.

Linkage analysis and haplotype information can be used to guide which samples to select for NGS. For pedigrees with locus homogeneity selection of sample(s) for NGS is based upon the entire pedigree, whereas for families with locus heterogeneity samples are selected within each pedigree subgroup predicted to segregate the same causal variant. Selecting a sample for NGS from one affected individual is usually sufficient for autosomal recessive traits when the pedigree is consanguineous; however, for autosomal dominant traits it is advisable to select samples from two or more affected individuals. The selection of DNA samples for NGS can be based upon haplotypes within a pedigree or pedigree subgroup, where it is desirable to select individual(s) with the smallest shared haplotype region. Linkage information combined with NGS and filtering can reduce the number of variants for follow-up.5, 6, 27 When there is strong evidence that a causal variant lies within a mapped region, variant selection can be easily performed even with relaxed filters (eg, occurrence in variant databases, population frequency >0.1%). On the other hand, when no causal variants are identified despite strong linkage evidence, this can give a hint that the variant may have been missed due to poor coverage by NGS or alternative strategies (eg, CNV calling) might be needed.

As the cost of NGS decreases, instead of sequencing DNA samples from a subset of family members, an affordable strategy will be to sequence all available pedigree members. Performing linkage analysis using NGS genotypes from the entire family will aid in identifying potentially causal genes/variants that segregate with disease.28, 29 The linkage analysis and homozygosity mapping approach described in this article can also be used with NGS data.

In summary, we demonstrate that linkage analysis and homozygosity mapping are still useful in the age of NGS. Although families presented here have HI, the proposed strategies discussed here and outlined in Figure 4 should be applicable to any Mendelian trait. In the presence of intra-familial locus heterogeneity, the causal variants may be identified by: (1) initial screening of genes that are known to harbor disease-causal variants; (2) use of additional clinical information that may give phenotypic clues to genetic etiology; (3) when locus heterogeneity is suspected, performing simulation and linkage analysis or homozygosity mapping for branches or subsets of individuals in order to identify individuals most likely to share the same disease-causal variant; (4) using linkage and haplotype information to select DNA samples from affected individuals for NGS; and (5) considering locus heterogeneity when evaluating co-segregation of variant with phenotype within the family. We have shown that occurrence of intra-familial locus heterogeneity, including intra-sibship heterogeneity, is not rare and that taking intra-familial locus heterogeneity into account while analyzing pedigree data can increase the success rate in the identification of causal variants for Mendelian traits. Although we have concentrated on demonstrating the presence of locus heterogeneity in pedigrees segregating Mendelian disease, this problem is not unique to Mendelian traits but also occurs in families segregating complex traits particularly in those with bilineal inheritance.

Figure 4
figure 4

Workflow for identification of causal variants in families with Mendelian traits.