Main

Since 1956, when Tjio and Levan made the landmark observation that human cells contain 46 chromosomes, the practice of cytogenetics has witnessed many innovations, including chromosome banding techniques, fluorescence in situ hybridization (FISH), spectral karyotyping, and comparative genomic hybridization (CGH).1 Of these advances, G-banded karyotyping is the most useful method for whole-genome analysis and has hence been the gold standard for analyzing chromosomes for the past several decades. As these advances helped illuminate the causes of a range of genetic disorders and cancers, experts in the field formulated criteria for their use in clinical diagnostics. The most recent and perhaps the most consequential advancement yet, microarray-based CGH (array CGH), finally offers a method for whole-genome analysis at resolutions much greater than that possible with conventional karyotyping.2 This technology holds considerable promise for the field of medical genetics, but several stipulations must be appreciated as it begins to take its place in cytogenetics laboratories.

BACTERIAL ARTIFICIAL CHROMOSOME (BAC) ARRAYS

Completion of the Human Genome Project and improvements in microarray technology led to the development of the first “genomic arrays,” which differed from previous arrays that consisted mainly of expressed sequence probes. The first genomic arrays, developed by various academic centers and commercial entities (Table 1), incorporated large clones derived from the initial physical mapping stages of the Human Genome Project.35 Most of these were BAC clones that were larger and more stable than alternative types of clones. The first BAC arrays were “targeted” formats, containing proprietary collections of clones from specific genomic regions, designed to detect deletions or duplications associated with known genetic disorders.3,4 These targeted arrays were followed by “whole-genome” arrays, which contained clones selected at regular intervals across the entire genome. For example, one commercial whole-genome BAC array contains about 2600 clones spread across the genome with an average distance of 1 Mb between clones.6,7 Some academic centers have also developed whole-genome arrays with similar resolution, as well as tiling arrays that consist of overlapping clones spanning the entire genome.810 Irrespective of the design of BAC arrays, their underlying concept is the same. These arrays essentially represent FISH experiments en masse and can be used to screen for many genetic disorders simultaneously.

Table 1 Examples of current array CGH products and services

OLIGONUCLEOTIDE ARRAYS

Soon after the first BAC arrays were introduced, various companies developed arrays containing single-stranded oligonucleotide probes (Table 1). Unlike BAC arrays, oligonucleotide arrays do not have the same underlying design concept. Variations include manufacturing method, sample throughput capacity, probe length (25–80 bases), and presence or absence of single nucleotide polymorphisms (SNPs) within the probes.11 Although a discussion of these differences is important and has relevance to diagnostic testing, it is beyond the scope of this review. However, it is worthwhile to emphasize that oligonucleotide probes are constructed in situ using different patented technologies that allow the arrays to be quickly customized. All available oligonucleotide platforms are whole-genome arrays, but they can be customized either to function as targeted arrays (similar to available targeted BAC arrays) or to substantially increase resolution in a specific genomic region of interest. This versatility will eventually prove useful as array-CGH technology becomes widespread.

RESOLUTION OF TARGETED VERSUS WHOLE-GENOME ARRAYS

The resolution of any genomic array is defined by the length of the individual probes and the distance between adjacent probes on that array. Classifying the resolution on an array as macroresolution or microresolution may also be useful. We define macroresolution as the average space between probes across the entire genome, and it applies only to whole-genome arrays. Microresolution would be the density of probes within a specific region of interest and is relevant to both targeted and whole-genome arrays. For example, a targeted BAC array may have a 100-kb microresolution in the 1p36 region, whereas a whole-genome oligonucleotide array may have a 150-kb macroresolution across the entire genome, but a 10-kb microresolution within the critical region for Williams syndrome. Higher microresolution gives users greater confidence in observed data if multiple probes within a region indicate the same result. Ultimately, the macro- and microresolution on an array have direct bearing on the use of array CGH for surveying regions associated with known genomic disorders, for identifying novel chromosomal anomalies, and for refining the boundaries of observed chromosomal imbalances.

CLINICAL CONTEXTS FOR TARGETED ARRAYS

Targeted BAC arrays were originally designed to test for microdeletion and microduplication syndromes (genomic disorders) and for gains or losses of subtelomeric DNA with a single assay (Table 2). The medical genetics community has, however, debated the utility of such an assay. Some downplay the utility of targeted BAC arrays, stating that most conditions that they test for can be clinically diagnosed under most circumstances and that a simple FISH test can usually verify those diagnoses. Others argue that a clinical diagnosis cannot be made with certainty in every situation because phenotypic variability in many disorders is not fully appreciated, even within well-characterized syndromes. For instance, a targeted array may help clarify the underlying cause if a patient is too young to have manifested the full phenotype for a disorder, if the disorder shows significant variable expressivity, or if a single patient shows characteristics of multiple disorders. For these reasons and because targeted BAC arrays can also detect subtelomeric abnormalities, they have been in considerable demand since their inception.

Table 2 Clinical contexts for targeted versus high-resolution whole-genome array CGH

The American College of Medical Genetics recommends high-resolution karyotyping and subtelomere-targeted FISH as part of the diagnostic workup for children with developmental delay.13 These patients often demonstrate mental retardation as they grow, and many suffer from congenital anomalies. Conventional subtelomere-targeted FISH detects cytogenetic imbalances in about 2.5% of these patients.14 Targeted BAC arrays have already uncovered subtelomeric anomalies in many patients whose results from subtelomere-targeted FISH were normal.4,15 This is because pathogenic rearrangements can occur along greater lengths of subtelomeric regions than those covered by the current set of FISH probes used to analyze subtelomeric rearrangements. For example, 1p36 deletion syndrome (OMIM #607872) is caused by large deletions that vary in size and location within 1p36. Targeted BAC arrays with high microresolution within 1p36 have defined the boundaries for these deletions and provided insight into genotype-phenotype correlations.16,17

Targeted BAC arrays are also useful for classifying supernumerary marker chromosomes (Table 2).1820 Conventional methods for analyzing marker chromosomes are time-consuming and labor-intensive and require complex cocktails of FISH probes. With these methods, identifying the exact origin of a marker chromosome is difficult, and determining the amount of euchromatin present is harder still. This has direct relevance to predicting phenotypic outcome in those individuals carrying marker chromosomes. Provided that the marker chromosome is present in a reasonable percentage of cells (≥25% mosaicism to 100% of cells), targeted BAC arrays with adequate microresolution in the pericentromeric regions of chromosomes can more easily identify the origin of a marker chromosome and also define its euchromatic segments. However, it is important to recognize that large BAC clones in the pericentromeric regions can contain repeats, and, unless each clone on a targeted array is carefully selected to ensure that it maps to the appropriate chromosome and produces signals with good specificity, results may be equivocal and require follow-up.18 Targeted BAC arrays are also limited in their ability to define euchromatic segments to intervals smaller than the BACs themselves, which are 80 to 200 kb.

Targeted BAC arrays also have some more general advantages. First, they can be used to quickly examine regions of known clinical relevance in a multiplex fashion. Second, when a deletion or duplication is identified, confirmation is straightforward because the clones that initially detected the abnormality can themselves be used as FISH probes. Third, well-designed targeted arrays generally detect only abnormalities that have been previously described (with the exception of some subtelomeric rearrangements), hence making data interpretation straightforward. On some targeted arrays, sparse coverage outside regions of known medical relevance can identify novel rearrangements, but this is generally infrequent.

A major limitation of targeted arrays, of course, is that when clinical signs and symptoms do not fit any of the disease-associated chromosomal anomalies for which the targeted array is intended to screen, then the arrays cannot be used to survey other regions of the genome in detail. For instance, children with global developmental delay are typically evaluated by high-resolution karyotyping and subtelomere-targeted FISH analysis.13 In this regard, targeted BAC arrays are useful for detecting some anomalies in subtelomeric regions, but they lack even coverage in the rest of the genome and therefore cannot address the high-resolution karyotyping component. This is a context in which whole-genome arrays prove indispensable.

CLINICAL CONTEXTS FOR WHOLE-GENOME ARRAYS

At the simplest level, whole-genome arrays offer everything that targeted arrays offer and more. However, the benefits of that “more” in a diagnostics setting are under debate, as the advantages of whole-genome arrays are accompanied by specific stipulations. Table 2 summarizes the appropriate clinical contexts for using the two types of arrays. The major advantage to using whole-genome arrays is clear. Geneticists must often evaluate conditions that do not fit the criteria for known genetic disorders but that are strongly suspected to have a chromosomal component. In these cases, whole-genome inspection allows an in-depth survey that could lead to the discovery of novel chromosomal imbalances. Numerous novel pathogenic deletions and duplications have already been identified using a whole-genome approach.5,8,10,2124 An excellent example is the recently discovered 17q21 microdeletion syndrome (OMIM #610443). The deletion is <1 Mb and was identified by four separate groups using whole-genome array CGH.21,25,26 This disorder is thought to account for as many as 1% of all cases of mental retardation.25,27 Another example of a cytogenetic anomaly discovered with whole-genome array CGH is the deletion responsible for Nablus masklike facial syndrome (OMIM #608156), a multiple malformation syndrome.28 Several more novel disorders will be discovered in the near future as whole-genome array CGH is more widely applied.

High-resolution whole-genome arrays are of significant interest to the cytogenetics community because of their genome-wide assaying capacity as well as versatility. Commercially available whole-genome BAC arrays have moderate resolution (1 Mb), and no high-resolution BAC arrays (such as tiling arrays) are available on the market. Thus, high-resolution whole-genome arrays will be defined as high-density oligonucleotide arrays for the purposes of this review.

One major advantage of high-resolution whole-genome arrays is that they can potentially define deletion or duplication boundaries with greater precision than BAC arrays. Knowing the extent of some cytogenetic alterations may provide insight into genotype-phenotype correlations. One illustrative example is the multiple malformation syndrome, Greig cephalopolysyndactyly syndrome (GCPS; OMIM #175700), which is caused by haploinsufficiency for GLI3 and adjacent genes in 7p14.1. The prognosis in GCPS patients depends on the size of the deletion, which is difficult to determine by FISH or DNA sequencing. Focused array CGH with very high microresolution (730 bp) in the GLI3 region has recently been applied to a cohort of GCPS patients to accurately define different deletion boundaries.29

Most contiguous gene deletion or duplication syndromes occur due to segmental duplications that predispose a specific region of the genome to rearrangement.30 Hence, patients with one of these disorders usually carry rearrangements with the same breakpoints. Only a few disorders, such as GCPS, have been characterized by deletions or duplications of varying sizes, but more likely exist, particularly in complex repeat-rich genomic regions that are vulnerable to rearrangements. Indeed, part of the reason that more such disorders have not been identified may be because sensitive technologies like high-resolution whole-genome array CGH were not available to precisely define the deletion or duplication intervals. Moreover, even some well-known genomic disorders may have subtle differences in deletion sizes that will come to light as array CGH is more widely used. Hence, disorders mapped to gene-dense regions or segmental duplication-rich regions27 and those linked to large genes would be amenable to focused high-resolution array CGH analysis. Some of these disorders include Duchenne and Becker muscular dystrophies (OMIM #310200, 300376), α-thalassemia (OMIM #141800), 1p36 deletion syndrome (OMIM #607872), Prader-Willi and Angelman syndromes (OMIM #176270, 105830), DiGeorge/velocardiofacial syndrome (OMIM #188400, 192430).

Another advantage of high-resolution whole-genome array CGH is that it may be useful for evaluating the breakpoints in apparently “balanced” translocations. A proportion of translocations that appear to be balanced by conventional karyotyping are actually unbalanced with small losses or gains of DNA at the breakpoints. While it may prove useful for evaluating a patient's phenotype, investigating such breakpoints with conventional methods can be difficult and labor-intensive. High-resolution whole-genome array CGH can potentially identify such alterations, and customized oligonucleotide arrays with high microresolution in the regions of interest can quickly define deletions or duplications at the translocation breakpoints. Numerous studies have already emphasized the value of this approach.3134

The versatility of high-density oligonucleotide arrays is a function of in situ probe synthesis chemistry, the digital workflow that controls that synthesis, and creative feature and array formats. Users can generate customized arrays quickly for two possible clinical uses. First, users may prepare focused high-resolution arrays to evaluate findings on a high-resolution whole-genome array. This may serve as a confirmation method [similar to multiplex ligation-dependent amplification or quantitative polymerase chain reaction (qPCR)] and provide more detail about the cytogenetic imbalance in question. Second, it is only a matter of time before users begin adding probes in various submicroscopic regions, such as individual exons, promoter regions, and imprinting centers, to detect small deletions or duplications that may be pathogenic. This will be particularly true for large genes. One study has already confirmed the feasibility of this approach by detecting deletions in the DMD gene, which is mutated in Duchenne and Becker muscular dystrophies (OMIM #310200, 300376), and in other disease-associated genes.35 Genomic regions that contain imprinting centers, such as that associated with Prader-Willi and Angelman syndromes in 15q11-q13 (OMIM #176270, 105830), are also good candidates for high-resolution whole-genome array CGH since copy imbalance in these regions can cause imprinting disorders.36

Some oligonucleotide array platforms are based on SNP probes. Although initially produced for genotyping to facilitate disease-association studies, these arrays have been successfully used to detect genomic copy number.37,38 SNP arrays cannot only detect gains or losses of DNA but can also leverage their genotyping capability to detect uniparental disomy and loss of heterozygosity. Moreover, these arrays could eventually include probes to screen for disease-related point mutations, such as those that cause cystic fibrosis (OMIM #602421), β-thalassemia (OMIM #141900) and a variety of other single-gene disorders.

There are clear advantages to high-resolution whole-genome arrays, but, paradoxically, the complexity of data generated from these arrays can itself be a major constraint in their use. As the macroresolution of an array increases, more sequence variants can be detected, and the proportion of small variants among the total number of variants detected also increases. This is the single most challenging aspect of using oligonucleotide arrays designed with high-density probes at regular intervals. Therefore, it is important to keep background noise to a minimum so that the data from an array are highly reproducible and accurate and can identify all potential alterations. This can be addressed by using dye-reversal hybridization, a process in which two arrays (non-SNP arrays only) are separately hybridized with differentially labeled patient and control DNA samples. The two samples hybridized to the first array are identical to the two hybridized to the second array, but the labeling patterns are reversed (e.g., patient-Cy5/ control-Cy3 on the first array and patient-Cy3/control-Cy5 on the second array). Data from the two arrays are then superimposed to ensure that results are consistent. This approach is especially necessary as the size of the smallest observed sequence variant in a given region approaches the microresolution of that region. In this regard, high-density probe coverage can be very useful because multiple adjacent probes would likely detect a single-sequence variant, thereby improving confidence in that observation.

Two difficulties arise due to the high rate of detecting sequence variants on high-resolution whole-genome arrays. First, the smaller a sequence variant, the more difficult it is to find reagents (e.g., BAC clones) to verify that variant by FISH. If the variant is smaller than the length of an average BAC, then molecular methods such as qPCR or multiplex ligation-dependent amplification must be used. However, these methods cannot distinguish between different types of chromosome rearrangements, each of which can have very different recurrence risks.

Another serious issue related to the detection of sequence variants on high-resolution arrays is recognizing pathogenic sequence variants from benign ones. It has become evident that the human genome contains a much greater degree of normal variation than previously expected in sequence lengths between 1 kb and 3 Mb. Accounting for as much as 5% of the total human genome, these large-scale variants are referred to as copy number variants (CNVs) to distinguish them from smaller variants, such as SNPs, short tandem repeats, short interspersed repeats, and long interspersed repeats.37,3941 CNVs are generally considered benign in terms of disease association, and some are known to be polymorphic (i.e., present in at least 1% of the population). In the context of array CGH, more CNVs will be detected as the macroresolution increases, thereby making data interpretation more complicated.

Ongoing efforts to catalog human CNVs have been successful, but the record is still incomplete.7,37,41 Two public databases that are attempting to capture information on all human CNVs are proving to be particularly valuable for clarifying array CGH data.42,43 Within the next 3 to 5 years, these databases will become more comprehensive and thus of greater assistance in interpreting array CGH data in a diagnostic setting. Hence, as a first step in analyzing array CGH data generated from high-resolution whole-genome array, potential chromosomal alterations should be matched against the CNV databases to exclude benign variants. Any remaining alterations would represent potentially pathogenic changes that can be verified by comparison with array CGH data or follow-up FISH data from parental samples. If the parents do not carry the same rearrangement as their child (and nonpaternity is not an issue), clinicians can be a little more confident that the de novo rearrangement in the child is pathogenic. Finally, considering the gene content in that region and surveying the literature for similar alterations (either pathogenic or benign) will further clarify the results from a proband sample. Eventually, the CNV databases, along with other genomics databases, may integrate into the Online Mendelian Inheritance in Man (OMIM) database to provide quick and clear explanations for all observed cytogenetic alterations.

Another approach to aid the interpretation of high-resolution array CGH data are to use stringent algorithms to evaluate the data. Algorithms that measure signals from oligonucleotide arrays are versatile in that they can be designed to evaluate signals only from regions of interest. Hence, regions known to contain benign CNVs can be excluded from analysis; alternatively, until all CNVs are carefully studied, they can be flagged and given scores for pathogenicity based on gene content, repeat content, and correlations with information in CNV databases. In addition, it is possible to survey only regions of known clinical significance, hence allowing a whole-genome array to perform more like a targeted array. If necessary, at an added service cost, the initial masked results can later be unmasked to survey the rest of the genome for potential rearrangements. Notwithstanding potential medical and/or ethical dilemmas, this approach would be comparable with routine versus high-resolution karyotyping and may be a logical first step as high-resolution whole-genome array CGH becomes part of clinical diagnostics.

AN IDEAL ARRAY?

Most cytogenetics laboratories will likely incorporate array CGH into their testing services in the next few years. Many may initially opt for targeted arrays because they can adequately recognize known microdeletions and microduplications and detect unbalanced subtelomeric alterations. Importantly, prenatal testing will rely exclusively on targeted arrays, at least in the foreseeable future, because data interpretation and FISH confirmation are straightforward. However, most facilities will likely recognize that high-resolution whole-genome arrays are more versatile, can perform as targeted arrays with appropriate algorithms if desired, and are essential for analyzing patients with disorders of unknown cause.

As array CGH enters the mainstream and users gain more confidence in whole-genome arrays, an ideal resolution for whole-genome analysis needs to be defined. A moderate resolution of 1 Mb has proven insufficient for detecting many pathogenic anomalies using a whole-genome BAC array.21 Clearly, greater macroresolution is needed, but the optimal resolution is not easy to define and will vary for different regions of the genome. This has to be empirically determined. Given the outcomes of recent array CGH studies, it is clear that a macroresolution <100 kb is essential because novel disorders have been described with rearrangements as small as 35 to 100 kb.21,44 Eventually, the American College of Medical Genetics should issue guidelines for the minimum components of a “consensus” array that will bring some agreement on the content of a diagnostic genomic array.

We predict that the general trend will be toward carefully designed, high-resolution whole-genome (i.e., high-density oligonucleotide) arrays with high overall macroresolution (perhaps 50–100 kb) and higher microresolution in gene-rich intervals and regions of known medical significance. However, a couple of issues might slow this trend. First, high-resolution whole-genome arrays are relatively expensive, although the costs are expected to decrease as the technology gains wider acceptance. Second, different users may use different array CGH platforms and even different designs within a single platform, making comparison of findings on two separate arrays difficult. However, enforcing nomenclature standards that include information on sequence coordinates for all genomic alterations reported can alleviate this problem.45

A third and more pressing hurdle in the increased use of high-resolution whole-genome arrays is the difficulty of confirming de novo alterations that they detect. For this reason, as explained earlier, parental DNA samples must be analyzed when a potentially new alteration is found. Typically, novel findings on an array should be followed up by FISH to distinguish between chromosomal imbalances, such as simple deletions and unbalanced translocations or simple duplications, and marker chromosomes. In cases of novel rearrangements in complex genomic regions, such as 15q11,21,46,47 parental samples may have to be examined by array CGH as well. Where FISH proves unreliable because the deletion interval is too small or rich with repeats, follow-up would require molecular methods such as qPCR. This prolonged follow-up could cause considerable anxiety not only for patients and their families but also for laboratory directors, clinicians, and genetic counselors involved in the cases. To address these circumstances, we recommend an initiative within the cytogenetics community to build a library of genome-wide reagents that are quickly accessible for confirming array CGH results. These reagents could include FISH-tested BAC clones or pooled oligonucleotides, multiplex ligation-dependent amplification probes, and isothermal qPCR primers. Ultimately, this targeted versus whole-genome array issue will not exist because all clinically relevant cytogenetic imbalances discovered on whole-genome arrays will eventually be incorporated into the testing menu of a comprehensive targeted array that will have representative confirmation probes for every locus and will also exclude regions of known benign variation.

Cytogenetics will soon witness a seismic shift in testing paradigm as high-resolution whole-genome array CGH is introduced into a growing number of laboratories. This will tremendously benefit the field of medical genetics as novel genomic disorders are discovered and subtleties in both new and previously characterized disorders are revealed. With appropriate regulatory oversight and careful integration into cytogenetics laboratories, this sophisticated technology should help diagnose many more patients with genetic conditions than previously thought possible.