Introduction

Pharmacogenomic testing might be most useful in psychiatric patients who have treatment resistance, intolerable adverse effects or the potential for problematic drug–drug or drug–disease interactions.1, 2, 3, 4, 5, 6, 7, 8, 9 Mayo Clinic, Rochester, MN, USA has an outpatient consultative psychiatry practice that collects a standard set of clinical data for each patient seen in consultation, including a standard method for recording historical psychotropic medication trials. Patients sometimes receive pharmacogenomic tests as a part of their consultation and results obtained are used in management recommendations.5, 6 This study sought to examine the hypotheses that pharmacogenomic genotype knowledge is associated with better clinical, cost and healthcare utilization outcomes, after controlling for other clinical variables that might differentiate tested and non-tested patients.

Patients and methods

Medical records of 2,390 patients seen in the Mayo Clinic Rochester outpatient psychiatric consultation practice between 1 January 2006 and 31 May 2010 were reviewed; 251 (10.5%) patients met the study's inclusion criterion: a patient health questionnaire (PHQ)-9 (10) scale score was recorded (PHQ-9, 10) at the time of psychiatric consultation and at least two PHQ-9 scores that followed psychiatric consultation. Although PHQ-9 scores are administered routinely in this practice, most patients present for a one-time consultation or come to Mayo Clinic from other geographic locations and are not followed longitudinally. Therefore, only a minority of patients have longitudinal PHQ-9 scores. A baseline PHQ-9 score was defined as a measure of severity of depression. For tested patients, the baseline PHQ-9 score was the score that most closely preceded pharmacogenomic testing; generally the date of the psychiatric consultation during which testing was ordered. For non-tested patients, the baseline PHQ-9 was the score recorded on the date of their baseline psychiatric consultation. Post-consultation PHQ-9 scores were required to be at least 14 days after the consultation, that is, the minimum amount of time to reasonably expect to begin to see a clinical response if a medication was initiated by the clinic. A total of 146 of the 251 patients (58%) had taken at least one of the following pharmacogenomic tests—cytochrome P450 2D6 (CYP2D6), CYP2C19, CYP2C9, serotonin transporter genotype (5-HTTLPR). Comparisons of demographic and clinical features between tested and non-tested patients were evaluated using Wilcoxon rank-sum and χ2-tests. For the 251 study patients, PHQ-9 scores were grouped into those that occurred before or on the baseline date and into those that occurred on or after 14 days after the consultation date. A slope representing change in PHQ-9 scores over time was calculated for the tested and non-tested subgroups.

Out of the 251 study patients, 46 had at least two PHQ-9 scores that preceded the psychiatric consultation and at least two PHQ-9 scores that followed the baseline psychiatric consultation. Again 29 of these 46 patients (63%) had pharmacogenomic testing. For these 46 patients, differences in pre-baseline and post-baseline PHQ-9 slopes were calculated. For each patient a slope representing the change in pre-consultation PHQ-9 scores over time was calculated using a linear regression model. This was subtracted from the slope calculated using all post-consultation PHQ-9 scores. Differences in slopes among genotype categories were also evaluated. As the distribution of the differences in the slopes for these analyses was not approximately normal, the differences were ranked and then compared between tested and non-tested patients using univariate and multivariable linear regression models. Comparisons of differences in depression score slopes were evaluated using Kruskal–Wallis and Wilcoxon rank-sum tests.

Healthcare utilization costs between January 2006 and June 2010 were retrieved for 92 of the 251 study patients who lived entirely within the local community during the study period and received all of their local healthcare and mental healthcare at Mayo Clinic Rochester. Differences in total costs from pre-baseline to post-baseline were evaluated by subtracting pre-baseline costs from post-baseline costs. Overall, 45 (49%) had pharmacogenomic testing and 47 (51%) did not. Healthcare cost estimation (Olmsted County, Minnesota Healthcare Expenditure and Utilization Database maintained at Mayo Clinic). All healthcare costs were inflated to the year 2010.

To examine mean numbers of medical and mental health admissions, consultations and follow-up/treatment visits, the time frame was expanded to the dates from January 2002 and June 2010, covering the period of time when patient records for local patients were consolidated into the Mayo Clinic electronic medical record. Because a previous study indicated variability among consultants, in this clinic, in terms of propensity, to order pharmacogenomic testing,5 a subset of tested and non-tested patients were identified who saw the four consultants working in the clinic who most frequently ordered pharmacogenomic testing. Nineteen local patients had testing ordered by one of these four consultants, and were compared with 19 non-tested patients whose baseline psychiatric consultation was performed by the same consultants in the same frequency proportion.

Cost data were compared between tested and non-tested patients using linear regression models, both univariately, and after adjusting for factors found in the univariate analysis to be significantly different between tested and non-tested patients. As the distribution of costs was not approximately normal, data were ranked before the modeling. Comparisons of costs among genotypes for tested patients were evaluated using Kruskal–Wallis and Wilcoxon rank-sum tests. As the distribution of the differences was not approximately normal, the differences were ranked and then compared using univariate and multivariable linear regression models. For healthcare utilization data, tests of group means were assessed; the numbers of clinical encounters were adjusted to take into account the varying ratios of time elapsed between the start of the study period and the baseline testing or consultation, and between the baseline testing or consultation and the end of study period. As a previous study had demonstrated variation in ordering patterns among consultants, a separate comparison of group means was conducted for a subgroup of patients with identical proportions of ordering consultant frequencies, and only using the four consultants ordering testing most frequently. In addition, the test of group differences for the entire study sample was conducted a second time, adjusting for ordering consultant, as an additional method to account for this potential confounder of consultant variation in ordering practices.

Statistical analyses were performed using the SAS software package (SAS Institute, Cary, NC, USA). All tests were two-sided and P-values <0.05 were considered statistically significant. This study was conducted in full compliance with all policies and procedures of the Institutional Review Board at the Mayo Clinic. Study data were abstracted from medical records of patients who gave informed consent for their charts to be reviewed for research purposes.

Results

As this was an exploratory study, over 50 demographic and clinical variables were assessed to determine whether there were differences between tested and non-tested patients that might account for group differences. Table 1 summarizes univariate comparisons that were statistically significant. Major depressive disorder diagnosis frequency, baseline PHQ-9 score, family history of mood disorder, psychiatric hospitalization history and numbers of previous antidepressant, mood stabilizer and antipsychotic trials, were significantly different, all higher in the tested group, indicating greater degrees of psychiatric predisposition and depression severity. There were no statistically significant demographic differences between any groups compared in this study. These factors became the basis of the logistic regression analyses.

Table 1 Comparisons of significant differences between patients with and without pharmacogenomic testing, N=251

Post-day 14 serial depression severity scores

A total of 301 patients had CYP2D6 testing and initial PHQ-9 scores; lesser numbers had CYP2C19 (N=289), CYP2C9 (N=166) and 5HTTLPR (N=205) testing and initial rating scale scores. Subsets of those patients had serial depression rating scale screening. Approximately one in six patients had as many as five follow-up depression severity scores. There were no significant differences over time among CYP genotype categories. For 5-HTTLPR categories, there was significantly greater improvement among long/long genotype patients at time 4 (N=55, mean duration of time from initial testing 13.3 weeks, range 4.0–53.3 weeks, χ2-value 8.0492, P=0.018) and at time 5 (N=44, mean duration of time from initial testing 18.4 weeks, range 5.6–78.8 weeks, χ2-value 6.1492, P=0.046) (Figure 1).

Figure 1
figure 1

PHQ-9 scores over time by HTT.

Post-day 14 depression severity rating scale slopes

There were 120 patients with at least two post-day 14 depression severity rating scale scores (mean 7.9; median 4; range 2–53), including 67 (56%) who underwent pharmacogenomic testing and 53 (44%) who did not. The mean slope representing the change in post-day 14 PHQ-9 scores, measured from baseline testing or consultation date, over time, was −0.07 (median 0.00; range −1.47 to 0.17). The mean slope for tested patients was −0.06 (median 0.00; range −1.47 to 0.17) compared with −0.07 (median −0.01; range −1.28 to 0.08) for non-tested patients (P=0.96). There were no statistically significant differences in slopes between tested and non-tested patients after adjusting for diagnosis of major depressive disorder (P=0.66), baseline PHQ-9 score (P=0.39), family history of mood disorder (P=0.96), numbers of previous antidepressant, mood stabilizer and antipsychotic trials (P=0.95), psychiatric hospitalization history (P=0.49) or after adjusting for all of the features listed above (P=0.88). There were no significant differences in post-consultation PHQ-9 slopes among genotype categories (CYP2D6, CYP2C19, CYP2C9, 5-HTLLPR) for tested patients.

Differences in pre-baseline to post-baseline depression severity scale score slopes

There were 46 patients with at least two pre-baseline depression severity scale scores (mean 12.4; median 6.5; range 2–69) and at least two post-baseline scores (mean 9.6; median 5; range 2–54), including 29 (63%) who underwent pharmacogenomic testing and 17 (37%) who did not. The mean difference for all 46 patients between the pre-baseline and post-baseline slopes was 0.00 (median −0.01; range −1.20 to 2.16). The mean difference for tested patients was −0.08 (median −0.01; range −1.20 to 0.15) compared with 0.13 (median 0.02; range −0.18 to 2.16) for non-tested patients (P=0.03). The significant difference between tested and non-tested patients remained after adjusting for diagnosis of major depressive disorder (P=0.028), family history of mood disorder (P=0.046) and numbers of previous antidepressant, mood stabilizer and antipsychotic trials (P=0.041), but not after adjusting for baseline PHQ-9 score (P=0.28), psychiatric hospitalization history (P=0.14) or after adjusting for all of the features listed above (P=0.42). Comparisons of differences in pre-baseline to post-baseline depression severity scale score slopes among genotypes for tested patients are summarized in Table 2. There was a trend for pre:post-baseline mean slope differences to be more favorable for the long/long genotype category (−0.30) than short/long (−0.02) and short/short (−0.04) genotype categories (P=0.06).

Table 2 Comparisons of differences in pre-baseline to post-baseline PHQ slopes among cytochrome P450 genotypes

Summary of healthcare costs and utilization

There was no statistically significant difference in post-day 14 mean healthcare costs between 2006 and 2010 in tested and non-tested patients after adjusting for family history of mood disorder, diagnosis of major depressive disorder, baseline depression severity rating scale score, history of psychiatric admission, and numbers of previous antidepressant, mood stabilizer and antipsychotic medication trials. There was lack of group differences when the 26 patients with histories of psychiatric admissions were analyzed separately and when patients without histories of psychiatric admissions were analyzed separately. When mean pre-baseline healthcare costs are compared with post-baseline costs, there is a nonsignificant trend for tested patients to have a higher mean cost ($5,010, median −$2,167, range −$95 051 to $157 716) than non-tested patients (−$6,693, median −$9,511, range −$108 290 to $210 716; P=0.08). The difference was statistically significant after adjusting for a diagnosis of major depressive disorder (P=0.049) and after adjusting for numbers of previous antidepressant, mood stabilizer and antipsychotic trials (P=0.02), but not after adjusting for baseline depression severity rating scale score (P=0.34), family history of mood disorder (P=0.07), practice setting (P=0.46) or after adjusting for all of the features listed above (P=0.90). After removing patients without history of psychiatric admission from the analysis, there were no significant differences between tested and non-tested patients in terms of differences in pre-baseline and post-baseline costs. There were also no significant cost differences among the subgroup of patients with psychiatric admission histories.

When adjusted pre-baseline and post-baseline healthcare utilization categories were compared between tested and non-tested patients, several nonsignificant trends were observed (Table 3). Tested patients, after baseline testing or consultation, had a trend toward a higher mean number of post-baseline medical-surgical outpatient specialty comprehensive evaluations or consultations (15.6 vs 17.7), but relatively smaller than the increase observed (P=0.09) in non-tested patients (15.5 vs 31.3). Tested patients, after the baseline testing or consultation, had a trend toward a higher mean number of psychiatric admissions (0.8 vs 2.2), but relatively smaller than the observed (P=0.09) increase in non-tested patients (0.2 vs 2.4). Tested patients, after the baseline testing or consultation, had a trend toward a stable and relatively smaller mean number of subsequent psychiatric consultations or mental health comprehensive specialty evaluations (3.8 vs 3.9) compared with (P=0.07) non-tested patients (1.6 v 6.1). There were no significant differences between tested and non-tested patients in terms of mean numbers of medical-surgical or mental health follow-up or treatment visits. There were no group differences in mean numbers of electroconvulsive therapy sessions. These trends held (P>0.05 and <0.10) after adjusting for ordering consultant in the analysis, but did not become significant.

Table 3 Adjusted mean medical-surgical and mental health admissions and outpatient encounters following pharmacogenomic testing or psychiatric consultation without pharmacogenomic testing (adjusted for ratio of time between beginning of study period and baseline testing or consultation, and between baseline testing or consultation and end of study period)

For the subset of tested (n=19) and non-tested consultant-adjusted (n=19) patients, two significant differences emerged in terms of healthcare encounters (Table 3). Tested patients, after the baseline testing or consultation, had stable mean numbers of psychiatric admissions (1.0 vs 0.8), relatively smaller (P=0.04) than the increase in mean numbers of psychiatric admissions seen among non-tested patients (0.1 vs 3.8). Tested patients, after the baseline testing or consultation, had fewer mean numbers of psychiatric consultations or comprehensive mental health specialty evaluations (5.0 vs 4.2), in contrast (P=0.03) to the higher mean numbers of related encounters observed in non-tested patients (1.5 vs 9.9). There were no significant group differences in mean numbers of medical-surgical clinical encounters of any category or mean numbers of mental health specialty follow-up visits. There were no group differences in mean number of electroconvulsive therapy sessions.

Discussion

Tested patients differed from non-tested patients across a number of clinical domains, including higher likelihood of major depressive disorder diagnosis (in relative comparison with anxiety, other mood and other psychiatric disorders), higher numbers of psychotropic medication trials considered to be adequate in terms of duration and dosage, greater mood disorder predisposition as measured by family history of mood disorder, psychiatric hospitalization history and greater severity of baseline depression as measured by PHQ-9 score (Table 1). These data suggest that consulting psychiatrists at this tertiary care referral center are more likely to order pharmacogenomic tests when patients are more severely depressed, report predisposition for mood disorder, or report treatment non-response or poor tolerability. The average tested patient had 4.6 antidepressant medication trials before pharmacogenomic testing; the correct number of medication trials that may trigger pharmacogenomic testing in future clinical algorithms will depend on prospective data that assesses the affect of pharmacogenomic testing on clinical outcomes and on cost-effectiveness.

Though underpowered to detect a signal for some of the variables of ultimate interest, this retrospective study presents preliminary evidence that there maybe a potential clinical impact of ordering pharmacogenomic testing as measured by more favorable post-testing depression severity rating scale score slopes when compared with pre-testing slopes. The difference holds up during comparisons of other group differences, except when controlling for baseline PHQ-9 score and psychiatric hospitalization history, suggesting that at least part of the difference in rating scale score slopes maybe accounted for by tested patients being more severely ill and having a larger potential distance for scores to fall relative to non-tested patients.

Healthcare cost and healthcare utilization analyses did not find a significant impact of pharmacogenomic testing on costs in this health system. A conservative approach to detecting a cost impact was used, which may have made it difficult to discern an impact, if present, because 4 years of clinical costs were evaluated. It may be difficult to detect the impact of a single event such as a laboratory test over such a long period of time. In addition, there may be affects on costs not detected by the methodology used. For example, pharmacogenomic testing may have increased some costs (for example, laboratory and outpatient psychotherapy costs) and decreased others (for example, costs related to decreased admissions).

There is evidence from this data of a potential impact of testing in terms of decreased numbers of clinical encounters in some categories, when other potentially contributory clinical variables and ordering consultant are adjusted for, though costs themselves were not appreciably affected. This could be related to variability of costs related to services (for example, an inpatient admission being many times more expensive than other categories of services studied). These findings are preliminary and should be replicated or used to generate hypotheses and estimates of necessary power for prospective studies.

At a translational level, treatment plans for patients who have unfavorable genotypes (that is, poor CYP2D6 metabolizers, presence of a short 5-HTTLPR allele) may include selecting a medication that is predominantly a substrate of a different CYP isoenzyme and having a lower threshold for pharmacological augmentation strategies. Evidence-based non-pharmacological interventions such as cognitive behavioral therapy and behavioral activation maybe used more frequently or earlier in the course of treatment.6 The findings from this study, including possible associations between 5-HTTLPR allele category and treatment response, are consistent with findings from other studies.3, 5, 7, 9, 10 On the basis of these potential associations, well-powered prospective studies and other study paradigms are needed to further inform the clinical utility and cost-effectiveness of these tests in the setting of depression and other psychiatric disorders in the medically ill patients.

These results are limited by the small sample size and retrospective nature of the study. It was underpowered to fully address the aims of the study across all variables that could potentially account for group differences. The 251 patients who met data analysis inclusion criteria only represent 11% of the 2,390 patients seen in consultation in this clinic, and may not be representative of all clinic patients. For example, 58% of patients who met study inclusion criteria had pharmacogenomic testing; for the overall clinic population, the frequency is 17%. No algorithm or testing parameters were available to guide ordering practice, hence, psychiatric consultants varied in their ordering thresholds and practices.11 Data regarding the number and adequacy of medication trials were dependent on patient memory when previous records were not available, therefore, it was not possible to confirm with accuracy whether genotype categories of tested patients were managed before testing with adequate dosage levels or durations of treatment with medications.

This retrospective study was an exploratory investigation. Therefore, a large number of measures were screened in the univariate analysis without correcting for multiple comparisons. This step was undertaken to improve statistical power for logistical regression, therefore, any results from the univariate analysis should be considered preliminary. Logistic regression is less prone to type 1 errors when adequately powered, and the high number of variables carries a risk for false positive results.12

Conclusion

Pharmacogenomic testing has the potential to contribute to individualizing outpatient psychiatric practice by helping identify patients prone to treatment resistance or non-tolerance due to genetic factors. Prospective study is important to define under what, if any, specific circumstances pharmacogenomic tools help customize and optimize cost-effective treatment recommendations for each patient's unique genetic and environmental factors.