We would like to thank S. Roedder and N. Salomonis for their comments on our News & Views article (Loaded, locked, drawn: kSORT validated for patient samples. Nat. Rev. Nephrol. http://dx.doi.org/10.1038/nrneph.2016.160)1, and we appreciate the opportunity to further address the points we raised (Biomarkers in transplantation — the devil is in the detail. Nat. Rev. Nephrol. 11, 204–205; 2015)2 regarding their paper, which described the development of an assay for detection of acute rejection in kidney transplant recipients3.

In our article, we pointed out that with the appropriate prevalence adjustments, the positive predictive value (PPV) of the Kidney Solid Organ Response Test (kSORT) for acute rejection and for subclinical acute rejection would be significantly reduced2. We note that Roedder and Salomonis agree with us that performance metrics including PPV depend on prevalence1. However, we are perplexed by their statement that “the true prevalence of acute rejection cannot be fairly determined,” and that the prevalence of subclinical acute rejection is also not defined because it “might be missed in patients who never undergo protocol biopsies”. The prevalence of acute rejection is well established4 (Fig. 1) and recent studies have further defined the prevalence of subclinical acute rejection5. The issue of suclinical acute rejection is of particular interest, as their data show a low sensitivity and modest C statistic of the kSORT assay for subclinical acute rejection3. We contend that unless this issue is addressed through further validation and refinement of the assay, the proposed models risk miscategorization of biomarkers.

Figure 1: The prevalence of acute rejection following kidney transplantation.
figure 1

In 2008–2013, rates of clinical rejection in the first 12 months post-transplantation ranged from 8% to 10% among recipients of transplants from deceased or living donors. Permission obtained from American Society of Transplantation © Hart, A. et al. Am. J. Transplant. 16 (Suppl. 2), 11–46 (2016).

PowerPoint slide

We also commented on the necessity to lock down an algorithm to ensure reproducibility. An algorithm is a process — a set of rules to be followed in all subsequent tests — and not, as Roedder and Salomonis suggest, an assurance that the discovery set is locked. Most molecular diagnostic tests based on gene expression use a single fixed or locked model to predict a given phenotype. For example, the AlloMap test uses a single fixed model that comprises a linear discriminant analysis (LDA) method which fits a linear equation that measures the expression of 11 genes6. Similarly, a three-gene urinary PCR test uses logistic regression with backward elimination and bootstrap-resampling methods to derive a best-fitting fixed parsimonious model for prediction of rejection7.

In sharp contrast to the AlloMap and urinary PCR tests, kSORT uses 13 individual models, each consisting of 12–17 genes. Roedder and colleagues created an ensemble classifier of all 13 models using a commonly used voting procedure8,9,10,11, but they used an unorthodox application of this voting procedure without justification. Normally an ensemble classifier uses different algorithms (Support Vector Machines, random forests, LDA, etc.) on the discovery data to predict a phenotype, which gives the classifier an advantage by sampling the range of predictive capabilities of different algorithms9. Conversely, the kSORT analysis suite (kSAS) applies the same Pearson correlation-based algorithm repeatedly in a modified voting procedure that determines the final phenotype, suggesting that a single fixed and locked model capable of yielding acceptable predictive accuracy does not exist. The authors do not provide any clarity around the assumptions and performance metrics used to determine the best set of gene features and the number of models applied. Moreover, they provide little evidence that the test can accurately predict the correct phenotype when the sample is tested blind, a requirement for any clinical test. In fact, by their own account, the final tool required an intensive internal training process mixing investigator-selected samples of known phenotypes from multiple clinical sources, creating a potential for substantial imposed user bias in reporting performance. Finally, despite the authors' claims, we remain unable to find a description of the exact thresholds for the optimization of the correlation coefficients used in each individual model in the manuscript or supplemental data3.

Another remaining concern about kSORT is that indeterminate samples (classified as 'intermediate risk'; 15 of 100 samples) seem to have been left out of the area under the curve (AUC) calculations. Yet for indeterminate calls, one has to assume a low accuracy for samples below the established threshold. Including indeterminate calls in the calculations would substantially and negatively affect the AUC, resulting in a substantially lower AUC than the published values. In their correspondence1, the authors cite the AUC for subclinical acute rejection of 0.73 reported for the kSORT assay in the ESCAPE study12, which was co-authored by Roedder and Sarwal (senior author on the original kSORT study3), as evidence for the validity of kSORT. Sarwal recently stated, however, that “biomarker panels developed for graft rejection and tolerance in recent studies provide [receiver operating characteristic (ROC)] curves of >85%”, citing kSORT and the AART study3, and that the AUC “serves as estimated index of overall accuracy and serves a useful practice to compare different ROCs”13.

Although the correspondence from Roedder and Salomonis is a welcome addition to further this dialogue, we do not feel that their argument offers evidence of any “misunderstanding” or misrepresentation of kSORT or the KSAS algorithm in our original commentary2. Rather, their correspondence offers further technical information that is consistent with our original concerns, without providing additional and needed clarity. We do, however, enthusiastically applaud their ongoing validation studies and their efforts to better the lives of transplant recipients.