The T-cell receptor (TCR) repertoire of the peripheral T cells is sculpted from stochastically generated TCRs on developing thymocytes in the thymus by a series of selection steps that delete cells bearing TCRs not recognizing self-peptide–MHC (pMHC) complexes or recognizing them with high affinity, known as positive selection or negative selection. There are two models to explain how TCR signals determine selection outcome. In the threshold model, thymocytes bearing TCRs that signal above a negative selection threshold will be deleted, while T cells experiencing low to intermediate TCR signaling strengths will survive through positive selection. In the sustained signaling model, high-affinity and low-affinity interactions between TCRs and pMHC complexes trigger biochemically different signaling cascades; low-affinity TCRs induce sustained signaling while TCR signaling after high-affinity stimulation is intense but short [1]. Little is known about how the TCR sequence determines the outcomes of selection. In the current issue of genes & immunity, Ostmeyer et al. [2] develop an approach to identify how TCR protein sequences influence the selection fate using machine learning.

A large number of functional T lymphocytes in the periphery contain both productive and non-productive TCR genes [3]. The authors use those productive and non-productive TCRB genes from mature T cells to define unselected and selected repertoires, assuming that the sequence of the non-productive TCR protein is closely related to a non-selected TCR. Thus, the authors develop an algorithm to computationally repair non-productive TCR genes to obtain productive copies with the fewest alteration, which maximally preserve the original biological sequences. This approach allows to exclude known biases from VDJ recombination in the unselected repertoire from the model. Moreover, it does not rely on obtaining the repertoire of thymocytes expressing solely the TCRB gene, giving the approach potentially broader applicability. The authors used both sets of TCR protein sequences to train a machine-learning model. The model returns a probability of PSURVIVE to any TCRB sequence; with PSURVIVE > 0.5, the TCRB gene predicted to be a productive one and PSURVIVE < 0.5, the TCRB gene predicted to be repaired.

To test the model, the authors use TCR genes from developing thymocytes, which include pre-selected and post-selected thymocytes. Distribution of PSURVIVE is bimodal for thymic TCRB genes. In contrast, the distribution of productive TCRB genes from splenocytes was consistent with all splenocytes having survived selection. Similar unimodal were obtained for TCRB genes from blood and colon. This approach is therefore very powerful to identify pre-selected or post-selected TCR genes.

The authors propose that their approach might find applications in personalized medicine by predicting a T-cell repertoire prone to increased autoreactivity, eventually resulting in an autoimmune disease. Indeed, central tolerance is a key checkpoint and defects in positive and negative selection in animal models have been shown to cause autoimmunity. With the exception of AIRE mutations causing the autoimmune polyendocrinopathy-candidiasis-ectodermal dystrophy or APECED, there is little evidence for a role of negative selection in human disease [4]. However, more complex mechanisms in thymic selection rather than the simple failed deletion of a peptide-specific autoreactive T cells may contribute to autoimmune diseases such as Type 1 diabetes, multiple sclerosis, and rheumatoid arthritis [5, 6]. There has been a decade-long, ongoing debate on the role of positive selection, for example, in HLA-DRB1 homozygosity conferring more severe disease in rheumatoid arthritis [7]. The rheumatoid arthritis-like manifestations in the SKG mice, which carry a mutation in the ZAP70 signaling molecule, also appear to be a consequence of faulty positive selection [8].

The approach described in this paper provides a composite assessment of positive and negative selection, and it cannot easily be envisioned how these two processes can be separated. Whether both processes are equally represented in the algorithm remains to be clarified. It appears to be likely that most of the algorithm is driven by positive selection, namely to describe whether the two proteins of TCR and MHC fit to each other based on their 3-D structure. This fitting may go beyond MHC differences, i.e., there may be universal structural principles. If the genetic predisposition of HLA polymorphisms to autoimmune disease comes from a better HLA-TCR fit, such as described above, the tool developed here may be very informative. It appears less likely that the algorithm can predict failure in negative selection, the importance of which has been mainly concluded from genetic manipulation in murine models such as the K/BxN strain but is less evident for spontaneous human disease.

In summary, reconstituting TCR selection in patients in-silico by using PBMCs will be an important tool for understanding TCR selection processes in human autoimmune diseases. So far, our test armamentarium to examining and quantifying disease processes has been very limited. The approach developed here may eventually provide valuable diagnostic information. Whether and how these insights will lead to new preventive and therapeutic interventions remains to be seen.