Introduction

Computational hit-finding is poised to make a major impact in early drug discovery1,2,3,4, enabled by leaps in computational power, increased accessibility to diverse chemical space, improved physics-based methods and the emerging potential of newer machine learning and artificial intelligence approaches. However, despite the promise, no algorithm can currently select, design or rank potent drug-like small-molecule protein binders consistently.

Significant advances in the development of computational methods can be gained through blinded benchmarking exercises, as evidenced by community progress in developing computational methods to predict protein structure from primary sequence. In 1993, when the Critical Assessment of protein Structure Prediction (CASP) exercise5 was launched, humans were often better at predicting protein structures than computational methods. Now, machine learning algorithms can predict the structures of many (but not all) globular proteins as accurately as can be determined experimentally6,7, and progress is being made rapidly to predict the structures of protein complexes8,9.

In computational chemistry, organizing benchmarking exercises similar to CASP have occurred10,11,12,13,14,15,16,17,18, but none are currently operational. In addition, besides the TDT and DREAM benchmarking initiatives13,14,18 that included a prospective arm to its prediction challenge, there has been no concerted effort to provide experimental testing of predictions, which is in large part because of the associated costs. There is no opportunity to fund the synthesis and quality control of predicted compounds and to test their binding rigorously in one laboratory under standardized conditions that facilitate head-to-head comparison of predictions. One confounding issue has been that commercial sensitivities complicate small-molecule-binding benchmarking. A large fraction of the experimental data suitable for benchmarking in silico binding predictions are generated within the pharma industry and kept confidential, rather than being released for general use. In addition, significant advances in computational chemistry technologies are taking place within companies, and massive private investment is flowing into new companies for the development of artificial intelligence methods. These companies are also likely reluctant to share their methods in any detail or see them put to the test publicly.

It is now possible to conceptualize a benchmarking exercise that can overcome some of these limitations. From a financial perspective, the creation of ultra-large libraries of chemicals that can be described in silico and procured on demand2,19 significantly reduces the cost associated with accessing chemical matter to test predictions. The availability of massive amounts of computational resources facilitates data sharing and democratizes the ability to make predictions20.

From an organizational view, there is now community acceptance that public and private sectors can collaborate precompetitively in areas that were once considered commercially sensitive. The ‘open-access, open-source, open-data’ paradigm is accepted as an accelerator of biomedical science21,22. Critically, this paradigm has provided immense scientific value by normalizing the placement of chemical matter, including advanced molecules such as chemical probes, in the public domain without complex and rate-limiting intellectual property agreements21.

Based on this new landscape, we are creating a public–private partnership called Critical Assessment of Computational Hit-finding Experiments (CACHE) to benchmark computational approaches for the identification of a small molecule that binds a targeted protein with high enough affinity and suitable physiochemical properties to qualify as a credible starting point for a drug discovery project. Modelled after CASP, CACHE will organize hit-finding challenges against selected biologically relevant targets and participants will use various computational methods to predict hits. However, unlike CASP, which was able to piggyback experiments being done in the structural biology community, CACHE will have an experimental arm testing predictions prospectively. Each challenge will typically include two testing iterations to enable refinement and forward application of successful predictive models. Upon completion of a hit-finding challenge, all data generated by CACHE, including all screening data and chemical structures, will be publicly available without intellectual property restrictions.

The genesis of the CACHE concept

Prompted by recent developments and interest in computational methods, including deep learning, as well as the challenges in identifying the best performing methods, ~80 scientists from industry, academia and funding agencies met virtually in November 2020 to consider potential areas of drug discovery that might benefit from coordinated benchmarking. Of the many areas that were identified, the group prioritized hit-finding as particularly suitable and practical, and an excellent area to begin. To advance the idea, a set of ~30 representatives developed a draft concept for CACHE in four working groups, which focused on: target selection and prioritization; virtual library construction; measuring outcomes; and governance. These groups’ ideas for the CACHE project are presented in this Roadmap.

The CACHE concept

CACHE will present and organize a variety of hit-finding challenges to the community. As a part of this, and as described in detail below, CACHE will identify suitable protein targets, curate the virtual chemical libraries, define success parameters for generated predictions and solicit predictions for hit compounds. For evaluation, CACHE will purchase or otherwise procure the compounds that are predicted to bind, experimentally measure their binding to their intended target, calculate other key properties of the active compounds and share the outcomes openly with the scientific community (Fig. 1). We envision that CACHE, like CASP, will organize multiple rounds of challenges, providing ongoing opportunities for computational scientists, molecular modellers, algorithm developers etc. to improve and test their methods.

Fig. 1: CACHE challenge workflow.
figure 1

1. Hit-finding challenges: Critical Assessment of Computational Hit-finding Experiments (CACHE) presents a variety of hit-finding challenges to the community, including assessment criteria. 2. Virtual libraries: CACHE will establish and host two virtual libraries: a make-on-demand library (REAL, ZINC20) and a library comprising compounds synthetically accessible by chemists in academia or industry (bespoke chemistry). 3. Participants predict chemical matter and CACHE experimentally tests compounds: each participant will have the opportunity to make two cycles of predictions per round. CACHE will procure and assay the predicted compounds. At this stage, structures of compounds will be made available to all participants, but screening data will be provided only to the specific participant and competition management, in order to serve as a starting point for an additional cycle of predictions. 4. Compounds and data placed in the public domain: once the second cycle is complete, the data package, including all structures and screening data, as well as an assessment of each compound, will be made available to all, without restriction. PDB, Protein Data Bank; SAR, structure–activity relationship.

CACHE challenges and target selection

CACHE will organize hit-finding challenges that represent the common scenarios encountered in hit-finding (Fig. 2b). The CACHE target selection committee will select targets appropriate for each of these five scenarios. They will define the acceptance criteria for targets in each scenario and use bioinformatics tools to compile a longlist of targets that meet these criteria. Subsequently, they will create a mechanism or mechanisms for the community, including the funders of CACHE, to prioritize from this list of potential targets those that will be included in the benchmarking challenges.

Fig. 2: Target selection consideration and classes of CACHE challenges.
figure 2

a | Targets will be selected from a longlist of proteins that represent a range of scenarios of varying technical difficulty, are experimentally enabled (for example, there must be a robust binding assay) and, where possible, represent opportunities to make new biological or medical discoveries. Funders can prioritize targets within each challenge. b | The five potential hit-finding scenarios that address key technical questions in computational chemistry. CACHE, Critical Assessment of Computational Hit-finding Experiments; SAR, structure–activity relationship; SMOL, small molecule.

Only targets having two orthogonal, cost-effective direct binding assays that can provide rapid, validated, high-quality experimental feedback will be considered. From this list, CACHE and its funders will use a prioritization scheme that maximizes both the structural diversity of the target proteins and the opportunity to discover new biological insights. The aim is for CACHE to benefit both the computational as well as the pharmaceutical communities. We anticipate that a funder (such as a disease-focused charity) might consider CACHE as an attractive funding opportunity through the mobilization of a wide global network of computational chemists to focus on their priority target(s) (Fig. 2a). We also imagine that, in lieu of providing direct financial support, funders, foundations or companies might also offer in-kind support for CACHE, for example, by offering to evaluate all predictions for a given target or provide access to computational resources, assay reagents and/or laboratory equipment. Over a 5-year period, we aspire to provide CACHE with the resources to pursue 15 targets, representing each of the five hit-finding scenarios to enable it to fulfil its goals.

Participation guidance and support

Virtual compound libraries availability

To enable rapid and cost-effective testing of predictions, CACHE will establish a well-defined and robust core make-on-demand virtual library comprising compounds that are readily accessible from commercial vendors, at reasonable cost. A combination of Enamine REAL (now providing 21 billion make-on-demand compounds) and ZINC20 (ref.19) (containing over 750 million purchasable compounds) might comprise the core of this library.

CACHE will annotate compounds in the library with predicted physical properties, such as cLogP, polar surface area and the fraction of sp3 carbon atoms (Fsp3), among others, which will be assessed in the challenge’s success criteria. Ideally, these annotated properties should enable participants to select individual subsets and/or apply relevant filtering as they see best fit for their challenge, while ensuring any such pre-filtering or subset restrictions can be accounted for in any subsequent evaluation and comparison of approaches. CACHE will also create subsets within the initial library, as this classification may be required to account for the needs of specific CACHE participants. For example, a 1% diversity set or a 10% diversity set might be preferred when examining computationally intensive approaches, and so on. The libraries will evolve, such that more compounds will be added as they become commercially available or accessible, and additional library subsets will be created as a function of their performance.

To accommodate de novo design methods, which are not selecting compounds from commercial vendors but designing new molecules, CACHE will test custom-synthesized compounds if the compounds can be procured by participants within 3 months of the completion of the in silico selection step. In later challenges, CACHE may also incrementally explore mechanisms to provide participants access to a virtual library containing new chemistry, where synthetic chemists within academia or industry would be offered the opportunity to contribute to a virtual library that covers new chemical space. In this initiative, chemists would add compounds that they would be willing to synthesize on demand in a timely manner, using emerging synthetic chemistry protocols and their own resources.

At regular and defined intervals over the course of the CACHE benchmarking exercises, the CACHE virtual libraries committee will evaluate the impact of library choice, composition and nature (diversity, size) on both virtual screening capabilities and on general screening success, and recommend changes accordingly.

Evaluating predictions experimentally

At the core of the CACHE initiative will be an experimental hub that will provide rapid, high-quality testing of the predicted hits. Predicted compounds will be submitted to the experimental hub, which will procure the compounds and evaluate them using a binding assay selected to be most appropriate for the protein target. Each compound will be assayed at a single concentration in duplicate, and each positive will be retested in dose–response mode, as well as in an orthogonal biophysical assay, which is critical for the robustness of the experimental results. Feedback will be given first to the participant(s), and participants who made successful predictions will have the opportunity to improve on them by submitting a new set of predictions.

Each CACHE challenge round will take ~18 months, with two cycles of predictions per round in order to give participants the opportunity to incorporate learnings from the first round into their next designs. The timing and sequence of the proposed challenge round is shown in Fig. 3. Challenges will be staggered in order to avoid overwhelming the experimental hub. As part of each challenge, participants will be asked to make predictions from a small library constituting the combined list of predicted compounds contributed to the first cycle by all participants. Experimental testing of these compounds and then comparing with predictions will facilitate inter-algorithm benchmarking.

Fig. 3: The timelines of challenge activities.
figure 3

After reviewing the letters of intent (LOIs), each complete challenge round will take ~18 months, with the various stages outlined.

CACHE benchmarking

Benchmarking computational hit-finding methods poses a challenge, because no single measure, or even combination of measures, can be used to unambiguously quantify the success of virtual screens, let alone determine which binder among many is the best. The affinity of compounds that are active in a primary screen, typically in a surface plasmon resonance assay, will be evaluated with an orthogonal biophysical method. Although binding affinity to the desired protein will be the main benchmarking criterion, selectivity against specific off-targets will be tested if called for in the challenge. The solubility and colloidal aggregation23 of hit molecules will be determined experimentally by dynamic light scattering. Insoluble and aggregating compounds will be flagged because precipitation and aggregation are confounders in nearly all binding assays. Common pan-assay interference (PAINS) compounds24, predicted, for instance, by a strong indication of promiscuity with Badapple25, will also be flagged. Method-specific patterns of binding or inhibition that could be associated with nonspecific interaction or aggregation will also be monitored. These include high Hill slopes of IC50 determination plots, linear fitting of surface plasmon resonance data and unreasonable stabilization of proteins measured by differential scanning fluorimetry. Experimental hits will also be subjected to rigorous analytical quality control to confirm the purity of the samples. CACHE will seek to solve the crystal structure of validated hits in complex with their target when robust crystallization protocols are available.

Before each challenge, CACHE will publish the corresponding success criteria (activity, selectivity, aqueous solubility, lipophilicity, novelty etc.) and how these will be combined into an overall multi-objective score26,27, similar to the oralPhysChemScore (oPCS)28. Binding affinity, aqueous solubility and logD will be measured. Calculated properties include: corrected molecular weight; polar surface area29; number of rotatable bonds; Fsp3 (ref.30); and novelty. This novelty parameter will be defined as the Tanimoto distance relative to most similar structures binding that target, as calculated from RDKithttp://www.rdkit.org. These novelty thresholds were chosen based on previous work with circular fingerprints18,31. CACHE will provide the workflows and scripts that were used to calculate the different descriptors. In one possible scheme (Table 1), active compounds will not be ranked per se but, rather, will be classified into three buckets (green, yellow and red) by summing up the traffic light values for each property. The scoring scheme used to assess a compound’s physical and molecular properties will be similar across the challenges, but the values for potency and selectivity may change, depending on the challenge. For example, compounds with weaker affinity might be acceptable for targets that are more difficult to identify hits against and have no reported precedent, but higher affinities might be the aim if the challenge is to identify novel chemotypes for precedented targets. As stated above, to facilitate comparison among the methods, all predictions from all participants for a given target will be combined into a single small virtual library, and all participants will also be asked to rank these compounds.

Table 1 Example Critical Assessment of Computational Hit-finding Experiments (CACHE) traffic light scoring scheme for one arbitrary target protein

Top-scoring molecules (Table 1) will be further analysed by a panel of experienced medicinal chemists in order to provide additional annotation to the molecules, including opinion on the suitability of the hits to serve as a starting point for potential drug discovery programmes. This includes human experience on reactivity, synthesizability, chemical stability, potential toxicity, off-target activity etc. Their reflections will not influence the score but, rather, will help contextualize the output and provide insight for refinement of the scoring process for future challenge iterations.

CACHE output sharing

CACHE will generate three main outputs for the community: screening data, chemical structures and algorithm performance (Box 1). CACHE’s mandate is to ensure that the screening data and the chemical structures are available to the community without intellectual property or other restrictions on use, and in a digitally readable format according to FAIR principles32. These data will also include the composition of the virtual libraries screened, all predicted small molecules (including negative data), all experimental screening results and all screening methods.

CACHE will mandate that participants disclose their computational approaches in sufficient detail to enable an expert in the area to understand the methodology and algorithms. These methodology descriptions will be double-blind peer reviewed by other participants to ensure they contain sufficient information according to the standards of the field. In the interest of encouraging participation from all sectors, participants will not be required to provide access to their code and can remain anonymous. However, CACHE will encourage participants to share their software code and, as stated below, intends to provide a range of financial incentives for those participants who release their code, algorithms and workflows under permissive open-source license terms and, ideally, who also submit their fully automated workflows. In addition, participants must agree that the identity of those who submit top-performing methods (as determined by prespecified criteria agreed to by CACHE and the participants) will automatically be de-anonymized when the screening data and compound structures are publicly released. Participants who agree to share workflows, code and methodology must do so in a FAIR manner32.

Participants will be encouraged to seek peer-reviewed open-access publication of the results of their submissions and detailed analyses of their performance, and to work together to share learnings and identify differentiators of performance. CACHE will organize a workshop following each challenge and coordinate the open-access publication of overview papers for each challenge, perhaps with dedicated special issues of relevant journals to provide a wider forum for participants.

CACHE organization and management

CACHE will be structured as an independent, not-for-profit entity or fiscally governed by a not-for-profit organization with aligned goals, such as the Structural Genomics Consortium (SGC) or the Open Group. CACHE or its parent organization will receive funding as described below and subcontract other organizations (academic, government or industry) to carry out CACHE activities, all under terms that mandate open data sharing. CACHE will create a secretariat to handle administration, fundraising, project management and logistics.

CACHE will be funded in part by members, who will have the opportunity to influence the strategic directions of CACHE through appointments to a governing board (Fig. 4). The governing board will be responsible for making operational decisions, including target selection, participation rules and use of funds. An external scientific advisory board will be appointed by the governing board to provide outside advice on scientific questions such as the strategy for target selection and the metrics for success.

Fig. 4: CACHE governance.
figure 4

Critical Assessment of Computational Hit-finding Experiments (CACHE) will be structured as an independent, not-for-profit entity. The CACHE governance will include: a governing board constituted by funders (members) and two independent members selected with input from the scientific community: an external scientific advisory board and a secretariat who will oversee day-to-day operations. The governing board will create three scientific committees: the target selection committee will select protein targets (with the final decision impacted by the governing board); the virtual libraries committee will define the virtual chemistry libraries to be screened; and the hit evaluation committee will create the metrics of success and assess performance against the metrics. Funders who do not wish to play an active role in governance can nominate targets for consideration by the target selection committee.

CACHE plans to launch challenges for each of the five hit-finding scenarios shown in Fig. 2, each challenge occurring at least once over 2 years (Fig. 3). There will be periodic public open calls for participation. For the first rounds, letters of intent will be solicited to better understand the needs and goals of potential participants. All potential participants would be asked to submit brief applications detailing their qualifications to participate and general intended approach. For inclusivity, the initiative should strive to accept every reasonable application, paying attention to use resources efficiently.

For each challenge, CACHE will contribute a challenge lead, who will be responsible for the coordination of experiments and logistics. The challenge lead will ensure that best practices are used in challenge design, execution and assessment, and codified in iteratively revised documents. For instance, these documents could be similar to the living reviews found in the Living Journal of Computational Molecular Science or made as contributions to the NCATS Assay Guidance Manual. Challenge leads, in consultation with the governing board, will determine the details of specific challenges and what compound properties — experimental or computed — beyond affinity for the target will be incorporated into the overall performance scores.

Challenge leads will also be responsible for determining and executing or delegating the execution of appropriate baseline methods to be run centrally to avoid duplication for participants running many similar baselines. These methods would likely include random local search, simple similarity matching or vanilla docking methods, where applicable. Challenge leads will have the support of the scientific advisory board in making all of these decisions.

CACHE funding strategy

CACHE intends that its activities, including governance, management, logistics and data sharing, will be supported by a pool of government, industry and charitable funders. Ideally, CACHE funding would also be used to provide subsidies for participants from resource-poor environments, providing an overall more inclusive approach.

The funding of the challenges themselves will be shared among interested funders and participants. Funders, such as a disease foundation, could support challenges of particular interest to them. As CACHE matures, participants will be expected to pay a participation fee reflective of a portion of per-compound costs (including synthesis/procurement and assays). To facilitate this, CACHE will develop a transparent cost structure for each challenge. In the interest of encouraging transparency, CACHE aspires to be able to subsidize the cost of participation for participants who agree to share their methods, code or methodologies.

By centralizing the experimentation, CACHE will not only provide standardized data but will also provide logistical and cost savings over carrying out the activities in individual labs. Within CACHE, we estimate that the costs of rigorous experimental testing for 100 compounds is approximately US $25,000; this includes purchasing of the compounds, quality control, protein purification, equipment time, primary biophysical assays and hit confirmation using orthogonal assays. CACHE will procure the compounds on behalf of all participants to facilitate logistics as well as to provide the opportunity to negotiate bulk pricing.

In the first two competitions, CACHE aims to secure sufficient seed funding to purchase and evaluate ~100 compounds for every qualified participant, but, in subsequent rounds, these costs will be transferred to participants. If participants wish to test more than 100 compounds, or if the number of participants exceeds the initial available funding, participants may also be required to fund some portion of per-compound costs.

CACHE will also be well positioned to collaborate with other successful community initiatives in order to increase the impact of CACHE. For example, if CACHE includes a viral target among the challenges, then the CACHE predictions might input into community antiviral development initiatives, such as the COVID Moonshot initiative20. Predicted compounds that pose synthetic challenges can be turned into additional community challenges, such as Merck’s Compound Synthesis Challenge, to design and predict the most efficient synthetic pathway for a given small molecule. Confirmed hits could also be used as starting points to develop new chemical probes.

CACHE success criteria

CACHE will be a long-term project that will be assessed against success metrics of organizational capabilities and community engagement in the short term (1–3 years) and scientific accomplishments in the longer term (year 3 and beyond). Organizational success will be achieved by running the entire workflow of target selection for several rounds. For example, we expect six rounds to run over ~2 years, where a round includes hit prediction, chemical synthesis, biochemical/biophysical testing of the compounds and analysis/dissemination of the results (Fig. 3). Community engagement success will be defined as generating a constant flow of targets, hit proposals and experimental results from an increasing number of community members over time. Scientific success can likely be analysed only after 12 rounds (year 4), after which all five types of challenges are performed at least two to three times with different targets. Scientific success metrics will include providing unbiased comparisons of which computational methods deliver suitable hits (chemotypes) as starting points for drug discovery and the number and quality of novel chemical matter for biologically interesting new targets.

With respect to quantitative metrics, we aspire for CACHE to have deposited experimental screening data for 12 proteins and 30,000 drug-like molecules selected by over 100 participants in the public domain after 4 years. Over this period, we also expect that computational methods will predict unprecedented hits for 25% of the nominated novel targets. We also expect CACHE to provide clearer guidance as to which computational approaches are most promising for identifying novel small molecules active substances and, thus, significantly influence computational hit-finding-method development on a global scale.

Summary and next steps

A group of ~50 scientists from the public and private sectors intend to launch a benchmarking initiative to accelerate the development of computational methods to predict small molecules that bind to proteins. The initiative will comprise experimental and data hub(s), which will support a community of participants in their predictions. All data, including chemical structures, will be made available without restriction on use. The initiative intends to attract funding from industry, governments and foundations to support the infrastructure and challenge-specific funding in order to give disease-focused funders the opportunity to enable a community-wide effort to target proteins of interest to them. The intention is to launch the first CACHE challenge in early 2022.