Scaffold hopping from natural products to synthetic mimetics by holistic molecular similarity

Grisoni, Francesca; Merk, Daniel; Consonni, Viviana; Hiss, Jan A.; Tagliabue, Sara Giani; Todeschini, Roberto; Schneider, Gisbert

doi:10.1038/s42004-018-0043-x

Download PDF

Article
Open access
Published: 08 August 2018

Scaffold hopping from natural products to synthetic mimetics by holistic molecular similarity

Francesca Grisoni^1,2,
Daniel Merk¹,
Viviana Consonni²,
Jan A. Hiss¹,
Sara Giani Tagliabue²,
Roberto Todeschini² &
…
Gisbert Schneider¹

Communications Chemistry volume 1, Article number: 44 (2018) Cite this article

8777 Accesses
41 Citations
14 Altmetric
Metrics details

Subjects

A Publisher Correction to this article was published on 20 September 2018

This article has been updated

Abstract

Natural products offer unexplored molecular frameworks for the development of chemical leads and innovative drugs. However, the structural complexity of natural products compared with synthetic drug-like molecules often limits the scaffold hopping potential of natural-product-inspired molecular design. Here we introduce a holistic molecular representation incorporating pharmacophore and shape patterns, which facilitates scaffold hopping from natural products to isofunctional synthetic compounds. This computational approach captures simultaneously the partial charge, atom distributions and molecular shape. In a prospective application, we use four natural cannabinoids as queries in a chemical database search for novel synthetic modulators of human cannabinoid receptors. Of the synthetic compounds selected by the new method, 35% are experimentally confirmed as active. These cannabinoid receptor modulators are structurally less complex than their respective natural product templates. The results of this study validate this holistic molecular representation for hit and lead finding in drug discovery.

Principle and design of pseudo-natural products

Article 03 February 2020

A general strategy for diversifying complex natural products to polycyclic scaffolds with medium-sized rings

Article Open access 05 September 2019

Natural product fragment combination to performance-diverse pseudo-natural products

Article Open access 25 March 2021

Introduction

Natural products have inspired numerous pharmacologically active lead compounds that have entered clinical trials^{1,2,3,4,5,6,7,8,9,10}. Natural products possess desirable molecular frameworks as starting points for small molecule drug discovery¹¹ as they contain larger fractions of sp³-hybridized bridgehead atoms, chiral centers and diverse pharmacophores^5,9,12. However, the majority of natural products in the Dictionary of Natural Products¹³ (DNP) do not have immediate synthetic counterparts⁵. This is partly due to a lack of dedicated research tools and methods to harvest the full potential of natural products for drug discovery, especially for designing ligands when scarce or no target structural information is available. In such a situation, molecular descriptor analysis can support early drug discovery by enabling ligand-based scaffold hopping for hit and lead finding^14,15,16.

Molecular descriptors are numerical representations computationally derived from the underlying molecular structure¹⁷. Molecular descriptors have been mainly used for reductionist representations that capture certain individual molecular features, such as fragments¹⁸ or atom/bond properties¹⁹. However, the structural differences between natural and synthetic compounds limit the scaffold hopping potential of these single-feature representations²⁰.

To this end, we have developed a novel molecular representation to transfer relevant structural and pharmacophore information encoded in natural products to synthetically accessible compounds through similarity-based approaches. In contrast to the conventional single-feature descriptors, these molecular descriptors are holistic, capturing many molecular properties, such as geometric interatomic distances, molecular shape, and the partial charge distribution. From this representation, the new Weighted Holistic Atom Localization and Entity Shape (WHALES) descriptors are obtained.

For proof-of-concept, we employ WHALES to prospectively screen a large library of commercially available compounds, using four phytocannabinoids as natural product queries. Based on this computational analysis, seven out of the twenty compounds identified modulate human cannabinoid receptors (CB₁, CB₂) with low-micromolar potencies, agonistic and antagonistic activity, and different subtype selectivity. Five out of the seven active scaffolds are novel compared to the known cannabinoid receptor ligands from ChEMBL(v23)²¹ and SureChEMBL²². These results demonstrate that WHALES descriptors capture functionally relevant molecular features and enable scaffold hopping from natural products to bioactive synthetic mimetics.

Results

WHALES descriptors

We designed the WHALES descriptors to encode information on geometric interatomic distances, molecular shape, and atomic properties in a holistic way. The respective molecular feature distributions are computed from locally centered atom distances, drawing inspiration from a recently proposed data analysis method²³. For each atom position in a three-dimensional (3D) molecular conformation, the spatial distribution of the surrounding molecular atoms is captured by a weighted atom-centered covariance matrix, which is used to normalize the interatomic distances to account for local feature distributions. The obtained interatomic distances are proportional to the remoteness of each atom from the center of local atomic distributions, measured in variance units. Additionally, to account for potential ligand-receptor interaction patterns (“pharmacophore” features), the contribution of each atom to the atom-centered covariance matrix is weighted by atomistic partial charges, as explained below.

Step 1 Atom-centered covariance matrix calculation

Let X be the matrix of the atom coordinates, containing as many rows as there are non-hydrogen atoms (n) and three columns corresponding to the 3D coordinates of each non-hydrogen atom. The distribution of atoms and their partial charges around any j-th atom is captured using an atom-centered weighted covariance matrix (S_w(j)),

$${\mathbf{S}}_{w(j)} = \frac{{\mathop {\sum}\nolimits_{i = 1}^n {\left| {\delta _i} \right|} \cdot \left( {{\mathbf{x}}_i - {\mathbf{x}}_j} \right)\left( {{\mathbf{x}}_i - {\mathbf{x}}_j} \right)^{\mathrm{T}}}}{{\mathop {\sum}\nolimits_{i = 1}^n {\left| {\delta _i} \right|} }},$$

(1)

where (x_i − x_j) are the differences between the 3D coordinates of the j-th atomic center and those of any i-th atom, while |δ_i| is the absolute value of the partial charge of the i-th atom. The atom-centered covariance is computed for any non-hydrogen atom of the molecule. The weighted covariance matrix is influenced by the density and partial charges of atoms surrounding j. In particular, S_w(j) can be thought of as an ellipsoid centered on j, whose principal axes are oriented in the three orthogonal directions of maximum atom-centered variance; the greater the variance, the longer the corresponding axis of the ellipsoid. This weighted covariance ellipsoid is influenced by (Supplementary Figure 1): (i) the distribution of the atoms surrounding j, since the ellipsoid axes are oriented in the directions of maximal molecular extension; and (ii) the distribution of the atomic properties, which causes a rotation of the atom-centered covariance ellipsoid toward the locations of high absolute partial charge (|δ_i|) densities.

Step 2 Atom-centered Mahalanobis distance calculation

From Sw(j), the atom-centered Mahalanobis (ACM) distance from the center j to any i-th atom is calculated as follows:

$${\rm ACM}(i,j) = \left( {{\mathbf{x}}_i - {\mathbf{x}}_j} \right)^{\mathrm{T}}\,\cdot \,{\mathbf{S}}_{w(j)}^{ - 1} \cdot \left( {{\mathbf{x}}_i - {\mathbf{x}}_j} \right),$$

(2)

All of the pairwise normalized interatomic distances calculated according to Eq. 2 are collected in the ACM matrix (Fig. 1c): Each i-th row of the matrix represents how the i-th atom is “globally perceived” by other atoms, while each j-th column contains the distances from atom j to all the other atoms, where j itself is the center of the molecular feature space. Thus, a column represents how an atom “globally perceives” all the remaining atoms. Atoms located in the directions of high variance will have a smaller relative distance from the atomic center than atoms located in low-variance regions, e.g., atoms residing off the main molecular axis. Due to the normalization procedure based on S_w(j), the ACM distance is dimensionless and asymmetric.

Step 3 Calculation of atomic indices

From the ACM matrix, three indices are calculated for each atom (Fig. 1c):

(1)
Remoteness (Rem), which is the ACM matrix row-average, calculated as follows:
$${\rm Rem}(j) = \frac{{\mathop {\sum}\nolimits_{i = 1}^n {{\rm ACM}(j,i)} }}{{n - 1}},$$
(3)
where n is the number of non-hydrogen atoms. Remoteness is high for atoms with large ACM distances from many atomic centers (global information);
(2)
Isolation degree (Isol), which is the ACM matrix column minimum (excluding the atomic center):
$${\rm Isol}(j) = {\mathrm{min}}_i\left( {{\rm ACM}(i,j)} \right)\quad i\,\≠ \,j$$
(4)

The isolation degree represents the distance of the j-th object from its nearest atom neighbor. The isolation degree is high for atoms located in “peripheral” regions of the molecule, i.e., atoms are surrounded by a few close atoms (local information);
(3)
Isolation-Remoteness ratio, calculated as:

$${\rm IR}(j) = \frac{{{\rm Isol}(j)}}{{{\rm Rem}(j)}}$$

(5)

The Isolation-Remoteness ratio (IR) simultaneously accounts for the local and global information of each atom, assuming high values for atoms residing off the main molecular axis (i.e., high-isolation degree) and a small relative distance from most of the atomic centers (i.e., low remoteness).

The remoteness, isolation degree values and their ratio calculated for negatively charged atoms are assigned a negative sign, as follows:

$${\rm if}\,\delta _j < 0\left\{ {\begin{array}{*{20}{l}} {{\rm Isol}(j) = - {\rm Isol}(j)} \hfill \\ {{\rm Rem}(j) = {\mathrm{ - }}{\rm Rem}(j)} \hfill \\ {{\rm IR}(j) = - {\rm IR}(j)} \hfill \end{array}} \right.$$

(6)

This procedure allows to distinguish positively and negatively charged atoms having the same values of isolation degree and remoteness.

Step 4 WHALES descriptors calculation

Because the number of calculated atomic indices depends on the number of non-hydrogen atoms of the molecule, a binning procedure is applied to obtain a fixed-length representation, enabling the straightforward comparison of molecules with different numbers of atoms. In particular, the WHALES descriptors are calculated as deciles plus minimum and maximum of (i) atomic isolation degrees, (ii) remoteness values, and (iii) isolation/remoteness ratios. Thus, each molecule is characterized by the same number of descriptors (i.e., 11 values for each atomic index, for a total of 33 descriptors), regardless of the number of atoms considered (Fig. 1d). WHALES descriptors are invariant to any roto-translation of molecular coordinates and robust to small conformational changes (Supplementary Figure 2).

For this present proof-of-concept study, Gasteiger-Marsili partial charges²⁴ and MMFF94²⁵ energy-minimized structures were used for WHALES calculations. However, the WHALES descriptors can be computed using any type of energy-minimized structures and partial charge scheme as input, e.g., quantum-chemistry derived partial charges²⁶.

Scaffold hopping from natural products

To assess the potential of WHALES for scaffold hopping from natural products, it was compared to extended-connectivity fingerprints¹⁸ (ECFPs), which represent the molecule as a set of fragments that are radially grown from each non-hydrogen atom. ECFPs are a benchmark in virtual screening campaigns²⁷ due to their widespread availability in numerous software tools, ease of calculation and intuitiveness to chemists. WHALES and ECFPs were compared to detect differences in their representation of natural products compared to synthetic compounds. To this end, we compared 210,119 entries from the DNP with a set of 3,383,942 commercially available synthetic compounds. Each DNP natural product was used as a query to rank the remaining DNP and commercial compounds on a similarity-basis, using WHALES (Euclidean distance) or ECFP (Jaccard-Tanimoto distance) descriptors. WHALES led to a statistically higher (p < 0.001, Wilcoxon signed-rank test²⁸) number of natural compound synthetic neighbors than ECFPs on average (Fig. 2a). Among the 200 nearest natural product neighbors, an average of 26% of the synthetic compounds were concentrated in the top-20 positions for WHALES, compared to 9% for ECFPs (Fig. 2b). This difference reflects the largely different chemical space representations obtained with the two descriptors (Fig. 2c). WHALES descriptors suggest synthetic compound “bridging regions” that connect clusters of natural products. In contrast, the fragment-based perception of ECFPs leads to a clear separation between synthetic compounds and natural products. This comparative study indicates that WHALES may be better suited for scaffold hopping between natural products and synthetic compounds than ECFPs. Thus, we applied WHALES to a prospective virtual screening on the Cannabinoid Receptor (CB) using natural cannabinoids as queries. Retrospective analysis of WHALES on CB actives annotated in ChEMBL showed that WHALES descriptors have a higher scaffold hopping potential on this target when compared to ECFPs (Fig. 3).

Prospective virtual screening

For prospective screening, we selected four of the most abundant active constituents of the cannabis plant (Cannabis sativa) as queries²⁹, namely (Fig. 4): (1) (-)-trans-∆⁹-tetrahydrocannabinol (THC), (2) (-)-cannabidiol (CBD), (3) (-)-cannabinol (CBO), and (4) (-)-trans-∆⁹-tetrahydrocannabivarin (THCV). 1 and 3 act as agonists or partial agonists on CB₁ and CB₂ in vitro. Compound 4 shows dose-dependent agonism on CB₂ and CB₁ in vivo, respectively, while the mechanism of action of 2 is still under debate^29,30,31. Each phytocannabinoid was used in turn to perform a similarity-based virtual screening on the commercial library, with the Euclidean distance calculated on WHALES descriptors as a ranking criterion. The compounds were sorted according to the sum of their reciprocal ranks obtained with each query. The 20 top-ranked synthetic compounds were selected and, without any additional exclusion criteria applied, tested in vitro for their modulatory activity on human CB₁ and CB₂ receptors.

The WHALES-based virtual screening protocol led to the identification of seven active compounds (35% of the selected compounds), with activity values (EC/IC₅₀ and K_B) in the low micro- or nanomolar range and different selectivity profiles (Table 1). Scaffold analysis of the core rings and atomic frameworks³² of the synthetic hits revealed that five out of the seven actives not only differ in their structure from the natural product queries, but they also possess a novel scaffold that is not contained in any of the CB actives (EC/IC₅₀ or K_i/D < 50 μM, 6188 compounds) annotated in ChEMBL²¹ or in the patent literature (SureCHEMBL)²² (Fig. 5). This result demonstrates that the WHALES method is suitable for retrieving isofunctional synthetic mimetics of bioactive natural products.

Table 1 In vitro activity of the queries and the active hits on CB₁ and CB₂

Full size table

Among the novel actives, one non-selective agonist (5, CB₁: EC₅₀ = 3.1 ± 0.5 μM; CB₂: EC₅₀ = 1.8 ± 0.6 μM) and three selective CB₁ agonists (6, EC₅₀ = 4.3 ± 0.7 μM; 7, EC₅₀ > 30 μM; 9, EC₅₀ = 1.0 ± 0.2 μM) were identified. These hits inherited the prevalent agonistic activity from the utilized natural cannabinoids with different selectivity profiles. Computational ligand docking (Fig. 6) suggests that 5 and 6 might act through similar binding poses and interaction patterns to their most similar natural-product templates according to WHALES (THCV [4] and THC [1], respectively). The non-selective antagonist 8 (CB₁: IC₅₀ = 10.1 ± 0.7 μM, K_B = 8.8 μM; CB₂: IC₅₀ = 27.0 ± 0.8 μM, K_B = 1.8 μM) and two selective CB₁ antagonists (10, IC₅₀ = 3.2 ± 0.5 μM, K_B = 0.9 μM; 11, IC₅₀ = 1.3 ± 0.2 μM, K_B = 0.2 μM) were also identified.

The similarity of the predicted binding poses of 5 and 6 to their natural product templates highlights that WHALES descriptors did indeed capture the pharmacophore of phytocannabinoids in terms of shape and partial charge distributions. At the same time, the presence of active hits with antagonistic activity and/or presumably novel receptor pocket interaction patterns demonstrate that the WHALES representation is sufficiently flexible to allow for the discovery of novel ligand-binding motifs. This is due to the “fuzziness” of the WHALES descriptors, which represent molecules by how their pharmacophore properties are distributed in 3D space without any explicit fragment, ring system or atom type information. Considering commercial building block availability, retrosynthetic analysis suggests that the bioactive hits can be prepared in three or fewer steps and are thus more easily synthetically accessible than the natural product queries (Supplementary Figure 3).

The screening library ranks obtained by considering only molecular shape (i.e., WHALES without any charge-based weighting, Eq. 1) or only charge (i.e., deciles of Gasteiger-Marsili partial charges) have a low correlation with those obtained by WHALES (Kendall rank correlation coefficient τ < |0.08|). None of the active hits were scored in the top 1000 of the screening compounds with the shape-only and charge-only descriptions. These results confirm the holistic character of WHALES descriptors, which grasp “emergent” structural features of NPs that cannot be captured by describing single aspects separately.

To assess the ability of WHALES to identify novel actives compared with existing tools, we compared the prospective virtual screening results with six common molecular descriptors (ECFP¹⁸, FeatMorgan¹¹, RDKit³³, MACCS 166³⁴, AtomPair³⁵ fingerprints, CATS¹⁵) and four pharmacophore screening protocols (MOE pharmacophore search³⁶, LigandScout³⁷ ligand-based pharmacophore search, ShAEP³⁸, UFSRAT³⁹). The virtual screening protocol was performed starting from the natural product queries (1–4) on the commercial screening library, using the benchmark methods and the same ranking protocol as in the productive WHALES run (Supplementary Note 1). None of the novel active hits discovered by WHALES were scored in the top 100 lists obtained with any of these alternative methods (Supplementary Table 2). This outcome clearly supports the use of WHALES in medicinal chemistry workflows for the discovery of novel active scaffolds.

Discussion

The results of this study demonstrate the suitability of this holistic virtual screening method for scaffold hopping from natural products to isofunctional synthetic compounds. The WHALES-based molecular representation bridges the gap between natural product and synthetic compound chemical spaces and leads to “bridging regions” of synthetic compounds that connect clusters of natural products. With 35% of the top-ranked compounds exhibiting low-micromolar in vitro activities, WHALES is at least competitive with other screening protocols. Importantly, WHALES proved suitable for retrieving novel active compounds and scaffolds that were not found by other methods for similarity searching. The cannabinoid receptor modulators obtained are structurally less complex than the natural product templates. These results clearly highlight the effectiveness of this novel holistic approach to harvest the potential of natural products by obtaining synthetically accessible, natural product-inspired bioactive compounds and to explore uncharted chemical space regions.

Methods

Compound preparation pre-processing and descriptor calculation

Compound structures were de-salted and protonated (considering a pH = 7) prior to descriptors calculation. Molecular geometry was optimized using the MMFF94²⁵ force field with 1000 iterations and 10 starting conformers for each compound; the minimum energy conformation was used for subsequent descriptor calculation. Gasteiger-Marsili²⁴ partial charges were computed using the RDKit module³³.

Preliminary analysis

Extended-connectivity fingerprints¹⁸ (ECFPs) were calculated using Dragon 7⁴⁰ (size = 1024 bit; 2 bits per pattern, length = 0–4 bonds). Prim’s⁴¹ minimum spanning trees were generated on 15,000 DNP and 15,000 commercial non-overlapping compounds, which were selected randomly. Molecular scaffolds were defined according to Bemis-Murcko³² molecular frameworks using the RDKit module³³.

Commercial library

The library was assembled from commercially available synthetic compounds from four providers: Asinex (http://www.asinex.com/libraries-html/) (Elite, Fragments, Gold and Platinum collections), ChemBridge screening compound collection (http://www.chembridge.com), Enamine advanced and HTS collections (http://www.enamine.net), and Specs screening compounds (https://www.specs.net).

Prospective screening

Phytocannabinoid structures were retrieved from the scientific literature. Structure optimization and descriptor calculation were performed as explained above. Each query was used to perform virtual screening based on Euclidean distance on Gaussian-normalized WHALES values. The virtual screening results of each commercial library compound were merged and sorted according to the sum of their reciprocal ranks on each query. The top-20 screening compounds were purchased from ChemBridge, Enamine, and Specs.

In vitro biological characterization

Screening compounds were purchased and assayed in vitro for agonism and antagonism on cannabinoid receptors CB₁ and CB₂ in functional test systems. For agonistic characterization, CHO cells over-expressing the respective human GPCR were incubated with varying concentrations of each compound for 20 min and cAMP response was quantified by homogenous time-resolved FRET (HTRF). For antagonistic characterization, varying concentrations of the test compounds in competition with a fixed agonist concentration were used. CP55940 (CB₁ agonist, EC₅₀ = 0.035 nM), WIN55212-2 (CB₂ agonist, EC₅₀ = 0.21 nM), AM281 (CB₁ antagonist, IC₅₀ = 10 nM) and AM630 (CB₂ antagonist, IC₅₀ = 0.9 µM) served as reference compounds. For each test compound concentration, a relative cAMP response compared to the respective reference compound was recorded. All experiments were independently repeated at least twice, and results were reported as the mean ± standard error. EC/IC₅₀ values were calculated from dose-response curves using a four-parameter nonlinear regression (Supplementary Figure 4). These assays were performed by Cerep (Celle-L’Evescault, France; www.eurofinsdiscoveryservices.com; assay reference numbers 1744, 1745, 1746, 1747) on a fee-for-service basis.

Docking and homology modeling

The crystal structure of human CB₁ in complex with agonist AM11542 (PDB-ID: 5XRA)⁴² was prepared for docking in MOE (v2016.0802)³⁶. Energy minimization was performed using the Amber10:EHT force field. For each ligand, 60 poses were generated, their energy was minimized using MMFF94x force field within a rigid receptor, and they were ranked by London dG score³⁶. The ten top-scoring poses were refined, re-scored using Alpha HB scoring, and visually inspected. Re-docking of the crystallized ligand led to a small RMSD value (0.39 Å). A homology model of CB₂ (UniProt ID: P34972) was obtained with MODELLER⁴³, using the prepared CB₁ structure as the template. The initial template and target alignment was obtained by Muscle⁴⁴ and then manually adjusted (Supplementary Figure 5). The ligand was retained to consider induced fit effects.

Data and code availability

The authors declare that the data supporting the findings of this study are available within the paper and its supplementary information. Python code implementing WHALES descriptors is deposited as an open source repository on GitHub (https://github.com/grisoniFr/whales_descriptors.git).

Change history

20 September 2018
The original PDF and HTML versions of this Article contained errors in Eqs. (4) and (6). In Eq. (4), the not equal symbol displayed incorrectly as #, and in Eq. (6), the greater than symbol was displayed in place of the less than symbol. This has been corrected in both the PDF and HTML versions of the Article.

References

Molinari, G. Natural products in drug discovery: present status and perspectives. in Pharmaceutical Biotechnology 13–27 (Springer, New York 2009).
Chapter Google Scholar
Koehn, F. E. & Carter, G. T. The evolving role of natural products in drug discovery. Nat. Rev. Drug Discov. 4, 206–220 (2005).
Article CAS PubMed Google Scholar
Patridge, E., Gareiss, P., Kinch, M. S. & Hoyer, D. An analysis of FDA-approved drugs: natural products and their derivatives. Drug Discov. Today 21, 204–207 (2015).
Article CAS PubMed Google Scholar
Harvey, A. L. Natural products in drug discovery. Drug Discov. Today 13, 894–901 (2008).
Article CAS PubMed Google Scholar
Lee, M. L. & Schneider, G. Scaffold architecture and pharmacophoric properties of natural products and trade drugs: application in the design of natural product-based combinatorial libraries. J. Comb. Chem. 3, 284–289 (2001).
Article CAS PubMed Google Scholar
Brown, D. G., Lister, T. & May-Dracka, T. L. New natural products as new leads for antibacterial drug discovery. Bioorg. Med. Chem. Lett. 24, 413–418 (2014).
Article CAS PubMed Google Scholar
van Hattum, H. & Waldmann, H. Biology-oriented synthesis: harnessing the power of evolution. J. Am. Chem. Soc. 136, 11853–11859 (2014).
Article CAS PubMed Google Scholar
Rodrigues, T., Reker, D., Schneider, P. & Schneider, G. Counting on natural products for drug design. Nat. Chem. 8, 531–541 (2016).
Article CAS PubMed Google Scholar
Harvey, A. L., Edrada-Ebel, R. & Quinn, R. J. The re-emergence of natural products for drug discovery in the genomics era. Nat. Rev. Drug Discov. 14, 111–129 (2015).
Article CAS PubMed Google Scholar
Grabowski, K., Baringhaus, K.-H. & Schneider, G. Scaffold diversity of natural products: inspiration for combinatorial library design. Nat. Prod. Rep. 25, 892–904 (2008).
Article CAS PubMed Google Scholar
Morrison, K. C. & Hergenrother, P. J. Natural products as starting points for the synthesis of complex and diverse compounds. Nat. Prod. Rep. 31, 6–14 (2014).
Article CAS PubMed Google Scholar
Henkel, T., Brunne, R. M., Müller, H. & Reichel, F. Statistical investigation into the structural complementarity of natural products and synthetic compounds. Angew. Chem. Int. Ed. 38, 643–647 (1999).
Article CAS Google Scholar
Dictionary of Natural products (DNP) database, v20.1. CRC Press, Taylor & Francis, http://dnp.chemnetbase.com. (2011).
Martin, E. J. & Critchlow, R. E. Beyond mere diversity: tailoring combinatorial libraries for drug discovery. J. Comb. Chem. 1, 32–45 (1999).
Article CAS PubMed Google Scholar
Reutlinger, M. et al. Chemically advanced template search (CATS) for scaffold-hopping and prospective target prediction for ‘orphan’molecules. Mol. Inf. 32, 133–138 (2013).
Article CAS Google Scholar
Grisoni, F. et al. Matrix-based molecular descriptors for prospective virtual compound screening. Mol. Inf. 36, 1600091 (2017).
Article CAS Google Scholar
Todeschini, R. & Consonni, V. Molecular Descriptors for Chemoinformatics. (Wiley VCH, Weinheim, 2009).
Book Google Scholar
Rogers, D. & Hahn, M. Extended-connectivity fingerprints. J. Chem. Inf. Model. 50, 742–754 (2010).
Article CAS PubMed Google Scholar
Bade, R., Chan, H.-F. & Reynisson, J. Characteristics of known drug space. Natural products, their derivatives and synthetic drugs. Eur. J. Med. Chem. 45, 5646–5652 (2010).
Article CAS PubMed Google Scholar
Rodrigues, T., Reker, D., Kunze, J., Schneider, P. & Schneider, G. Revealing the macromolecular targets of fragment-like natural products. Angew. Chem. Int. Ed. 54, 10516–10520 (2015).
Article CAS Google Scholar
Gaulton, A. et al. ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res. 40, D1100–D1107 (2011).
Article CAS PubMed PubMed Central Google Scholar
Papadatos, G. et al. SureChEMBL: a large-scale, chemically annotated patent document database. Nucleic Acids Res. 44, D1220–D1228 (2015).
Article CAS PubMed PubMed Central Google Scholar
Todeschini, R., Ballabio, D., Consonni, V., Sahigara, F. & Filzmoser, P. Locally centred Mahalanobis distance: a new distance measure with salient features towards outlier detection. Anal. Chim. Acta 787, 1–9 (2013).
Article CAS PubMed Google Scholar
Gasteiger, J. & Marsili, M. Iterative partial equalization of orbital electronegativity — a rapid access to atomic charges. Tetrahedron 36, 3219–3228 (1980).
Article CAS Google Scholar
Halgren, T. A. Merck molecular force field. I. Basis, form, scope, parameterization, and performance of MMFF94. J. Comput. Chem. 17, 490–519 (1996).
Article CAS Google Scholar
Finkelmann, A. R., Göller, A. H. & Schneider, G. Robust molecular representations for modelling and design derived from atomic partial charges. Chem. Commun. 52, 681–684 (2016).
Article CAS Google Scholar
Vogt, M., Stumpfe, D., Geppert, H. & Bajorath, J. Scaffold hopping using two-dimensional fingerprints: true potential, black magic, or a hopeless endeavor? Guidelines for virtual screening. J. Med. Chem. 53, 5707–5715 (2010).
Article CAS PubMed Google Scholar
Wilcoxon, F. Individual comparisons by ranking methods. Biom. Bull. 1, 80–83 (1945).
Article Google Scholar
Pertwee, R. The diverse CB1 and CB2 receptor pharmacology of three plant cannabinoids: Δ9-tetrahydrocannabinol, cannabidiol and Δ9-tetrahydrocannabivarin. Br. J. Pharmacol. 153, 199–215 (2008).
Article CAS PubMed Google Scholar
Turner, S. E., Williams, C. M., Iversen, L. & Whalley, B. J. Molecular pharmacology of phytocannabinoids. in Phytocannabinoids: Unraveling the Complex Chemistry and Pharmacology of Cannabis sativa eds. (Kinghorn, A. D., Falk, H., Gibbons, S. & Kobayashi, J.) 61–101 (Springer International Publishing, 2017).
Martínez-Pinilla, E. et al. Binding and signaling studies disclose a potential allosteric site for cannabidiol in cannabinoid CB2 receptors. Front. Pharmacol. 8, 244 (2017).
Article Google Scholar
Bemis, G. W. & Murcko, M. A. The properties of known drugs. 1. Molecular frameworks. J. Med. Chem. 39, 2887–2893 (1996).
Article CAS PubMed Google Scholar
RDKit: Open-source cheminformatics; http://www.rdkit.org (2017).
MACCS-II, MDL Information Systems Inc, San Leandro, CA, USA (1987).
Carhart, R. E., Smith, D. H. & Venkataraghavan, R. Atom pairs as molecular features in structure-activity studies: definition and applications. J. Chem. Inf. Comput. Sci. 25, 64–73 (1985).
Article CAS Google Scholar
Chemical Computing Group ULC. Molecular Operating Environment (MOE), 2013.08. Montreal, QC, Canada, H3A 2R7. (2017).
Wolber, G. & Langer, T. LigandScout: 3-D pharmacophores derived from protein-bound ligands and their use as virtual screening filters. J. Chem. Inf. Model. 45, 160–169 (2005).
Article CAS PubMed Google Scholar
Vainio, M. J., Puranen, J. S. & Johnson, M. S. ShaEP: molecular overlay based on shape and electrostatic potential. J. Chem. Inf. Model. 49, 492–502 (2009).
Article CAS PubMed Google Scholar
Shave, S. et al. UFSRAT: ultra-fast shape recognition with atom types–the discovery of novel bioactive small molecular scaffolds for FKBP12 and 11βHSD1. PLoS ONE 10, e0116570 (2015).
Article CAS PubMed PubMed Central Google Scholar
Kode srl. Dragon version 7.0.6, 2016, https://chm.kode-solutions.net (2016).
Prim, R. C. Shortest connection networks and some generalizations. Bell Labs Tech. J. 36, 1389–1401 (1957).
Article Google Scholar
Hua, T. et al. Crystal structures of agonist-bound human cannabinoid receptor CB1. Nature 547, 468–471 (2017).
Article CAS PubMed PubMed Central Google Scholar
Fiser, A. & Šali, A. Modeller: generation and refinement of homology-based protein structure models. Methods Enzymol. 374, 461–491 (2003).
Article CAS PubMed Google Scholar
Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

The authors thank Petra Schneider and Cyrill Brunner for technical support. This research was financially supported by the Swiss National Science Foundation (grant no. IZSEZ0_177477 to G.S.). D.M. was supported by an ETH Zurich Postdoctoral Fellowship (grant no. 16-2 FEL-07).

Author information

Authors and Affiliations

Department of Chemistry and Applied Biosciences, Swiss Federal Institute of Technology (ETH), Vladimir-Prelog-Weg 4, CH-8093, Zurich, Switzerland
Francesca Grisoni, Daniel Merk, Jan A. Hiss & Gisbert Schneider
Department of Earth and Environmental Sciences, University of Milano-Bicocca, piazza della Scienza 1, IT-20126, Milano, Italy
Francesca Grisoni, Viviana Consonni, Sara Giani Tagliabue & Roberto Todeschini

Authors

Francesca Grisoni
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Merk
View author publications
You can also search for this author in PubMed Google Scholar
Viviana Consonni
View author publications
You can also search for this author in PubMed Google Scholar
Jan A. Hiss
View author publications
You can also search for this author in PubMed Google Scholar
Sara Giani Tagliabue
View author publications
You can also search for this author in PubMed Google Scholar
Roberto Todeschini
View author publications
You can also search for this author in PubMed Google Scholar
Gisbert Schneider
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

F.G. and G.S. designed the study. F.G. developed WHALES with the support of V.C., R.T., and G.S. and performed the analysis. F.G., G.S., J.A.H., and D.M. analyzed and discussed the results. D.M. curated the experimental results. F.G. performed the docking study with the support of S.G.T.; S.G.T. developed the homology model. F.G. and G.S. wrote the manuscript. All authors contributed to manuscript revision and approved the final version.

Corresponding author

Correspondence to Gisbert Schneider.

Ethics declarations

Competing interests

G.S. declares a potential financial conflict of interest in his role as life science industry consultant and cofounder of inSili.com GmbH, Zurich. All the remaining authors declare no competing interests.

Additional information

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Grisoni, F., Merk, D., Consonni, V. et al. Scaffold hopping from natural products to synthetic mimetics by holistic molecular similarity. Commun Chem 1, 44 (2018). https://doi.org/10.1038/s42004-018-0043-x

Download citation

Received: 29 March 2018
Accepted: 10 July 2018
Published: 08 August 2018
DOI: https://doi.org/10.1038/s42004-018-0043-x

This article is cited by

Artificial intelligence for natural product drug discovery
- Michael W. Mullowney
- Katherine R. Duncan
- Marnix H. Medema
Nature Reviews Drug Discovery (2023)
Structural derivatization strategies of natural phenols by semi-synthesis and total-synthesis
- Ding Lin
- Senze Jiang
- Qingsong Shao
Natural Products and Bioprospecting (2022)
Identifying new topoisomerase II poison scaffolds by combining publicly available toxicity data and 2D/3D-based virtual screening
- Anna Lovrics
- Veronika F. S. Pape
- Gergely Szakács
Journal of Cheminformatics (2019)
Automated de novo molecular design by hybrid machine intelligence and rule-driven chemical synthesis
- Alexander Button
- Daniel Merk
- Gisbert Schneider
Nature Machine Intelligence (2019)
Scaffold-Hopping from Synthetic Drugs by Holistic Molecular Representation
- Francesca Grisoni
- Daniel Merk
- Gisbert Schneider
Scientific Reports (2018)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.