Introduction

Visceral cestodiases are caused by the larval forms of parasites of the Cestoda class (cestodes or tapeworms). They constitute an important health and economic burden, with cases detected in all continents1. Disadvantaged socioeconomic segments of society are the most affected by these diseases and serious human morbidity is associated with them2. It is estimated that the global losses resulting from these infections in terms of disability-adjusted life years (DALYs) are equivalent to those of some of the most well-known Neglected Tropical Diseases (NTDs), such as Chagas disease, trypanosomiasis and dengue2. Given their importance in public human health, echinococcoses and cysticercoses, caused by metacestodes from Echinococcus spp. and Taenia solium, respectively, were included as priorities for prevention, treatment and control in the World Health Organization list of NTDs (https://www.who.int/teams/control-of-neglected-tropical-diseases).

In humans, larval cestodiases like echinococcoses and cysticercoses are often asymptomatic or have nonspecific symptoms, that depend on the infected organ, the size and location of the cysts and the way they may affect adjacent tissues or organs3,4,5,6. Current therapeutic approaches for these diseases are complex and present several limitations and risks. For instance, for echinococcoses, invasive and risky surgical approaches are often required. For both echinococcoses and cysticercoses, chemotherapy is limited to the use of a small number of drugs, mostly benzimidazoles derivatives. These drugs have limited effects, being mostly parasitostatic, and their high required doses in treatments may often give rise to severe side effects4,5,7,8. Therefore, more research, that expand therapeutic possibilities, are necessary.

Sequencing of cestode genomes has offered new perspectives for the search of novel and more effective drugs for the treatment of visceral cestodiases9. Cestodes present genomes which along their evolution have lost several genes in different metabolic pathways as a possible adaptation to parasitic life10. This has generated metabolic vulnerabilities that pave the way for the development of new treatments targeting many potential therapeutic targets11,12. Such targets may serve also for the repurposing of drugs as novel anthelminthics. Drug repurposing is highly efficient, has low-cost, and offers less risk, especially in the case of drugs already approved and commercially available for human treatment13.

Classic biochemical studies, as well as genomic studies of endoparasitic Platyhelminthes, i.e. cestodes and trematodes (flukes), indicate that these organisms have deficient pathways for the de novo biosynthesis of sterols and fatty acids14,15,16. Despite of that, the survival of these organisms still depends on the presence of sterols, especially cholesterol, which they need to uptake from their host environment17. In this context, the Niemann-Pick C1 protein (NPC1) plays a key role, promoting cholesterol uptake in these organisms.

NPC1 was initially characterized as an intracellular cholesterol-trafficking protein in humans18. Mutations in the NPC1 gene cause the Niemann–Pick type C disease, which named these proteins. A second human related protein, the Niemann-Pick C1-Like 1 (NPC1L1) protein, is responsible for cholesterol uptake in intestinal cells and is associated with cholesterol homeostasis19. Human NPC1 and NPC1L1 share high sequence identity and are structurally very similar, with both presenting an N-terminal luminal domain (NTD), a middle luminal domain (MLD) and a C-terminal luminal domain (CTD), in addition to 13 transmembrane domains where is located the putative sterol-sensing domain (SSD)20,21. The three luminal domains (NTD, MLD and CTD) form the cholesterol delivering tunnel, through which cholesterol is transported.

Human NPC1L1 has been described as a pharmaceutical target for the control of high blood cholesterol22. It is inhibited by the drug ezetimibe, which interacts to a pocket formed by NTD, MLD and CTD, inside the cholesterol delivering tunnel, blocking the passage of cholesterol through the protein. Ezetimibe repurposing as an antiparasitic drug has been suggested and tested for the treatment of leishmaniasis, with promising results23. This raises the possibility of repurposing this drug for the treatment of other parasitic diseases, such as visceral cestodiases, targeting the worm cholesterol uptake pathway.

In this work, NPC1-related proteins were identified in cestodes and, along with their human reference orthologs, were used to search and retrieve orthologs from a wide range of organisms, including species from virtually all Animalia kingdom. The evolutive history of NPC1-related proteins was inferred through phylogenetic analyses. Darwinian selection analyses were performed to investigate possible functional divergence. The two NPC1-related proteins from the cestode Echinococcus multilocularis and human NPC1L1 were modeled and comparatively analyzed. Moreover, their interactions with ezetimibe were predicted by molecular docking. Based on the evolutive and structural conservation between a cestode NPC1 and human NPC1L1, its potential as target for therapeutic interventions in visceral cestodiases is discussed.

Results

Identification of NPC1 orthologs annotated in public databases

Paralog NPC1 sequences from Echinococcus multilocularis (EmuJ_001107700 and EmuJ_001107950) and from Hymenolepis microstoma (HmN_003003490 and HmN_003003500), previously annotated as such, were downloaded from the WormBase ParaSite 15.0, and were used as probes to search for orthologs from other tapeworms. Two ortholog NPC1 sequences were recovered for Mesocestoides corti (MCU_005849_RA and MCU_005850_RB), E. granulosus (EgrG_001107700 and EgrG_001107950), Echinococcus canadensis (EcG7_07448 and EcG7_07485), Taenia multicepis (Tm5G011420 and Tm5G011422), Taenia saginata (TSAs00030g04723 and TSAs00030g04725) and Hymenolepis diminuta (HDID_0000769801 and HDID_0000850401). Three ortholog NPC1 sequences were recovered for Taenia asiatica (TASs00067g06239, TASs00067g06241 and TASs01471g12258) and Taenia solium (TsM_000003800, TsM_000245600 and TsM_001088500).

All probe and recovered tapeworm NPC1 ortholog sequences were revised based on available RNA-seq data. As shown in Fig. 1, most of the tapeworm NPC1 sequences presented lengths between 878 to 1392 aa. However, eight of them (EcG7_07448, TASs00067g06239, TASs00067g06241, TASs01471g12258, Tm5G011420, TSAs00030g04725, TsM_000245600, TsM_001088500) presented noticeable differences in length and/or in domain composition. Among these orthologs, four (TASs00067g06239, TASs01471g12258, TsM_000245600, and TsM_001088500) missed some domains found in all other NPC1 ortholog sequences, with two of them (namely TASs00067g06239, and TsM_001088500) being shorter than expected, with lengths from 494 to 534 aa, respectively. These four NPC1 genes have their corresponding nucleotide sequences in the end of the assembled contigs, being likely incomplete, and, therefore, were excluded from further analyses. On the other hand, the other four (namely EcG7_07448, TASs00067g06241, Tm5G011420 and TSAs00030g04725) were longer than expected, with lengths from 1679 to 1795 aa. These longer orthologs, presented extra domains in their N-terminal ends, corresponding to domains found in a protein encoding gene located upstream to NPC1 gene in other related species. Assuming that this was due to misannotation, the sequence from the extra domains were excluded, while the remaining (corrected) sequences of these four longer orthologs were kept for further analyses (see Fig. 1).

Fig. 1
figure 1

Domain composition of cestodes and human NPC1 orthologs. Full-length proteins after correction based on RNA-seq data are represented by horizontal black lines, with indication of their N-terminal and C-terminal amino acids; original amino acid numbering prior to the correction is shown between angle brackets, for those proteins (marked by an asterisk) that required correction due to the presence of extra domains in the N-terminus of the sequence (see text). Predicted InterPro matches are represented by colored bars, with NPC1_N domains shown in blue; SSD domains shown in green; NPC1-like family (IPR004765) in red; and protein patched/dispatched family (IPR003392) shown in magenta. Corresponding WormBase IDs are listed on the right. Domain representations marked with asterisk (*) indicate the presence of extra N-terminal domains, which were excluded from our analyses.

Cestode NPC1 ortholog sequences, along with reference human NPC1 and NPC1L1 proteins (O15118 and Q9UHC9, respectively), comprehending an overall set of 20 NPC1 sequences, were functionally analyzed for family and domain predictions, and the generated results are summarized in Fig. 1. Two characteristic domains of NPC1, namely the NTD (NPC1_N, Pfam: PF16414) and the SSD (Pfam: PF12349), were predicted for all assessed orthologs. These results suggest that the 18 cestode sequences may be true NPC1 orthologs providing reliability to our initial dataset. Further corroborating that, subcellular localization analyses (see Supplementary Table S1) performed for NPC1A and NPC1B cestode orthologs have essentially predicted their association to the cytoplasmic membrane, as expected for true cholesterol transporting proteins.

In line with the functional predictions, a local NPC1 database with 3114 proteins was built based on the presence of the NTD for following evolutive analyses. This NPC1 database was filtered, keeping the sequences with lengths between 1000 and 1500 amino acids, and with ≥ 30% identity with at least one of the reference NPC1 proteins (i.e., those from E. multilocularis, H. microstoma, or H. sapiens). Overall, a filtered subset of 1142 non-redundant NPC1 sequences was established, representing 15 phyla from Animalia kingdom, and the Plantae and Fungi kingdoms.

Phylogenetic analyses

The established filtered subset of 1142 non-redundant NPC1 sequences was aligned based on protein structure information, and the resulting alignment was analyzed and edited to exclude non-informative sites. As shown in the Supplementary Fig. S1, the performed analyses and edition decreased the overall length of the alignment in 79.75%, based on a threshold that excluded sites with less than 90% of informative characters. The final alignment of NPC1 sequences, post exclusion of non-informative sites, was then compared to the original dataset retrieved from HMMER (3114 sequences), as well to the filtered subset of non-redundant NPC1 sequences (1142 sequences). The comparison between the identity profiles of the six reference NPC1 sequences (two from E. multilocularis, two from H. microstoma and two from H. sapiens) (Supplementary Fig. S2) demonstrated the representativeness of the resulting final alignment against the precursor datasets. The established final alignment of NPC1 sequences were then used to generate the corresponding nucleotide retroalignment and they were then used for the subsequent evolutionary analyses.

The best fit evolutionary models for our datasets were determined as WAG + I + G + F, for the final amino acid aligned subset, and as GTR + I + G, for the nucleotide retroalignment. The construction of phylogenetic trees was carried out using the maximum likelihood method, and branches with support lower than 0.8 were collapsed. Two good quality trees, one based on amino acid alignments, and one based on nucleotide alignments, were reconstructed. Virtually the same phylogenetic relations, with well resolved clades, were observed in both trees at the kingdom and phylum levels (Supplementary Fig. S3) and a representative overview of these established relationships based on the amino acid tree is shown in Fig. 2. Most of the NPC1 sequences were resolved in well-defined monophyletic clades at the phyla. An apparent monophyletic clade, of basal position in the tree, was formed by NPC1 sequences from the Plantae and Fungi kingdoms and was regarded as an outgroup. Within the large clade comprehending the NPC1 sequences from the Animalia kingdom, two similar paraphyletic clades (regarding lower taxonomic levels) were formed for the Chordata phylum, with the Echinodermata NPC1 sequences appearing basal to both. The organization of Chordata NPC1 sequences (and those from Echinodermata) in two clades separated their NPC1 and NPC1L1 sequences. The two species of Annelida included in our analysis grouped separately and unspecifically with other phyla, likely due to the weak Annelida sampling representation in our dataset.

Fig. 2
figure 2

Overall architecture of the NPC1 amino acid based phylogenetic tree. (A) Cladogram showing the overview of the predicted clades at the phylum and kingdom levels. Branches with support value below 0.8 were collapsed. Total numbers of sequences (leaves) in each represented taxon are indicated. Branch lengths are arbitrary. (B) Phylogram, with branch lengths drawn to scale, representing the complete NPC1 amino acid tree topology, with branches colored according to the tree in A.

The representation of the phylogenetic relationships within the Platyhelminthes clade can be seen in Fig. 3. Free-living species are more basal in this clade. The trematode species formed a monophyletic clade, but it did not present a well-resolved relationship with the other Platyhelminthes subclades. Two clades were formed for the cestode species, each essentially formed by one of the two paralogous NPC1 sequences found in these species. The cestode groups were named NPC1A and NPC1B as shown in Fig. 3. These results suggest a duplication event in a common ancestor of cestodes, resulting in two paralogous proteins that may undergone some degree of functional specialization.

Fig. 3
figure 3

Phylogenetic relationships within the Cestoda clade. The amino acid based phylogeny followed the WAG + I + G + F evolutionary model, and the nucleotide based phylogeny followed the GTR + I + G evolutionary model. Branches with support value below 0.8 were collapsed. The inferred NPC1A and NPC1B subclades are colored accordingly to the legend. WormBase IDs are shown between brackets, and UniProtKB ids between square brackets. Trematoda and free-living cestodes clades relationships are also indicated.

Selection pressure analyses NPC1 orthologs

It would be expected that the existence of functional divergence between NPC1A and NPC1B would be the consequence of different selective pressures acting on them since duplication. To investigate this selection pressures, several methods from Datamonkey server were used. The BUSTED method provided evidence of the occurrence of diversifying selection in the cestode NPC1A and NPC1B clades. For both clades the method predicted that at least one site has undergone diversifying selection. To identify which NPC1 sites are under selection, three different statistical methods were used to analyze the whole tree: FUBAR, SLAC and FEL. Overall, 1151 out of 1157 NPC1 sites were predicted as under purifying selection by FUBAR, while the other two methods predicted 1156 sites in this condition, being the remaining ones under neutral selection. No evidence of diversifying selection was pointed out by any of the used methods.

The FEL method, which allows the analysis of individual clades, was then used to separately analyze the whole Platyhelminthes clade, as well as the cestode NPC1A and NPC1B subclades. The FEL results are summarized in Fig. 4. In the whole Platyhelminthes clade, the performed FEL analysis predicted one site under diversifying selection, and 944 under purifying selection. In the NPC1A subclade, 6 sites were predicted as under diversifying selection, and 803 sites were predicted as under purifying selection. Finally, in the NPC1B subclade, 9 sites were predicted as under diversifying selection, and 721 were predicted as under purifying selection.

Fig. 4
figure 4

Darwinian selection analyses through Datamonkey tools. (A) Representation of whole alignment generated by PROMALS3D to NPC1 sequences. Excluded regions are shown in grey, while retained ones are in red. Final edited alignment used in our analyses, containing only informative sites, is represented below. (B) Semi-circular view of FEL charts, showing synonymous (α) and non-synonymous estimated rates (β) at each site as bars. The plotted line indicates the estimates under the null model (α = β). Estimates above 5 are censored at this value. Predicted type of evolutive selection pressures are also indicated for each site, accordingly to the legend. The FEL charts related to four analyzed groups are represented, namely the whole tree (ALL), Platyhelminthes clade (PLATY), cestode NPC1A subclade (NPC1A) and cestode NPC1B subclade (NPC1B). Alignment retained sites from (A) are represented by the red circle, and predicted domains are shown. Internal connections in the red circle represents coevolving interactions between sites of the alignment, predicted by Bayesian Graphical Model (BGM). SSD, sterol-sensing domain; Ptc/Disp, protein patched/dispatched family. The semi-circular view was generated with Circos 0.69–9 software package (https://circos.ca).

The BGM method was used to verify the occurrence of co-envolving sites in the whole NPC1 amino acid tree. As also shown in Fig. 4, 492 pairs of sites were found under coevolution by the performed analysis. However, it was not possible to identify a clear correlation between these sites and the different functional domains predicted for the aligned sequences. The RELAX method was then used to evaluate relaxation of the predicted selective strengths (diversifying and purifying) for NPC1 sequences. In the comparison between the NPC1B subclade against the NPC1A as a reference subclade, the performed analysis indicated relaxation in the selections acting on the NPC1B (K = 1.16, with p = 0.05 and LR = 3.85), indicating that NPC1A is under the effect of stronger selective forces. These two subclades were also individually tested against the trematode clade as reference, but, in both comparisons (NPC1A vs. trematodes, and NPC1B vs. trematodes), there were no statistically significant prediction of relaxation (p < 0.05). The NPC1A subclade was then tested against the whole NPC1 phylogenetic tree as reference, and, again, there was no statistically significant prediction of relaxation (p < 0.05). On the other hand, the comparison of the NPC1B subclade against the whole NPC1 amino acid tree indicated relaxation of the predicted selective strengths acting on NPC1B sequences (K = 0.82, with p = 0.005 and LR = 7.89).

Overall, these results suggest that the proteins of the NPC1A and NPC1B subclades are under different selective pressures, corroborating the separation of these two subclades in the NPC1 tree topology and suggesting functional specialization.

Comparative structural analyses between E. multilocularis NPC1A and NPC1B, and human NPC1L1

To investigate whether the proteins of the NPC1A and NPC1B subclades underwent functional divergence, we performed comparative structural analyses. We took the E. multilocularis NPC1A and NPC1B (henceforth EmNPC1A and EmNPC1B) proteins as references for these structural analyses. Among available structures in PDB, the NPC1L1 structure of Rattus norvegicus (PDB ID: 6V3H21; henceforth RnNPC1L1) was one of the few that presented the whole protein structure, and it was also conveniently experimentally determined in complex with an ezetimibe analog. It is also in agreement with previous results24 that demonstrated that the interactions with Phe532 and Met543 are critical to the high-affinity binding of ezetimibe with NPC1L1. Moreover, it presented enough homology with the EmNPC1A (89% of query cover and 37.75% of identity) and EmNPC1B (94% of query cover and 34.43% of identity), as well as with the human NPC1L1 (henceforth HsNPC1L1; 98% of query cover and 76.36% of identity), being suitable to use as a template for homology modelling with good quality according to MolProbity validation (Supplementary Data 35). Overall, the models generated for EmNPC1A and EmNPC1B and for HsNPC1L1, adopted the expected NPC1 folding (Fig. 5). As expected, deviations in the structures were seen in the cytosolic moieties of EmNPC1A and EmNPC1B, which showed a higher degree of disorder. Such deviations were likely caused by two small structural gaps in RnNPC1L1 and a third larger stretch corresponding to an insertion in the E. multilocularis proteins, with fewer amino acids in EmNPC1B (Asn697 to Leu729), and longer and more evident in EmNPC1A (Val686 to Leu758). This additional stretch could be identified in the sequences of all cestode NPC1A and NPC1B orthologs included in our phylogenetic analyses.

Fig. 5
figure 5

Homology modelling generated structures. Overall comparison of the structures indicated that all models adopted the expected fold. The cytosolic moieties of E. multilocularis structures were more unstructured than expected, mainly because of an insertion in this region that is consistent through all cestode sequences included in this work. Structure of the R. norvegicus template (RnNPC1L1) and the generated structures through modelling to HsNPC1L1 and EmNPC1A and EmNPC1B are shown in ribbon and surface representations. Ribbon colors indicate the three luminal domains: In blue the N-terminal domain, in green the middle luminal domain and in pink the C-terminal luminal domain. Cytosolic and transmembrane domains are colored in cyano. The membrane region is represented by the dotted highlighted box.

To assess the degree of conservation of the cholesterol delivering tunnel among generated models, cross-sections views focusing on this feature were generated (Fig. 6). The tunnel structure was very conserved in HsNPC1L1 model in comparison to the RnNPC1L1 template, with minor local structural variations. The EmNPC1B model presented less conservation, with variations distributed along the whole tunnel structure in comparison to the same template. The HsNPC1L1 and EmNPC1B models, presented both a continuous delivering tunnel, in structures that seem to be functional, with enough inner space to cholesterol transport. The EmNPC1A model, however, presented major changes, with an interruption of the delivering tunnel that could restrain cholesterol transport. MOLEonline was used to measure the tunnel radius in this region (Supplementary Fig. S4), confirming a significant alteration in EmNPC1A (EmNPC1A radius: 1.6 Å, EmNPC1B radius: 3.6 Å, HsNPC1L1 radius: 3.8 Å) This interruption occurred in the same region of the expected ezetimibe binding site and is apparently caused mainly by two amino acid substitutions among the proteins. The first substitution is in the N-terminal domain of the proteins, being a change from an alanine in RnNPC1L1 template and in HsNPC1L1 model (A180 in both), to a serine in EmNPC1B model (S165) and to a phenylalanine in EmNPC1A model (F148). The second one is in the C-terminal luminal domain, being a change from an alanine in RnNPC1L1 template, in the HsNPC1L1 model (A1032 in both) and also in EmNPC1B model (A1066), to a glutamine in the EmNPC1A model (Q1079). Therefore, in both cases EmNPC1A undergoes substitutions of simple and small amino acids for larger and more complex ones, which would explain the observed tunnel interruption. The same substitution pattern was observed for NPC1A and NPC1B of most of the cestodes included in our analyses (data not shown). These results are suggestive that EmNPC1A has undergone some degree of functional divergence when compared to the EmNPC1B, HsNPC1L1 and RnNPC1L1 proteins, possibly losing its ability to transport cholesterol.

Fig. 6
figure 6

Conservation of the cholesterol delivering tunnel of NPC1 models. The cross-section visualization of the NPC1 proteins allows us to analyze the tunnel whereby cholesterol molecules are transported through, from the extracellular environment to the plasma membrane. The RnNPC1L1 template and the three modelled NPC1 whole structures are shown in ribbons and surface representation, with the cholesterol delivering tunnel structure showed in detail. The ezetimibe analog present in RnNPC1L1 experimental determined template structure is shown, and its location is indicated in the modelled NPC1 protein structures. HsNPC1L1 and EmNPC1B generated model structures presented an apparently functional delivering tunnel, while EmNPC1A model presented an interruption in it, that may restrain the cholesterol transportation.

Molecular docking analyses were than carried out to verify whether ezetimibe could interact with EmNPC1A, EmNPC1B, HsNPC1L1 proteins. The RnNPC1L1 template, with structure already determined in a complex with an ezetimibe analog (Fig. 7A) was used as reference. The HsNPC1L1 docking with ezetimibe produced a very similar interaction pose (Fig. 7B) to that of the RnNPC1L1 template, with an affinity prediction of − 10.189 kcal/mol. Most of the amino acids involved in the interaction with ezetimibe in RnNPC1L1 complex (9 out of 12) were also observed interacting with the ligand in the resulting HsNPC1L1-ezetimibe docked complex, including the interactions with Phe532 and Met543, which have been described as critical to the high-affinity between RnNPC1L1 and ezetimibe.

Fig. 7
figure 7

Molecular interactions between ezetimibe and NPC1 models submitted to docking. NPC1 proteins are shown in ribbon representation, with ezetimibe, ezetimibe analog and protein interacting amino acids shown as sticks. (A) Interactions between RnNPC1L1 template, shown in cyano, and the ezetimibe analog, shown in yellow, as available on the PDB databank (PDB ID: 6V3H). Interactions between the NPC1 protein and regions of the analog that are not present in the ezetimibe molecule were omitted. (B) Interactions between HsNPC1L1 protein, shown in blue, and the docked ezetimibe molecule, shown in green (affinity prediction = − 10.189 kcal/mol). Interactions are very conservated in respect to the template (A). (C) Interactions between EmNPC1B protein, shown in pink, and the docked ezetimibe, in green (affinity prediction = − 9.376 kcal/mol). Less interactions were observed in this complex, and it presented some relevant amino acids changes when compared to the template (A) and human (B) complexes. A simplified alignment of the sequence, predicted secondary structures, and interactions showed in the regions depicted in this figure is shown in Supplementary Fig. S5.

The docking of EmNPC1B with ezetimibe also resulted in a similar interaction pose (Fig. 7C), with an affinity prediction of − 9.376 kcal/mol. However, fewer ezetimibe-amino acid interactions were observed, but still sharing 7 out of the 12 interactions observed in the complexes with the RnNPC1L1 template or HsNPC1L1. Also, some changes were observed in the interacting amino acid. The interacting amino acids Met543, Ala548 and Asn1022 in the RnNPC1L1 template changed to a Leu510, an Ile515 and an Asp1057 in EmNPC1B-ezetimibe complex, respectively. This could have some impact in the affinity between EmNPC1B and ezetimibe, since, as previously mentioned, the Met543 from the RnNPC1L1 template is known to be critical in the interaction with ezetimibe. Moreover, the also critical RnNPC1L1 Phe532 has no counterpart in the EmNPC1B-ezetimibe complex.

It was not possible to perform ezetimibe docking with the EmNPC1A, as its ezetimibe binding site was blocked by the amino acids that interrupt the cholesterol delivering tunnel.

Discussion

Chemotherapeutic treatments for cestodiases are based on benzimidazole and/or praziquantel administration. However, these drugs have some disadvantages, including poor intestinal absorption, variable efficacy, large individual differences of action and side effects25,26. Moreover, drug pipelines for cestodiases and other NTDs have ran dry, without any new chemical entity being recently approved for treatment of these diseases 27. In this context, the search for new safer and more effective treatment plan for cestodiases is urgently needed.

Drug repurposing has emerged as a valuable strategy for accelerating the drug development process, decreasing research costs and accelerating time to market. This approach bypasses stages of the research process, exploring innovative ways of repurposing existing drugs that have already been approved for use or are in advanced pre-clinical stages13. The repurposing of well-established drugs is a promising alternative for antiparasitic drug development, with several chemical compounds been tested as drugs against a wide array of helminths12. Studies involving drug repurposing to treat cestodiases have been conducted, for instance, with nitazoxanide28, amphotericin B29, itraconazole30, benzimidazole derivatives31 and hydroxyurea32.

The cholesterol uptake process is an interesting target for new anthelmintics, as Platyhelminthes lack the de novo biosynthesis of sterols and fatty acids14,15. This deficiency has also been described for some nematodes33 and for free-living platyhelminths34, indicating that it may be ubiquitous among helminths. In this context, the NPC1 proteins, responsible for cellular cholesterol uptake, are of vital importance for maintaining cholesterol homeostasis in helminths and, as such, appear as potential targets for drug repurposing strategies. This potential is corroborated by the facts that NPC1 orthologs are expressed in different developmental stages of cestode parasites, including the larval pathogenic ones (Supplementary Table S2), and that the human NPC1 ortholog (HsNPC1L1) has an inhibitor drug (ezetimibe) already approved and available in the market for the treatment of hypercholesterolemia35. Thus, in this paper we assessed the potential of repurposing ezetimibe for the treatment of visceral cestodiases and other helminth infections. The conducted in silico analyses focused on NPC1 evolutive and structural conservation among parasites and the human host, as well in the assessment of ezetimibe as a ligand for cestode NPC1 orthologs.

In humans, there are two proteins from the NPC1 family. Despite sharing considerable similarity in terms of amino acid sequences, topology, and cholesterol binding, they exhibit significant differences from each other36. The HsNPC1L1 protein is located mainly in the plasma membrane, and it is responsible for the uptake of extracellular cholesterol19. Human NPC1 (HsNPC1), on the other hand, is in endosomal/lysosomal membranes and is involved in the intracellular trafficking of cholesterol37. Furthermore, HsNPC1 relies on the protein NPC2 to mediate cholesterol binding, whereas HsNPC1L1 does not exhibit this dependence, interacting directly with the cholesterol molecule36.

Although studies in the nematode Caenorhabditis elegans have demonstrated that its NPC1 orthologs can be functionally replaced by both HsNPC1 and HsNPC1L138, inhibitors are often selective for one protein or another. Ezetimibe, for instance, exhibits activity on HsNPC1L1 but not on HsNPC139. When retrieving the cestodes NPC1 sequences from WormBase ParaSite database, we consistently found two ortholog genes of the NPC1 family in each of the assessed cestode genomes. Thus, it was necessary to better understand how these two cestode NPC1 proteins relate to HsNPC1 or HsNPC1L1.

In our phylogenetic analyses, two separated and not related groups could be seen within the phylogenetic tree. One of them comprehending all HsNPC1-related sequences, and the other, all HsNPC1L1-related sequences. Both clades presented similar topologies, with Chordata sequences forming a monophyletic clade, and a single Echinodermata sequence grouping basal to it. This pattern suggests that the duplication event that gave rise to these paralogs occurred prior to the separation between chordates and echinoderms. None of these clades, however, presented clear phylogenetic relationships to cestode NPC1 sequences, with no direct correspondence between the two cestode NPC1 paralogs and HsNPC1 or HsNPC1L1. The monophyletic clade formed by cestode sequences, however, indicated a clear separation between the NPC1 paralogs found in each species, giving rise to two similar subclades (named NPC1A and NPC1B clades). This separation pattern was not observed in non-cestode platyhelminths. Therefore, we can infer that a duplication event within the cestode lineage and not shared with trematodes and other platyhelminths gave rise to the NPC1A and NPC1B paralogues.

Our phylogenetic tree indicated occurrences of this type of duplication in NPC1 genes in a wide range of organisms. For instance, the topology of the clades of Insecta, Arachnida and Rosids (Plantae-Angiosperms-Eudicots intern clade) suggest other points in the evolutionary history of NPC1 gene where similar duplication events have occurred. Studies in Drosophila had already demonstrated the existence of two NPC1 orthologs in these organisms40, which also do not show a direct correlation with HsNPC1 or HsNPC1L1, corroborating our results. Nevertheless, it was demonstrated that, as in humans, the drosophila paralogs exhibit functional divergence41. Cestodes, however, tend to present smaller and simplified genomes10, raising the questions of why these organisms would maintain two NPC1 paralogs and whether they have undergone any functional divergence.

The performed selective pressure analyses provided additional information on how evolution is acting on cestode NPC1A and NPC1B proteins, providing further evidence of functional divergence between them. It was shown that the selective pressure pattern consistently diverged between NPC1A and NPC1B, with the former being more similar to the pattern observed for the platyhelminth clade. Furthermore, we found that selective pressures in the NPC1A clade appear to be as effective and intense as those expected for the whole tree, while the NPC1B clade presented a relaxation of these pressures in comparison to the NPC1A clade, to the trematode clade or to the whole tree. The relaxation of selective pressures has been described as a necessity for evolutionary innovation in genes42. Therefore, we were able to demonstrate that NPC1A and NPC1B are under the influence of different selective pressures and are likely to exhibit some degree of functional divergence.

At a first glance, the selective pressure analyses suggested a functional conservation for NPC1A and raised the possibility of variations or innovations for NPC1B. However, our structure analyses pointed to the opposite direction. Despite good agreement of the models generated for the E. multilocularis NPC1 proteins with the HsNPC1L1 model, the EmNPC1A model exhibited more relevant variations than the EmNPC1B model. The interruption of the cholesterol delivering tunnel in EmNPC1A is a determinant structural change, that implies the loss of its canonical function. Moreover, the insertion of regions in its cytosolic moiety, consistently found among all analyzed cestode proteins, includes two amino acids sites that are predicted to be under diversifying selection pressures. This is suggestive of a possible new function for cestode NPC1A, that remains to be elucidated. Consequently, cestode NPC1B, that keeps the cholesterol delivering tunnel functional, would have to be the responsible for the cholesterol uptake in cestodes. Furthermore, we can conclude that both cestode NPC1A and NPC1B are undergoing some degree of functional diversification, where NPC1A would be ceasing to perform its canonical function and would be undergoing a process of functional innovation, while NPC1B would promote cholesterol uptake, but also undergoing a process of functional variation, as indicated by our selective pressures analyses.

As the interruption in the cholesterol delivering tunnel occurs closely to the region of interaction with the ezetimibe molecule, NPC1A is no longer a likely target of this drug in cestodes. On the other hand, EmNPC1B predicted interactions with ezetimibe presented a quite similar affinity to its human ortholog, with some specific variations mainly caused by amino acids changes. These variations may offer opportunities for future modifications to the ezetimibe molecule for the design of novel and more effective inhibitors specific for the parasite target, enhancing treatment efficacy.

Regarding dosages, ezetimibe treatment protocols for hypercholesterolemia are designed to be effective in the intestine43. For repurposing for the treatment of visceral cestodiases, higher doses may be required for systemic exposure to the drug. Since high dosages of ezetimibe can be tolerated by humans without clinical and laboratory adverse events, such adjustments for the use against cestode extraintestinal infections are likely viable.

Conclusion

NPC1 proteins are key components of the cholesterol trafficking system, which is a vital pathway for cestodes and other parasitic helminths, as they are unable to synthetize cholesterol endogenously. Therefore, NPC1 proteins are good candidate targets for therapeutic intervention. In this work, a diverse range of in silico methods was used to assess the potential of cestode NPC1 proteins as targets for the repurposing of a NPC1 inhibiting drug (ezetimibe) as a therapeutic alternative against visceral cestodiases. The performed phylogenetic analysis provided, as far as we are aware of, the most comprehensive NPC1 phylogenetic tree built so far, allowing to assess the degree of evolutive conservation among NPC1 proteins from different phyla and also provided evidence of a duplication event in the cestode clade that gave rise to a pair of paralogs (NPC1A and NPC1B) of the NPC1 protein family, not directly related to other duplications observed in other phyla. Selective pressures analyses and structural analyses provided evidence of functional divergence processes affecting both cestode NPC1A and NPC1B. Modelling of ezetimibe in complex with E. multilocularis NPC1A and NPC1B, and comparison with the human-NPC1L1 complex showed that cestode NPC1A presents a partially blocked cholesterol delivering tunnel, that would impair the cholesterol transport as well the interaction with ezetimibe. On the other hand, cestode NPC1B structure points out to its full functionality as a cholesterol transporter, as well as to its susceptibility to ezetimibe inhibition. The predicted interaction between ezetimibe and a cestode NPC1B consolidates the idea of this protein as a target for the repurposing of ezetimibe as a novel drug to the treatment of visceral cestodiases and other helminthiases. Overall, the in silico results presented here pave the way for future functional studies of cestode NPC1 proteins, and experimental assessment and confirmation of ezetimibe potential anthelmintic effects on cestodes and other parasitic flatworms in vitro and in vivo.

Methods

Identification of NPC1 orthologs

The initial search for NPC1 orthologs were based on sequences of E. multilocularis and H. microstoma proteins annotated as such in the WormBase ParaSite (parasite.wormbase.org). NPC1 ortholog proteins from other tapeworms species were also identified and downloaded from the same database. Possible errors in the annotation of cestode NPC1 sequences were corrected by comparisons with cognate E. multilocularis and H. microstoma transcripts recovered from RNA-seq data available in the ArrayExpress database (E-ERAD-50 and E-ERAD-236, respectively and assembled de novo by Paludo et al. [unpublished data]. Supporting data are available in the BioStudies database [http://www.ebi.ac.uk/biostudies] under accession numbers E-ERAD-50 and E-ERAD-236). Reference human NPC1 and NPC1L1 sequences were obtained from UniProtKB (uniprot.org).

The InterProScan tool44 from the InterPro online server (ebi.ac.uk/interpro) was used to analyze the domain composition of NPC1 sequences from E. multilocularis, H. microstoma and H. sapiens, taken as references in this step. The BUSCA server45 was used for predictions of subcellular localization.

The HMM profiles of the domains identified in consensus between the species were obtained from the Pfam database (pfam.xfam.org) and were used to perform exhaustive searches for proteins in UniProtKB databases, using the hmmsearch tool46, available from the HMMER server (ebi.ac.uk/Tools/hmmer). A local database was then constructed with the resulting proteins, using the following filters: 30% identity to at least one reference sequence (human NPC1 and NPC1L1, and those from E. multilocularis, and H. microstoma), and length between 1000 and 1500 amino acids. Additionally, redundant sequences, with 100% identity to each other, were merged into single entries. Correspondent nucleotide sequences were then retrieved from the following databases: ENA (ebi.ac.uk/ena), WormBase ParaSite (parasite.wormbase.org), Ensembl (ensembl.org), and NCBI (ncbi.nlm.nih.gov/nuccore).

Evolutive analyses

The PROMALS3D alignment method47, implemented on its online server (prodata.swmed.edu/promals3d/) was used to obtain the alignment of our sequence dataset. The alignment was then inspected using an authorial Python script (Supplementary Data S1) to identify non-informative sites, which were manually excluded. A second authorial Python script (Supplementary Data S2) was used to determine the representativeness of the resulting final aligned dataset in comparison to the previous ones, namely the original HMMER-generated dataset and the filtered subset used for the initial alignment.

TranslatorX48 was used to obtain the retroalignment for correspondent nucleotides sequences, and ModelTest-NG49 was used to determine the most adequate evolutionary model for the protein and nucleotide sequences. Phylogenetic analyses were performed by the maximum likelihood method with PhyML 3.050, and branch supports were determined by aLRT method with SH-like interpretation51. Phylogenies had their branches collapsed using a support cutoff of 0.8, with TreeCollapse (emmahodcroft.com/TreeCollapseCL).

Several tools from the Datamonkey online server (datamonkey.org/) were used to analyze generated phylogenies and alignments. The branch-site unrestricted statistical test for episodic diversification (BUSTED) method52 was initially used to verify sites with diversifying selection in the proteins. Variations in selection pressure were analyzed with the Fast, Unconstrained Bayesian AppRoximation (FUBAR53), Single-Likelihood Ancestor Counting (SLAC54) and Fixed Effects Likelihood (FEL54) methods in the phylogenetic trees. FEL was also used to analyze trees isolated clades. Coevolution interactions between codons were analyzed with the Bayesian Graphical Model (BGM) tool55. RELAX 42 was used to detect relaxed selection in phylogenetic trees.

Structural analyses

The SWISS-MODEL server (https://swissmodel.expasy.org) was used to perform the homology modeling of NPC1 proteins. A 3.50 Å resolution structure of NPC1L1 from Rattus norvegicus (PDB ID: 6V3H), determined by cryo–electron microscopy was used as template for modelling both E. multilocularis NPC1 proteins, and human NPC1L1, as well. The rat NPC1L1 was selected as a template because it has high identity cestode and human orthologs, good structural resolution, and is so far the only full-length NPC1 structure determined in complex with ezetimibe, in addition. The visualization and images of the resulting structure were generated in PyMol 2.5.4 (https://pymol.org/). Resulting models were validated with the Structure Assessment tool from SWISS-MODEL server, where geometry and stereochemical properties were assessed through Ramachandran plots and MolProbity version 4.456 analysis. Structural comparisons were made among generated models and template. MOLEonline57 was used to detect, analyze and measure the cholesterol delivering tunnel.

All generated models were submitted to molecular docking with DockThor server (https://dockthor.lncc.br/v2/). Ezetimibe structure was obtained from PubChem (Compound CID 150,311) and refined in Avogadro 1.2.0 58 with Steepest Descent algorithm and MMFF94s force field. Rotation bonds were enabled in all ligand, and soft docking option was selected. Grid boxes were defined visually, with a discretization value of 0.24 Å. Remaining parameters were left at default. Resulting docking positions were analyzed by comparison with the R. norvegicus template, containing an ezetimibe analog. Interactions between ezetimibe and modeled NPC1 proteins were determined with LigPlot + v.2.259.