Computational simulation and target prediction studies of solubility optimization of decitabine through supercritical solvent

Alshahrani, Saad M.; Almutairy, Bjad K.; Alfadhel, Munerah M.; Belal, Amany; Abourehab, Mohammed A. S.; Saqr, Ahmed Al.; Alshetaili, Abdullah S.; Venkatesan, Kumar; Alsubaiyel, Amal M.; Pishnamazi, Mahboubeh

doi:10.1038/s41598-022-21233-0

Download PDF

Article
Open access
Published: 07 November 2022

Computational simulation and target prediction studies of solubility optimization of decitabine through supercritical solvent

Saad M. Alshahrani¹,
Bjad K. Almutairy¹,
Munerah M. Alfadhel¹,
Amany Belal^2,9,
Mohammed A. S. Abourehab^3,4,
Ahmed Al. Saqr¹,
Abdullah S. Alshetaili¹,
Kumar Venkatesan⁵,
Amal M. Alsubaiyel⁶ &
…
Mahboubeh Pishnamazi^7,8

Scientific Reports volume 12, Article number: 18875 (2022) Cite this article

1158 Accesses
4 Citations
1 Altmetric
Metrics details

Subjects

Abstract

Computational analysis of drug solubility was carried out using machine learning approach. The solubility of Decitabine as model drug in supercritical CO₂ was studied as function of pressure and temperature to assess the feasibility of that for production of nanomedicine to enhance the solubility. The data was collected for solubility optimization of Decitabine at the temperature 308–338 K, and pressure 120–400 bar used as the inputs to the machine learning models. A dataset of 32 data points and two inputs (P and T) have been applied to optimize the solubility. The only output is Y = solubility, which is Decitabine mole fraction solubility in the solvent. The developed models are three models including Kernel Ridge Regression (KRR), Decision tree Regression (DTR), and Gaussian process (GPR), which are used for the first time as a novel model. These models are optimized using their hyper-parameters tuning and then assessed using standard metrics, which shows R²-score, KRR, DTR, and GPR equal to 0.806, 0.891, and 0.998. Also, the MAE metric shows 1.08E−04, 7.40E−05, and 9.73E−06 error rates in the same order. The other metric is MAPE, in which the KRR error rate is 4.64E−01, DTR shows an error rate equal to 1.63E−01, and GPR as the best mode illustrates 5.06E−02. Finally, analysis using the best model (GPR) reveals that increasing both inputs results in an increase in the solubility of Decitabine. The optimal values are (P = 400, T = 3.38E + 02, Y = 1.07E−03).

Intelligence computational analysis of letrozole solubility in supercritical solvent via machine learning models

Article Open access 17 September 2024

Optimization of tamoxifen solubility in carbon dioxide supercritical fluid and investigating other molecular targets using advanced artificial intelligence models

Article Open access 24 January 2023

Experimental and thermodynamic modeling decitabine anti cancer drug solubility in supercritical carbon dioxide

Article Open access 13 January 2021

Introduction

Production of nanomedicine has been a topic of first-rate hobby in pharmaceutical vicinity due to their importance significance for improving the drug efficacy. Nanosized drug production is one of the strategies to decorate drug solubility due to the expanded surface area to nanosized which therefore outcomes in an enhancement in drug solubility^1–3. Given that, maximum newly discovered drugs are poorly soluble in aqueous media, underpinning studies is required to secorate the solubility of drugs thru one of the kind strategies such as nanonization, amorphous solid dispersion, crystallization, and salt formation^4,5,6,7,8.

Drug nanonization has been used as an appealing technique for optimization of drugs thru solubility enhancement withinside the body. Supercritical solvents are extra appealing techniques due to the advanced characteristics of this process for preparation of nanosized drug particles^{3,6,9,10,11,12}. The use of superficial CO₂ as the secure solvent in pharmaceutical has been accredited by authorities. Moreover, there are some advantages of supercritical CO₂ method like low price, easy operation, and moderate supercritical factor as compared to different gases. Therefore, supercritical CO₂ is a great choice to be used as green solvent for preparation of nanomedicine in pharmaceutical area^{7,13,14,15,16}.

Prior to implementing the nanonization process based on supercritical technology, first drug dissolution has to be analyzed in the solvent. Determination of drug solubility in supercritical process can be done via either experimental approach or computational, by which the computational method is more attractive^17,18. Applying the experimental technique needs extensive time and cost for the analysis, while computational methods can save time and cost of the experiments, and they can be used for interpolation and extrapolation of the data^{19,20,21,22,23}.

Different computational techniques have been utilized for the modeling of drug solubility, but the thermodynamic model and machine learning have shown better performance. The thermodynamic model establishes equilibrium between the solid phase and the solvent phase to determine the value of solubility^{19,24,25,26,27,28}. On the other hand, machine learning models need measured data for training and validation of the algorithms^29,30. The methods of machine learning have shown to be easier to implement and offer better accuracy for prediction of drug solubility in solvents.

Machine learning (ML) as a subject in artificial intelligence is a set of techniques to understand the patterns of data with no any suppostions regarding to the structure. One of these strategies' strengths is creating a relation among data and, then estimate some interaction. An important application of machine learning is regression, that could be defined as a specific type of problem in this study^31,32,33,34. In this research, three approaches have been chose to make approaches on the drug solubility. Accepted methods in this research are Kernel Ridge Regression (KRR), Decision Tree Regression (DTR), and Gaussian Process Regression (GPR). Indeed, we implemented these efficient ML models for the first time for simulation of decitabine solubility in supercritical CO₂ as the solvent. The results can help to assess the applicability of supercritical process for this drug candidate to be prepared in nanosized scale³⁵.

Ridge regressions and the kernel approach are used in the Kernel Ridge Regression (KRR). KRR has the advantage of capturing nonlinear relationships, avoiding regression over-fitting problems through regularization and kernel techniques³⁵.

A decision tree regressor (DTR) is a straightforward, comprehensible, and efficient approach. The core principle of the decision tree algorithm has been distributed a large problem within multiple smaller sub-problems, it can be lead to an easier-to-interpret respond^36,37. A DTR demonstrates a set of conditional queries ordered hierarchically and requested from the tree's root to the leaf³⁸. DTRs are easy to understand and have a clear structure. DTRs produce a trained predictor, be able to express principles, and forecast new datasets using the splitting procedures, which is repeated^39,40.

The other employed model of this study is based on the Gaussian process (GP) statistical concept, which refers to a group of random variables, as some of them are distributed with Gaussian distributions^41,42. In geostatistics, the Gaussian process is the fundamental stochastic process. Gaussian processes directly represent Gaussian data and the base for non-Gaussian models such as linear regression models. As a result, Gaussian processes regression based on GPs is both accurate and straightforward for small datasets with high generality^43,43,45. Additional target prediction studies of decitabine will be conducted in this research work to get better insights about the different plausible targets for this drug. We will use a hybrid approach in this study through combining both binding and ligand similarity analysis to predict other putative targets of decitabine. The purpose of this research is modifying the solubility of decitabine and GPR has been selected as the best model and reveals that increasing both inputs roughly increase the solubility of drug. So, the optimal is (P = 400, T = 3.38E + 02, Y = 1.07E-03).

Data set

To make models on solubility, we used a dataset with 32 data points identical to the reference^46,47. Indeed, the experimental data have been collected from the reference and the machine learning models were fitted and implemented on the data. More detailed description of the method and experimental conditions can be found in the source published paper in⁴⁶. Here, two inputs are considered, Pressure (bar) and Temperature (K), and a single output that shows the solubility of Decitabine drug in the supercritical carbon dioxide (CO₂). The entire dataset is shown in Table 1.

Table 1 The whole dataset^46,47.

Full size table

Methodology

Kernel ridge regression (KRR)

The first machine learning (ML) method which is considered here for correlation of drug solubility values is the method of Kernel Ridge Regression (KRR). Suppose a data set ${\left\{({x}_{i},{y}_{i})\right\}}_{i=1}^{N}$ has been provided which is include $N$ data points, and the goal is to estimate a function can analysis the mean squared error (MSE) of [${(f\left(x\right)-y)}^{2}$]. The conditional mean ${f}^{*}\left(x\right) : ={\mathbb{E}}[Y|X=x]$ has been illustrated as the best function⁴⁸. In order to estimate the function ${f}^{*}$,

$$\widehat{f}:=\underset{f\in H}{\mathrm{argmin}}\left\{\frac{1}{N}\sum_{i=1}^{N}{\left(f\left({x}_{i}\right)-{y}_{i}\right)}^{2}+ \leftthreetimes {\left|\left|f\right|\right|}_{H}^{2}\right\}$$

(1)

This equation can predict the kernel ridge regression³⁵.

Decision tree regression (DTR)

A regression tree or decision trees regressor³⁸ uses data from simulation inputs and outputs to create a structure that can be a leaf (terminal node), illustrating a estimation value, or an internal node (decision node), indicating some query to be performed on an input, with a branch and child for each possible output of the query. For continuous inputs, two options are available based on whether the condition is true or not. The structure of the data is declared at every node of the regression tree. To estimate the output for an unobserved data point, the inputs of that data point are employed to traverse the decision tree until a terminal node is seen. The estimated value is decided according to the output values from the training set ending up at that terminal node⁵¹.

An impurity measure for each node of the tree's test is decided by reviewing all input feature and obtaining an optimal split that maximizes the measure. MSE can be calculated by formulating the split A as follows for a particular input⁵²:

$$MSE\left(A\right)={p}_{L}.s\left({t}_{L}\right)+{p}_{R}.s\left({t}_{R}\right)$$

(2)

Here, t_L and tR denote the set of instances. Also, s(t) indicates the standard deviation of the N(t) data, c_i, of instances within t:

$$s(t)=\sqrt{\frac{1}{N(t)}\sum_{i=1}^{N(t)}\left({c}_{i}-{\overline{c\left(t\right)}}^{2}\right)}$$

(3)

Here, $\overline{c(t)}$ is the average of the values in t. The split that minimizes mean square error across all input features for instances at each node of the regression tree is used at each node. Overfitting can occur in tree-based algorithms if the data is split too finely^53,54,52.

Gaussian process regression (GPR)

Successor models, such as the Gaussian process (GP), provide predictions as well as the degree of uncertainty associated with those predictions. A GP is a group of random variables with the same Gaussian distribution for any quantity of variables⁴². GPs can be assumed as an infinite-dimensional buildup of multivariate Gaussian distributions. N- instance training data can be considered a singular data point taken from an N-variate Gaussian distribution; thus, it can be matched to the Gaussian process. Typically, the average of this Gaussian Process is reserved to zero.

We describe GP⁵⁵ using a one-dimensional problem with an N-instance training set, [x_i | i = 1,2,…,N] and the corresponding output values y = [y₁, …, y_N]. We use the same notation as in the previous sections to describe GP for a one-dimensional problem for ease of exposition. Two instances x_i and x_j in the training set are related to each other through the covariance function k(x_i, x_j). The squared-exponentiation function is employed here⁵⁶:

$$k\left({x}_{i}, {x}_{j}\right)={\sigma }_{f}^{2}\mathrm{exp}\left(\frac{-{\left({x}_{i}, {x}_{j}\right)}^{2}}{2{l}^{2}}\right)$$

(4)

where ${\sigma }_{f}^{2}$ the maximum allowable covariance and l is a length parameter that controls the extent of influence of each point. ${\sigma }_{f}^{2}$ Should be set to a large value for functions covering a broad range of values. In condition that data points x_i and x_j are close to each other, their output values are highly correlated, but if they are far away, then the value at one point does not influence the value at the other point. Accordingly, the hyper-parameter l determines the smoothness of the interpolation.

Assume we desire to employ the training data to estimate the output at an unseen data point x_∗. Since the results be able toe depicted as an instance through a multivariate Gaussian distribution:

$$\left[\genfrac{}{}{0pt}{}{y}{{y}_{*}}\right]=\mathrm{ N}\left(0, \left[\genfrac{}{}{0pt}{}{K}{{K}_{*}}\genfrac{}{}{0pt}{}{{K}_{*}^{T}}{{K}_{**}}\right]\right)$$

(5)

y denotes the output variable correlated to the N training data points, y_∗ shows the estimated production at x_∗ and the following sub-matrices:

$$K=\left[\begin{array}{llll}k\left({x}_{1},{x}_{1}\right)& k\left({x}_{1},{x}_{2}\right)& \cdots & k\left({x}_{1},{x}_{N}\right)\\ .& .& .& .\\ \vdots & \vdots & \vdots & \vdots \\ k\left({x}_{N},{x}_{1}\right)& k\left({x}_{N},{x}_{2}\right)& \cdots & k\left({x}_{N},{x}_{3}\right)\end{array}\right]$$

(6)

And:

$${K}_{*}=\left[\begin{array}{llll}k({x}_{*},{x}_{1})& & \cdots & k({x}_{*},{x}_{N})\end{array}\right]$$

The probability of y_∗, which is, the output at a data point, is formulates as:

$${k}_{**}=k({x}_{*},{x}_{*})$$

The variance indicates the degree of uncertainty in the estimate:

$$\mathrm{var}({y}_{*})={K}_{**}-{K}_{*}{K}^{-1}{K}_{*}^{T}$$

The parameters l and σ_f of the Gaussian process regressor can be computed from the training subset using a maximum likelihood method. It is also feasible to incorporate a Gaussian noise component in the output variable, however we have supposed that the noise is zero in our current research.

Prediction of decitabine putative targets

Decitabine Smiles were generated via PubChem (https://pubchem.ncbi.nlm.nih.gov/compound/Decitabine) then we feed the smiles into The LigTMap server (https://cbbio.online/LigTMap/?action=home) to identify the plausible targets from seventeen target classes and more than six thousands of different types of proteins.

Results and discussion

Analysis of model outcomes

The three abovementioned models were implemented to the collected dataset to build the models for the drug solubility. The hyper-parameters of the models we introduced were optimized using Grid-Search⁵⁷. More than 1000 distinct combinations were used to get these ideal parameters for each model. Then, the models were tested in their ideal configurations, and their performance was evaluated.

Three traditional statistical metrics will be used to assess and compare the efficiency of each model, such as R² and Mean Absolute Error (MAE) and MAPE. In order to calculate each of the statistics, a mathematical equation must be used⁵²:

$${R}^{2}=\frac{{({\sum }_{i=1}^{\text{n}}({Y}_{i,m}-{\overline{Y}}_{i,m})({Y}_{i,o}- {\overline{Y}}_{i,o}))}^{2}}{{\sum }_{i=1}^{\text{n}}{({Y}_{i,m}-{\overline{Y}}_{i,m})}^{2}{\sum }_{i=1}^{\text{n}}{({Y}_{i,o}- {\overline{Y}}_{i,o})}^{2}}$$

(7)

$$MAE=\frac{1}{n}\sum_{i=1}^{\text{n}}\left|{Y}_{i,m}-{Y}_{\text{i,o}}\right|$$

(8)

In these equations, n is size of data set, Y_i,m is the estimated value, Y_i,o indicates actual (observed) value. As well, ${\overline{Y}}_{i,m}$ is the average of estimated values and ${\overline{Y}}_{i,o}$ indicates average of actual values. A comparison among the estimated amounts and the real (observed) amounts in the model training is shown in Figs. 1, 2, and 3 for the methods of KRR, DTR, and GPR, respectively. The red dots indicate the test data, the blue dots are the training data (estimated amounts), and the green line represents the real amounts. Comparing these three shapes clearly shows the higher generality in the GPR method in comparison to other methods. The statistical results of the comparison for all methods have been also demonstrated in Table 2. As it is clear, all methods have great capability in fitting and correlating the experimental data which indicate that these models are of great choice for application in production of nanomedicine using supercritical based technology. The best outputs are illustrated for GPR through R² higher than 0.99 in order to fit the solubility results.

Table 2 The statistical results of all models used in this study.

Full size table

The validated GPR method as the significant method has been applied to calculate the solubility data and find the influence of temperature and the pressure on the solubility of Decitabine in supercritical CO₂. The results of 3D surface plot are explained in Fig. 4, the impact of temperature and pressure on the solubility of decitabine are significant, so that the highest value of solubility is observed at the maximum values of T and P in the 3D graph (see Fig. 4). The increase in the solubility with temperature and pressure could be attributed to the change of solvent density and consequently changing the solvation capacity of the solvent. Also, the 2D graphs of solubility versus temperature and pressure are indicated in Figs. 5 and 6, respectively. The optimum values calculated using the GPR model are listed in Table 3.

Table 3 Optimized parameters using the GPR method.

Full size table

Figure 7 aims to evaluate the impact of pressure on the solubility values of Decitabine at disparate temperatures. As indicated, there is a cross over area at each solubility figure. Indeed, the impact of temperature on drug solubility in SC-CO₂ is paradoxical. Furthur, temperature's growth, influence on the sublimation pressure of drugs, causes the increment of solubility. In another side, increase the temperature results in decreasing the molecular compaction and as the result, the amount of SC-CO2 density, which has negative effect on the solubility of Decitabine. The pressure value of 18 bar is known as the cross over pressure. At the pressures between 12 and 18 bar, the negative effect of density deterioration entirely overcomes the desirable effect of vapor pressure increment. Moreover, at this range of pressures, temperature enhancement lead to a reduction in solubility. Above the cross over pressure (18 MPa), the solubility of Decitabine significantly enhances owing to the superiority of the positive effect of drug’s vapor pressure than the negative effect of density reduction. Therefore, at this pressure elevation of temperature improves the solubility.

Correlation of the solubility data with semi-empirical models

Figure 8a–d present the correlation outcomes of Decitabine-SC-CO₂ system obtained by semi-empirical models. In this investigation, four principal semi-empirical density-based models (Sodeifian et al., K-J, Bartle et al. and Bian et al.) were pondered for the correlation of the experimental data of Decitabine solubility SC-CO₂^{58,56,57,58,62}. Disparate values including settable parameters (a₀, a₁, a₂, a₃, _a4 and a₅), average absolute relative deviation (AARD%) and R² are enlisted in Table 4. The AARD for developed models for Sodeifian et al., Bartle et al. and Bian et al. models were 12.15%, 11.61%, 14.46%, and 13.25%, respectively. Comparison of the results implies the fact that K-J model is the best model due to presenting the lowest value of AARD (11.61%).

Table 4 Correlation outcomes of Decitabine-SC-CO₂ system obtained by various semi-empirical models.

Full size table

Table 4 presents the correlation outcomes of Decitabine-SC-CO₂ system obtained by semi-empirical models.

Additionally, through usage of LigTMap server we have found more than one hundred predicted targets for decitabine, these targets are classified according to disease target class into kinase 34 (29%), 30 (25%) transferase, 28 (24%) Hydrolase, 15 (13%) tuberculosis, 5 (4%) Hpyroli, 3 (2.5%) Influenza and 1 (0.8%) Beta secretase. Attached with this research work a supplementary data file that contains a list for the targets with docking scores in the binding sites of the specified proteins. Also, Pdb IDs for each specific protein are incorporated, the optimum binding of decitabine with these target proteins and binding mode, in addition to predicted affinity and docking scores all are obtained through the automated workflow of LigTMap. The obtained results (supplementary data) revealed that decitabine has ligand Similarity Score more than 0.6 with Deoxycytidine kinase and Thymidylate kinase TMK, target classes are kinase and tuberculosis with Pdb ID 3ipx and 1w2g respectively. Decitabine showed binding affinity more than 7 in disease target class Influenza, target name is Polymerase basic protein 2 (Pdb IDs: 5efc, 4or6 and 4q46). Decitabine showed the best docking score into Thymidylate kinase binding site with value equal −7.007 kcal/mol, Pdb ID: 1mrs in tuberculosis disease class. From these obtained results we can conclude that Thymidylate kinase (tuberculosis) and Polymerase basic protein 2 (Influenza) are plausible targets for decitabine, the following Fig. 9 illustrates the binding mode with Thymidylate kinase (tuberculosis).

Four semi-empirical models (Sodeifian et al., Bartle et al., K-J and Bian et al.) have been considered to make a correlation with the outputs of solubility experiments. The precision for all applied methods has analysed and measured through AARD% and R². Comparison of the outputs implies the fact that K-J model is the best model due to presenting the lowest value of AARD (11.61%). Despite good efficiency of K-J model for the accurate prediction of drug solubility, the employed GPR model shows better performance compared to K-J owing to having higher value of R².

Conclusion

Computational simulation of Decitabine drug solubility in supercritical carbon dioxide was carried out in this study via three different machine learning models. We used a dataset of 32 data points and two inputs in this investigation to create solubility models (P and T). In this dataset, Y (solubility, mole fraction) is the lone result which is predicted by the models. Kernel Ridge Regression (KRR), Decision Tree Regression (DTR), and Gaussian process (GP) are the models which were employed in this work for correlation of the solubility data. Hyper-parameter tweaking was used to fine-tune these models, and standard metrics were used to assess their performance. KRR, DTR, and GPR have R²-scores of 0.806, 0.891, and 0.998. MAE's error rate is 1.08E−04, 7.40E−05, and 9.73E−06 in that sequence, too. The MAPE measure has a KRR error rate of 4.64E−01, a DTR error rate of 1.63E−01, and a GPR error rate of 5.06E−02 as the optimum option. As a conclusion, the best model (GPR) shows that increasing both inputs roughly raise the output. So, the best outcome is obtained as P = 400 bar, T = 3.38E + 02 °K, Y = 1.07E−03. Finally, LigTMap workflow revealed the promiscuity of decitabine to target Thymidylate kinase (disease class: tuberculosis) and Polymerase basic protein 2 (disease class: influenza). In this paper, the solubility value of Decitabine was evaluated at disparate values of pressure (120, 160, 200, 240, 280, 320, 360 and 400 bar) and temperatures (308, 318, 328, and 338 K). Four semi-empirical models (Sodeifian et al., Bartle et al., K-J and Bian et al.) have been considered to make a correlation with the outputs of solubility experiments. The precision of all applied methods has been evaluated through AARD% and R². Comparison of the outputs implies the fact that K-J model is the best model due to presenting the lowest value of AARD (11.61%). Despite good efficiency of K-J model for the accurate prediction of drug solubility, the employed GPR model shows better performance compared to K-J owing to having higher value of R².

Data availability

All data are available within the published paper.

Abbreviations

ML:: Machine learning
P:: Pressure
T:: Temperature
KRR:: Kernel ridge regression
GPR:: Gaussian process regression
DTR:: Decision tree regression
MSE:: Mean squared error
MAE:: Mean absolute error
AARD:: Average absolute relative deviation
SC-CO₂ :: Supercritical CO₂

references

Pishnamazi, M. et al. Evaluation of supercritical technology for the preparation of nanomedicine: etoricoxib analysis. Chem. Eng. Technol. 44, 559–564 (2021).
Article CAS Google Scholar
Khoshmaram, A. et al. Supercritical process for preparation of nanomedicine: Oxaprozin case study. Chem. Eng. Technol. 44, 208–212 (2021).
Article CAS Google Scholar
Sodeifian, G., Razmimanesh, F., Ardestani, N. S. & Sajadian, S. A. Experimental data and thermodynamic modeling of solubility of Azathioprine, as an immunosuppressive and anti-cancer drug, in supercritical carbon dioxide. J. Mol. Liq. 299, 112179 (2020).
Article CAS Google Scholar
Shaikh, R., Shirazian, S. & Walker, G.M. Application of artificial neural network for prediction of particle size in pharmaceutical cocrystallization using mechanochemical synthesis. Neural Comput. Appl. (2021).
Zabihi, S., Esmaeili-Faraj, S. H., Borousan, F., Hezave, A. Z. & Shirazian, S. Loxoprofen solubility in supercritical carbon dioxide: Experimental and modeling approaches. J. Chem. Eng. Data 65, 4613–4620 (2020).
Article CAS Google Scholar
Zabihi, S. et al. Thermodynamic study on solubility of brain tumor drug in supercritical solvent: Temozolomide case study. J. Mol. Liq. 321, 114926 (2021).
Article CAS Google Scholar
Hazaveie, S. M., Sodeifian, G. & Sajadian, S. A. Measurement and thermodynamic modeling of solubility of Tamsulosin drug (anti cancer and anti-prostatic tumor activity) in supercritical carbon dioxide. J. Supercrit. Fluids 163, 104875 (2020).
Article CAS Google Scholar
Zhuang, W., Hachem, K., Bokov, D., Javed Ansari, M. & TaghvaieNakhjiri, A. Ionic liquids in pharmaceutical industry: A systematic review on applications and future perspectives. J. Mol. Liq. 349, 118145 (2022).
Article CAS Google Scholar
Zhao, Z. et al. Multi support vector models to estimate solubility of Busulfan drug in supercritical carbon dioxide. J. Mol. Liq. 350, 118573 (2022).
Article CAS Google Scholar
Operti, M. C. et al. PLGA-based nanomedicines manufacturing: Technologies overview and challenges in industrial scale-up. Int. J. Pharmaceut. 605, 120807 (2021).
Article CAS Google Scholar
Penoy, N., Grignard, B., Evrard, B. & Piel, G. A supercritical fluid technology for liposome production and comparison with the film hydration method. Int. J. Pharmaceut. 592, 120093 (2021).
Article CAS Google Scholar
Campardelli, R., Baldino, L. & Reverchon, E. Supercritical fluids applications in nanomedicine. J. Supercrit. Fluids 101, 193–214 (2015).
Article CAS Google Scholar
Sodeifian, G., Sajadian, S. A. & Ardestani, N. S. Determination of solubility of Aprepitant (an antiemetic drug for chemotherapy) in supercritical carbon dioxide: Empirical and thermodynamic models. J. Supercrit. Fluids 128, 102–111 (2017).
Article CAS Google Scholar
Sodeifian, G. & Sajadian, S. A. Solubility measurement and preparation of nanoparticles of an anticancer drug (Letrozole) using rapid expansion of supercritical solutions with solid cosolvent (RESS-SC). J. Supercrit. Fluids 133, 239–252 (2018).
Article CAS Google Scholar
Sodeifian, G., Razmimanesh, F. & Sajadian, S. A. Prediction of solubility of sunitinib malate (an anti-cancer drug) in supercritical carbon dioxide (SC–CO₂): Experimental correlations and thermodynamic modeling. J. Mol. Liq. 297, 111740 (2020).
Article CAS Google Scholar
Pishnamazi, M. et al. Measuring solubility of a chemotherapy-anti cancer drug (busulfan) in supercritical carbon dioxide. J. Mol. Liq. 317, 113954 (2020).
Article CAS Google Scholar
Esfandiari, N. & Sajadian, S. A. Experimental and modeling investigation of Glibenclamide solubility in supercritical carbon dioxide. Fluid Phase Equilib. 556, 113408 (2022).
Article CAS Google Scholar
Sodeifian, G., Sajadian, S.A., Razmimanesh, F. & Hazaveie, S.M. Solubility of the Ketoconazole (an Antifungal Drug) in Supercritical Carbon Dioxide and Menthol as a Cosolvent (Ternary System): Experimental Data and Empirical Correlations. (2021).
Sodeifian, G., Sajadian, S. A., Razmimanesh, F. & Ardestani, N. S. A comprehensive comparison among four different approaches for predicting the solubility of pharmaceutical solid compounds in supercritical carbon dioxide. Korean J. Chem. Eng. 35, 2097–2116 (2018).
Article CAS Google Scholar
Sodeifian, G., Razmimanesh, F., Sajadian, S. A. & Hazaveie, S. M. Experimental data and thermodynamic modeling of solubility of Sorafenib tosylate, as an anti-cancer drug, in supercritical carbon dioxide: Evaluation of Wong-Sandler mixing rule. J. Chem. Thermodyn. 142, 105998 (2020).
Article CAS Google Scholar
Suleiman, D., Estévez, L. A., Pulido, J. C., García, J. E. & Mojica, C. Solubility of anti-inflammatory, anti-cancer, and anti-HIV drugs in supercritical carbon dioxide. J. Chem. Eng. Data 50, 1234–1241 (2005).
Article CAS Google Scholar
Pishnamazi, M. et al. Thermodynamic modelling and experimental validation of pharmaceutical solubility in supercritical solvent. J. Mol. Liq. 319, 114120 (2020).
Article CAS Google Scholar
Sodeifian, G., SaadatiArdestani, N., Sajadian, S. A., Golmohammadi, M. R. & Fazlali, A. Prediction of solubility of sodium valproate in supercritical carbon dioxide: Experimental study and thermodynamic modeling. J. Chem. Eng. Data 65, 1747–1760 (2020).
Article CAS Google Scholar
Sodeifian, G., SaadatiArdestani, N., Razmimanesh, F. & Sajadian, S. A. Experimental and thermodynamic analyses of supercritical CO₂-solubility of minoxidil as an antihypertensive drug. Fluid Phase Equilibr. 522, 112745 (2020).
Article CAS Google Scholar
Sodeifian, G., Sajadian, S. A. & Derakhsheshpour, R. Experimental measurement and thermodynamic modeling of Lansoprazole solubility in supercritical carbon dioxide: Application of SAFT-VR EoS. Fluid Phase Equilibr. 507, 112422 (2020).
Article CAS Google Scholar
Sodeifian, G., Detakhsheshpour, R. & Sajadian, S. A. Experimental study and thermodynamic modeling of Esomeprazole (proton-pump inhibitor drug for stomach acid reduction) solubility in supercritical carbon dioxide. J. Supercrit. Fluid 154, 104606 (2019).
Article CAS Google Scholar
Yamini, Y. et al. Solubility of capecitabine and docetaxel in supercritical carbon dioxide: Data and the best correlation. Thermochim. Acta 549, 95–101 (2012).
Article CAS Google Scholar
Xiang, S.-T., Chen, B.-Q., Kankala, R. K., Wang, S.-B. & Chen, A.-Z. Solubility measurement and RESOLV-assisted nanonization of gambogic acid in supercritical carbon dioxide for cancer therapy. J. Supercrit. Fluids 150, 147–155 (2019).
Article CAS Google Scholar
Zhu, H., Zhu, L., Sun, Z. & Khan, A. Machine learning based simulation of an anti-cancer drug (busulfan) solubility in supercritical carbon dioxide: ANFIS model and experimental validation. J. Mol. Liq. 338, 116731 (2021).
Article CAS Google Scholar
Sadeghi, A. et al. Machine learning simulation of pharmaceutical solubility in supercritical carbon dioxide: Prediction and experimental validation for busulfan drug. Arab. J. Chem. 15, 103502 (2022).
Article CAS Google Scholar
Senders, J. T. et al. Machine learning and neurosurgical outcome prediction: a systematic review. World Neurosurg. 109, 476-486e471 (2018).
Article PubMed Google Scholar
Cherkassky, V. & Ma, Y. Comparison of model selection for regression. Neural Comput. 15, 1691–1714 (2003).
Article PubMed MATH Google Scholar
Carbonell, J. G., Michalski, R. S. & Mitchell, T. M. An overview of machine learning. Mach. Learn. 45, 3–23 (1983).
Google Scholar
Goodfellow, I., Bengio, Y. & Courville, A. Machine learning basics. Deep Learn. 1, 98–164 (2016).
MATH Google Scholar
Rami, M. A. et al. Novel numerical simulation of drug solubility in supercritical CO2 using machine learning technique: Lenalidomide case study. Arab. J. Chem. 15(11), 104180 (2022).
Article Google Scholar
Xu, M., Watanachaturaporn, P., Varshney, P. K. & Arora, M. K. Decision tree regression for soft classification of remote sensing data. Remote Sens. Environ. 97, 322–336 (2005).
Article ADS Google Scholar
Song, Y.-Y. & Ying, L. Decision tree methods: Applications for classification and prediction. Shanghai Arch. Psychiatry 27, 130 (2015).
PubMed PubMed Central Google Scholar
Breiman, L., Friedman, J. H., Olshen, R. A. & Stone, C. J. Classification and Regression Trees (Routledge, 2017).
Book MATH Google Scholar
Ahmad, M. W., Mourshed, M. & Rezgui, Y. Trees vs neurons: Comparison between random forest and ANN for high-resolution prediction of building energy consumption. Energy Build. 147, 77–89 (2017).
Article Google Scholar
Dumitrescu, E., Hue, S., Hurlin, C. & Tokpavi, S. Machine learning for credit scoring: Improving logistic regression with non-linear decision-tree effects. Eur. J. Oper. Res. 297, 1178–1192 (2022).
Article MathSciNet MATH Google Scholar
Grbić, R., Kurtagić, D. & Slišković, D. Stream water temperature prediction based on Gaussian process regression. Expert Syst. Appl. 40, 7407–7414 (2013).
Article Google Scholar
Rasmussen, C. E. & Williams, C. K. Gaussian Processes for Machine Learning (MIT Press, 2006).
MATH Google Scholar
Rasmussen, C. E. Gaussian Processes in Machine Learning. Summer School on Machine Learning (Springer, 2003).
Google Scholar
Daemi, A., Kodamana, H. & Huang, B. Gaussian process modelling with Gaussian mixture likelihood. J. Process Control 81, 209–220 (2019).
Article CAS Google Scholar
Wang, H., Guan, Y., & Reich, B. Nearest-neighbor neural networks for geostatistics. in 2019 International Conference on Data Mining Workshops (ICDMW). 196–205 (IEEE, 2019).
Pishnamazi, M. et al. Experimental and thermodynamic modeling decitabine anti cancer drug solubility in supercritical carbon dioxide. Sci. Rep. 11, 1–8 (2021).
Article Google Scholar
Bader, H. et al. Solubility enhancement of decitabine as anticancer drug via green chemistry solvent: Novel computational prediction and optimization. Arab. J. Chem. 15(12), 104259 (2022).
Google Scholar
Byrne, E. & Schniter, P. Sparse multinomial logistic regression via approximate message passing. IEEE Trans. Signal Process. 64, 5485–5498 (2016).
Article ADS MathSciNet MATH Google Scholar
Zhang, Y., Duchi, J. & Wainwright, M. Divide and conquer kernel ridge regression. in Conference on Learning Theory, PMLR. 592–617 (2013).
Hoerl, A. E. & Kennard, R. W. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12, 55–67 (1970).
Article MATH Google Scholar
King, W. E. et al. Laser powder bed fusion additive manufacturing of metals; physics, computational, and materials challenges. Appl. Phys. Rev. 2, 041304 (2015).
Article ADS Google Scholar
Seyed, A. S. et al. Solubility of favipiravir (as an anti-COVID-19) in supercritical carbon dioxide: An experimental analysis and thermodynamic modeling. J. Supercrit. Fluids. 183, 105539 (2022).
Article Google Scholar
Kamath, C., & Cantu-Paz, E. Creating Ensembles of Decision Trees Through Sampling. (Lawrence Livermore National Lab (LLNL), 2001).
Kamath, C. Scientific Data Mining: A Practical Perspective (SIAM, 2009).
Book MATH Google Scholar
Ebden, M. Gaussian Processes: A Quick Introduction. arXiv preprint arXiv:1505.02965 (2015).
Kamath, C. Data mining and statistical inference in selective laser melting. Int. J. Adv. Manuf. Technol. 86, 1659–1677 (2016).
Article Google Scholar
Lerman, P. Fitting segmented regression models by grid search. J. R. Stat. Soc. Ser. C (Appl. Stat.) 29, 77–84 (1980).
Google Scholar
Kumar, S. K. & Johnston, K. P. Modelling the solubility of solids in supercritical fluids with density as the independent variable. J. Supercrit. Fluids 1, 15–22 (1988).
Article CAS Google Scholar
Bartle, K., Clifford, A., Jafar, S. & Shilstone, G. Solubilities of solids and liquids of low volatility in supercritical carbon dioxide. J. Phys. Chem. Ref. Data 20, 713–756 (1991).
Article ADS CAS Google Scholar
Esfandiari, N. & Sajadian, S. A. Solubility of lacosamide in supercritical carbon dioxide: An experimental analysis and thermodynamic modeling. J. Mol. Liq. 26, 119467 (2022).
Article Google Scholar
Bian, X.-Q., Zhang, Q., Du, Z.-M., Chen, J. & Jaubert, J.-N. A five-parameter empirical model for correlating the solubility of solid compounds in supercritical carbon dioxide. Fluid Phase Equilib. 411, 74–80 (2016).
Article CAS Google Scholar
Sodeifian, G., Razmimanesh, F. & Sajadian, S. A. Solubility measurement of a chemotherapeutic agent (Imatinib mesylate) in supercritical carbon dioxide: Assessment of new empirical model. J. Supercrit. Fluids 146, 89–99 (2019).
Article CAS Google Scholar

Download references

Acknowledgements

The authors extend their appreciation to the Deputyship for Research & Innovation, Ministry of Education in Saudi Arabia for funding this research work through the project number (IF-PSAU-2021/03/18826). The authors would like to thank the Deanship of scientific research at Umm Al-Qura University for supporting this work by grant code (22UQU4290565DSR93). Authors would like to introduce appreciated thanks to Taif University Researchers Supporting, Project number (TURSP-2020/35), Taif University, Taif, Saudi Arabia. The authors extend their appreciation to the Deanship of Scientific Research at King Khalid University for funding this work through Large Groups Project under grant number (R.G.P. 2/50/43).

Author information

Authors and Affiliations

Department of Pharmaceutics, College of Pharmacy, Prince Sattam Bin Abdulaziz University, P.O. Box 173, Al-Kharj, 11942, Saudi Arabia
Saad M. Alshahrani, Bjad K. Almutairy, Munerah M. Alfadhel, Ahmed Al. Saqr & Abdullah S. Alshetaili
Department of Pharmaceutical Chemistry, College of Pharmacy, Taif University, Taif, 21944, Saudi Arabia
Amany Belal
Department of Pharmaceutics, College of Pharmacy, Umm Al-Qura University, Makkah, 21955, Saudi Arabia
Mohammed A. S. Abourehab
Department of Pharmaceutics and Industrial Pharmacy, Faculty of Pharmacy, Minia University, Minia, 61519, Egypt
Mohammed A. S. Abourehab
Department of Pharmaceutical Chemistry, College of Pharmacy, King Khalid University, Abha, 62529, Kingdom of Saudi Arabia
Kumar Venkatesan
Department of Pharmaceutics, College of Pharmacy, Qassim University, Buraidah, 52571, Saudi Arabia
Amal M. Alsubaiyel
Institute of Research and Development, Duy Tan University, Da Nang, 550000, Vietnam
Mahboubeh Pishnamazi
The Faculty of Pharmacy, Duy Tan University, Da Nang, 550000, Vietnam
Mahboubeh Pishnamazi
Medicinal Chemistry Department, Faculty of Pharmacy, Beni-Suef University, Beni-Suef, 62514, Egypt
Amany Belal

Authors

Saad M. Alshahrani
View author publications
You can also search for this author in PubMed Google Scholar
Bjad K. Almutairy
View author publications
You can also search for this author in PubMed Google Scholar
Munerah M. Alfadhel
View author publications
You can also search for this author in PubMed Google Scholar
Amany Belal
View author publications
You can also search for this author in PubMed Google Scholar
Mohammed A. S. Abourehab
View author publications
You can also search for this author in PubMed Google Scholar
Ahmed Al. Saqr
View author publications
You can also search for this author in PubMed Google Scholar
Abdullah S. Alshetaili
View author publications
You can also search for this author in PubMed Google Scholar
Kumar Venkatesan
View author publications
You can also search for this author in PubMed Google Scholar
Amal M. Alsubaiyel
View author publications
You can also search for this author in PubMed Google Scholar
Mahboubeh Pishnamazi
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

S.M.A: supervision, writing the original draft, software, validation B.K.A: data analysis, writing the original draft, optimization M.M.A: investigation, optimization, writing the original draft, data analysis A.B: Supervision, writing the original draft, validation, modeling, software M.A.S.A: writing the original draft, modeling, software A.A.S: editing, data analysis, writing, investigation A.S.A: editing, writing, investigation, data analysis K.V: editing, writing, investigation, data analysis A.M.A: editing, writing, investigation, data analysis M.P: supervision, writing the original draft, software, validation.

Corresponding authors

Correspondence to Saad M. Alshahrani, Amany Belal, Amal M. Alsubaiyel or Mahboubeh Pishnamazi.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Alshahrani, S.M., Almutairy, B.K., Alfadhel, M.M. et al. Computational simulation and target prediction studies of solubility optimization of decitabine through supercritical solvent. Sci Rep 12, 18875 (2022). https://doi.org/10.1038/s41598-022-21233-0

Download citation

Received: 19 May 2022
Accepted: 23 September 2022
Published: 07 November 2022
DOI: https://doi.org/10.1038/s41598-022-21233-0

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.