Abstract
In previous studies, replicated and multiple types of speech data have been used for Parkinson’s disease (PD) detection. However, two main problems in these studies are lower PD detection accuracy and inappropriate validation methodologies leading to unreliable results. This study discusses the effects of inappropriate validation methodologies used in previous studies and highlights the use of appropriate alternative validation methods that would ensure generalization. To enhance PD detection accuracy, we propose a two-stage diagnostic system that refines the extracted set of features through \(L_{1}\) regularized linear support vector machine and classifies the refined subset of features through a deep neural network. To rigorously evaluate the effectiveness of the proposed diagnostic system, experiments are performed on two different voice recording-based benchmark datasets. For both datasets, the proposed diagnostic system achieves 100% accuracy under leave-one-subject-out (LOSO) cross-validation (CV) and 97.5% accuracy under k-fold CV. The results show that the proposed system outperforms the existing methods regarding PD detection accuracy. The results suggest that the proposed diagnostic system is essential to improving non-invasive diagnostic decision support in PD.
Similar content being viewed by others
Introduction
Parkinson’s disease (PD) is a neurological condition characterized by slowness of movements, tremors, rigidity, impaired voice and challenges in maintaining balance and coordination1,2,3. Global estimates in 2019 showed over 8.5 million individuals with PD4. In 1817 Dr. James Parkinson described and named the disease5. Speech-related impairments identified in PD patients include hypophonia (low volume), monotone speech (unvaried pitch range), dysarthria (difficulty in controlling speech-producing muscles), and dysphonia (difficulty in speaking)6,7. Approximately 90% of PD patients experience issues with their vocal system6,8. As of now, no medical (blood or laboratory) tests have been discovered for diagnosing PD9,10. Hence, artificial intelligence based methods using voice or speech features can facilitate neurologists.
The literature demonstrates that many machine learning methods have been introduced, utilizing voice and speech data, for the detection of PD1,11. Little et al. conducted an analysis of PD by measuring dysphonia10. Their dataset consisted of voice recordings from 31 individuals producing the vowel sound “a”. Dysphonia features were extracted from vowel phonation data and subsequently classified using the support vector machine (SVM) model. Tsanas et al. similarly employed voice data for the classification of PD12. A total of one hundred and thirty-two dysphonia measures were extracted from a dataset consisting of 263 samples12. Four feature selection algorithms were investigated to attain elevated accuracy. Huseyin Guruler utilized the dataset gathered in10 and accomplished the highest accuracy of 99.52% by employing a complex-valued artificial neural network with feature weighting based on k-means clustering13. Nonetheless, subject overlap emerged as a primary problem in Huseyin Guruler’s approach and other methods employed with the dataset from10. Furthermore, the preceding studies did not implement measures to mitigate the impacts of imbalanced classes within the dataset.
Sarkar et al.6 collected a well-balanced dataset from 20 PD patients and 20 healthy individuals to mitigate the influences of imbalanced classes distribution within the data. Each participant contributed twenty-six speech samples, and Praat acoustic analysis software was employed to extract 26 features from each speech sample14. Various learning models, including k-nearest neighbors (k-NN) and support vector machines (SVM), were investigated to attain optimal performance. However, the primary limitation for the well-balanced dataset obtained from6 was the comparatively lower classification accuracy. Canturk et al. aimed to enhance classification accuracy by employing a cascading approach, incorporating six distinct machine learning predictive models coupled with diverse feature selection algorithms. Nevertheless, their achieved maximum accuracies were 57.5% through Leave-One-Subject-Out Cross-Validation (LOSO CV) and 68.94% via 10-fold Cross-Validation (10-fold CV)15. Likewise, in a similar vein, 16,17, and18 compiled voice datasets with the intention of detecting PD. However, the datasets they employed are not accessible to the public. In reference to16, speech data from 50 subjects was collected. This study integrated three distinct feature extraction methods with five diverse classifiers, resulting in an impressive accuracy of 90%. In the context of17, a novel Bayesian linear regression technique was introduced for monitoring the severity of Parkinson’s Disease (PD) symptoms. This approach achieved an accuracy of 86.2% through the utilization of a two-stage variable selection and classification methodology.
Several researchers have explored deep learning models for PD diagnosis utilizing voice data, including techniques like autoencoders and Convolutional Neural Networks (CNNs)19,20,21. Several other scholars studied neural networks, but their study was limited to a single hidden layer, i.e., deep architecture was not explored15,22,23. Neural networks are commonly classified into two main categories: shallow neural networks (SNNs) and deep neural networks (DNNs). Shallow neural networks encompass an input layer, an output layer, and typically include only one hidden layer24,25. However, DNNs are characterized by an arrangement that comprises an input layer, an output layer, and multiple hidden layers26,27. In summary, DNNs are networks that undergo training using novel optimization algorithms and are composed of multiple hidden layers28,29. This study employs a recently introduced algorithm, namely the Adaptive Moment Estimation (ADAM) learning algorithm, for training the DNNs30.
This paper addresses two critical issues in PD detection using replicated voice and multiple types of speech data: the problem of inappropriate validation methods leading to subject overlap and a low rate of PD detection accuracy. Conventional k-fold CV is the cause of subject overlap. In such cases, we cannot depend on the constructed model as it is biased. Therefore, we suggest the use of alternative validation methodologies, such as LOSO CV. Additionally, we demonstrate that translating multiple samples per subject data into one sample per subject data automatically eliminates subject overlap.
To mitigate the low rate of PD detection accuracy problem, we have devised a two-stage diagnostic method to enhance PD detection accuracy. In the initial stage, we employ an \(L_{1}\) regularized SVM model to refine the extracted features. Subsequently, in the following stage, we conduct classification using a DNN model. Different from previous work, we propose simultaneous optimization of the two models. To simultaneously optimize the two models, a hybrid grid is obtained by merging the hyper-parameters of the cascaded models. Optimized versions of SVM and DNN are constructed when the optimum point on the hybrid grid is identified. Hybrid grid search algorithm (HGSA)31 is used to locate the optimal point on the hybrid grid. The search algorithm can simultaneously optimize the two models, i.e., SVM and DNN. An optimum subset of features will be obtained through the optimized version of the SVM model, while the optimized version of DNN will work efficiently on an optimal subset of features.
The primary contributions of this paper can be succinctly summarized as follows:
-
(1)
This paper addresses the issue of inappropriate validation methods employed in prior studies and advocates for the adoption of alternative validation approaches. Furthermore, it demonstrates that consolidating multiple samples per subject data into a single sample per subject data set effectively mitigates the issue of overlap.
-
(2)
We enhance the set of extracted features through the utilization of an \(L_{1}\)-regularized SVM. This process effectively eliminates redundant and irrelevant features, yielding a higher-quality feature set for classification.
-
(3)
To the best of our knowledge, the proposed cascaded diagnostic system, referred to as \(L_{1}\)SVM-DNN, represents a pioneering technique for the detection of Parkinson’s disease (PD) using voice and speech data.
-
(4)
Only a limited number of studies have explored the evaluation of feature selection at the input level of Deep Neural Networks (DNN)32. Notably, Taherkhani et al.32 recently discovered that deep learning models exhibit improved performance when the feature selection and feature extraction capabilities of a DNN are integrated. In this paper, we reinforce this finding by incorporating feature selection at the input level of the DNN.
-
(5)
The proposed cascaded diagnostic system surpasses the performance of state-of-the-art methods as reported in the two benchmark voice recording datasets.
The remainder of the paper is structured as follows:
In Section “Materials and methods”, we provide a detailed explanation of the datasets and delve into the discussion of a deep learning-based predictive classification model. In Section “Results and discussion”, we present experimental results and engage in a discussion of these findings. Section “Comparative study” is dedicated to a comparative study. Section “Limitations of the study” briefly discuss some limitation of the study. Lastly, Section “Conclusion” encapsulates the conclusion of this study.
Materials and methods
Datasets description
Two datasets are used in this work. Max Little collected the first dataset in10 and is available at33. The second dataset was collected by Sarkar et al., reported in6 and can be obtained online from34. The Max Little (first dataset) data contains voice samples of 31 people (23 PD and eight healthy). The age range of the subjects is from 46 to 85 years (mean= \(\mu =\) 65.8, std. deviation= \(\sigma =\) 9.8). The duration of the disease for PD patients in the first dataset ranges from 1 to 28 years. The dataset contains 195 replicated sustained vowel “ a” phonations. The data is a matrix containing 195 rows and 23 columns where the columns denote features except the last label column. The label can have a value of 0 or 1. A detailed description of 22 biomedical voice features extracted from each sample is given in Table 1.
The second dataset contains 20 healthy persons and 20 PD having PD for 0 to 6 years. Twenty-six voice samples, including words, numbers, sustained vowels, and short phrases, were taped for every individual. Praat acoustic analysis software was used to extract 26 features from every single voice sample14. A detailed description of these 26 features extracted from each sample is given in Table 1. Thus a total of 1040 samples are obtained. This data set is known as the training dataset. Another independent testing dataset was collected from 28 PD patients under the same conditions. This dataset was named the test dataset; it includes 168 samples. These samples include the recordings of 28 PD subjects, just saying vowels “ a” and “ o” one after another for three times. In the test data, voice samples from 1 to 3 correspond to vowel “ a” , and voice samples from 4 to 6 correspond to vowel “ o” . The duration of the disease for PD patients in the training dataset ranges from 0 to 6 years. The age range of the patients in the training dataset is from 43 to 77 (\(\mu =\) 64.86, \(\sigma =\) 8.97). The age range of the the healthy subjects in the training dataset is from 45 to 83 (\(\mu =\) 62.55, \(\sigma =\) 10.79). The duration of the disease for PD patients in the testing dataset ranges from 0 to 13 years. The age range of the the patients in the testing dataset is from 39 to 79 (\(\mu =\) 62.67, \(\sigma =\) 10.96). Moreover, the authors of dataset provided Hoehn and Yahr (H &Y) scores for PD patients. The H &Y score provides information about the stage of the disease and its value ranges between 1 and 510. The authors of the second dataset provided Unified Parkinson’s Disease Rating Scale Part III (UPDRS-III) score for the PD patients in the training dataset only. UPDRS III i.e. motor UPDRS ranges from 0 to 108, where 0 represents symptom free and 108 represents severe motor impairments35,36. The scores for PD patients are reported in Table 2. For the healthy subjects, UPDRS-III and H &Y values are denoted by n/a. Samuel et al.37 suggested that to test the effectiveness of a newly developed machine learning method, it is a good approach to choose dataset(s) that have been extensively tested. Thus, our choice of datasets in this paper was based on the facts discussed in37.
The proposed cascaded system based on \(L_{1}\) SVM and DNN
We propose a two-stage feature selection and classification method to detect PD using replicated voice data and various voice records. With the proposed two-stage approach, the time complexity of the predictive model can be reduced. The accuracy can also be improved by eliminating irrelevant features from the feature space. The model that we used for feature refinement is the \(L_{1}\)-regularized linear SVM, while for classification DNN with optimized hyper-parameters has been used. The models’ formulations, potentially associated problems, and proposed solutions are stated as follows.
For a given dataset D with q instances: \(D = \{(x_{i}, y_{i})|x_{i}\in R^{p}, y_{i}\in \{-1,1\} \}_{i=1}^{q}\) where \(x_{i}\) is i-th instance and each instance has p dimensions or features. And \(y_{i}\) denotes class label which may be \(-1\) or 1 for binary classification. For the classification problem, SVM learns the hyper-plane given by \(wx=b\), where b is the bias and w is the weight vector. The hyper-plane maximizes the margin distance \({2}/{\parallel {w}\parallel _{2}^2}\).
The primal form of the SVM can be formulated as follows:
In 1995, Cortes and Vapnik proposed a modified version of SVM called Soft Margin SVM, which allows for mislabeled instances38, and it has the following form:
where the regularizer or penalty function is \(L_{2}\)-norm, \(C>0\) is the error penalty parameter and \(\xi\) is slack variable used for misclassification measurement.
In 1998, Bradley and Mangasarian proposed to use \(L_{1}\)-norm as the regularizer39, and the feature selection can be made using \(L_{1}\)-norm SVM due to its sparse solutions. It is formulated as:
where the regularizer or penalty function is \(L_{1}\)-norm, \(C>0\) is the error penalty parameter and \(\xi\) is slack variable used for misclassification measurement. As discussed above, in (3), w is the weight vector. changing values of hyper-parameter C, different coefficients of w shrink towards zero. In fact, with sufficiently small C, several fitted coefficients would be exactly zero, i.e., sparse solution. Therefore, \(L_{1}\)-norm regularization has an inherent feature selection property, i.e., those features whose corresponding coefficients are fitted to zero can be eliminated. Furthermore, as C changes, several fitted coefficients will become zero, which will result in different feature subsets40. Thus, the optimal subset of features can be obtained by tuning the hyper-parameter C. For this purpose, we use HGSA in this paper which will automatically tune the C hyper-parameter of the linear SVM model and search the optimal subset of features.
It is worth noting that DNN can extract features by itself. DNNs, including the one used in this paper, use feature extraction rather than feature selection to extract underlined features or rules from the data32. We consider only the most important features in feature selection by eliminating the irrelevant features from the feature space. While in feature extraction, all the features are considered, and new ones are extracted. DNNs use a large number of non-linear elements, i.e., neurons, to learn relationships or functions of high complexity. More likely, irrelevant features present in the feature space are also modeled accordingly. Noise is the result of Modeling irrelevant features32. Thus, learning the noise from these irrelevant features negatively affects the acquired knowledge of data about the overall distribution of the data32. If feature space contains irrelevant features, overfitting the network to the training data is another problem32,41. That is when the network learns irrelevant details from the training data. It shows good performance on the training data as it becomes more biased to the previously seen data42. But, it fails to generalize to the unseen validation or testing data.
To solve these problems posed by irrelevant features in the feature space, we use \(L_{1}\) regularized SVM to make the feature space free from irrelevant features before applying the feature vector to DNN. The SVM model eliminates irrelevant features. To validate the fact that feature selection coupled with the feature extraction capability of DNN improves the performance of DNN, in Section “Comparative study”, we performed experiments by applying all the features to DNN, i.e., removing the feature selection SVM model and then compared it with the proposed \(L_{1}\)SVM-DNN. The accuracy of 96.87 and 62.5% is obtained for datasets 1 and 2, respectively, when all features were applied to DNN. While accuracies of 100% and 97.5% are obtained for datasets 1 and 2, respectively, using the \(L_{1}\)SVM-DNN model. Hence, simulation results show that the feature selection capability of the SVM model, when combined with the feature extraction capability of the DNN model, improves the performance of DNN for PD detection problems. HGSA is used to search for advanced or optimal features and is given to a DNN model for classification.
For the given m training samples, a DNN models a hypothesis function \(h_{\theta }({\textbf{x}})\) parameterized by DNN parameters \(\theta \in {\mathbb {R}}^{d}\) where d denotes the dimension of \(\theta\) and the input feature vector is represented by \({\textbf{x}}\). The \(h_{\theta }({\textbf{x}})\) tries to anticipate label \(\hat{{\textbf{y}}}\) for input feature vector \({\textbf{x}}\). The aim is to locate those optimum values of \(\theta\) for which objective function is minimized as:
We used the ADAM learning algorithm to minimize(4). In this paper, we used default values for hyper-parameters of the ADAM algorithm, i.e., the value of 0.9 for \(\beta _{1}\), 0.999 for \(\beta _{2}\) and \(10^{-8}\) for \(\varepsilon\). After optimizing the parameters or weights of the DNN model by ADAM for training data samples, the model performance is evaluated by applying testing data samples. The generalization performance (in terms of % of falsely predicted testing samples), represented by generalization error \(\eta\) or validation loss \({\mathcal {L}}(A_{\lambda }, D_\text {train}, D_\text {valid})\). In the expression, \(A_\lambda\) denotes the model, \(D_\text {valid}\) denotes data on which the loss is evaluated, and \(D_\text {train}\) denotes the data on which the model is trained. Our objective is to find \(A_\lambda\) that minimizes the validation loss. The hyper-parameter optimization problem under k-fold CV is then to minimize the black box function given as follows:
where \(\lambda\) denotes the hyper-parameters of DNN and \(A_{\lambda }\) represents DNN configuration under \(\lambda\) hyper-parameters choice or setting. In order to obtain good performance, optimal hyper-parameters of DNN need to be searched that can lessen the validation loss. Hence, two optimization problems are dealt with here, i.e., searching the optimal value of the hyper-parameter of the SVM model that will yield the optimal subset of features and searching optimal hyper-parameters of the DNN model. In this paper, two optimization problems are merged into one by merging the hyperparameters of the two models. Thus, after merging the two optimization problems into one, (5) can be formulated as:
The minimization of (6) will result in us optimized forms of two models. The merging of hyper-parameters of the two models yields a hybrid grid. Each point on the grid has several coordinates. The first coordinate of each point on the hybrid grid is C, i.e., the SVM model’s hyperparameters, while other coordinates are the hyperparameters of the DNN model. The hyper-parameters of the second model contain the number of layers of DNN denoted by L, the number of neurons in each hidden layer characterized by \(N_{h}\), where h indicates the hidden layer number and dropout regularization. Dropout regularization is considered only in those cases when the model is overfitting. To solve the minimization of (6), we use HGSA. Algorithm 1 gives the detailed procedure of the HGSA algorithm.
Ethical approval
This article does not contain any studies with human participants or animals performed by any authors.
Informed consent
Informed consent is not applicable. The study used two publically available datasets33,34.
Results and discussion
For evaluation purposes, both types of cross-validation schemes are utilized, i.e., LOSO CV and k-fold CV with data translation. LOSO CV and k-fold are two widely adopted validation approaches in data analysis. In LOSO CV, the dataset is initially partitioned into \(S_{n}\) parts, where \(S_{n}\) represents the total number of subjects or individuals.. In each iteration of LOSO CV, the data corresponding to one subject, starting with \(S_{1}\), is reserved for testing, while the data from the remaining subjects are utilized for training the model. Similarly, in k-fold CV, the dataset is divided into k subsets or folds. During the first iteration of k-fold CV, the data in the first fold \(k=1\) is set aside for testing, while the data from the other folds are employed for model training. In subsequent iterations, the testing fold shifts to the next one \(k=2\), and the remaining data continue to serve as the training set. This cycle repeats until all the folds have been used for testing.
For more practical validation, we carried out model development in phase 1 and model testing in phase 2 as can be seen in Fig. 1. The software package used for these experiments was Python. In all the experiments, N1 and \(N_{2}\) represent the number of neurons in hidden layer 1 and hidden layer 2 of the network, respectively. While L denotes the total number of layers in the neural network and \(N_{h}\) represents the number of neurons in each hidden layer when we are using the equal number of neurons in all hidden layers. The learning algorithm used is ADAM. Furthermore, C represents the hyper-parameter of the linear SVM model and n denotes the number of features produced by the SVM model. The initial range for the hyperparameters N1, \(N_{2}\), \(N_{h}\) is set between 5 and 100. Likewise, the initial range of the hyperparameters L is established between 4 and 10, while the hyperparameter C takes an initial range spanning from 0.00001 to 1000.
Simulation results of dataset 1
LOSO cross-validation
In this experiment, LOSO CV is performed on the first dataset. Despite the fact that LOSO CV is the most practical validation scheme for replicated voice data and multiple types of voice data, LOSO CV was ignored in previous studies except43 for this dataset. The best results of 100% were obtained for C = 0.5, resulting in a subset of features having only eight features. Moreover, the best result was obtained for optimally configured DNN with five layers i.e. \(L=5\), and 20 and 30 neurons in each hidden layer. The same results are also obtained for \(L=4\) and \(N_{h}=30\). That is, the proposed approach can classify subjects as PD and healthy with an accuracy of 100%. The results of the experiment are reported in Table 3. In the table, the optimal subset of features for n = 8 contains \(F_{1}, F_{2}, F_{3}, F_{10}, F_{16}, F_{18}, F_{19}\) and \(F_{21}\). It is evident from the table that if optimal hyper-parameters of the DNN model are not utilized, we may obtain poor performance with an optimal subset of features. Thus, better performance can be achieved if extracted features are refined and optimally configured DNN is utilized.
As discussed earlier, the first dataset has the problem of imbalanced classes. The problem of imbalanced classes in data affects the performance of predictive models because the predictive models trained on imbalanced data are more sensitive to detecting the majority class and less sensitive to the minority class44. Thus, there is a need to balance the training process of the predictive model. There are two ways-Under-sampling the majority class and over-sampling the minority class. Over-sampling is very easy for image datasets because, with simple operations like rotations and translation, we can easily over-sample the minority class. For voice data, we have used the under-sampling method. However, in literature, more advanced techniques used for under-sampling did not significantly improve simply selecting random samples. Hence, in this paper, we performed random under-sampling during the training process.
The practical demonstration of the problems posed by imbalance classes is given in Table 3. The last three rows of the table, separated by a horizontal line, are the results obtained when no measure is taken to balance the training process. The simulation results show that the model fails to perform better even with optimally configured DNN and the optimal subset of features. The reason is that machine learning models are sensitive to detecting the majority class and less susceptible to detecting the minority when imbalanced classes are used to train the model. That is why in the last three rows, the model results in poor specificity. Thus, it is of paramount importance to balance classes during the training process.
k-fold cross-validation with k=10
The second experiment that is performed on the first dataset is a k-fold CV. The value of k is chosen here to be 10. The results for different hyper-parameter configurations are given in Table 4. HGSA searches for the best accuracy of 100% for a 10-fold CV. The achieved accuracy via 10-fold CV is the same as the accuracy achieved in45. In45, the 10-fold experiment was also conducted on the second dataset and achieved 90% accuracy. Our proposed model achieved 97.5% for 10-fold CV on the second dataset, which proves the effectiveness of the proposed diagnostic system. The optimal subset of features with \(n=1\) contains \(F_{2}\) and with n = 7 contains \(F_{1}, F_{2}, F_{3}, F_{10}, F_{16}, F_{19}\) and \(F_{21}\).
Simulation results of dataset 2
LOSO cross validation on training database
In this experiment, LOSO CV is performed on the training database of the second dataset. We achieved state-of-the-art results with an accuracy of 100%, which is the highest classification accuracy reported so far for LOSO CV on the training database. The results of the experiment are given in Table 5. The proposed approach has the capability to classify subjects as PD and healthy with an accuracy of 100%. The best results are obtained for C hyper-parameter equal to 0.0015 for this dataset, resulting in a feature subset consisting of only seven features. It is important to note that 100% result for LOSO CV does not mean that the proposed system can correctly classify all samples of the dataset. Because a subject is classified as PD if more than half of its samples are predicted as 1, otherwise the subject is classified as healthy. Thus, it is expected that for any disease having more than one sample per patient, the proposed system could be an ideal candidate for diagnosis. Moreover, optimal subset of features for C = 0.0015 and with n = 7 contains \(F_{5}, F_{10}, F_{15}, F_{19}, F_{21}, F_{24}\) and \(F_{26}\). Additionally, the best result of 100 % was obtained for optimally configured DNN with five layers i.e. L = 5 and 30 neurons in each hidden layer. It is evident from Table 5 that if optimal hyperparameters of the DNN model are not utilized, we may obtain poor performance with an optimal subset of features. Thus, better performance can be achieved if extracted features are refined and optimally configured DNN is utilized.
LOSO cross-validation on testing database
In this experiment, LOSO CV is performed on the testing database of the second dataset. This dataset is an independent dataset collected from new 28 patients under the same conditions in which the training dataset was collected. This dataset aims to validate the performance of the proposed system achieved on the training dataset. Since this data only contain patient subjects and no healthy subject, thus its specificity cannot be reported. The DNN model is trained on a train data file, but it is transformed into a new dataset by extracting only those concerned with vowel phonations. The main reason for creating modified train data is that the test data, in this case, contains only vowel phonations. The simulation results for this experiment are given in Table 6. From the results, it is clear that maximum accuracy of 78.57% is obtained. It is due to the overfitting of the model to the training data. Thus to avoid the model from overfitting, we bring into account dropout regularization. With 0.3 dropouts, the proposed method achieved an accuracy of 100%. The dropout regularization is applied to hidden layers of the DNN model. Dropout is a hyperparameter that is used when the DNN is facing the problem of overfitting. It is important to note that according to the proper unbiased validation approach depicted in Fig. 1, the accuracy on the testing dataset should be reported 96.42% not 100% because during the model development phase (results given in Table 5), the optimal model is produced under hyperparameters configuration of \(n=7\), \(L=5\) and \(N_{h}=30\).
k-fold cross validation with k = 10 on training data of dataset 2
The results of the 10-fold CV experiment for dataset 2 are given in Table 7. It is important to note that so far the highest accuracy achieved for 10-fold CV is 90% (see Table 11). The proposed diagnostic system achieved the best PD detection accuracy of 97.5 %. The obtained accuracy is the highest accuracy for k-fold cross-validation for this dataset. Moreover, the optimal subset of features obtained at \(C=0.001\) and with \(n=1\) contains \(F_{19}\) while the optimal subset of features with at \(C=0.01\) and with \(n=4\) contains \(F_{10}, F_{18}, F_{19}\) and \(F_{21}\).
Comparative study
In this section, the performance of the proposed method is compared with other well-known machine learning models and with previously published work that used the two benchmark voice datasets.
Comparison of the proposed method with other models for dataset 1
For validation purposes, we also carried out experiments by cascading the features refinement model i.e. \(L_{1}\) SVM with other renowned classifiers namely SVM and artificial neural network (ANN) owing to their remarkable performance on many other biomedical problems. Furthermore, we also checked the performance of the conventional DNN model without any feature refinement module. Next, we developed three similar hybrid systems i.e., SVM-SVM(Lin) and SVM-SVM(RBF), and SVM-ANN, where the first SVM model is \(L_{1}\) regularized linear SVM model that is used for features refinement while the second model is used as a predictive model. In the case of the SVM-SVM hybrid model, we denote the hyper-parameter of the feature selection model by \(C_{1}\) while the hyper-parameter of the predictive SVM model by \(C_{2}\). In addition, g denotes the gamma hyperparameter of the SVM predictive model when it uses the RBF kernel. All these experiments were performed using a 10-fold CV. The goal is to evaluate the feature refinement capabilities of the \(L_{1}\) SVM when it is cascaded state-of-the-art classifiers. Furthermore, all the cascaded models were optimized by using the HGSA approach. The results are tabulated in Table 8.
Comparison of the proposed method with other models for dataset 2
The same types of cascaded models were also developed for the second dataset. The results are reported in Table 9. From Tables 8 and 9, it is clear that the proposed method shows better performance. Additionally, in each case, the \(L_{1}\) SVM produces features of better quality, and hence performance of the predictive model is improved whether it is SVM, ANN, or DNN. Thus, these results validate the feature refinement capabilities of the developed cascaded systems.
Comparison with previously reported methods
For comparison purposes, Tables 10 and 11 list accuracies obtained in previous studies by different methods applied to the two voice recording-based PD datasets. As shown in these tables, our developed model can yield better classification accuracy than previously proposed methods in the literature.
Based on data in Tables 10 and 11, we are in a position to conclude that our developed diagnostic system gives state-of-the-art performance in terms of PD detection accuracy.
Limitations of the study
Although this study showed good performance in terms of differentiating PD patients from healthy subjects, there are some limitations. One limitation pertains to the data used in the study. Information such as the severity of the disease in PD patients from the testing dataset of the second dataset and whether the data collection was carried out in the ON or OFF state of the disease is missing. The study did not investigate whether accuracy varies depending on disease duration and severity. Another diagnostic challenge in Parkinsonism is differentiating between idiopathic PD and atypical PD (e.g., progressive supranuclear palsy (PSP), multiple system atrophy (MSA), corticobasal syndrome (CBS), Dementia with Lewy Bodies (DLB)), where vocal dysfunction is also manifested67. The study did not investigate this kind of differential diagnosis.
Conclusion
This paper has addressed two primary issues concerning the automated detection of PD. Firstly, it has highlighted the inadequacies of validation methodologies employed in previous studies, which led to the creation of biased predictive models. Secondly, it has recognized the persistent challenge of achieving high PD detection rates when unbiased models are employed. To mitigate bias, this study has adopted appropriate validation approaches. In addition, to enhance the accuracy of PD detection, a two-stage diagnostic system, referred to as \(L_{1}\)SVM-DNN, has been proposed. Notably, unlike previous methods, this research has emphasized the independence of model development and testing phases. Two benchmark datasets were employed for validation purposes. The experimental results have demonstrated that the proposed method attains a classification accuracy of 97.5% with 10-fold CV and an impressive 100% accuracy with LOSO CV. For generalization purposes, we also evaluated the optimally developed model on testing dataset and obtained 96.42% accuracy. Based on these outcomes, it can be confidently asserted that the developed cascaded system holds significant promise in automated differentiation of PD patients from healthy subjects.
Although the \(L_{1}\)SVM-DNN approach showed outstanding performance in terms of differentiating PD patients from healthy subjects, from a clinical diagnostic perspective, this kind of automated differentiation has limited significance. This is because, in real-time applications, differentiating between idiopathic PD and atypical PD (e.g., PSP, MSA, CBS, DLB), where vocal dysfunction is also manifested, is a more challenging task. Therefore, future efforts should focus on the collection of a multi-class dataset, including data from healthy subjects, idiopathic PD, and atypical PD and its subtypes. Unbiased machine learning models, like \(L_{1}\)SVM-DNN, should be trained and tested on such multi-class problems. These models would have more significance and could be deployed in hospitals and clinics for real-time diagnostic applications.
Data availability
The datasets analyzed during the current study are available in the UCI Machine Learning Repository, https://doi.org/10.24432/C5NC8M, and https://doi.org/10.24432/C59C74.
References
Ali, L., Zhu, C., Zhou, M. & Liu, Y. Early diagnosis of Parkinson’s disease from multiple voice recordings by simultaneous sample and feature selection. Expert Syst. Appl. 137, 22–28. https://doi.org/10.1016/j.eswa.2019.06.052 (2019).
Jankovic, J. Parkinson’s disease: Clinical features and diagnosis. J. Neurol. Neurosurg. Psychiatry 79(4), 368–376 (2008).
Khorasani, A. & Daliri, M. R. HMM for classification of Parkinson’s disease based on the raw gait data. J. Med. Syst. 38(12), 147 (2014).
Wordl Health Organization. A report on Parkinson’s disease. https://www.who.int/news-room/fact-sheets/detail/parkinson-disease
Langston, J. W. Parkinson’s disease: Current and future challenges. Neurotoxicology 23(4–5), 443–450 (2002).
Sakar, B. E. et al. Collection and analysis of a Parkinson speech dataset with multiple types of sound recordings. IEEE J. Biomed. Health Inform. 17(4), 828–834 (2013).
Singh, N., Pillay, V. & Choonara, Y. E. Advances in the treatment of Parkinson’s disease. Prog. Neurobiol. 81(1), 29–44 (2007).
Ho, A. K., Iansek, R., Marigliani, C., Bradshaw, J. L. & Gates, S. Speech impairment in a large sample of patients with Parkinson’s disease. Behav. Neurol. 11(3), 131–137 (1999).
Ravì, D. et al. Deep learning for health informatics. IEEE J. Biomed. Health Inform. 21(1), 4–21 (2017).
Little, M. A. et al. Suitability of dysphonia measurements for telemonitoring of Parkinson’s disease. IEEE Trans. Biomed. Eng. 56(4), 1015–1022 (2009).
Rahman, A. et al. Parkinson’s disease diagnosis in cepstral domain using MFCC and dimensionality reduction with SVM classifier. Mob. Inf. Syst. 2021, 1–10 (2021).
Tsanas, A., Little, M. A., McSharry, P. E., Spielman, J. & Ramig, L. O. Novel speech signal processing algorithms for high-accuracy classification of Parkinson’s disease. IEEE Trans. Biomed. Eng. 59(5), 1264–1271 (2012).
Gürüler, H. A novel diagnosis system for Parkinson’s disease using complex-valued artificial neural network with k-means clustering feature weighting method. Neural Comput. Appl. 28(7), 1657–1666 (2017).
Boersma, O., & Weenink, D. Praat: Doing phonetics by computer. http://www.fon.hum.uva.nl/praat/ (2010).
Canturk, I. & Karabiber, F. A machine learning system for the diagnosis of Parkinson’s disease from speech signals and its application to multiple speech signal types. Arab. J. Sci. Eng. 41(12), 5049–5059 (2016).
Benba, A., Jilbab, A. & Hammouch, A. Discriminating between patients with Parkinson’s and neurological diseases using cepstral analysis. IEEE Trans. Neural Syst. Rehabil. Eng. 24(10), 1100–1108 (2016).
Naranjo, L., Pérez, C. J., Martín, J. & Campos-Roca, Y. A two-stage variable selection and classification approach for Parkinson’s disease detection by using voice recording replications. Comput. Methods Progr. Biomed. 142, 147–156 (2017).
Naranjo, L., Pérez, C. J. & Martín, J. Addressing voice recording replications for tracking Parkinson’s disease progression. Med. Biol. Eng. Comput. 55(3), 365–373 (2017).
Zhang, Y. Can a smartphone diagnose Parkinson disease? a deep neural network method and telediagnosis system implementation. Parkinson’s Dis. 2017(4), 1–11 (2017).
Frid, A., Kantor, A., Svechin, D. & Manevitz, L. M. Diagnosis of Parkinson’s disease from continuous speech using deep convolutional networks without manual selection of features. In Science of Electrical Engineering (ICSEE), IEEE International Conference on the, IEEE 1–4 (2016).
Caliskan, A., Badem, H., Basturk, A. & Yuksel, M. E. Diagnosis of the Parkinson disease by using deep neural network classifier. Istanb. Univ. J. Electr. Electron. Eng. 17(2), 3311–3319 (2017).
Das, R. A comparison of multiple classification methods for diagnosis of Parkinson disease. Expert Syst. Appl. 37(2), 1568–1572 (2010).
Åström, F. & Koker, R. A parallel neural network approach to prediction of Parkinson’s disease. Expert Syst. Appl. 38(10), 12470–12474 (2011).
Ali, L. & Bukhari, S. An approach based on mutually informed neural networks to optimize the generalization capabilities of decision support systems developed for heart failure prediction. Irbm 42(5), 345–352 (2021).
Heydarpour, F., Abbasi, E., Ebadi, M. & Karbassi, S.-M. Solving an optimal control problem of cancer treatment by artificial neural networks. Int. J. Interact. Multimedia Artif. Intell. 6(4), 18–25 (2020).
Nielsen, M. A. Neural Networks and Deep Learning Vol. 25 (Determination Press, 2015).
Kasihmuddin, M., Mansor, M., Alzaeemi, S. A. & Sathasivam, S. Satisfiability logic analysis via radial basis function neural network with artificial bee colony algorithm. Int. J. Interact. Multimedia Artif. Intell. 6(6), 164–173 (2021).
Hinton, G. et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012).
Imrana, Y., Xiang, Y., Ali, L. & Abdul-Rauf, Z. A bidirectional LSTM deep learning approach for intrusion detection. Expert Syst. Appl. 185, 115524 (2021).
Kingma, D. P., & Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980
Ali, L., Niamat, A., Khan, J. A., Golilarz, N. A. & Xingzhong, X. An expert system based on optimized stacked support vector machines for effective diagnosis of heart disease. IEEE Accesshttps://doi.org/10.1109/ACCESS.2019.2909969 (2019).
Taherkhani, A., Cosma, G. & McGinnity, T. Deep-FS: A feature selection algorithm for deep Boltzmann machines. Neurocomputing 322, 22–37 (2018).
Dheeru, D., & Karra Taniskidou, E. UCI machine learning repository-Parkinsons data set. http://archive.ics.uci.edu/ml (2017).
Dheeru, D., & Karra Taniskidou, E. UCI multiple voice recordings-Parkinsons data set. https://archive.ics.uci.edu/ml/datasets/Parkinson+Speech+Dataset+with++Multiple+Types+of+Sound+Recordings (2017).
Tsanas, A., Little, M. A., McSharry, P. E. & Ramig, L. O. Accurate telemonitoring of Parkinson’s disease progression by noninvasive speech tests. IEEE Trans. Biomed. Eng. 57(4), 884–893 (2009).
Hlavnivcka, J., Cmejla, R., Klempivr, J., Ruuvzivcka, E. & Rusz, J. Acoustic tracking of pitch, modal, and subharmonic vibrations of vocal folds in Parkinson’s disease and parkinsonism. IEEE Access 7, 150339–150354 (2019).
Samuel, O. W., Asogbon, G. M., Sangaiah, A. K., Fang, P. & Li, G. An integrated decision support system based on ANN and Fuzzy_AHP for heart failure risk prediction. Expert Syst. Appl. 68, 163–172 (2017).
Cortes, C. & Vapnik, V. Support-vector networks. Mach. Learn. 20(3), 273–297 (1995).
Bradley, P. S. & Mangasarian, O. L. Feature selection via concave minimization and support vector machines. In ICML, Vol. 98 82–90 (1998).
Zhu, J. & Zou, H. Variable selection for the linear support vector machine. In Trends in Neural Computation (eds Chen, K. & Wang, L.) 35–59 (Springer, 2007).
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014).
Javed, K., Babri, H. A. & Saeed, M. Feature selection based on class-dependent densities for high-dimensional binary data. IEEE Trans. Knowl. Data Eng. 24(3), 465–477 (2012).
Sakar, C. O. & Kursun, O. Telediagnosis of Parkinson’s disease using measurements of dysphonia. J. Med. Syst. 34(4), 591–599 (2010).
Japkowicz, N. The class imbalance problem: Significance and strategies. In Proceedings of the International Conference on Artificial Intelligence, Vol. 56 111–117 (2000).
Khan, M. M., Mendes, A. & Chalup, S. K. Evolutionary wavelet neural network ensembles for breast cancer and Parkinson’s disease prediction. PLoS ONE 13(2), e0192192 (2018).
Psorakis, I., Damoulas, T. & Girolami, M. A. Multiclass relevance vector machines: Sparsity and accuracy. IEEE Trans. Neural Netw. 21(10), 1588–1598 (2010).
Ozcift, A. & Gulten, A. Classifier ensemble construction with rotation forest to improve medical diagnosis performance of machine learning algorithms. Comput. Methods Progr. Biomed. 104(3), 443–451 (2011).
Li, D.-C., Liu, C.-W. & Hu, S. C. A fuzzy-based data transformation for feature extraction to increase classification performance with small medical data sets. Artif. Intell. Med. 52(1), 45–52 (2011).
Luukka, P. Feature selection using fuzzy entropy measures with similarity classifier. Expert Syst. Appl. 38(4), 4600–4607 (2011).
Spadoto, A. A., Guido, R. C., Carnevali, F. L., Pagnin, A. F., Falcão, A. X. & Papa, J. P. Improving Parkinson’s disease identification through evolutionary-based feature selection. In IEEE Annual International Conference Engineering in Medicine and Biology Society (EMBC 2011) 7857–7860 (IEEE, 2011).
Chen, H.-L. et al. An efficient diagnosis system for detection of Parkinson’s disease using fuzzy k-nearest neighbor approach. Expert Syst. Appl. 40(1), 263–271 (2013).
Zuo, W.-L., Wang, Z.-Y., Liu, T. & Chen, H.-L. Effective detection of Parkinson’s disease using an adaptive fuzzy k-nearest neighbor approach. Biomed. Signal Process. Control 8(4), 364–373 (2013).
Zhang, H.-H. et al. Classification of Parkinson’s disease utilizing multi-edit nearest-neighbor and ensemble learning algorithms with speech samples. Biomed. Eng. Online 15(1), 122 (2016).
Chandrayan, S., Agarwal, A., Arif, M. & Sahu, S. S. Selection of dominant voice features for accurate detection of Parkinson’s disease. In The Third International Conference on Biosignals, Images and Instrumentation (ICBSII 2017) 1–4 (IEEE, 2017).
Ozcift, A. SVM feature selection based rotation forest ensemble classifiers to improve computer-aided diagnosis of Parkinson disease. J. Med. Syst. 36(4), 2141–2147 (2012).
Alhussein, M. Monitoring Parkinson’s disease in smart cities. IEEE Access 5, 19835–19841 (2017).
Cai, Z., Gu, J. & Chen, H.-L. A new hybrid intelligent framework for predicting Parkinson’s disease. IEEE Access 5, 17188–17200 (2017).
Eskıdere, Ö., Karatutlu, A. & Ünal, C. Detection of Parkinson’s disease from vocal features using random subspace classifier ensemble. In The 12th International Conference on Electronics Computer and Computation (ICECCO 2015) 1–4 (IEEE, 2015).
Behroozi, M. & Sami, A. A multiple-classifier framework for Parkinson’s disease detection based on various vocal tests. Int. J. Telemed. Appl. 2016, 1–9 (2016).
Benba, A., Jilbab, A. & Hammouch, A. Using human factor cepstral coefficient on multiple types of voice recordings for detecting patients with Parkinson’s disease. IRBM 38(6), 346–351 (2017).
Li, Y., Zhang, C., Jia, Y., Wang, P., Zhang, X. & Xie, T. Simultaneous learning of speech feature and segment for classification of Parkinson disease. In The 19th IEEE International Conference on e-Health Networking, Applications and Services (Healthcom 2017) 1–6 (IEEE, 2017).
Vadovskỳ, M. & Paralič, J. Parkinson’s disease patients classification based on the speech signals. In The 15th IEEE International Symposium on Applied Machine Intelligence and Informatics (SAMI 2017) 000321–000326 (IEEE, 2017).
Benba, A., Jilbab, A. & Hammouch, A. Analysis of multiple types of voice recordings in cepstral domain using MFCC for discriminating between patients with Parkinson’s disease and healthy people. Int. J. Speech Technol. 19(3), 449–456 (2016).
Kraipeerapun, P. & Amornsamankul, S. Using stacked generalization and complementary neural networks to predict Parkinson’s disease. In The 11th International Conference on Natural Computation (ICNC 2015) 1290–1294 (IEEE, 2015).
Cai, Z. et al. An intelligent Parkinson’s disease diagnostic system based on a chaotic bacterial foraging optimization enhanced fuzzy KNN approach. Comput. Math. Methods Med. 2018, 2396952 (2018).
Ali, L., Zhu, C., Zhang, Z. & Liu, Y. Automated detection of Parkinson’s disease based on multiple types of sustained phonations using linear discriminant analysis and genetically optimized neural network. IEEE J. Transl. Eng. Health Med. 7, 1–10 (2019).
Daoudi, K., Das, B., Tykalova, T., Klempir, J. & Rusz, J. Speech acoustic indices for differential diagnosis between Parkinson’s disease, multiple system atrophy and progressive supranuclear palsy. npj Parkinson’s Dis. 8(1), 142 (2022).
Acknowledgements
The authors declare the use of AI tool for correcting grammar related mistakes in some paragraphs of the manuscript.
Funding
Open access funding provided by Óbuda University.
Author information
Authors and Affiliations
Contributions
L.A. and A.J.: Conceptualization A.N. and H.T.R.: Methodology S.K.: Software and Validation A.H.G.: Supervision.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Ali, L., Javeed, A., Noor, A. et al. Parkinson’s disease detection based on features refinement through L1 regularized SVM and deep neural network. Sci Rep 14, 1333 (2024). https://doi.org/10.1038/s41598-024-51600-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-024-51600-y
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.