Phonemes based detection of parkinson’s disease for telehealth applications

Pah, Nemuel D.; Motin, Mohammod A.; Kumar, Dinesh K.

doi:10.1038/s41598-022-13865-z

Download PDF

Article
Open access
Published: 11 June 2022

Phonemes based detection of parkinson’s disease for telehealth applications

Nemuel D. Pah^1,2,
Mohammod A. Motin^2,3 &
Dinesh K. Kumar²

Scientific Reports volume 12, Article number: 9687 (2022) Cite this article

3301 Accesses
16 Citations
5 Altmetric
Metrics details

Subjects

Abstract

Dysarthria is an early symptom of Parkinson’s disease (PD) which has been proposed for detection and monitoring of the disease with potential for telehealth. However, with inherent differences between voices of different people, computerized analysis have not demonstrated high performance that is consistent for different datasets. The aim of this study was to improve the performance in detecting PD voices and test this with different datasets. This study has investigated the effectiveness of three groups of phoneme parameters, i.e. voice intensity variation, perturbation of glottal vibration, and apparent vocal tract length (VTL) for differentiating people with PD from healthy subjects using two public databases. The parameters were extracted from five sustained phonemes; /a/, /e/, /i/, /o/, and /u/, recorded from 50 PD patients and 50 healthy subjects of PC-GITA dataset. The features were statistically investigated, and then classified using Support Vector Machine (SVM). This was repeated on Viswanathan dataset with smartphone-based recordings of /a/, /o/, and /m/ of 24 PD and 22 age-matched healthy people. VTL parameters gave the highest difference between voices of people with PD and healthy subjects; classification accuracy with the five vowels of PC-GITA dataset was 84.3% while the accuracy for other features was between 54% and 69.2%. The accuracy for Viswanathan’s dataset was 96.0%. This study has demonstrated that VTL obtained from the recording of phonemes using smartphone can accurately identify people with PD. The analysis was fully computerized and automated, and this has the potential for telehealth diagnosis for PD.

An integrated biometric voice and facial features for early detection of Parkinson’s disease

Article Open access 29 October 2022

A machine learning method to process voice samples for identification of Parkinson’s disease

Article Open access 23 November 2023

Phonetic relevance and phonemic grouping of speech in the automatic detection of Parkinson’s Disease

Article Open access 13 December 2019

Introduction

Parkinson’s disease (PD) is the second most common neurodegenerative disorder¹ and its prevalence is expected to increase with an aging population. It is multisymptomatic with a number of motor and non-motor impairments^2,3. Its diagnosis is based on clinical assessment and the presence of two or more motor symptoms of tremor, rigidity, bradykinesia, or postural impairment or non-motor symptoms such as dysarthria, functional impairment or cognitive impairment are indicative of the disease⁴.

One of the early symptoms of PD is speech impairment, termed as Parkinsonian hypokinetic dysarthria. Speech symptoms are reported by 90% of people with PD^5,6. The evaluation of Parkinsonian speech reveals a variety of disturbances such as reduced voice intensity, increased voice nasality, increased acoustic noise, reduced speech prosody, imprecise articulation, significantly narrower pitch range, mono loudness, longer pauses, vocal tremor, harsh and breathy voice quality, and disfluency^7,8. Many of these are based on speech, which are limited by factors such as language skills or poor visual and auditory functions. Voice-based assessments have the advantage that these are more universal^9,10.

Hypokinetic dysarthria is caused by poor activation and coordination of the speech production muscles^8,11. The stiffness and tremor of the larynx muscle harden the vocal cords affects the vibration of the vocal cords and causes changes to the fundamental frequency, inadequate closed phases, and irregular or asymmetrical vocal motion during phonation^8,12. The reduced controllability of the diaphragm muscles causes unstable phonatory airflow and pneumatic pressure to the larynx^8,13,14. People with PD also have reduced control of other vocal tract muscles such as the tongue and lips.

The standard clinical method for classifying parkinsonian voice is by perceptual evaluation, which however is subjective¹⁵. Computerized voice analysis has been proposed for a more accurate, objective, and quantifiable alternative, which could also have the potential for telehealth and remote monitoring of the patients.

Studies on the effective Parkinsonian speech and voice biomarkers are clustered into four aspects: phonatory, articulatory, prosodic, and linguistic¹⁶. The study based on articulatory, prosodic, and linguistic aspects¹⁷ involves broad factors such as the psychology, linguistics, and cognitive conditions of patients. On the other hand, phonatory aspects of a sustained phoneme are less influenced by the above factors.

Studies have investigated the effectiveness of sustained phoneme parameters in representing the phenomenon of Parkinsonian hypokinetic dysarthria^{16,18,19,20,21}. Most of the studies were focused on the parameters that are closely related to impairments in vocal cord vibration. The pitch frequency variation, number of pulses, jitter (perturbation of the glottal vibration period), shimmer (amplitude perturbation of glottal vibration), autocorrelation, and harmonics to noise ratio (HNR/NHR) were used in the authors previous work²², as well as in the work of Orozco-Arroyave²³, Behroozi et al.²⁴, Tsanas and Little²⁵, Ali et al.²⁶, Sakar et al.¹⁹, and Rusz et al.⁶.

Machine-based analysis can be correlated with perceptual features such as voice quality, loudness, pitch, and resonance. Some of the characteristics that have been assessed and found suitable for Parkinsonian voice are vocal intensity, jitter (frequency variability), shimmer (amplitude variability), harmonics to noise ratio (HNR), fundamental frequency (F₀), and formant frequency profiles^{19,23,25,26,27,28,29}.

Speech production features extracted from the glottal waveform remove the effect of articulation on the acoustic signal. They approximate the volume velocity of the air flowing through the vocal folds and may have an advantage for the analysis of the pathological voice.

Physiologically, these glottic source features are associated with (1) the frequency, amplitude, symmetry, and periodicity of vocal fold vibration; (2) the competency of glottic closure, and (3) speed of the vibratory cycle and the ratio of its open to closed phases. Breathiness, the hallmark perceptual voice quality of parkinsonian speech, is associated with incomplete closure of the vocal folds leading to air escape, and thus the presence of relatively higher noise in the voice, lowered the intensity and a predominance of the open phase of glottic pulse^8,30. People with PD have higher jitter and lower HNR, associated with aperiodicity of vocal fold vibration and perceived as roughness. Connected speech of people with PD is monotonous and has reduced pitch and loudness variation.

Perez³¹ combined the above parameters with thirteen Mel Frequency Cepstral Coefficients (MFCCs) that represent the energy and articulatory positions. Fractal dimension (FD) features that measure the complexity of the signal was used by Viswanathan et al.³². More recently, multivariate deep-features have been found to be effective³³.

Even though the above studies have demonstrated some significant differences between the voice parameters of controls and people with PD, their implementation in a generalized automatic system is not straightforward³⁴. There is also evidence of inconsistent results between different studies³².

Gillivan-Murphy³⁵ published preliminary findings based on nasolaryngoscopy which shows that PD voice tremor is not associated with the vocal folds. PD voice tremor is likely to be related to oscillatory movement in structures across the vocal tract rather than just the vocal folds. Furthermore, pronouncing a phoneme is a voluntary activity while PD tremors exist during rest. This may result in an inconsistent appearance of voice tremor in sustained and steady phoneme recordings which is essential for glottal vibration parameters.

The parameters other than the glottal vibration parameters that may potentially be used in PD identification are the parameters related to phonatory airflow and pneumatic pressure to the larynx such as voice intensity and the parameters related to vocal tract muscles such as formants and Vocal Tract Length (VTL)^36,37.

This study has investigated and compared the effectiveness of three groups of parameters to differentiate the voice of people with PD from that of age-matched healthy participants. These are related to three domains of speech production control: (i) the stability of lung control, (ii) the periodicity and stability of glottal vibration control, and (iii) the stability of vocal tract control. Standard deviation (SD) and range of phonemes intensity were used to measure the lung stability while the shimmer, jitter, SD of pitch, and harmonics parameters were used for the stability of glottal vibration. The vocal tract stability was represented by the SD of the first four formants and the apparent Vocal Tract Length (VTL).

The comparison was examined using a statistical hypothesis test, followed by classification using the Support Vector Machine (SVM). The parameters were extracted from the recordings of sustained phonemes /a/, /e/, /i/, /o/, and /u/. Public database PC-GITA was used for this study. To evaluate the consistency of the method between different datasets, the SVM classifications were also applied to Viswanathan’s dataset³⁸ which contains the recordings of /a/, /o/, and /m/.

Methods

Database of recordings

Two databases of recordings were used in this study. The first is the publicly available database, PC-GITA, provided by Rafael Orozco et al.²³. It contains the recordings of 100 Columbian-Spanish native speakers, 50 of them were diagnosed with PD, and the other 50 were age and gender-matched participants with no PD or any other neurological disease symptoms. Table 1 presents participants’ demographic and clinical information. The p-values in the table confirm that there was no significant age difference between the groups as well as showing the matched clinical stage between male and female groups of PD subjects. The speech recording of the PD subjects was conducted within 3-h after their morning medication and hence has been in pharmacological ON-state. The procedure complied with the Helsinki Declaration and was approved by the Ethics Committee of the Clinica Noel, in Medellin, Colombia.

Table 1 Participants’ demographics of PC-GITA database.

Full size table

The recordings were captured in noise-controlled conditions and sampled at 44,100 Hz with 16 resolution bits, using a dynamic omnidirectional microphone (Shure, SM 63L). In this study, we use the recording of the five vowels /a/, /e/, /i/, /o/, and /u/. The participants produced three repetitions of the sustained vowel, each done as long as possible in one breath, at their natural pitch and loudness. Figure 1 illustrates the waveforms of the five vowels recorded from control and PD patients.

The second is the Viswanathan’s dataset³² available publicly on request. This has the recordings from 24 people with PD and 22 people with no neurological disease and age-matched with PD, referred to as Controls. The people with PD were recruited from the Movement Disorders Clinic at Monash Medical Centre, Australia. All people with PD have been diagnosed within the last ten years. Three sustained phonemes /a/, /o/, and /m/ were recorded from each participant in a noise-restricted environment using Samson-SE50 microphone. The recordings were stored in a single-channel WAV format with a sampling rate of 48 kHz and a 16-bit resolution. The sustained phonemes of people with PD in the database were recorded in on-state and off-state medication. However, for this study, only the on-state recordings were used. Table 2 provides the demographics of the subjects. The detailed information can be found in^22,32.

Table 2 Participants’ demographics of Viswanathan’s database.

Full size table

Parameter extraction

A publicly available speech analysis software, Praat³⁹, was used to extract speech features from the recordings. Before features extraction, the recordings were trimmed to a uniform duration of 0.5 s based on the assumption that vowels correspond to largely stationary signals. The recordings were filtered with an IIR 4th order Butterworth band-pass filter of 50 Hz to 4 kHz.

Voice intensity parameters

The voice intensity is controlled by the subglottal pressure, which is controlled by the respiratory muscles and the lung volume⁴⁰ and thus, it is hypothesized that people with PD will have increased variation and reduced range of the voice intensity. The standard deviation and range of intensity are proportional to the fluctuation of lung pressure during the pronunciation of the sustained phoneme that may capture the tremor or rigidity due to Parkinson's disease.

The standard deviation and range of voice intensity were obtained for each recording. The parameters measure the ability of the subject to keep the stability of air pressure produced by the lung. The intensity, I (in dB), of an input voice s(t) with a duration of T, were calculated using Praat’s function with energy averaging method as in Eq. (1).

$$I=10 {log}_{10}\frac{1}{T}{\int }_{0}^{T}{10}^{\frac{s(t)/}{10}}dt$$

(1)

Periodicity and stability of glottal vibration

It is commonly assumed that Parkinsonian dysarthria is affected by the abnormal vibration of the vocal cords, such as the inadequate or excessive closing of the vocal cords and irregular or asymmetrical vocal fold, as well as a tremor in its muscles^8,34,35. A total of 6 parameters related to the periodicity and stability of glottal vibration were extracted from each recording. The parameters were jitter absolute (abs), jitter relative (rel), the absolute shimmer (in dB), the relative shimmer, the standard deviation of pitch frequency (f₀), the HNR, and the NHR.

The jitter parameters⁴¹ were related to time perturbation glottal pulses, T_i. The equation to calculate the two jitter parameters⁴¹ are shown in Eqs. (2) and (3):

$$Jitter\left(abs\right)=\frac{1}{N-1}\sum_{i=1}^{N-1}\left|{T}_{i+1}-{T}_{i}\right|$$

(2)

$$Jitter\left(rel\right)=\frac{\frac{1}{N-1}\sum_{i=1}^{N-1}\left|{T}_{i+1}-{T}_{i}\right|}{\frac{1}{N}\sum_{i=1}^{N}{T}_{i}}$$

(3)

The shimmer parameters⁴¹ were related to amplitude perturbation of the glottal cycles. The parameters were calculated with Eqs. (4) and (5):

$$Shimmer\left(abs,dB\right)=\frac{1}{N-1}\sum_{i=1}^{N-1}\left|20*\mathrm{log}\left(\frac{{A}_{i+1}}{{A}_{i}}\right)\right|$$

(4)

$$Shimmer\left(rel\right)=\frac{\frac{1}{N-1}\sum_{i=1}^{N-1}\left|{A}_{i+1}-{A}_{i}\right|}{\frac{1}{N}\sum_{i=1}^{N}{A}_{i}}$$

(5)

The standard deviation of the pitch was calculated based on the instantaneous pitch frequency f_{0 i} = 1/T_i. The HNR and NHR were calculated based on the normalized autocorrelation function of the segment. R_xx[T₀] is the peak next to the center of R_xx at a distance corresponding to the T₀ of the recording. The HNR and NHR were calculated as described in Eqs. (6) and (7)^42,43:

$$HNR=10*log\frac{{R}_{xx}[{T}_{0}]}{1-{R}_{xx}[{T}_{0}]}$$

(6)

$$NHR=1-{R}_{xx}[{T}_{0}]$$

(7)

Formants parameters

The limitations of the control in the speech production process by the people with PD leads to some disturbances including the change in phonatory and resonant characteristics³⁴. The disturbances in the resonant characteristics are due to an inaccurate position of the articulators or a lack of control of vocal tract muscles. The accurate position and control of vocal tract muscles can be observed in the fluctuation of formants frequencies. The stability of vocal tract control in this study was measured with a standard deviation of the first four formants (F₁, F₂, F₃, and F₄) and the Vocal Tract Length (VTL). The formants of each recording were extracted from Praat using Burg’s method⁴⁴ with a maximum formant value of 5.5 kHz, a window length of 25 ms, a time step of 6.25 ms, and a pre-emphasis from 50 Hz. The mean and standard deviation were then calculated for each recording.

Vocal tract length

The other parameter that captures the resonant characteristic of the vocal tube model of voice production is the apparent vocal tract length (VTL). VTL is the estimation of the physical vocal tract length of a subject while pronouncing a specific voice based on formants frequency. VTL has been used in other voice analyses such as speaker verification⁴⁵, identifying body measures^36,46.

VTL of each recording was calculated (in cm) from the mean of the four formants, F_i, with the formula in Pisanski et al.³⁶.

$$VTL({F}_{i})=(2i-1)\frac{c}{4 {F}_{i}}$$

(8)

The constant, c = 33,500 cm/s, is the speed of sound in a uniform tube with one end closed. A total of four VTL were calculated for each recording associated with each formant, F_i.

Statistical analysis

The mean and standard deviation of all the parameters were computed for the two groups of the PC-GITA database: PD and CO. The normality of the extracted parameters was examined with the Anderson–Darling test⁴⁷. Mann Whitney U-test⁴⁸ was used to compare the group differences for speech parameters between PD and control subjects. The 95% confidence level was considered for the analysis and p-value < 0.05 to indicate that the mean of the groups was significantly different. All the statistical analyses were performed using MATLAB2018b (MathWorks).

Support vector machine classification

The effectiveness of the parameters to classify PD and control subjects was investigated with Support Vector Machines (SVM)⁴⁹ classifier. The SVM was trained with a Gaussian kernel and validated using “leave-one-out” cross-validation. The Gaussian kernel was selected anecdotally since it yielded the best result compared to the other kernels. The input to the SVM were the sets of voice parameters and the ten highest-ranked features, selected using the Relief-F algorithm⁵⁰ with 10 nearest neighbors (k = 10). The classification accuracy, sensitivity, and selectivity were evaluated based on the true-positive (TP), true-negative (TN), false-positive (FP), and false-negative (FN).

Ethics

This paper reports the analysis of two datasets: Viswanathan and PC-GITA. Viswanathan dataset was developed using the research protocol for analysis was approved by RMIT University human experiments Committee for Ethics in Human Research and the experiments were performed in accordance with Helsinki declaration for ethical experiments, revised 2013. PC-GITA dataset was developed based on the procedure that complied with the Helsinki Declaration and was approved by the Ethics Committee of the Clinica Noel, in Medellin, Colombia. Both database confirm that all participants provided written consent for the experiments.

Results

Statistical analysis

The Anderson–Darling test confirmed that except for some VTL parameters, the parameters were not normally distributed. Mann Whitney U-test, a non-parametric test, was thus used to test for group differences in each of the features. Table 3 provides the statistical distribution (mean ± SD) and p-value and effective size of Mann Whitney U-test between CO and PD for all the features. The table shows that the parameters of people with PD fluctuated more than CO. The voice intensity of people with PD has both higher SD and range, which indicates their diminished ability to produce sustained phonemes with stable air pressure. The p < 0.05 shows that the group difference was significant.

Table 3 Statistical distribution and the result of Mann Whitney U-test.

Full size table

The statistical distribution of the glottal vibration parameters, i.e., jitter, shimmer, SD of pitch, was significantly higher for people with PD compared to the CO, with p-value < 0.05. The HNR and NHR distribution show that PD voice had higher noise (non-periodic) components compared to healthy people.

For vocal tract parameters, except for phoneme /o/ and /u/, the first three formants (F₁, F₂, and F₃) of PD patients have a significantly higher standard deviation compared to the normal subjects. The majority of VTL parameters did not show significant differences between PD and normal subjects. The p-value and effect size confirm that statistically, the mean of the groups was not significantly different.

SVM classification

The SVM classification results of recordings from the PC-GITA database for the four groups of input parameters are shown in Table 4. It presents the accuracy, sensitivity, and selectivity when considering each vowel independently and with the combination of the five vowels. For the sake of presentation simplicity and without loss to the outcome of this work, the table only presents the results of the vowel combination with significant accuracy. The results show that the classification accuracy of 84.3% was obtained with the combination of all the vowels when the SVM input were VTL(F_i); the overall observation is that VTL is the most effective feature to distinguish between voice of PD and CO. The SVM classification accuracy was 71.2% when it was given the ten highest-ranked features selected by the Relief-F algorithm. The ten highest-ranked features selected by Relief-F algorithm were dominated by the VTL (VTL(F4) of/o/; VTL(F1) of /i/;VTL(F2) of /o/; VTL(F3) of /u/; std(F1) of /o/; std(F2) of /o/; VTL(F1) of /e/; VTL(F1) of /a/; VTL(F2) of /i/; VTL(F2) of /u/). Comparing the vowels, the VTL of /i/ was the most effective parameter with an accuracy of 73.0%. The percentage of sensitivity and selectivity was about at the same level as the accuracy for almost all the input configurations.

Table 4 The SVM classification results of PC-GITA database.

Full size table

To evaluate the consistency of SVM classification using VTL(F_i) in different databases, the SVM classifications using VTL(F_i) were also applied to Viswanathan’s dataset³⁸ which contains the recordings of /a/, /o/, and /m/. Table 5 provides the classification results of the recordings in the database. The table shows that the SVM classification using VTL(Fi) as input parameters performs consistently with different databases. The highest accuracy was 96.0% with the combination of VTL(Fi) of /a/ and /m/, while an accuracy of 94.0% was obtained with the combination of /a/, /o/, and /m/.

Table 5 The SVM classification results of Viswanathan’s database.

Full size table

Discussion

Several earlier studies that have proposed the use of voice-based diagnosis and assessment of Parkinson's disease^{16,18,19,20,21,22}. These studies used the vocal cord vibration parameters such as pitch frequency variation, number of pulses, jitter, shimmer, autocorrelation, and harmonics to noise ratio (HNR/NHR). While these studies showed the potential of voice-based biomarkers for Parkinson’s disease, these show inconsistent results in different databases^6,23. As an example, the vocal cord vibration parameters based analysis gave classification accuracy of 78.1% in Viswanathan’s dataset²² but performed poorly for PC-GITA dataset as shown in Table 4 (70.9% of accuracy).

This study has identified VTL as a potential parameter to be used in the classification of PD patients based on sustained phoneme recordings. The parameters have achieved 84.3% accuracy, 84.0% sensitivity, 84.7% specificity when used in PC-GITA database with five vowels /a/, /e/, /i/, /o/, and /u/. This study showed the consistency of the parameters when applied in different datasets. Table 5 shows that when applied in Viswanathan's datasets, VTL parameters could classify PD patients from healthy subjects with an accuracy of 96.0%.

This study has shown that among the features reported in the literature, VTL features are most suitable for differentiating the voice of people with PD from that of Control. VTL is an approximate measure of the physical vocal tract length while producing voice. The shape and length of the vocal tract affect the value and space of formants. Longer vocal tracts produce lower, more closely spaced formants³⁶. Although the length of the vocal tract mainly depends on the physical body structure, the study of Piransky et al.³⁷ found that a person may voluntarily modify the length of the vocal tract up to 25%. The result reported in this paper indicates the possible relation between the modification of vocal tract length by a subject with a symptom of PD. When a PD patient, due to the reduction in the ability to control speech muscle, modifies the length of the vocal tract, the properties of voice modulation in the vocal tract change. The relation is a higher-order relation. The linear separation by statistic test could not properly separate the PD from healthy subjects.

The novelty of this study is the high performance in differentiating between voices of PD from Controls, and which is consistent for two different databases. We are the first study that investigated the use of VTL to identify voices of people with PD and found that VTL parameters outperformed the features reported in the literature that are related to perturbation of glottal vibration, such as jitter, shimmer, pitch frequency, and harmonics ratio. The finding in this study suggests and supports the argument in³⁵ that the neuro-physiology change in PD patients is manifested more in the change of vocal tract control compared to glottal vibration or air pressure control by the lung. This opens the potential for computerized and remote monitoring of people with PD.

The limitation of this study is we have only investigated two databases; Columbian-Spanish native speakers and Australian native speakers. Further study needs to be conducted of people from other demographics and ethnicity to validate the findings for global use. While the size of the datasets are sufficient, larger datasets are required that will allow the examination of the various confounding factors. There is also the need to investigate the effect of PD medication such as Levodopa on these parameters and to test this over repeated voice recordings.

Conclusion

This study has investigated the effectiveness of using three sets of voice features of sustained phonemes to differentiate people with PD from age-matched healthy participants using two independent and different sets of publicly available databases. It has found that the most effective feature set was using apparent vocal tract length (VTL). The classification accuracy in identifying PD from control was 84.3% when combining the VTL features of all the five vowels /a/, /e/, /i/, /o/, and /u/. The classification accuracy when using /a/, /o/ and /m/ using Viswanathan dataset obtained using smartphone was 96%. This performance was significantly higher than the accuracy obtained when using the glottal vibration parameters (jitter, shimmer, pitch, and harmonics) and voice intensity. Another advantage of VTL parameters is that there were obtained automatically and thus suitable for computerized analysis of the voice recordings using smartphones. Unlike deep-learning approach, this method has the benefit because it has identified the specific voice parameters which allows the clinician to understand the differences. This has the potential for telephone-based diagnosis for PD.

Data availability

We have used publicly available datasets. GITA dataset is available on request from Orzoco et al. (reference²³). Viswanathan dataset is available from contact of reference³².

References

de Lau, L. M. & Breteler, M. M. Epidemiology of Parkinson’s disease. Lancet Neurol. 5(6), 525–535 (2006).
Article PubMed Google Scholar
Poewe, W. et al. Parkinson disease. Nat. Rev. Dis. Prim. 3, 7013 (2017).
Google Scholar
Tautan, A.-M., Ionescu, B. & Santarnecchi, E. Artificial intelligence in neurodegenerative diseases: A review of available tools with a focus on machine learning techniques. Artif. Intell. Med. 117, 1 (2021).
Article Google Scholar
Simonet, C., Schrag, A., Lees, A. J. & Noyce, A. J. The motor prodromes of parkinson’s disease: From bedside observation to large-scale application. J. Neurol. 1, 1–10 (2019).
Google Scholar
Trail, M. et al. Speech treatment for Parkinson’s disease. NeuroRehabilitation 20(3), 205–221 (2005).
Article PubMed Google Scholar
Rusz, J., Cmejla, R., Ruzickova, H. & Ruzicka, E. Quantitative acoustic measurements for characterization of speech and voice disorders in early untreated Parkinson’s disease. J. Acoust. Soc. Am. 129(1), 350–367 (2011).
Article ADS CAS PubMed Google Scholar
Vaiciukynas, E., Verikas, A., Gelzinis, A. & Bacauskiene, M. Detecting Parkinson’s disease from sustained phonation and speech signals. PLoS ONE 12(10), 1–16 (2017).
Article CAS Google Scholar
Yang, S. et al. The physical significance of acoustic parameters and its clinical significance of dysarthria in Parkinson’s disease. Sci. Rep. 10(11776), 1–9 (2020).
Google Scholar
Tsanas, A., Little, M. A., McSharry, P. E., Spielman, J. & Ramig, L. O. Novel speech signal processing algorithms for high-accuracy classification of Parkinson’s disease. IEEE Trans. Biomed. Eng. 59(5), 1264–1271 (2012).
Article PubMed Google Scholar
I. R. Titze, Principles of voice production, 1st Editio. Prentice Hall (1994).
Huang, M. et al. Chapter 2: The Reasoning of Dysarthria in Parkinson ’ s Disease”, in Neurodegenerative Diseases Symptoms and Treatment (Las Vegas, 2019).
Google Scholar
Silbergleit, A. K., LeWitt, P. A., Peterson, E. L. & Gardner, G. M. Quantitative analysis of voice in Parkinson disease compared to motor performance: A pilot study. J. Park. Dis. 5, 517–524 (2015).
Google Scholar
Jiang, H. D., O’Mara, T., Chen, H. J., Stern, J. I. & Vlagos, D. Aerodynamic measurements of patients with Parkinson’s disease. J. Voice 13, 4 (1999).
Article Google Scholar
Hammer, M. J. Aerodynamic assessment of phonatory onset in Parkinson’s disease: evidence of decreased scaling of laryngeal and respiratory control. Park. Dis. 3, 173–179 (2013).
Google Scholar
Bjornestad, A., Tysnes, O., Larsen, J. P. & Alves, G. Reliability of three disability scales for detection of independence loss in Parkinson’s disease. Park. Dis. 1, 1 (2016).
Google Scholar
Moro-Velázquez, L. et al. Analysis of speaker recognition methodologies and the influence of kinetic changes to automatically detect Parkinson’s Disease. Appl. Soft Comput. 62, 649–666 (2018).
Article Google Scholar
Rusz, J. et al. Imprecise vowel articulation as a potential early marker of Parkinson’s disease: Effect of speaking task. J. Acoust. Soc. Am. 134(3), 2171–2181 (2013).
Article ADS PubMed Google Scholar
Goyal, J., Khandnor, P. & Aseri, T. C. Engineering applications of artificial intelligence classification, prediction, and monitoring of Parkinson’s disease using computer assisted technologies: A comparative analysis. Eng. Appl. Artif. Intell. 96, 3955 (2020).
Article Google Scholar
Sakar, B. E. et al. Collection and analysis of a parkinson speech dataset with multiple types of sound recordings. IEEE J. Biomed. Heal. Inf. 17(4), 828–834 (2013).
Article Google Scholar
Sakar, C. O. et al. A comparative analysis of speech signal processing algorithms for Parkinson’s disease classification and the use of the tunable Q-factor wavelet transform. Appl. Soft Comput. J. 74, 1 (2019).
Article Google Scholar
Braga, D., Madureira, A. M., Coelho, L. & Ajith, R. Automatic detection of Parkinson’s disease based on acoustic analysis of speech. Eng. Appl. Artif. Intell. 77, 148–158 (2019).
Article Google Scholar
Pah, N. D., Motin, M. A., Kempster, P. & Kumar, D. K. Detecting effect of levodopa in Parkinson’s disease patients using sustained phonemes. IEEE J. Transl. Eng. Heal. Med. 1, 1 (2021).
Google Scholar
Orozco-Arroyave, J. R., Arias-Ledono, J. D., Vargas-Bonilla, J. F., & Gonzalez-Rativa, M. C. New Spanish speech corpus database for the analysis of people suffering from Parkinson’s disease. In International Conference on Language Resources and EvaluationAt: Reykjavik, Iceland (2014).
Behroozi, M. & Sami, A. A multiple-classifier framework for Parkinson’s disease detection based on various vocal tests. Int. J. Telemed. Appl. 2016(11), 6837498 (2016).
PubMed PubMed Central Google Scholar
Tsanas, A., Little, M. A., McSharry, P. E. & Ramig, L. O. Accurate telemonitoring of Parkinson’s disease progression by noninvasive speech tests. IEEE Trans. Biomed. Eng. 57(4), 884–893 (2010).
Article PubMed Google Scholar
Ali, L., Zhu, C. E., Zhang, Z. & Liu, Y. Automated detection of Parkinson’s disease based on multiple types of sustained phonations using linear discriminant analysis and genetically optimized neural network. IEEE J. Transl. Eng. Heal. Med. 7(October), 1–10 (2019).
Google Scholar
Behroozi, M. & Sami, A. A multiple-classifier framework for Parkinson’s disease detection based on various vocal tests. Int. J. Telemed. Appl. 2016(11), 1–9 (2016).
Google Scholar
Rusz, J. et al. Evaluation of speech impairment in early stages of Parkinson’s disease : A prospective study with the role of pharmacotherapy. J. Neural Transm. 120(2), 319–329 (2013).
Article CAS PubMed Google Scholar
Sechidis, K., Fusaroli, R., Orozco-arroyave, J. R., Wolf, D. & Zhang, Y. A machine learning perspective on the emotional content of Parkinsonian speech. Artif. Intell. Med. 115, 2061 (2021).
Article Google Scholar
Midi, I. et al. Voice abnormalities and their relation with motor dysfunction in Parkinson’s disease. Acta Neurol. Scand. 117(2), 26–34 (2008).
CAS PubMed Google Scholar
Pérez, C. J., Campos-Roca, Y., Naranjo, L. & Martín, J. Diagnosis and tracking of Parkinson’s disease by using automatically extracted acoustic features. J. Alzheimer’s Dis. Park. 6(5), 1 (2016).
Google Scholar
Viswanathan, R., Arjunan, S. P., Bingham, A. & Jelfs, B. Complexity measures of voice recordings as a discriminative tool for Parkinson’s disease. Biosensors 10, 1 (2019).
Article PubMed Central Google Scholar
Khojasteh, P., Viswanatha, R., Aliahmad, B., Ragnav, S., Zham, P., & Kumar, D. Parkinson’s disease diagnosis based on multivariate deep features of speech signal. IEEE Life Sci. Conf. (LSC 2018), pp. 187–190 (2018).
Godino-Llorente, J. I., Shattuck-Hufnagel, S., Choi, J. Y., Moro-Velazquez, L. & Gomez-Garcıa, J. A. Towards the identification of Idiopathic Parkinson’s Disease from the speech. New articulatory kinetic biomarkers. PLoS ONE 12(12), 1–35 (2017).
Article CAS Google Scholar
Gillivan-Murphy, P., Carding, P. & Miller, N. Vocal tract characteristics in Parkinson’s disease. Speech Ther. Rehabil. 24(3), 175–182 (2016).
Google Scholar
Pisanski, K. et al. Vocal indicators of body size in men and women: A meta-analysis. Anim. Behav. 95, 89–99 (2014).
Article Google Scholar
Pisanski, K., Cartei, V., McGettigan, C., Raine, J. & Reby, D. Voice modulation: A window into the origins of human vocal control ?. Trends Cogn. Sci. 20(4), 304–318 (2016).
Article PubMed Google Scholar
Viswanathan, R. et al. Complexity measures of voice recordings as a discriminative tool for Parkinson’s disease. Biosens 10, 1 (2019).
Article Google Scholar
Boersma, B. P. & Van Heuven, V. Speak and unSpeak with P RAAT. Glot Int. 5(9–10), 341–347 (2001).
Google Scholar
Zhang, Z. Mechanics of human voice production and control. J. Acoust. Soc. Am. 140, 4 (2016).
Google Scholar
Teixeira, J. P. & Gonçalves, A. Accuracy of jitter and shimmer measurements. Proc. Technol. 16, 1190–1199 (2014).
Article Google Scholar
Teixeira, J. P., Oliveira, C. & Lopes, C. Vocal acoustic analysis—Jitter, shimmer and HNR parameters. Procedia Technol. 9, 1112–1122 (2013).
Article Google Scholar
A. A. De Oliveira, Dajer, M. E., Fernandes, P. O., Teixeira, J. P. Clustering of voice pathologies based on sustained voice parameters. in 13th International Conference on Bio-inspired Systems and Signal Processing, 2020, pp. 280–287.
D. G. Childers, Modern spectrum analysis. IEEE Press (1978).
Sarkar, A. K. & Tan, Z. Vocal tract length perturbation for text-dependent speaker verification with autoregressive prediction coding. IEEE Signal Process. Lett. 28, 364–368 (2021).
Article ADS Google Scholar
Valentova, J. V. et al. Vocal parameters of speech and singing covary and are related to vocal attractiveness, body measures, and sociosexuality: A cross-cultural study. Front. Psychol. 10(October), 1–14 (2019).
Google Scholar
Jäntschi, L. & Bolboacă, S. D. Computation of probability associated with anderson-darling statistic. Mathematics 6(6), 1–16 (2018).
Article MATH Google Scholar
McDonald, J. H. Handbook of biological statistics 3rd edn. (Sparky House Publishing, 2014).
Google Scholar
Hamel, L. Knowledge discovery with support vector machines (John Wiley & Sons, 2009).
Book Google Scholar
Robnik Sikonja, M. & Kononenko, I. Theoretical and empirical analysis of ReliefF and RReliefF. Mach. Learn. 53, 23–69 (2003).
Article MATH Google Scholar

Download references

Acknowledgements

We acknowledge and thank Dr Rekha Viswanathan, Dr Jennifer Nagao, Ms Kitty Wong, Dr Sridhar Arjunan, and Prof Sanjay Raghav for their support for this project.

Author information

Authors and Affiliations

Electrical Engineering Department, Universitas Surabaya, Surabaya, Indonesia
Nemuel D. Pah
School of Engineering, RMIT University, Melbourne, VIC, 3000, Australia
Nemuel D. Pah, Mohammod A. Motin & Dinesh K. Kumar
Department of Electrical and Electronic Engineering, Rajshahi University of Engineering and Technology, Rajshahi, Bangladesh
Mohammod A. Motin

Authors

Nemuel D. Pah
View author publications
You can also search for this author in PubMed Google Scholar
Mohammod A. Motin
View author publications
You can also search for this author in PubMed Google Scholar
Dinesh K. Kumar
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

N.P.: responsible for signal processing and classification, and for the first draft of the manuscript. M.M.: responsible for literature review, data management, statistical analysis and manuscript editing. D.K.: Responsible for project inception and management, data management and manuscript editing.

Corresponding author

Correspondence to Dinesh K. Kumar.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Pah, N.D., Motin, M.A. & Kumar, D.K. Phonemes based detection of parkinson’s disease for telehealth applications. Sci Rep 12, 9687 (2022). https://doi.org/10.1038/s41598-022-13865-z

Download citation

Received: 24 October 2021
Accepted: 30 May 2022
Published: 11 June 2022
DOI: https://doi.org/10.1038/s41598-022-13865-z

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.