Data-driven discovery of movement-linked heterogeneity in neurodegenerative diseases


Neurodegenerative diseases manifest different motor and cognitive signs and symptoms that are highly heterogeneous. Parsing these heterogeneities may lead to an improved understanding of underlying disease mechanisms; however, current methods are dependent on clinical assessments and an arbitrary choice of behavioural tests. Here we present a data-driven subtyping approach using video-captured human motion and brain functional connectivity from resting-state functional magnetic resonance imaging. We applied our framework to a cohort of individuals at different stages of Parkinson’s disease. The process mapped the data to low-dimensional measures by projecting them onto a canonical correlation space that identified three Parkinson’s disease subtypes: subtype I was characterized by motor difficulties and poor visuospatial abilities; subtype II exhibited difficulties in non-motor components of activities of daily living and motor complications (dyskinesias and motor fluctuations) and subtype III was characterized by predominant tremor symptoms. We conducted a convergent validity analysis by comparing our approach to existing and widely used approaches. The compared approaches yielded subtypes that were adequately well-clustered in the motion-brain representation space we created to delineate subtypes. Our data-driven approach, contrary to other forms of subtyping, derived biomarkers predictive of motion impairment and subtype memberships that were captured objectively by digital videos.

Fig. 1: Our data-driven PD subtyping using motion (from videos) and brain FC (from rs-fMRI).
Fig. 2: Subtype approach comparison.
Fig. 3: Subtype (digital) biomarker discovery.

Data availability

We provided a toy dataset of select motion encoding outputs and fMRI subnetworks with our code release62. To protect study participant privacy, we are unable to release the clinical data or the gait examination videos. The NTU RGB+D dataset54 used to pretrain the motion encoder model is available at

Code availability

We have released code for subtype analysis and predicting motor impairment using FC data62. The previously published code for the GaitForeMer motion encoder is available at


This research was supported in part by National Institutes of Health grant nos. AA010723 (E.V.S.), AA017347 (E.V.S., E.A.), AG047366 (V.W.H., K.L.P., E.A.), MH113406 (K.M.P.) and AG066515 (V.W.H.). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. This study was also supported by the Stanford School of Medicine Department of Psychiatry and Behavioral Sciences Jaswa Innovator Award (E.A.) and the Stanford Institute for Human-Centered Artificial Intelligence GCP Cloud Credit (E.A.).

M.E. was responsible for methodology, investigation, software, visualization and writing of the original draft. F.N. was responsible for the methodology. Q.Z. was responsible for the methodology and for reviewing and editing the paper. E.V.S. was responsible for the methodology and for reviewing and editing the paper. L.F.-F. was responsible for the methodology. V.W.H. conceptualized the project. K.M.P. was responsible for the methodology and for reviewing and editing the paper. K.L.P. conceptualized the project and was responsible for the methodology and for reviewing and editing the paper. E.A. conceptualized the project and was responsible for project administration, supervision, writing of the original draft, and reviewing and editing the paper.

Correspondence to Ehsan Adeli.

Competing interests

The authors declare no competing interests.

Peer review information

Nature Machine Intelligence thanks Carl Yang, Jing Zhang and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Full visualization of subtype characteristics.

The displayed variables exhibited statistically significant (P < 0.05) levels of expression between subtypes (N = 30) as measured by the two-sided Chi-square test for categorical variables and the Kruskal-Wallis test for continuous variables. For each box, the central line indicates the subtype median value, and the top and bottom edges indicate the 75th and 25th percentiles, respectively. The whiskers extend to 1.5 times the interquartile range, and data points beyond the whiskers are represented using the ♦ symbol. Each subtype mean value is overlayed as a gold × symbol, and dashed gold lines connect subtype mean values across subtypes. Overall, Subtype I had the highest disease severity with more severe problems relating to motor aspects of experiences of daily living (MDS-UPDRS Part-II), higher observed motor difficulties (MDS-UPDRS Part-III excluding tremor), and lower visuospatial abilities (JLO). Subtype II exhibited more difficulties in non-motor aspects of experiences of daily living (MDS-UPDRS Part-I) and higher motor complications (MDS-UPDRS Part -V). Subtype III exhibited the lowest disease severity despite not being the youngest cohort. Generally, individuals in this subtype exhibited tremor symptoms, particularly for rest tremor amplitude of upper extremities and consistency of rest tremor. Yet, some individuals within the cohort had no tremor.

Extended Data Fig. 2 Relationship between joint movement/rotation and PD gait impairment severity.

Visualization of motion metrics that exhibit a significant difference (two-sided t-test P < 0.05) between individuals without gait impairment (MDS-UPDRS 3.10 score 0, N = 9) and individuals with gait impairment (MDS-UPDRS 3.10 scores 1-3, N = 45). For each metric, the left plot shows the difference in motion metric expression between individuals without gait impairment and individuals with gait impairment. The right plot depicts the difference in motion metric expression with respect to gait impairment severity. On each box, the central line indicates the median value, and the top and bottom edges indicate the 75th and 25th percentiles, respectively. The whiskers extend to 1.5 times the interquartile range, and data points beyond the whiskers are represented using the ♦ symbol. For plots with statistically significant (P < 0.05) metric values between groups, each group mean value is overlayed as a × symbol, and dashed lines connect the mean group values. Motion metric trends that exist between individuals without gait impairment versus with gait impairment don’t necessarily hold true and are sometimes even reversed when observing individuals with mild gait impairment versus more severe gait impairment.

Extended Data Fig. 3 Analyzing disease duration and onset age.

(A) Visualization of disease duration and onset age across subtypes (N = 30). For each box, the central line indicates the subtype median value, and the top and bottom edges indicate the 75th and 25th percentiles, respectively. The whiskers extend to 1.5 times the interquartile range, and data points beyond the whiskers are represented using the ♦ symbol. For the onset age variable which is significantly different between groups (P < 0.05 from Kruskal-Wallis test), each subtype mean value is overlayed as a gold × symbol, and dashed gold lines connect subtype mean values across subtypes. While disease duration is not significantly different across the three subtypes, onset age is significant and exhibits a pattern similar to the age variable. (B) Visualization of disease duration and age onset in our subtyping space. CCX signifies the motion component of our space. Each oval represents a single individual since there are multiple data points per individual.

Extended Data Fig. 4 Visualization of extracted measurements from gait videos with respect to cohort (PD vs. CTRL) and gait impairment score within the PD cohort.

For each metric, the left plot compares metric values across the Healthy/Control group and the PD group. The right plot compares the metric values across different levels of gait impairment severity (MDS-UPDRS 3.10) among the PD group. On each box, the central line indicates the median value, and the top and bottom edges indicate the 75th and 25th percentiles, respectively. The whiskers extend to 1.5 times the interquartile range, and data points beyond the whiskers are represented using the ♦ symbol. For plots with statistically significant metric values between groups (P < 0.05 from two-sided t-test for left plots and P < 0.05 from Kruskall-Wallis for right plots), each group mean value is overlayed as a × symbol, and dashed lines connect the mean group values. For plots without statistical significance, the overall metric mean across all groups is drawn as a dashed line.

Supplementary Information

Supplementary Figs. 1–7, Discussion on brain pathways’ link to motor impairment and Supplementary Table 1.

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Endo, M., Nerrise, F., Zhao, Q. et al. Data-driven discovery of movement-linked heterogeneity in neurodegenerative diseases. Nat Mach Intell 6, 1034–1045 (2024).

