Introduction

Alzheimer’s disease (AD) is a neurodegenerative disorder characterized by the presence of amyloid-beta (Aβ) plaques, neurofibrillary tangles, and brain atrophy1,2. It is the most prevalent cause of dementia3,4,5 and has a significant social impact4. However, the clinical diagnosis of AD can be challenging due to overlapping clinical manifestations with other diseases such as frontotemporal lobar degeneration (FTLD) or late-onset psychiatric disorders. These diseases may present similar clinical signs and symptoms and occasionally may be comorbid with AD3,6,7.

Considering that Aβ is one of the defining characteristics of AD, examining Aβ-positivity may aid in differential diagnosis2,8 and precision medicine, including drug choice6,9. Nevertheless, Aβ detection is not necessarily convenient to perform in routine clinical practice. Positron emission tomography (PET)10,11 requires advanced facility requirements and careful attention to radiation exposure. Cerebrospinal fluid (CSF) testing12,13 can be risky in patients with bleeding tendencies (e.g., on anticoagulants) or increased intracranial pressure. Blood biomarkers have shown potential for high diagnostic performance in a minimally invasive manner14,15,16, but have not been applied in routine clinical practice17. Meanwhile, magnetic resonance imaging (MRI) has achieved widespread adoption in general clinical practice despite certain facility limitations. As MRI may be effective in excluding non-AD causes of cognitive impairment and contribute to the diagnosis of dementia17, MRI-based Aβ prediction may be a useful screening tool before definitive diagnosis by CSF testing or amyloid PET18,19,20.

However, visual judgment of MRI may be hindered by the heterogeneity of brain structural changes. In other words, objective, data-driven detection of subtle structural changes indicative of Aβ deposition can enhance the visual interpretation of MRI for dementia differential diagnosis, streamlining the screening process for potential participants undergoing CSF testing or amyloid PET scans18,19,20,21,22.

Large MRI datasets such as Alzheimer’s Disease Neuroimaging Initiative (ADNI)23 has facilitated data-driven approaches (e.g., machine learning) (for reviews, see Refs.21,22). Machine learning methods utilizing these datasets have achieved classification accuracy of 93–98% in distinguishing between AD and healthy controls (HC)24,25. However, one of the limitations of the previous studies is the narrow focus on the AD continuum, including AD, mild cognitive impairment (MCI), and HC20,24,25,26,27. Consequently, the results may not necessarily be generalizable to common clinical populations6,18,28,29. Another limitation of previous MRI-based models is that many predict clinical diagnoses24,25 (for reviews, see Refs.21,22) instead of Aβ deposition18,26,27,28.

To examine the brain structure patterns in neurological or psychiatric disorders, source-based morphometry (SBM), a data-driven multivariate analysis method, has garnered increasing attention30,31,32,33,34,35. SBM is a structural neuroimaging analysis technique based on independent component analysis (ICA). It uses masked gray matter (GM) images as input features, extracts independent spatial maps representing anatomical variability, and potentially detects co-varying structural patterns in the whole brain30,35. In this regard, SBM may be more data-driven than region-of-interest (ROI)-based feature extraction and potentially more sensitive than voxel-based morphometry for detecting GM atrophy30. SBM has significantly contributed to enhancing our comprehension of distinctive brain structural patterns observed in patients affected by various neurological or psychiatric disorders, including Parkinson's disease31, FTLD syndrome33, major depressive disorder34, and schizophrenia32. Applying SBM to the AD continuum, the temporo-frontoparietal component could differentiate amnestic MCI (aMCI) from HC,major hippocampal and temporal lobe atrophy and occipital atrophy could differentiate AD from aMCI and HC31.

Considering the challenges posed by the presence of AD-like signs and symptoms in other neurological or psychiatric disorders, which likely hamper diagnosis and patient management, predicting Aβ accumulation based on MR-images of heterogeneous diseases may hold greater clinical relevance. The aims of the present study were to (1) build a clinically useful prediction model for Aβ deposition from a diverse patient population using SBM, and (2) identify influential features in the model. Moreover, we assessed how the predictive accuracies varied with feature combinations.

Methods

Participants and clinical measurements

Patients clinically diagnosed with AD, MCI, FTLD, corticobasal syndrome (CBS), progressive supranuclear palsy (PSP), or psychiatric disorders were recruited between the 3rd of July, 2018 and 31st of August, 2021, from the memory clinic at Keio University Hospital. HC were also recruited as described in a previous manuscript36. For these diagnoses, the PET results were not considered.

Inclusion criteria among each diagnosis were: (1) age 40–85 years; (2) years of education ≥ 12, and (3) patients whose first language is Japanese.

The diagnosis-specific inclusion criteria were as follows:

AD: (1) clinical diagnosis of AD by a dementia specialist; (2) Logical Memory II subscale in the Wechsler Memory Scale-Revised (LM II) ≤ 8 for 16 years of education and ≤ 4 for 12–15 years of education; (3) Mini-Mental Scale Examination (MMSE) ≤ 23; and (4) clinical dementia rating (CDR) = 0.5 or 1.0.

MCI: (1) Clinical diagnosis of MCI by a dementia specialist; (2) LM II ≤ 11 for 16 years of education, and ≤ 9 for 12–15 years of education; (3) MMSE ≥ 24; and (4) CDR 0.5 (memory score 0.5).

Other diseases (i.e., FTLD, CBS, PSP, or psychiatric disorders) were diagnosed by a neurologist or psychiatrist according to the diagnostic criteria.

HC: (1) judged as cognitively normal by a dementia specialist; (2) LM II ≥ 9 for 16 years of education, ≥ 5 for 12–15 years of education; (3) MMSE ≥ 24; (4) CDR 0; and (5) Geriatric Depression Scale (GDS) < 6.

All the clinical data were obtained within 6 months from enrollment.

Standard protocol approval, registration, and patient consent

The Certified Review Board of Keio University approved the study design and protocol. The study was registered with the University Hospital Medical Information Network Clinical Trials Registry (UMIN-CTR; https://www.umin.ac.jp/ctr/index.htm, ID# UMIN000032027, the first registration: 31/03/2018) and the Japan Registry of Clinical Trials (jRCT; https://jrct.niph.go.jp/, ID# jRCTs031180225, the first registration: 11/03/2019), and was conducted in accordance with the 1964 Declaration of Helsinki and its later amendments. All participants and their proxies, if necessary, provided written informed consent.

Cognitive assessment

The following neuropsychological assessments were performed: MMSE37, Wechsler Memory Scale-Revised (WMS-R) Logical Memory38 immediate recall (LM I) and delayed recall (LM II), Word Fluency39, Trail Making Test (TMT)40, the Japanese version of Alzheimer’s Disease Assessment Scale-Cognitive subscale (ADAS-cog-J)41, Japanese Adult Reading Test (JART)42, Clinical Dementia Rating (CDR)43, and Functional Activity Questionnaire (FAQ)44.

Apolipoprotein E (APOE) genotyping

Genomic DNA was extracted from 0.2 mL whole blood using a Magnetic Nanoparticle DNA Extraction Kit (EZ1 DNA Blood 200 μL Kit). APOE genotyping (rs429358 and rs7412) was performed by real-time polymerase chain reaction (PCR) using the TaqMan probe on a CFX 96 deep well Real-Time PCR system (Bio Rad, Richmond, CA) to analyze the three major isoforms (APOE ε2, ε3, and ε4).

[18F] Florbetaben (FBB) amyloid-PET imaging

[18F] FBB was manufactured on-site using an automated synthesizer as described elsewhere45,46. Amyloid-PET images were acquired for 20 min using a PET-CT (True Point Biograph 40/64, Siemens Japan K.K., Tokyo, Japan), 90 min after intravenous injection of 300 MBq ± 20% [18F] FBB. The 20-min PET images were visually assessed by nuclear medicine experts who had completed a training program offered by the manufacturer (Piramal Imaging GmbH, Berlin, Germany). The Aβ positivity/negativity was determined based on the assessment of tracer uptake in the GM of the following four brain regions: the lateral temporal lobes, frontal lobes, posterior cingulate cortex/precuneus, and parietal lobes, in line with the NeuraCeq™ guidelines (http://www.accessdata.fda.gov/drugsatfda_docs/label/2014/204677s000lbl.pdf)47. Aβ negativity was established when tracer uptake (i.e., signal intensity) in the GM was lower than that in the white matter (WM) in all four brain regions.

MRI acquisition

High-resolution 3D T1-weighted MR-images were acquired (repetition time: 6.8 ms; echo time: 3.0 ms; flip angle: 8°; field of view: 230 mm; matrix size: 256 × 256; slice thickness: 1.0 mm; voxel size: 0.9 × 0.9 × 1.0 mm) using a Discovery MR750 3.0 T scanner (GE Healthcare, USA) with a 32-channel head coil. All images were visually checked for scanner artifacts and anatomical anomalies.

MRI pre-processing

Structural brain images were first segmented into GM, WM, and CSF using the Statistical Parametric Mapping (SPM12; Wellcome Trust Center for Neuroimaging, London, UK) toolbox CAT12 (http://www.neuro.uni-jena.de/cat/) in MATLAB (R2019a; MathWorks, Natick, Mass, USA). Segmented GM images were used to normalize the individual component images to the Montreal Neurological Institute (MNI) template48. Normalized images were modulated to preserve the total amount of signal from each voxel, resampled to an isotropic voxel size of 2 × 2 × 2 mm3, and smoothed using a 5-mm full-width-at-half-maximum Gaussian kernel.

For the subsequent pre-processing, we used SBM30,35. SBM incorporates independent component analysis (ICA) and provides automatic decomposition of a given set of anatomical brain images into independent spatial maps characterizing different modes of anatomical variability across all individuals30,35.

The preprocessed GM images were loaded with Nibabel (https://nipy.org), and a three-dimensional (3D) array of 91 × 109 × 91 voxels was transformed into a one-dimensional (1D) array of 1 × 902,629 voxels. We created a brain mask for this 1D array using the Neuromorphometric Atlas (provided by Neuromorphometrics, Inc. (http://Neuromorphometrics.com))49 and selected 208,082 voxels on which ICA was performed for all scans using the FastICA function of scikit-learn (https://scikit-learn.org/stable/), a Python machine learning library. The number of extracted independent components (ICs) was also used as a definitive hyperparameter to be tuned in subsequent model building.

After conducting the ICA, we reshaped the data matrix (i.e., ICs) back into a 3D image (91 × 109 × 91) using nipy (https://nipy.org). The 3D image was then superimposed onto the MNI-normalized template brain using BrainNet Viewer50, for visualization. The extracted ICs were used as spatial regressors for each participant's GM images (IGM).

$${\text{I}}_{{{\text{GM}}}} = \, {\rm \beta_{{1}}} {\text{IC}}_{{1}} + \, {\rm \beta_{{2}}} {\text{IC}}_{{2}} + ...{\rm \beta_{{\text{K}}}} {\text{IC}}_{{\text{K}}} .$$

In the above formula, each β represents the weighting coefficient associated with the effect of each IC for the GM image and K indicates the number of extracted ICs. Accordingly, the β-values could be loosely regarded as “weighted total gray matter volume” of the brain parcel represented by the given IC51. The β-values were then used as representative GM measures associated with each component, in the subsequent analyses.

Machine learning

We built predictive models for Aβ-positivity using scikit-learn (https://scikit-learn.org/stable/index.html)52 which is supported by Python ver. 3.4. The input feature values were based on the ICA’s β-values, demographic characteristics (i.e., age and sex), cognitive assessments, and APOE genotype. First, we used all input features and built the final model. Second, we investigated the model performance for each combination of features (e.g., brain images alone, brain images and cognitive assessments). Third, we investigated model performance for each combination of diagnoses (e.g., AD + HC and AD + MCI + HC).

Throughout the model building, we used a Gaussian kernel support vector machine (SVM) as the classifier and the model was validated using fivefold cross-validation (Additional Fig. 1). For a fivefold training/test split, the model was fitted to the training data, and the predictive value was assessed using the test data over all splits (five times). We tuned the hyperparameters (i.e., Gamma and C in SVM and the number of ICs) with a grid search in all model buildings.

To improve the interpretability of the model, we applied the SHapley Additive exPlanations (SHAP) (https://shap.readthedocs.io/en/latest/index.html) which makes the output of any machine learning model explainable as a model itself53. Based on the Shapley value in game theory, a large absolute SHAP value has a strong influence on the prediction. In the present study, the clinical features with positive and negative SHAP values were associated with Aβ-positivity and Aβ-negativity, respectively.

Statistical analysis

For the statistical analyses, we used Scipy (https://www.scipy.org), supported by Python version 3.4. Demographic and clinical variables were compared using a two-tailed t-test, or chi-square test, where appropriate. Relationships among features were examined using Pearson’s correlation analysis for continuous variables. Analysis of variance (ANOVA) was conducted to determine associations with diagnoses. Statistical significance was defined by a p-value of < 0.01 or < 0.05 after the Bonferroni correction for multiple corrections.

Ethics approval and consent to participate

The study protocol was prepared in accordance with the ethical standards of the Declaration of Helsinki and approved by the Certified Review Board of Keio University. Written informed consent was obtained from all participants who were included in the study and their proxies, if necessary.

Results

Demographic and clinical characteristics

Among 118 cases used for the final model building (AD [n = 24], MCI [n = 29], FTLD [n = 12], CBS [n = 3], PSP [n = 3], psychiatric disorders [n = 5], HC [n = 42)), 45 cases (38.1%) were Aβ-positive and 73 cases (61.9%) were Aβ-negative (Table 1). The demographic and clinical characteristics are shown in Table 1.

Table 1 Demographic and clinical characteristics.

Model performance

The final model (C = 0.01, gamma = 100, number of ICs = 7), which used brain images, cognition, and APOE data as input features achieved 89.8% accuracy (sensitivity = 88.4%, specificity = 90.7%, positive predictive value = 84.4%, negative predictive value = 93.2%), whereas the model based on brain images alone showed 84.7% accuracy (sensitivity = 82.9%, specificity = 85.7%, positive predictive value = 75.6%, negative predictive value = 90.4%) (Table 2). The area under the receiver operating characteristic curve (AUC) of the final model was 0.888 (95% confidence interval [CI] 0.854–0.973) and that of the brain images alone model was 0.830 (95% CI 0.825–0.958) (Fig. 1). The final model performance based on the combination of each feature set is presented in Table 2.

Table 2 Model performance using brain image, cognition, and APOE data for input features (by feature set).
Figure 1
figure 1

The area under the curve (AUC) of the final model and the brain image-alone model. The area under the receiver operating characteristic curve (AUC) of the final model (a) was 0.888 (95% CI 0.854–0.973), and of brain image-alone model (b) was 0.830 (95% CI 0.825–0.958).

Table 3 shows the performance of the final model to predict Aβ positivity in each diagnosis. The final model achieved an accuracy of 89.8% when including all the participants. The accuracy of the model based on AD, MCI, and HC was slightly lower (i.e. 88.4%), whereas that based solely on MCI was the lowest (i.e. 75.9%). Notably, Aβ-positivity/negativity was completely (i.e. 100%) identified in FTLD syndromes and in psychiatric disorders.

Table 3 Model performance using brain image, cognition, and APOE data for input features (by diagnosis set).

SBM

Seven ICs (IC 1–7) were derived from the final SBM model (Table 4 and Additional Fig. 2). Each component showed spatially maximally independent GM volume patterns. Upon examining the relationship between each component and clinical information, IC 1 showed a significant correlation with cognitive measures and Aβ-positivity. Meanwhile, IC 4 was significantly correlated with age (Table 4).

Table 4 Relation between each independent component and clinical data.

We assessed whether each clinical diagnosis was associated with the ICs. Only AD-diagnosis and IC 1 showed a significant association (Games-Howell test was applied for multiple comparisons, p < 0.001), whereas the other diagnoses were not associated with any ICs. The GM volume pattern of IC 1 is shown in Fig. 2. The spatial pattern of the loading coefficients from IC 1 showed higher z-scores in the lateral parietal lobes than in the other ICs.

Figure 2
figure 2

The gray matter volume pattern of independent component 1 in a three-dimensional brain map derived from source-based morphometry. A three-dimensional brain map of independent component 1. The color bar indicates the z-score. The z-score is calculated as (value—mean) / standard deviation, and regions with z-scores greater than or equal to 1 are color-coded. The 3D image was generated using BrainNet Viewer 1.7 (https://www.nitrc.org/projects/bnv).

Feature importance of the model

The SHAP values were calculated (Fig. 3), in which IC 1 showed the strongest impact on the model, followed by Logical Memory I and II, IC 3, and APOE x/4.

Figure 3
figure 3

Mean SHAP value in fivefold cross-validation. The horizontal and vertical axes represent the mean SHAP value in fivefold cross-validation and features, respectively. (a) Shows the relationship between each feature and the absolute value of SHAP in the analysis. A large absolute SHAP value indicates a significant influence on the prediction. (b) Shows the SHAP values for each participant. This plot summarizes how the top features in the dataset affect the output of the model in the form of information density. The x position of the dots is based on the SHAP value of the feature, and the dots are stacked along each feature row to indicate density. Positive and negative SHAP values were associated with Aβ-positivity and Aβ-negativity, respectively. The red dots indicate high values for each feature, while the blue dots indicate low values for each feature. If the red dots are in the positive SHAP, then the higher the feature value, the more it contributes to the Aβ-positivity. Likewise, if blue dots are in the positive SHAP, the lower the feature value, the more it contributes to the Aβ-positivity. For example, lower scores on immediate and delayed recall of Logical Memory (i.e., LM I and LM II) were associated with Aβ-positivity. IC independent component, JART Japanese Adult Reading Test LM Logical Memory, SHAP SHapley Additive Explanations, TMT-J The Japanese version of Trail Making Test, WF Word Fluency.

Discussion

Using SBM, our machine learning model predicted Aβ-positivity with an accuracy of 89.8% and an AUC of 0.888 based on brain MRI, cognitive, and genetic data from 118 participants. It also correctly predicted Aβ-positivity/negativity in non-AD participants, such as those with FTLD syndrome and psychiatric disorders. Even a model based solely on brain images achieved 84.7% accuracy and an AUC of 0.830. Among all the covariates in the final model, IC 1 had the strongest impact related to Aβ-positivity prediction, followed by Logical Memory I and II. This suggests that our model may be beneficial in clinical settings.

Model performance

Our model yielded the best accuracy (i.e. 89.8%) when it included non-AD cases, whereas the model based only on the AD continuum achieved slightly lower accuracy (i.e. 88.4%). It can be interpreted that the heterogeneity of clinical features among non-AD participants was informative in refining the accuracy of the final model.

While numerous machine learning models based on brain images have been developed, most of them have focused on the clinically determined AD continuum20,24,25,26,27, and predicted the clinical diagnoses of AD instead of imaging/pathology-based Aβ deposition18,28.

As patients visiting physicians’ offices would have various neurocognitive disorders beyond the AD continuum18,26,27, our model, which was based on diverse clinical populations may be better suited for application in clinical settings. Even our model, based only on structural brain images which yielded an 84.7% accuracy, may assist clinicians’ deciding and screening of potential candidates for AD-related clinical trials. These results may be due to the advantages of SBM, namely its ability to detect subtle morphological changes and unknown patterns in brain structures associated with neurodegenerative diseases without relying on existing atlases30,35. These strengths could be exploited in a patient population with diversified diseases, as in this study.

Our model achieved a predictive accuracy of 75.9% for Aβ-positivity in individuals with MCI. Notably, it surpassed the accuracy of the physicians’ clinical diagnosis of AD, which is approximately 70%3. Furthermore, our model demonstrated predictive accuracy comparable to previous studies that aimed to predict Aβ-positivity26 or future AD diagnosis in MCI patients using structural MRI20.

While no definitive treatment is currently available to slow the progression of AD54, new drugs aimed at disease-modifying therapies are being approved in some countries55. In the context of the growing availability of disease-modifying drugs for AD, accurate and early diagnosis will become a higher priority55. Although Aβ deposition is one of the earliest detectable pathological changes in AD2,6,8,19, its detection by PET or CSF test may be hampered by the need for specialized facilities, length of time required, or some degree of invasiveness or risk14,15,16. Since MRI is safe and applicable to a wide population, an MRI-based Aβ prediction model based on a heterogeneous population may be valuable for clinicians.

Feature importance

SHAP analyses indicated that IC 1, LM I, and LM II were important predictive features. These three leading features showed two or more strong impacts compared to the others.

IC 1, the most important feature in our model, was significantly correlated with Aβ-positivity (r = 0.516) and most of the cognitive measures included in the analyses, as shown in Table 4. Furthermore, the spatial pattern of the loading coefficients from IC 1 roughly followed the “cortical pattern” of neurodegeneration in AD that is characterized by cortical atrophy, particularly in the parietal lobe56 as depicted in Fig. 1. The parietal lobe, including the precuneus, is known to contribute to episodic memory57,58,59 which is likely to be impaired in AD60,61, and is possibly associated with Aβ pathology62,63. In our study, however, another “typical AD” pattern56, medial temporal lobe (MTL) atrophy64, was not observed in any IC. One possibility is that MTL atrophy does not necessarily indicate Aβ pathology, but may be a signal for tau pathology, such as primary age-related tauopathy65 or coexistent transactive response DNA-binding protein 43 pathology66. These clinicopathological relationships may explain why IC 1 was of greater importance in the prediction and represented the AD-related GM volume pattern.

The importance of Logical Memory scores indicated that memory impairment, a typical cardinal symptom of AD67, will also be essential for prediction.

APOE-ε4, a widely-accepted AD risk factor68, was also indicated as an important feature, as both “APOE x/ε4” (i.e., ε2/ε4, ε3/ε4, ε4/ε4) and “APOE ε4_number” (i.e., pairwise or not) showed large SHAP values.

Interestingly, all ICs showed greater importance than demographic and cognitive features, including scores on the MMSE, an assessment scale suitable primarily for screening for dementia. Among the ICs, IC 4 was uniquely extracted as a normal aging GM volume pattern (Additional Fig. 3) and lacked any significant association with cognitive measures or Aβ-positivity (Table 4). The separate associations between IC 1 and Aβ-positivity and between IC 4 and age might indicate that our model discriminates AD-related neurodegeneration from normal aging in brain imaging. These results imply that the pathological process of AD is not necessarily age dependent. In other words, brain atrophy patterns in normal aging processes can be distinguished from those in neurodegenerative diseases51, even though the deposition of Aβ plaques is likely to increase with age, and several age-related pathologies may be comorbid with AD69,70.

Overall, the SHAP analyses imply that SBM-derived GM volume patterns and Logical Memory results might be important for predicting Aβ-positivity across diverse neurocognitive disorders.

Limitation

This study has some limitations. First, Aβ-positivity was determined only by amyloid-PET scan, whereas CSF Aβ would be a more sensitive marker, particularly in the pre-clinical status9. Second, the number of samples in machine learning is expected to affect accuracy71, however, our study had a limited number of samples. Therefore, future studies will require larger sample sizes and independent test datasets72. Third, longitudinal follow-up data might improve model performance, rather than a cross-sectional approach73.

Conclusions

Our model achieved 89.8% accuracy to predict Aβ-positivity across a diverse range of neurological and psychiatric disorders. Notably, the SBM revealed a GM volume pattern that had the strongest impact on prediction. Even when using structural brain images alone, the accuracy still reached 84.7%. This MRI-based data-driven machine learning approach may aid clinicians in patient management and early decision-making processes.