Introduction

Stroke survivors frequently experience difficulties in daily activities and quality of life1. A significant proportion of stroke survivors, ranging from 36 to 89%, experience one or more functional disabilities2. Therefore, assessing activities of daily living (ADL) is crucial for clinicians in determining treatment methods and enhancing independence3. A short, reliable, and valid ADL measure is required to be clinically useful and reduce the burden on clinicians and patients4.

ADL includes basic ADL (BADL) and instrumental ADL (IADL)5. BADL measures, including the Barthel index (BI) and Functional Independence Measure, tend to have ceiling effects, whereas IADL measures tend to have floor effects6. The BI is a measure of public-domain ADL that is commonly used in clinical trials to assess patients with stroke7. The tool assesses the following 10 activities related to BADLs: bowel management, bladder control, grooming, feeding, toilet use, transferring, movement, dressing, climbing stairs, and bathing8. These activities are fundamental to living in a social world. Many IADL scales are closely related to environmental performance9. IADL scales comprised varying numbers of items covering the domains of housework, work/leisure, outdoor activities, dressing outside, shopping, and eating with guests9. However, simultaneously assessing all ADLs and IADLs may require more time, and would be physically demanding for both the patient and the clinician. Some patients do not need to be evaluated for all activities. For example, some long-term bedridden patients are completely unable to walk or climb stairs and do not need to be asked about this item by the assessor.

Therefore, to address this problem, our previous study developed the Longshi scale (LS), which is a graphical tool for assessing ADL based on the International Classification of Functioning, Disability, and Health (ICF) guidelines10,11. The scale categorizes patients into the following three groups based on ADL capability: bedridden, domestic, and community12. Each group includes 3 different items, which provides a practical solution for reducing the ADL scale length by linking ADL assessment with the scope of activities. The LS has demonstrated high interrater reliability (0.877–0.955) and test–retest reliability (0.921–0.984)11. Additionally, using smartphone video technology in LS for assessing ADL in stroke survivors can be effectively implemented in remote clinical settings13.

To streamline the LS evaluation process, we integrated computerized adaptive testing (CAT) with the inner logic of the programming in LS(CAT-LS)14,15. Computerized adaptive testing (CAT) is a proven method for the efficient, reliable, and valid assessment of health-related functions16,17. CAT leverages item response theory to dynamically select the most relevant questions based on prior responses, resulting in more concise questionnaires with enhanced precision18, and reducing the administrative burden on patients19,20, which is similar to the LS-categorizing assessment rule.

Nevertheless, the validation and interrater reliability of CAT-LS in hospitalized stroke survivors remain unestablished. Consequently, the study aims to evaluate CAT-LS concurrent validity, internal consistency, and inter-rater reliability in stroke survivors. Building upon the robust psychometric characteristics demonstrated by conventional LS assessments11,13,15,21, we postulated that CAT-LS would emerge as a reliable and valid instrument for assessing ADL in our target population.

Methods

Study design and participants

This multicenter cross-sectional study was conducted in 103 hospitals in 23 cities in China from September 2018 to August 2019 and involved 7151 cerebral stroke survivors using cluster sampling. The specific inclusion criteria were as follows: (1) individuals aged between 18 and 90 years and (2) those diagnosed with a stroke. The type of stroke was determined based on the initial diagnosis from medical history, according to the 10th revision of the International Classification of Diseases22. The exclusion criteria were as follows: diagnosis of subarachnoid hemorrhage.

CAT-LS development procedure

The CAT-LS development procedure comprises three phases. In phase 1, a total of 11 items from the BI, IADL, and ICF were selected to comprise the traditional paper version of the LS (including bladder and bowel management, feeding, entertainment, toileting, grooming, and bathing, cooking, community mobility, shopping, social interaction, transfer out of bed and return, and transfer out of the door and return) (Table 1)23,24. Out of the total 11 items, seven originated from BI while the remaining four were derived from ICF and IADL. Over 80% of the ICF items primarily focused on “activities”, which encompass “individuals performing specific tasks or actions”, as well as “physical functions”24. It is crucial not to overlook the social participation requirements of patients24. Selection of the 4 social participation entries in ICFs and IADLs that are of most concern to patients based on a previous study (Table 1)11. In phase 2, a CAT system based on the LS decision tree was presented, and validation datasets were collected using a smart mobile application named ‘Quicker Recovery Line (QRL)’23,25. In phase 3, concurrent validity and inter-rater reliability of CAT-LS and the correlation with BI in stroke survivors were calculated.

Table 1 Comparison of the proposed CAT-LS, BI, and ICF.

Data collection

The assessment was conducted using smart mobile devices (mobile phones or tablets). Before the formal assessment, the investigators explained the study contents to all participants.

All data were recorded and uploaded on the ‘QRL’. First, one healthcare professional logged into the QRL account and created electronic forms online. The demographic information of all participants, including basic information and health status, was recorded. Second, patients’ BI and CAT-LS scores were on a face-to-face interview basis and were collected by interviewing stroke survivors, caregivers, and doctors21,25. Once the data collection was completed, it could not be changed. The data were from our previous study23. Missing data were handled by conducting reinterviews.

ADL evaluation

We assessed Activities of Daily Living (ADL) using two methods: the CAT-LS and the BI scale26,27. For the first day’s assessment, the choice of whether to use the CAT-LS or the BI scale was made randomly each time. The performance of the CAT-LS and BI scales was assessed by medical professionals and therapists respectively. To check the consistency in evaluations between different raters, the same assessors evaluated the ADL of stroke survivors once more on the second day, measuring the interrater reliability of the CAT-LS. These assessments were carried out one after the other during a single clinical visit, and we recorded the time taken for each assessment in seconds.

The BI was designed in traditional electronic format with the same specifications as the paper version. Physicians and therapists can submit assessment results only after all items are selected. BI consists of 10 ADL tasks, including feeding, bathing, grooming, dressing, bowel management, bladder management, toileting, bed/chair transfer, wheelchair, climbing stairs, and range based on the level of physical assistance required to complete the task. These are based on the level of physical assistance required to complete the task28. Most of the items were scored in a range of 0–10, where scores of 0, 5, and 10 indicated an inability to perform the task, need for assistance, and ability to perform the task independently, respectively, for a combined total of 100 points. Bathing, wheelchair, and grooming items were scored in a range of 0–5, where scores of 0 and 5 indicated inability to perform the task and full ability to perform the task independently, respectively. In addition, the Bed\chair transfer items were scored on a scale ranging from 0 to 15, where a score of 5–10 indicated the need for assistance in completing the task, and a score of 15 indicated a full ability to complete the task independently (Table 2).

Table 2 Barthel index scores and the capability to perform ADL.

The CAT-LS used item response theory. Item response theory is the statistical basis for testing the fit of the data model to estimate the difficulty of the questions and the respondent’s ability29. The CAT-LS decision tree functions as a flowchart, commencing with a primary question—“Can you get off the bed?”, and subsequently diverging into different branches based on the outcomes of the initial decision. Further branching occurs based on the results of a subsequent question—“Can you go outside?”, enabling a visual representation of the outcomes of a comprehensive ADL assessment. This approach facilitates a rapid comprehension of a patient’s ADLs. Participants were asked about their ability to get in and out of bed to begin the assessment30. If they answered “No”, they are categorized in the bedridden group and only the items in Form 1 are displayed on the application interface, including bladder and bowel management, feeding, and entertainment (Fig. 1). If they answered ‘YES’, they were asked a second question regarding their ability to travel outside their house and return. If they answered ‘NO’, they were categorized into the domestic group, and only items in Form 2 were shown, including toileting, personal cleaning, and housework. If they answered ‘YES’, they were categorized into a community group, and only items in Form 3 were evaluated, including community mobility, shopping, and social participation (Fig. 1). Subsequently, individuals only needed to be evaluated on items corresponding to their functional level to accurately estimate the functional level and improve evaluation efficiency.

Figure 1
figure 1

Process for CAT-LS assessment.

After categorization, the patients in each group were evaluated using a 3-point Likert scale as follows: (1) bedridden (including bladder and bowel management, feeding, and entertainment), (2) domestic (including toileting, grooming, and housework), and (3) community (including community mobility, shopping, and social participation) groups. Each item corresponded to three multiple-choice questions with different abilities, and all alternatives were presented as a situation map (Fig. 1). Each item was scored as follows: 1 for maximum or complete dependence, 2 for partial independence, and 3 for maximum or complete independence. The evaluation was completed once the three items on the subscale were selected. The total scores of each group were 3–9 (Fig. 1)30, and accordingly, the CAT-LS was categorized into six grades, and subsequently, the CAT procedure ended (Table 3).

Table 3 Scoring guidelines for the ICF qualifiers.

Automatic quality control process

In this multicenter study, evaluators obtained the electronic data of CAT-LS and BI from smartphone device terminals located in different centers and were transmitted to the cloud server through the application named ‘QRL’25. Data quality was evaluated using the built-in automatic quality control system of the cloud server. For each day, if the data quality for the same evaluator was compromised, all evaluation data of the evaluator on that day were discarded. The built-in logic of quality control is presented in Table 4.

Table 4 Inner logic for automatic quality control.

Statistical analysis

Statistical analyses were performed using SPSS Statistics 25. The Kruskal–Wallis test, t-test, and chi-square test were performed to analyse mean differences between LS groups (i.e., bedridden, domestic, and community groups). Differences among CAT-LS groups were evaluated by comparing the means and standard deviations of the different scores of the original BI measures.

Internal consistency and concurrent validity

The CAT-LS was trained to predict the total BI scores using the raw scores of each item on the CAT-LS. We utilized generalized linear models with total BI as the dependent variable and each item of LS as the independent variable. The internal consistency of the CAT-LS was assessed using Cronbach’s coefficient (α) and corrected item-total correlations. Pearson’s correlation test and multiple linear regression analysis were used to analyse the concurrent validity of the CAT-LS. Notably, concurrent validity was considered strong if Pearson’s correlation coefficient was ≥ 0.7531.

Interrater reliability

The interrater reliability of the CAT-LS was evaluated using the kappa coefficient (κ) and intraclass correlation coefficient based on a two-way random effect (ICC2,1)32,33. ICC values were categorized as ‘poor’ (ICC < 0.5), ‘moderate’ (0.5–0.75), ‘good’ (0.75–0.9), and ‘excellent’ (ICC > 0.9)33. The kappa values were defined as ‘poor’ (κ < 0.20), ‘fair’ (0.21–40), ‘moderate’ (0.41–60), ‘good’ (0.61–80), and ‘very good’ (κ = 0.81–1.00) agreement32.

Suppliers

A smart mobile application named “Quicker Recovery Line (QRL)”.

Ethics approval and consent to participate

This study protocol was approved by the Medical Ethics Committees of Shenzhen Second People’s Hospital. The study was registered in the Chinese Clinical Trial Registry (No.: ChiCTR2000034067) on June 22, 2020. All inpatients or their proxies were invited to participate in this study after obtaining informed consent before collecting their information. All authors confirmed that all methods were carried out following the research protocol approved by the ethics committee.

Results

Population characteristics

Table 5 presents the baseline demographic characteristics of 7151 stroke survivors. The CAT-LS classified 4020 (56.2%), 2050 (28.7%), and 1081 (15.1%) patients into the bedridden, domestic, and community categories, respectively. The mean age of the participants was 67.6 ± 15.0 years, and a statistically significant difference was found in the mean age among the three groups. Hypertension was the most common comorbidity in the community group, affecting 672 (62.16%) participants, followed by diabetes mellitus (255; 23.63%), hyperlipidemia (116; 10.75%), heart disease (172; 15.93%), and kidney disease (25; 2.31%).

Table 5 Demographic information on stroke survivors.

Classification and regression tree of ADL assessment using CAT-LS

Table 6 displays the BI scores corresponding to the following three groups: bedridden (18.7 ± 18.8), domestic (64.8 ± 18.9), and community (93.5 ± 12.8). Patients in the bedridden group had lower scores than those in the domestic group in the following BI categories: bathing (0.0 ± 0.3), grooming (0.3 ± 1.2), dressing (1.1 ± 2.2), toileting (1.0 ± 2.0), bed/chair transfer (2.8 ± 3.7), walking (1.2 ± 2.8), and climbing stairs (0.2 ± 0.9). In the community group, BI scores for feeding (9.6 ± 1.5), grooming (4.6 ± 1.3), dressing (8.9 ± 2.2), bowel management (9.9 ± 0.8), bladder management (9.9 ± 1.0), toileting (9.3 ± 1.7), bed/chair transfer (14.4 ± 2.0), and walking (14.0 ± 2.4) tended towards the highest scores of 10/15 (Table 6, Fig. 2). Notably, the bed/chair transfer and walking categories had almost perfect scores in the community group. The CAT-LS results based on the decision-tree scoring model were consistent with the scores for each BI item. However, the median scores for 10 BI items significantly differed among the three CAT-LS groups (Table 7).

Table 6 BI scores in three CAT-LS groups.
Figure 2
figure 2

BI scores of each item among the three CAT-LS groups.

Table 7 Comparisons of BI-item scores among three CAT-LS groups.

CAT-LS results (CAT-LS grades and LS scores) had a strong correlation with BI scores

Table 8 displays the results of the correlation analysis between the CAT-LS and BI scores of stroke survivors. A strong correlation was observed between the CAT-LS and BI scores (Pearson’s r = 0.894, P < 0.0001), indicating that the CAT-LS grades and item scores were good indicators of the functional status of stroke survivors. Pearson’s r values ranged from 0.529 to 0.799 (P < 0.001) between CAT-LS items and BI total scores and from 0.600 to 0.856 (P < 0.001) between CAT-LS grades and BI items.

Table 8 Correlation of CAT-LS scores with BI total score and BI item scores.

A scatter plot was generated to illustrate the relationship between the CAT-LS and BI scores (Fig. 3), showing the linear fitting results. The plot indicated a positive correlation between CAT-LS results and BI total scores. Moreover, a linear relationship was found between the LS results and BI total scores, and the coefficient of determination (R2) was 0.874. High R2 values indicated that the CAT-LS results were closely associated with the model’s predictions of the BI total score. To determine the correlation between the BI scale and CAT-LS scale scores, we propose the formula: BI total score =  − 44.9 + 30.44 × LS Grade + 16.14 × (F1a/F2a/F3a) + 6.79 × (F1b/F2b/F3b) − 3.04 × (F1c/F2c/F3c) (R2 = 0.874). F1a represents the score of the bladder and bowel item, F1b represents the score of the feeding item, F1c represents the score of the entertainment item, F2a represents the score of the toileting item, F2b represents the score of the grooming and bathing item, F2c represents the score of the housework item, F3a represents the score of the exercise in the community mobility item, and F3b represents the score of the shopping item. F3c represents the social participation score.

Figure 3
figure 3

Scatter diagram illustrating the CAT-LS results (CAT-LS grades and LS scores) and BI total scores.

Floor and ceiling effects, internal consistency, and interrater reliability of the CAT-LS

Table 9 presents the floor and ceiling effects, internal consistency, interrater reliability, and concurrent validity of the CAT-LS. The floor and ceiling effects of CAT-LS grade were 19.2%/11.7%, respectively. Internal consistency, as measured using Cronbach’s α, was high for all three subscales (bedridden, 0.847; domestic, 0.723; and community, 0.868). The corrected item-total and mean interitem correlations were > 0.4, indicating good internal consistency. Interrater reliability was assessed using ICC2,1 and kappa values. The ICC2,1 values for interrater reliability were high for all three subscales and CAT-LS grades (bedridden group: 0.974; domestic group: 0.928; community group: 0.979; and CAT-LS grade: 0.964). The kappa values for the three groups ranged from 0.898–0.927, 0.837–0.877, and 0.841–0.866, indicating substantial agreement. The Spearman’s correlation coefficients for the bedridden, domestic, and community groups were 0.852, 0.764, and 0.685, respectively, indicating good and strong concurrent validity.

Table 9 Distribution, internal consistency, reliability, and validity of CAT-LS.

Comparison of the question burden and time consumption between CAT-LS and BI

The CAT-LS model required fewer questions to be answered than the complete BI questionnaire, with four questions for the bedridden group and five each for the domestic and community groups. This represents a 60% and 50% decrease in question burden, respectively. The time consumption of the CAT-LS was significantly lower than that of the BI, with a median difference ranging from 9.6 to 23.7 s (Table 10).

Table 10 Time consumption comparison of CAT-LS and BI.

Discussion

Our findings demonstrate that the CAT-LS exhibits robust concurrent validity and interrater reliability. The CAT-LS encompasses only 4 out of the 5 evaluation items, reducing the assessment workload by 40% required by BI. Additionally, administering the CAT-LS takes significantly less time, with an average duration of 19.6–25.1 s, which is merely half of the time required for the BI measure. These results show the reliability, validity, and efficiency of the CAT-LS in hospitalized stroke survivors. Healthcare practitioners can confidently employ the CAT-LS to alleviate the assessment burden faced by both patients and administrators.

The CAT-LS effectively categorized 7151 patients into three groups, namely, bedridden, domestic, and community, based on their ADL levels with significantly varying BI total scores, which is similar to our previous findings using LS in neurological diseases14. The CAT-LS decision-tree scoring method matched stroke survivors’ levels of ADL. A large proportion (56.2%) of stroke survivors were classified as bedridden groups, which can be attributed to these participants being hospitalized with poor ADL scores compared to community stroke survivors.

Significant differences were observed in the median scores for each BI item across the three CAT-LS groups. Specifically, the bedridden group exhibited lower scores in several items compared to the domestic group, including bathing, grooming, dressing, toileting, bed/chair transfer, walking, and climbing stairs. Notably, in the bedridden group, BI item scores for bathing and climbing stairs were consistently low at approximately 0 points. In contrast, the community group consistently achieved the highest scores, particularly in domains such as feeding, grooming, dressing, bowel management, bladder management, toileting, and bed/chair transfers with perfect scores of 10 or 15. These findings suggest that without directly querying patients about their ability to perform tasks such as bathing, walking, climbing stairs, or using the toilet, we can reasonably infer that stroke survivors in the bedridden group would likely require the most assistance in these specific activities. Conversely, stroke survivors in the community group did not feel necessary to assess their ability level concerning feeding, bladder management, and bowel management indicating self-sufficiency in these areas without external assistance. In the CAT-LS evaluation results, the score difference of each BI item among the three CAT-LS groups precisely shows that it is unnecessary to evaluate all BI items, as the approximate level of help of each BI item of the evaluation object can be known.

The results of the CAT-LS decision-tree scoring model were based on the individual transfer ability and mobility scope, and classified individuals into three groups. Once the participants were classified into one of the three groups, only the items in that group were evaluated. A decision tree is a versatile predictive model that learns based on observations and logic34. It represents and classifies events using a rule-based forecasting system34. The CAT algorithm selected the most appropriate question to be asked next using information from questions already answered18,35, and each subscale item was evaluated based on group classification.

Our results indicate that the CAT-LS demonstrates acceptable psychometric properties for evaluating ADL in stroke survivors. Concurrent validity was assessed by calculating Spearman’s correlation coefficients between the CAT-LS and BI, a commonly used ADL assessment tool27. The correlation between CAT-LS grades and total BI score was 0.964 in hospitalized stroke survivors, indicating good concurrent validity, similar to that of LS used in other diseases11,26. Here, the large dataset used is a critical advantage that supports the statistical analysis performed. The linear regression analysis of the CAT-LS results and BI scores showed an ideal correlation coefficient for standard validity (R2 = 0.874). The high R2 values indicated that the CAT-LS results were closely associated with the model’s predictions of the total BI score in stroke survivors. These findings indicate that the scores of the CAT-LS are likely to be strongly correlated with those of the BI. If so, the scores of the CAT-LS items about BADL and those of the BI are comparable and even interchangeable using linear transformation.

The CAT-LS grades had floor and ceiling effects of < 20%, indicating that the tool was sensitive to changes in ADL ability across the full range of scores. The CAT-LS grades demonstrated floor and ceiling effects of 19.2/11.7, which fell below the recommended level of 20%, as suggested in previous studies36,37. The internal consistency of three CAT-LS groups, as measured using Cronbach’s α, was 0.857, 0.803, and 0.912, respectively. Cronbach’s α between 0.70 and 0.95 indicates good internal consistency38. Therefore, the internal consistency of all three groups was deemed acceptable and comparable to that of the traditional version of the LS11. Interrater reliability was high, as evidenced by the ICC2,1 values ranging from 0.928 to 0.979, as well as the kappa values falling within the range of 0.898–0.927, 0.837–0.877, and 0.841–0.866 for the bedridden, domestic, and community groups, respectively. The above results suggest that the CAT-LS showed good consistency with the BI in classifying the ADL groups and high interrater reliability when used in stroke survivors.

With approximately 2.5 million new stroke cases yearly, the number of patients requiring ADL assessment and assistance is expected to increase exponentially in China39. The inner logic of the programming makes the CAT-LS evaluation process easy to implement and reduces the administrative burden on both clinicians and patients. The decision tree used in CAT-LS reduces the number of questions required by 50% or 60% compared with BI, promoting measurement efficiency40. CAT-LS can streamline the ADL assessment process and alleviate the burden on healthcare professionals. Therefore, CAT-LS has great potential for use by clinicians and patients in time-pressed clinical settings to effectively manage stroke survivors.

Study limitations

Although our findings are positive, this study had some limitations. First, this was a cross-sectional cohort study; we did not test the response validity in a clinical setting. Therefore, the sensitivity of the CAT-LS to changes over time should be further explored; and it could be combined with a longitudinal study to gain insight into the responsiveness of the CAT-LS to changes in ADLs over time. Second, the questionnaire is currently limited to the Chinese population, which may limit the generalisability of the study. However, we already planned to validate the validity of the CAT-LS in multiple languages, carrying out validation in different populations, languages, and settings to ensure its applicability. Third, the stroke survivors’ data were from hospitals; participants in the acute stage of stroke are unlikely to perform IADL (e.g., washing clothes, housework, or shopping), which could have introduced information bias. Therefore, the generalizability of our results may be limited. In terms of methodological constraints, using structural equation modeling (SEM), particularly the non-parametric approach Partial Least Squares (PLS)41, is indeed a suitable method for analyzing correlations between latent variables such as CAT-LS and BI. This is especially relevant when the sample is not a random list of patients. PLS-SEM is well-suited for small sample sizes, non-normal data, and complex relationships between variables.

Conclusions

The CAT-LS demonstrated strong performance as a tool for evaluating the ADL of stroke survivors, with high concurrent validity and interrater reliability. Notably, the CAT-LS using the tree-decision method can alleviate the assessment burden on patients and examiners by reducing the number of items. The assessment results could be incorporated into electronic medical records to further improve efficiency in patient management in the clinic setting.