Introduction

In wheelchair users with spinal cord injury (SCI) cardiorespiratory fitness is generally reduced [1]. Low cardiorespiratory fitness and low levels of physical activity are shown to be associated with high prevalence of cardiometabolic disease, which is the leading cause of mortality in this population [2, 3]. To increase cardiorespiratory fitness, exercise interventions such as handcycling may be introduced during or after rehabilitation [4,5,6]. To promote handcycling in the Netherlands and to increase cardiorespiratory fitness after rehabilitation, an annual handcycle race called the HandbikeBattle [7, 8] has been held since 2013 in Austria. To optimally train for events like this, but also in or after rehabilitation in general, individualized training schemes are required.

Individualized training schemes can be based on results of a graded exercise test (GXT). Training prescriptions based on maximum values, such as percentage maximum heart rate (HR) or power output (PO), are common, as well as prescriptions based on percentage HR reserve or ventilatory thresholds (VTs) [9]. Training intensity prescription based on relative percentages is shown to have downsides in able-bodied individuals. It seems not to take into account the individual’s metabolic response to exercise, and has shown less improvements in maximum oxygen uptake (VO2 max) after training compared with training intensity based on VTs [9, 10]. Therefore, prescribing training intensities based on VTs may more reliably achieve fitness gains. The first ventilatory threshold (VT1) is a physiological point during exercise at which a nonlinear increase in carbon dioxide (CO2) production occurs, coinciding with the first increase in lactate production [11]. The second ventilatory threshold (VT2) represents the onset of exercise-induced hyperventilation with respect to VCO2 as a reaction to metabolic acidosis, which coincides with the maximal lactate steady state [11, 12]. These VTs provide boundaries that allow to set individualized training zones: zone 1 at low intensity (below VT1), zone 2 at moderate intensity (between VT1 and VT2), and zone 3 at high intensity (above VT2) [12, 13]. This training principle has been developed in studies on lower-body exercise with able-bodied participants and athletes, and little or no research has been done regarding the reliability of VT determination in upper-body GXT in individuals with SCI. Therefore, the question arises whether the reliability of determination of both VTs is sufficient to set training schemes for individuals with SCI.

For able-bodied leg exercise, VT1 is normally positioned at 50–60% peak oxygen uptake (VO2 peak) and VT2 at 70–80% VO2 peak [14]. This is, however, dependent on cardiorespiratory fitness as values for VT1 and VT2 could increase to 75 and 90% VO2 peak for elite endurance athletes [12]. Studies in able-bodied cycling showed that experienced raters are able to identify VT1 in 90–94% of participants [15, 16]. Intrarater reliability of VT1 determination was high (intraclass correlation coefficient (ICC) 0.97) in one study [17], whereas interrater reliability varied (ICC 0.21–0.98) within and between studies [16, 17]. The identification rate and reliability of VT2 identification are largely unknown; only one study reports on this topic with an intrarater reliability (ICC) of 0.94–0.96 and an interrater reliability of 0.81–0.91 [18].

However, few studies reported on VTs during upper-body exercise in individuals with SCI. In two studies, 89–96% of VT1 and 74% of VT2 could be determined in wheelchair athletes with SCI [19, 20]. In both studies almost all undetermined VTs appeared to involve athletes with tetraplegia. Leicht et al. [19] explained that for athletes with tetraplegia the percentage of identifiable VT2s might be lower compared with able-bodied athletes, as the absolute ventilatory responses are generally low, resulting in a narrower range of ventilatory values compared with able-bodied athletes. A very recent study supports these findings as VT1 was only identified in 68% of untrained individuals with tetraplegia [21]. For the VT1, Coutts et al. reported a (Pearson) correlation of 0.95 between two raters for athletes with paraplegia and tetraplegia [20], and Bhambani et al. reported a Pearson correlation of 0.90 between two raters for trained and untrained individuals with tetraplegia [22]. However, although ICCs are more appropriate to assess intrarater and interrater reliability than Pearson correlations, they were not reported.

Unfortunately, no studies reported on reliability of VT determination for both thresholds, investigated in a nonathlete population with SCI. Therefore, it remains unclear whether VTs can be used to set individualized training schemes in this less fit population. The aims were, therefore the following:

  1. 1.

    To examine whether it is possible to detect both VTs in recreationally active individuals with tetraplegia or paraplegia.

  2. 2.

    To examine the interrater and intrarater reliability of VT determination.

Methods

The present study was a retrospective study: the data of the GXTs with 1-min increments of a previous study by Maher et al. [23] were re-analyzed to answer the research questions. Two sports physicians independently evaluated the tests twice during two separate sessions.

Participants

Thirty-three recreationally active individuals with SCI were recruited to participate in the study: 19 individuals with paraplegia and 14 with tetraplegia, 28 men, age: 38 ± 10 years, time since injury (TSI): 12 ± 9 years, body mass: 76 ± 19 kg, height: 1.75 ± 0.08 m. They were recruited through the Miami Project to Cure Paralysis database and voluntarily trained at the Miami Project gym at least once a week. Inclusion criteria: age ≥ 18 years, non-progressive SCI, TSI of at least 6 months, and self-reported inability to use lower extremity contractions to assist in transfers. Exclusion criteria: angina or myocardial infarction within the last month or pain in the upper extremities [23]. Informed consent was obtained from all participants included in the study. The study was approved by the University of Miami Institutional Review Board, Miami, United States of America.

Test procedure

All GXTs were performed with an (asynchronous) arm crank ergometer (Lode Angio, Groningen, the Netherlands). Participants performed the tests in their own wheelchair; positioned with arms slightly flexed in the furthest horizontal position; participants with tetraplegia used hand wraps to ensure a tight grip on the cranks; and wedges were used to minimize the movement of the wheelchair. As individualized protocols are preferred for individuals with SCI [24, 25], the starting workload and step size of every participant were individualized based on questions regarding activity level, current fitness program, and the ability to perform a floor-to-chair transfer [23]. The aim was to develop an individualized 1-min stepwise protocol with a duration between 8 and 12 min [26]. This resulted in an individualized starting workload of 5–90 W and step size of 10 W for participants with paraplegia, and start workload of 5–30 W and step size of 3–10 W for participants with tetraplegia. The prescribed cadence was between 60 and 65 rpm. Criteria to stop the test were volitional exhaustion or failure in keeping a constant cadence above 55 RPM. During the test, PO (W) was continuously measured. Gas exchange was measured breath-by-breath (Vmax Encore metabolic cart, Carefusion, Vyaire Medical, Mettawa, IL, USA) and HR was measured by standard 12-lead electrocardiography. The metabolic cart was calibrated before each test. All raw data, except for PO, were processed using a moving average over a 15-breath window [27]. VO2 peak and HR peak were defined as the highest 15-breath average of VO2 and HR, respectively. PO peak and the PO at each VT were defined as the last completed work rate step, plus half times the work rate increment for any 30-s block in the non-completed work rate step [28].

Determination of ventilatory thresholds

All data of the GXTs were represented in plots as described by Wasserman et al. [29] via a custom-made Matlab script according to the preferences of both raters [Matlab R2012b, Mathworks Inc., Natick, MA, USA]. Three plots were presented to the raters: (1) VCO2 versus VO2, (2) the ventilatory equivalents of oxygen (Ve/VO2) and carbon dioxide (Ve/VCO2) versus time, and (3) respiratory exchange ratio (RER) versus time. VT1 was defined as an increase in slope of more than one in the first plot (V-slope method) [12, 15, 19, 30], and as the first sustained rise in Ve/VO2 without a concomitant increase in Ve/VCO2, in the second plot (ventilatory equivalents method) [15, 19, 30, 31]. The RER plot was used as extra reference [12, 30, 31]. VT2 was defined as the first sustained increase in Ve/VCO2 (ventilatory equivalents method), in the plot with Ve/VO2 and Ve/VCO2 versus time [12, 14, 31], and as second increase in slope in the plot with VCO2 versus VO2 [12, 18]. Again, the RER plot was used as extra reference; for example for the raters to be certain that RER at VT2 was higher than RER at VT1 [12, 30, 31]. The raters assessed all three plots for each VT and made their final decision based on the V-slope or the ventilatory equivalents, depending on which plot most clearly showed that particular VT.

Two experienced sports physicians independently and randomly assessed the sets of graphs. They had at least 4 years of experience with VT determination in able-bodied athletes and in upper-body exercise in individuals with a disability. They were blinded to participant ID and injury level. For each determined VT, the Matlab script calculated the corresponding PO, HR, VO2, and RER at that threshold. When a rater thought that a VT was indeterminate, the test data for that VT were rejected. To calculate intrarater reliability, both raters assessed all tests twice (in different random order) with at least 1 week in between.

Statistical analysis

Statistical analyses were performed using SPSS (IBM SPSS Statistics 20, SPSS, Inc, Chicago, IL, USA). The data were tested for normality using Kolmogorov–Smirnov tests with Lilliefors Significance Correction and Shapiro–Wilk tests. Additionally, z-scores for skewness and kurtosis were calculated. To assess intrarater reliability for each VT, PO, HR, and VO2 at that VT were compared between the first and second session. To assess interrater reliability for each VT, PO, HR, and VO2 at that VT were compared between rater one and two for the first session. Systematic differences were investigated with paired samples t-tests for the total group and Wilcoxon signed-rank tests and Mann–Whitney tests within subgroups (tetraplegia and paraplegia) as data within subgroups were not normally distributed. ICCs with 95% confidence intervals (CI) were used to measure relative agreement on group level (ICC, two-way random, absolute agreement, single measures). For clinical/training purposes, Bland–Altman plots with 95% limits of agreement (LoA) were used to measure absolute agreement on an individual level [32]. The following interpretation was used for the ICC: 0.00–0.25, little to no correlation; 0.26–0.49, low correlation; 0.50–0.69, moderate correlation; 0.70–0.89, high correlation; and 0.90–1.00, very high correlation [33]. Values were considered significant at p < 0.05.

Results

Due to technical problems and short periods of stopping during testing, a total of three tests were excluded. This resulted in 30 tests to be assessed (tetraplegia N = 11, paraplegia N = 19). These 30 tests, with two possible VTs each, were assessed during two sessions by two independent raters, resulting in a total of 240 VTs to be analyzed (30 tests × 2 VTs × 2 sessions × 2 raters). For two tests, HR data were excluded due to problems with the HR monitoring system. The test peak values are shown in Table 1.

Table 1 Arm crank test peak values (N = 30)

Determination of ventilatory thresholds

Of the 240 VTs to be analyzed, 217 VTs (90%) could be determined. Of the 23 undetermined VTs, 2 (9%) were VT1 and 21 (91%) were VT2; and 7 (30%) related to tests in individuals with paraplegia and 16 (70%) to tests in individuals with tetraplegia (Fig. 1). In 18 out of the 30 tests (60%), both VTs could be determined during both sessions by both raters. Fourteen of these tests were related to individuals with paraplegia. Among individuals with paraplegia, there were no differences in peak test physiological values between tests where all VTs could (N = 14) and could not (N = 5) be determined (Median (Mdn) ± standard error (SE): VO2 peak 1.50 ± 0.17 L/min vs 1.11 ± 0.25 L/min, p = 0.19; PO peak 98 ± 10 W vs 70 ± 15 W, p = 0.11; HR peak 161 ± 6.8 bpm vs 156 ± 5 bpm, p = 0.20; RER peak 1.29 ± 0.02 vs 1.43 ± 0.08, p = 0.39). However, test duration was significantly lower in tests where one or more VTs could not be determined (Mdn 5.1 ± 0.6 min), compared with tests where all VTs could be determined (Mdn 7.6 ± 0.5 min, U = 11, z = −2.22, p = 0.026). Four out of five individuals, of whom one or both VTs could not be determined by one or both raters, were individuals with a high paraplegia (thoracic level 1–5).

Fig. 1
figure 1

Flowchart of the thresholds that could be determined by two experienced raters in 30 individuals with spinal cord injury during arm crank ergometry. TP = group with tetraplegia (N = 11), PP = group with paraplegia (N = 19), VT = ventilatory threshold

Among individuals with tetraplegia, there were no differences in peak test physiological values and test duration between tests where all VTs could (N = 4) and could not (N = 7) be determined (VO2 peak 0.79 ± 0.09 L/min vs 0.77 ± 0.15 L/min, p = 0.79; PO peak 44 ± 11 W vs 35 ± 8 W, p = 0.65; HR peak 118 ± 14 bpm vs 113 ± 3 bpm, p = 0.79; RER peak 1.30 ± 0.04 vs 1.22 ± 0.06, p = 0.53; test duration 5.6 ± 1.4 min vs 4.8 ± 0.6 min, p = 0.53).

Intrarater reliability

For the total group and injury subgroups no systematic differences were found between session 1 and 2, except for the VO2 at VT2 for the group with paraplegia in rater 1 (∆ Median: 0.00 L/min, ∆ Mean: 0.06 L/min, T 7.0, SE 12.7, p = 0.01). Tables 24 show the intrarater reliability for the total group and subgroups. The relative agreement between rating sessions was very high for both raters. In subgroups, the relative agreement varied between high to very high for both raters, although small sample size and unidentifiable VTs have reduced the statistical power. This can especially be seen in Table 4, where 95% CI were wide despite the high to very high ICC. Bland–Altman plots showed small systematic error as represented by small mean differences. Random error was small to large as represented by the small to wide 95% LoA in Figs. 2 and 3. Fig. 2 a, b, d, e and Fig. 3 a, b, d, e show the absolute agreement within raters for PO and HR, respectively.

Table 2 Threshold characteristics rater 1 and rater 2 for the total group of participants during arm crank testing
Table 3 Threshold characteristics rater 1 and rater 2 for the group with paraplegia during arm crank testing
Table 4 Threshold characteristics rater 1 and rater 2 for the group with tetraplegia during arm crank testing
Fig. 2
figure 2

Bland–Altman plot representing the absolute agreement of the power output (PO) within raters and between raters. Solid line represents the mean bias (systematic error), dotted lines represent mean ± 2 SD (95% LoA; random error). Circles and squares represent individuals with paraplegia and tetraplegia, respectively. a Intrarater reliability rater one at VT1. b Intrarater reliability rater two at VT1. c Interrater reliability at VT1. d Intrarater reliability rater one at VT2. e Intrarater reliability rater two at VT2. f Interrater reliability at VT2

Fig. 3
figure 3

Bland–Altman plot representing the absolute agreement of the heart rate (HR) within raters and between raters. Solid line represents the mean bias (systematic error), dotted lines represent mean ± 2 SD (95% LoA; random error). Circles and squares represent individuals with paraplegia and tetraplegia, respectively. a Intrarater reliability rater one at VT1. b Intrarater reliability rater two at VT1. c Interrater reliability at VT1. d Intrarater reliability rater one at VT2. e Intrarater reliability rater two at VT2. f Interrater reliability at VT2

Interrater reliability

There were no systematic differences between rater 1 and rater 2. The relative agreement between both raters was high to very high for the total group as well as for the subgroups (Tables 2, 3, 4). Again, due to small sample sizes and the number of excluded undetermined VTs, the number of tests in the subgroups was small. Bland–Altman plots showed small systematic error as represented by small mean differences. Random error was generally large as represented by wide 95% LoA in Figs. 2 and 3. Fig. 2 c, f and Fig. 3 c, f show the absolute agreement between raters for PO and HR, respectively.

Discussion

Of all VTs to be analyzed, 90% could be determined. Of the undetermined VTs, most were VT2 and related to individuals with tetraplegia. In 60% of the tests, both thresholds could be determined during both sessions by both raters. For the successfully determined VTs, the relative intrarater reliability was very high, whereas random error ranged from small to large within raters and among outcome measures. The relative interrater reliability was high to very high with a low absolute agreement due to large random error.

The participants of the present study were recreationally active individuals with SCI. For physical fitness, the participants with paraplegia scored “good” for VO2 peak compared with the average untrained population with paraplegia, based on the study by Simmons et al. [34]. The participants with tetraplegia scored “average” for VO2 peak compared with the average untrained population with tetraplegia [34]. We might conclude that the population in the present study has a physical fitness somewhat above average, compared with the untrained population with SCI.

For the individuals with paraplegia, VO2 at VT1 and VT2 was on average 53 and 76% of VO2 peak, respectively. In previous literature across all test modes and fitness levels, VT1 has been reported as occurring at between 56 and 77% of VO2 peak in individuals with paraplegia [19,20,21, 35, 36], whereas VT2 has been reported at 78% of VO2 peak [19]. Possible small differences between the present study and previous literature might be explained by mode of exercise and training status of the participants. Physical fitness of the studied population in previous literature is generally higher than in the present study (VO2 peak on average 1.9 L/min in previous literature vs 1.5 L/min in present study) [19, 35, 36].

For individuals with tetraplegia, VO2 at VT1 and VT2 was on average 68 and 81% of VO2 peak, respectively. In previous literature across all test modes and fitness levels, VT1 has been reported as occurring at between 63 and 87% of VO2 peak in individuals with tetraplegia [19,20,21,22], whereas VT2 has been reported at 75% of VO2 peak [19]. Overall, VTs of this subgroup are comparable with those reported in literature.

Ninety percent of VTs could be determined in the present study. This is comparable with literature with able-bodied participants [15, 16] and athletes with SCI [19, 20]. Most of the VTs that could not be determined were VT2s and related to tests in individuals with tetraplegia. Leicht et al. [19] found comparable results; two out of 19 VT1s (11%) could not be determined, both in athletes with tetraplegia, and 5 out of 19 VT2s (26%) could not be determined, of which 3 belonged to athletes with tetraplegia. Leicht et al. [19] explained their findings by lower absolute ventilatory responses in individuals with SCI, and tetraplegia specifically, resulting in a narrower range of ventilatory values compared with able-bodied athletes. In the present study there was no significant difference in VO2 peak between individuals with tetraplegia whose VTs could be determined compared to those whose VTs could not be determined. However, although not significant, which is potentially due to small sample sizes, it can be seen that for both persons with tetraplegia and paraplegia PO peak and test duration were generally lower in tests where one or both VTs could not be determined. This also might explain the finding that the proportion of undetermined VTs was higher in individuals with tetraplegia compared with individuals with paraplegia. This is supported by a recent study, where in 32% of tests, VT1 could not be determined in individuals with tetraplegia [21]. They explain their findings by lower peak cardiorespiratory responses and lower test duration for those individuals, compared with tests where VT1 could be determined. Another reason for not being able to determine VTs in untrained individuals with SCI, especially at higher intensity (VT2), might be premature termination of the test due to peripheral fatigue. In the present study, three out of twelve individuals, where one or both VTs could not be determined, stated that fatigue in the arms was the reason to stop the test.

For the VTs that could be determined, relative agreement for the total group within and between raters was high to very high. The SCI subgroups results might be hard to interpret, as these groups were small. The results are comparable to previous literature with able-bodied participants and wheelchair athletes, where an intrarater reliability of 0.94–0.97 was reported [17, 18] and an interrater reliability of 0.81–0.95 [18, 20, 22]. The absolute agreement varied between outcome measures. For some measures, such as HR at VT2 between raters, the random error was large, as depicted in Fig. 3f. This figure also shows a certain degree of heteroscedasticity: random error appears to be larger for individuals with a higher HR at VT2, i.e., those with a paraplegia.

On group level the agreement is high to very high, but on individual level there might be large differences between rating sessions or raters, which has large implications for the correct prescription of exercise intensity of that individual. This suggests that relative agreement of VT determination should be interpreted with caution, not only in the present study, but also in previous literature, as the absolute agreement was unfortunately often not reported.

Practical applications

On group level the results of the present study are positive. For the majority of tests, the VTs could be determined and relative agreement within and between raters was high to very high. Nevertheless, for 7 out of 11 tests of individuals with tetraplegia, one or both raters could not determine one or both VTs. This seemed to coincide with short test duration. Despite the extensive experience of the testers with testing in individuals with SCI, it was difficult to select a protocol resulting in test duration between 8 and 12 min. It must be emphasized that individualized protocol selection is important for individuals with SCI. However, optimal protocol selection is comprehensive as cardiorespiratory fitness in individuals with SCI is based on a lot of factors, such as lesion level, sex, BMI, and training status [24, 25, 34]. As such, tests with a duration less than 8 min are common in clinical practice and are not specific for the present study [21].

As known, training intensity based on HR peak or HR reserve might not be applicable to individuals with a lesion level above thoracic spinal nerve 6 due to the altered sympathetic response to exercise [37], this is also shown in the present study, as HR peak was low in individuals with tetraplegia (Table 1). The present study shows that it is sometimes impossible to determine VTs in this group, which makes training based on training zones challenging as well. Other methods to prescribe exercise intensity might provide better precision, such as training based on ratings of perceived exertion and/or %PO peak [38]. In the present study, it was not investigated whether exercise intensity prescription based on VTs is favorable to prescription based on RPE, %HRR, or %PO peak in terms of improvements in cardiorespiratory fitness. This should be further investigated in future research. Moreover, as the large random error within and between raters suggests, training schemes based on VTs should be clinically evaluated on individual level. For example, a talk test may be used to evaluate whether the intensity is either too high or too low [39]. If this appears to be the case, VT determination should be critically re-evaluated by one or more experts in order to prevent over- or undertraining in that individual. In addition, the low absolute agreement between raters suggest that during a longitudinal follow-up with several GXTs within an individual, it would be advised to identify the VTs by the same rater.

Study limitations

Although the sample size of the present study was equal to or higher than the sample size in comparable studies [17, 18, 20, 22], the sample size of the subgroups, especially for individuals with tetraplegia, was small. Therefore the statistical power was reduced, which makes interpretation of the ICCs for subgroups less reliable. It must be noted, however, that large sample sizes in rehabilitation populations are difficult to obtain. Another aspect that was not investigated in the present study is the test-retest reliability across days of the GXT itself. It might be interesting for future studies to investigate reliability of VTs during repeated GXTs, as the variability of VTs between tests within individuals is unknown for this population.

Conclusions

Ninety percent of VTs could be determined. Most of the VTs that could not be determined were VT2s and related to tests in individuals with tetraplegia. For the VTs that could be determined, the relative intrarater reliability was very high with small to large random error. The relative interrater reliability was high to very high with large random error. Although these results are positive on group level and show that determination of VTs might be a promising method to define training intensity for the majority of the tested recreationally active individuals with SCI, it should be noted that a critical evaluation of the VTs is necessary and other exercise intensity prescription methods should be considered when one or both VTs cannot be determined.

Data archiving

The datasets generated and/or analyzed during the current study are available from the corresponding author on reasonable request.