We thank Sabour and Ghassemi1 for their comments to our article,2 and contribution to improve the quality of reliability studies through choosing the right statistical measures.

First issue raised1 is the use of Kappa statistics as a measure of reliability (agreement, precision) and weighted Kappa. Weighted Kappa requires the assessed variables to be of ordinal nature. However, the outcome variables in our study were mainly dichotomous and thus a weighting scheme was not possible.3 Only the last variable in the International Spinal Cord Injury Musculoskeletal Basic Data Set (ISCIMSBDS),4 with the question: ‘Do any of the above musculoskeletal challenges interfere with your activities of daily living (transfers, walking, dressing, showers, and so on.)?,’ and the options: ‘not at all’, ‘yes a little’, ‘yes a lot’ is ordinal. One could, perhaps, argue that variables concerning ‘Fractures’, ‘Heterotopic ossifications’, ‘Contractures’ and ‘Degenerative Changes/Overuse’, seen in the table of the ISCIMSBDS can be considered ordinal, concerning the locations of the above mentioned variables. In this case, two raters choosing adjacent locations would be considered as better agreement than locations physically longer apart. For example, in the case where the first rater choose the location ‘Elbow’ and the second rater choose ‘Shoulder/Humerus’, then this would reflect better agreement than in the case where the second rater chose the location ‘Foot’. This could be a reasonable argument, but from a clinical perspective we found it more appropriate and relevant only to consider exact agreement of locations as measure of reliability.

Two weaknesses of Kappa were mentioned. Firstly the fact that Kappa is affected by prevalence, exemplified in Figure 1 by Sabour and Ghassemi.1 A skewed distribution between the two concordant pairs results in a lower kappa value (Figure 1a) despite having the same percentage/crude agreement of the two concordant and discordant pairs. We agree with this concern and was also mentioned it in the statistics section of the article.2 The fact that Kappa is sensitive to prevalence can therefore be seen as a limitation. It could, however, also be argued to be an advantage. Kappa is chance-corrected agreement or agreement beyond chance. In a population with the prevalence seen in Figure 1a, there would be an increased probability of raters to agree by chance, and Kappa adjust for this. Sim and Wright5 use the Prevalence Index to describe this effect: | a − d | n (see Table 1). A high Prevalence Index is followed by a low Kappa and vice versa. We argue that this can be a desired property of Kappa, but that the prevalence should be taken into account when interpreting the Kappa value. This is the reason why we reported the prevalence of the symptoms in our article as well as reporting the percentage agreement.

Table 1 A 2×2 contingency table of agreement between two observers with concordant pairs (a, d) and discordant pairs (b, c)

The second mentioned limitation of Kappa is that it depends upon the number of categories, which means the more categories, the lower the kappa value. This is true when using the weighted Kappa, but does not apply to the present study as mentioned earlier.

Finally, Sabour and Ghassemi address the importance of having an individual-based approach instead of a group based. We can confirm that for both intra- and inter-rater evaluations, all comparisons were performed pair-wise between the two relevant ratings, either performed by the same rater twice (intra-rater) or by two different raters (inter-rater), and then summed in a 2×2 contingency table used for both percentage (crude) agreement and Kappa calculations.