Introduction

Multiple Myeloma (MM) is a haematological cancer that affects multiple organs and is associated with complex symptoms [1]. However, due to treatment option advances, MM survival rates have significantly improved in the past 25 years [2,3,4]. Despite the constantly evolving treatment landscape for MM, it remains an incurable and progressive disease, that requires either continuous or intermittent therapies to maintain disease stability and sustain or prolong the survival [5].

Disease symptoms, in addition to treatment side effects caused by multiple lines of therapies, can severely impact on patient’s wider health-related quality of life (HRQoL). For example, fatigue and pain are physical symptoms commonly reported by patients with myeloma which significantly impair HRQoL [6, 7]. In addition to extended survival, it is important to understand how new and combination treatments may affects patients’ lives, therefore, it is recognised that patient-reported outcome (PRO) measures are vital to assess in clinical trials and in the management of MM [8].

The European Organisation for Research and Treatment of Cancer Quality of Life Multiple Myeloma Questionnaire (EORTC QLQ-MY20), developed in 1999, is a MM specific PRO measure consisting of 20 items within four domains (disease symptoms [DS], side effects of treatment [SE], future perspectives [FP] and body image [BI]) [9]. The original module QLQ-MY24, released in 1996, included 4 additional items under the domain of Social Support (SS) that was subsequently removed due to observed ceiling effects [10]. The QLQ-MY20 module is used in conjunction with the EORTC Core Quality of Life Questionnaire (QLQ-C30) designed for use in oncology patients more generally. The MM module has been translated into over 70 language versions [11], is a MM-specific measure used most globally and is one of the most extensively validated instrument for use in MM clinical research [10, 12].

Since the module’s development, the treatment for MM has changed [13]. The original validation of the QLQ-MY20 was largely in newly diagnosed patients and the module was focused on the expected side effects of conventional chemotherapy and steroids when it was originally developed [9]. The conventional chemotherapy in 1999 was mainly melphalan, cyclophosphamide, vincristine and doxorubicin. Although it is recognized that patients with myeloma can be treated with a variety of different chemotherapy drugs and regimens, it was felt that the side effects of conventional chemotherapy and steroids may more adversely affect the HRQoL of the patients for a longer period of time. However, after 1999, no-chemotherapy treatments (proteasome inhibitors, immunomodulatory drugs, monoclonal antibodies and other novel agents) have been introduced. The increase in survival rates coupled with the rapid progression in therapeutic options for patients with myeloma have implications for the HRQoL outcomes and side effects for this population. Osborne et al published a review in 2012 identifying issues important to patients and whether existing instruments comprehensively cover the current treatment landscape and patient experience [12]. While the QLQ-C30 and QLQ-MY20 were acknowledged as the instruments which had good conceptual coverage and had undergone the most extensive validation in patients with myeloma, no instruments were identified as covering all issues relevant to patients, signifying the need for a MM module update that will represent HRQoL taking into account current therapy issues and HRQoL concerns to patients today.

The EORTC guidelines provide a four-phase framework for updating existing modules [14]. As part of Phase I (generation of QoL issues), a literature review assessing the use of the QLQ-MY20, and any reported methodological issues was performed. The following article details this literature review which aimed to explore:

  1. 1.

    In which types of clinical studies the module has been used

  2. 2.

    To what extent has the module been used in both newly diagnosed and relapsed patients

  3. 3.

    The types of treatments/therapies the module has been used to assess

  4. 4.

    How and where the module-related endpoint is positioned within randomised controlled trials (RCTs)

  5. 5.

    How the module results are reported, and the prominence given to these results

  6. 6.

    The statistical results from QLQ-MY20 subscales in RCTs

  7. 7.

    PRO limitations identified from interventional studies and validity/reliability issues raised in psychometric validation studies

Methods

Literature search, eligibility criteria and screening

The primary search was conducted using the Ovid SP platform, accessing the electronic bibliographic databases: Medline, EMBASE and PsycINFO. Searches combined the use Keyword search (i.e., reference is identified if it includes the specified term within its bibliographic reference) and a Subject Heading search. Subject Headings are a controlled set of terms used in bibliographic databases to index articles by topic. A supplementary search in Google scholar was also performed and references that had not been previously identified were reviewed for inclusion. Only papers published between 1996 and 2020 were sought as this reflects when the MM module (MY24) was first released. The searches sought publications referencing ‘Multiple Myeloma’ in addition to ‘MY20’, ‘MY24’ or ‘EORTC’ or containing reference to the QLQ-MY20 domains.

See supplementary materials [1] for search strings.

Abstracts were included if they were reporting a clinical study of any design that generated data using the QLQ-MY20/24 or a study to evaluate the QLQ-MY20/24, including the assessment of the psychometric properties of the module (validation study). Only abstracts reporting original research were included thus reviews, conference proceedings and book chapters were excluded. The full-text publications were sought for all references meeting these criteria. When a single study was referenced across multiple references only the most comprehensive or relevant publication (e.g., HRQoL focused) was retained. Clinical studies were categorized as interventional (i.e., RCT’s, clinical trial – single-arm, clinical trial – cross over) or observational (i.e., cross-sectional and longitudinal/cohort) study designs.

Data extraction

General information (e.g., author, title and year and location of the study) was collected for all studies. For all clinical studies information about the disease severity (i.e., newly diagnosed/relapsed), and other clinical outcome assessments (COAs, including patient-reported outcomes) used was extracted. For trials (RCT’s, single, and cross-over arm) further information about the study design, reporting and presentation of results were extracted. Further in-depth extraction of RCTs was performed, including type of statistical analysis on QLQ-MY20 data and comparisons between groups. For validation studies data on the instrument structure and data distribution, reliability, validity and ability to detect change/interpretation of change scores was extracted.

Interrater agreement

Data extraction was initially performed by one reviewer, the indications of the first reviewer were subsequently checked by a second reviewer. Any cases of disagreement or uncertainty were then discussed, and consensus was established in all instances by the study team based on the inclusion criteria. For the extraction of statistical data, all data extracted was checked by a statistician to ensure accuracy.

Results

The search yield 502 unique records (Fig. 1) of which 74 publications were taken forward for review (33 full-text articles and 41 conference abstracts).

Fig. 1
figure 1

Flow diagram of the abstract screening process.

Study designs where QLQ-MY20 is used

Table 1 provides an overview of study design where the QLQ-MY20 was used and the country in which the author team were affiliated. The studies had a wide international spread and in recent years there has been a growth in scientific publication on the use of the QLQ-MY20 in both clinical and instrument validation studies. There has been an increase in the use of the QLQ-MY20 in RCTs, single-arm clinical trials and cross-sectional observational studies over time.

Table 1 Study design and country.

QLQ-MY20 instrument use in observational and interventional studies

When stated, interventional and observational studies included either exclusively relapsed patients (n = 24/43, 55.8%, 14 of which were interventional), newly diagnosed patients (n = 10/43, 23.3%, seven of which were interventional), and a mix of newly diagnosed and relapsed patients (n = 9/43, 20.9%, none of which were interventional). Over time, both observational and clinical trials increasingly utilized the QLQ-MY20 with samples of relapsed patients and mixed samples of newly diagnosed and relapsed patients.

With the exception of two observational studies [15, 16], all 65 studies used the QLQ-MY20 in conjunction with the EORTC QLQ-C30 as required by the EORTC modular measurement approach. The COAs used in conjunction with the EORTC QLQ-C30 [17] and QLQ-MY20 [10] module largely assessed peripheral neuropathy (e.g., FACT-GOG-Ntx) [18], HRQoL (e.g., EQ-5D-5L) [19], emotional wellbeing, particularly anxiety and depression (e.g., HADS) [20], fatigue (e.g., FACIT) [21], sleep quality (e.g., PSQI) [22] and functional impairment (e.g., KPS) [23]. A complete list of COAs used in conjunction with the QLQ-MY20 can be found in the supplementary material (Supplementary Table 1).

QLQ-MY20 instrument use in observational studies

See Supplementary Table 2 for a summary of observational studies that included the QLQ-MY20.

QLQ-MY20 instrument use in interventional trials

Table 2 provides a summary of the characteristics of the interventional studies that used the QLQ-MY20 (n = 21). QLQ-MY20 subscales were most commonly defined in studies as secondary (n = 11/21, 52%) or exploratory (n = 6/21, 29%) endpoints.

Table 2 Summary of the n = 21 interventional trials (randomized controlled trial or clinical trial-single arm/cross-over) identified by the literature review.

Trends over time (for the reporting periods 2006–2010, 2011–2015, and 2016–2020) were assessed across interventional studies and four notable trends were observed. Over time, the proportion of RCTs, relative to single-trial arm and cross-over trials, increased from n = 0 between 2006 and 2010 to n = 5/7 between 2011 and 2015 to n = 10/13 between 2016-2020. Similarly, the number and proportion of trials utilizing a sample of patients who have experienced their 1st or subsequent relapses, relative to being newly diagnosed, increased over time from n = 1/2 between 2006 and 2010 to n = 4/7 between 2011 and 2015, and n = 9/13 between 2016-2020. The average QoL sample size increased from 144 between 2006 and 2010 to 479 and 465 between 2011 and 2015 and 2016 and 2020 respectively. In recent years, there has also been more questionnaires used in conjunction with the QLQ-MY20; between 2006 and 2010 only two additional questionnaires were used alongside the QLQ-MY20, however, five were used between 2016 and 2020. No differences were observed in the types of treatments/therapies the QLQ-MY20 has been used to assess, the endpoint hierarchy that the QLQ-MY20 was selected for, the study phase it was used in or the presentation of QoL results in the form of tables, figures and/or in text.

The review of interventional study papers highlighted the main limitations with the PRO instruments or analysis/results as reported by authors (Table 2). Some issues are those generally affecting PROs rather than specific to the QLQ-MY20 such as differential dropout or poor completion rates potentially biasing the analysis, low baseline levels of symptoms limiting the opportunity to show improvement, single arm studies, short term PRO data collection and lack of standardization in collection and analysis of PROs across trials limiting comparison of results across studies. Issues raised which may be more specific to the QLQ-MY20 were the need for thresholds for meaningful change at the individual patient level, the need for consistency across studies in definitions of meaningful change, discrepancy between patient-reported ‘tingling hands and feet’ and the clinician reported peripheral neuropathy events, higher incidence of AEs or more severe AEs not translating into an impact on the PRO scores and potential lack of sensitivity of current questions to pick up variations in HRQoL depending on treatment administered. Another paper suggested that elements such as dosing convenience were currently not adequately measured by the available PROs.

Role of QLQ-MY20 alongside clinical endpoints in RCTs

Table 3 summarises the results from the 15 RCTs with respect to comparisons of QLQ-MY20 scores between treatment groups. The statistical significance of any mean difference comparisons between groups and any time to deterioration (TTD) comparisons between groups is reported.

Table 3 Summary of QLQ-QLQ-MY20 results from 15 RCTs.

Most trials evaluated the meaning of the PRO results in context with the clinical results. Five of the 15 trials were comparing triplet versus doublet therapy combination therapies. It was common in these studies for no statistically significant differences between treatment groups to be observed and for authors to interpret this as a positive result, demonstrating the addition of an agent to the combination did not impact on HRQoL. Four studies reported statistically significant differences between groups for the SE subscale (lenalidomide (Revlimid), dexamethasone [Rd] vs melphalan, prednisone, thalidomide [MPT], carfilzomib, dexamethasone [Kd] vs bortezomib, dexamethasone [Vd], melphalan, prednisone, thalidomide followed by thalidomide maintenance [MPT-T] vs melphalan, prednisone, lenalidomide followed by lenalidomide maintenance [MPR-R] and salvage autologous stem-cell transplantation [sASCT] vs nontransplantation consolidation [NTC]). One study reported longer time to deterioration for one arm for the DS subscale (once weekly vs twice weekly). One study reported longer time to deterioration for the SE subscale (Kd vs Vd). Another study reported differences between arms with respect to FP at later timepoints (IRd vs Rd). One small study [24] noted clinically relevant differences between cyclophosphamide‐bortezomib‐dexamethasone (VCD) plus placebo and VCD plus clarithromycin for DS and SE, and statistically significant differences with respect to BI.

In addition to these formal comparisons between treatment groups the RCTs also reported the proportion of patients with improved/stable/worsened QLQ-MY20 scores, association of clinical endpoints (response, time to progression and toxicity) with the QLQ-MY20 scales and the effect of age on HRQoL benefit.

QLQ-MY20 validation studies

Nine validation studies were identified in the review [10, 17, 25,26,27,28,29,30,31]. Four validation studies highlighted potential ceiling effects for the BI subscale. No issues with item reliability (Cronbach’s alpha) were identified for the multi-item scales [10, 17, 25,26,27, 29, 30]. Test-retest reliability was assessed in one article [25]; all four QLQ-MY20 subscales had high test-retest reliability (ICC ≥ = 0.85). Two articles assessing factor analysis were inconsistent with one showing acceptable fit [25] and one suggesting item reduction in the SE subscale [27].

External validity convergent/discriminant validity was reported in two full text articles. Kontodimopoulos N. et al. (2012) demonstrated correlations between SF-36 domains and QLQ-MY20 domains, Graca Pereira M et al. (2019) found correlations between QLQ-MY20 domains and QlQ-C30 total score, Satisfaction with social support scale (SSSS), and the HADS.

Eight articles reported known groups validity across a range of groups (albumin, haemoglobin, beta 2 microglobulin [10, 17, 25,26,27,28, 30, 31], performance status, gender, age and presence of fractures). Three articles demonstrated ability to detect change [10, 25, 26].

Discussion

The objective of this literature review was to review the use of the QLQ-MY20, since its first release 25 years ago, as the first validated module for patients with myeloma designed to be used with the EORTC QLQ-C30. The MM specific PRO measure consists of 20 items across four domains (refined from the original 24-item module [MY24] following early phase research). This literature review focused on the period after its publication in 1996 through to 2020.

There were a few drivers for this review. At the time of the original validation study the majority of clinical trials were in newly diagnosed patients and there was limited data for validation of the QLQ-MY20 in relapsed/refractory patients. Over the time period since the original publication of the QLQ-MY20, the treatment landscape has changed dramatically and patients with myeloma now undergo multiple lines of treatment and relapses. We wanted to use this review to see if the use of the questionnaire in relapsed patients has increased accordingly. The review aimed to summarise the range of studies the questionnaire has been reported in, how the data from the QLQ-MY20 was reported and how the results impacted on the evaluation of the treatments in the studies alongside clinical endpoints. We also wanted to collate any further psychometric evaluations of the QLQ-MY20 to see if any issues have emerged as the use of the questionnaire changed.

Seventy-four studies, that used the QLQ-MY20, were reviewed following screening, of which there were 15 RCTs, 6 single arm or cross-over trials, 44 observational and nine instrument validation studies, indicating diverse and extensive use of the QLQ-MY20 in several different clinical settings and investigations. The review of the published literature did not highlight any specific problems with the QLQ-MY20, however, qualitative interviews are ongoing to further explore the patient experience of symptoms and side effects of novel treatments. A revised version of the QLQ-MY20 is therefore warranted to ensure all concepts of interest are captured; concepts assessed by the additional COAs reported should be explored further in Phase I and II (generation of QOL issues and construction of the item list) of modular development and considered for inclusion in the updated version of the QLQ-MY20.

The RCTs highlighted that often no difference between treatments were observed with respect to the QLQ-MY20 subscales but that in conclusion often this was a desirable outcome, especially regarding the SE subscale (e.g., demonstrating that adding a further agent to a combination regimen does not have a detrimental impact on QoL). As new treatment regimens and new combination therapies continue to be developed, this should be a key consideration at the design stage for a RCT. The QoL comparisons should be non-inferiority rather than superiority and ensuring there is sufficient sample size to declare non-inferiority where applicable. It is also important for robust meaningful change thresholds to be determined in order that non-inferiority margins can be defined. To date there has been one study on deriving meaningful change [31] but further development of these may be required. The RCT data also supported the QLQ-MY20 subscales being related to clinical outcomes and supporting and supplementing the conclusions from the clinical endpoints. A number of studies investigated the relationship of the QLQ-MY20 scales with clinical outcomes such as time to progression and response.

Indicative of the expansion of the treatment portfolio and changing prognosis for patients, the proportion of RCTs using the QLQ-MY20 increased over time from n = 0 in the first 5 years to n = 10/13 in the last 5 years. The proportion of trials in patient post their 1st or subsequent relapses, relative to being newly diagnosed, increased over time from n = 1 in the first 5 years to n = 9/13 in the last 5 years. Over these time periods there were no observed trends for QoL endpoints to move up the hierarchy, however, this could be due to the inevitable time lag between research and publication of findings. Similarly, there were no trends or improvements in the reporting of QLQ-MY20 results in tables/figures rather than text alone; generally the reporting of the QLQ-MY20 included tables and/or figures throughout the period.

There were a few instances where limitations of the QLQ-MY20 were highlighted by individual papers. One issue was the need for work on meaningful change thresholds for the QLQ-MY20. Although this has since been addressed by Sully et al [31] more studies in this area would be beneficial in the future. Some studies used an additional peripheral neuropathy questionnaire alongside the QLQ-MY20 and one noted a discrepancy between the QLQ-MY20 item ‘tingling hands and feet’ and the clinician-reported peripheral neuropathy, which could indicate the need for more detailed items in the QLQ-MY20 on this side effect. Amongst the psychometrics studies, the instrument performed consistently well. One potential issue found in some studies was a ceiling affect for the BI subscale so this may warrant further investigation and may be the case for certain populations.

Potential limitations of our study include comprehensiveness of the usage of the QLQ-MY20. Our search will have identified any studies reporting results from the QLQ-MY20 but we acknowledge that this will exclude any studies that have used the instrument but not published any results from it. There will also be key multiple myeloma trials not in this review as they used only the QLQ-C30 or a different PRO. Regardless we have shown across a broad range of studies where the QLQ-MY20 has been used some of the trends over time in terms of patient populations and study designs.

In conclusion, the QLQ-MY20 has been shown to perform well psychometrically since its initial validation. The QLQ-MY20 scales have been supportive of clinical endpoints in RCTs and have been used to understand the patients’ QoL alongside improved response and time to progression outcomes. To maintain content validity in today’s MM treatment landscape (i.e., to ensure the instrument is relevant to MM patients and captures their symptoms and side effects of novel treatments and later lines of therapy) qualitative interviews with patients and health care professionals and an update to the QLQ-MY20 is underway to incorporate findings.