Introduction

Biological disease-modifying antirheumatic drugs (DMARDs) and targeted therapies are relatively new drugs in the history of rheumatology, and have been used to treat various rheumatic diseases such as rheumatoid arthritis (RA) and ankylosing spondylitis (AS), which exhibit therapeutic effects by antagonizing the actions of various molecules or immune cells1. However, they can lead to unexpected adverse effects due to the intricate and systemic nature of the immune system. Therefore, pharmacovigilance for unknown adverse drug reactions (ADRs) is essential2. In contrast to drug ADR data from clinical trials conducted in highly controlled environments, drug ADRs have been more comprehensively identified from real-world data, encompassing a wide array of adverse events occurring in diverse patient populations and scenarios, making its significance natably high. Data sources from real-world consist of post-marketing surveillance (PMS), spontaneous reporting systems (SRS), electronic health records (EHRs), claims data, and drug registries where a larger number of patients covered various ages, comorbidities, and concomitant medications.

Pharmaceuticals, physicians, or patients report possible ADRs to the SRS, and several representative SRSs include the World Health Organization's ‘International Pharmacovigilance Program,’ ‘FAERS’, ‘Spontaneous Adverse Drug Reaction Reporting System,’ and ‘Prescription Event Monitoring (PEM)’. A signal, as expressed as “drug-ADR pairs”, indicates the statistical possibility of association between an adverse effect and a suspected drug when the causal relationship is unknown, or the evidence is not yet sufficient3. Government organizations for pharmacovigilance or pharmaceuticals generally detect signals through big data mining using measures of disproportionality, such as the proportional reporting risk (PRR), reporting odds ratio (ROR), and Bayesian confidence propagation neural network (BCPNN)4,5,6. The signals are then validated, and some of them are prioritized and assessed for policy establishment for safe drug use.

In contrast, adverse events were observed in the EHRs with clearer causal relationships during patient management. Common data models (CDMs) are well-known standardized EHR data structures and CDMs such as Sentinel CDM or Observation Medical Outcomes Partnership (OMOP)-CDM were originally developed for the pharmacovigilance of newly marketed drugs7,8. To date, studies on drug safety using EHR or CDM data have been actively conducted9,10,11. Some authors in this study developed an OMOP-CDM-based pharmacovigilance data-processing pipeline for the active surveillance of laboratory ADR signals and extracted six significant ADR signals causing ear disorders using CDM data12.

Other newer resources of the signals could be registries for drugs or diseases run by countries or research networks13 where participating researchers at medical institutes register patients and treament-related items with higher accuracy and quality than SRS. In registries, cohorts are established and tracked enabling the inference of the temporal causation of an agent and an ADR.

In summary, registry data, similar to EHR data, holds significance as real-world data and benefits from healthcare professionals' direct reporting of ADRs. Many countries have actively established and collected data from these sources. Given the common limitation of underreporting in all data sources, the exploration of registry data, which has not been widely utilized to identify signals, holds significant value. Thus, this study aimed to discover signals using a registry data from Korea. The KOrean College of Rheumatology BIOlogics & Targeted Therapy Registry (KOBIO) is a nationwide web-based drug registry under the Korean College of Rheumatology (KCR) founded in 201214, where for RA and AS, respectively, they registered 2624 and 2240 patients with 8777 and 8198 followed-ups by October 2021. Signals had not extracted from registry data in rheumatology to our knowledge. In this study, we searched for novel signals of biological DMARDs or targeted therapies in patients with RA or AS using KOBIO data.

Methods

Definition of novel signals

The WHO definition of a signal is ‘reported information on a possible causal relationship between and adverse event and a drug. In this study, ‘novel signals’ were defined as statistically significant ADRs of target drugs determined from the source data and were not reported in the latest versions of the FDA labels, either from clinical trials or PMS. We used KOBIO data as the source data and FAERS and KYUH-CDM laboratory data for the external validation of the novel findings. The data collection and overall analysis flow are shown in Fig. 1.

Figure 1
figure 1

Data collection and overall analysis flow. KOBIO, KOrean College of Rheumatology BIOlogics & Targeted Therapy registry; RA, rheumatoid arthritis; AS, ankylosing spondylitis; ADR, adverse drug reaction; Drug*, drugs by brand name; RR, relative risk; CTZ, certolizumab; BAR, baricitinib; IXK, ixekizumab; RR, relative risk; FDA, Food and Drug Administration; FAERS, Food and Drug Administration Adverse Event Reporting System; CDM, common data model.

KOBIO Data collection and preparation for analysis

We retrieved data of patients with RA or AS from the KOBIO database from December 2012 to June 2020. The KOBIO enrolls patient with RA or AS separately when they starts biological DMARDs or targeted therapies or have a switch from one to another, and then collects follow-up data if the patients are on the same agents on yearly base, or when any switching or discontinuation occurs. When patients provide their informed consent, healthcare professionals typically conduct face-to-face interviews with patients to collect comprehensive medical information, including relevant details regarding medications and ADRs. They also review medical charts and laboratory test results. The dataset is primarily segregated by disease (RA or AS) and by phase (initial or follow-ups), incorporating patients' demographic information, risk factors, comorbidities, relevant medication details, laboratory data, and reports of ADRs that occurred during interim periods for the follow-up phase. In KOBIO, they gather data using the brand names of prescribed biologics or targeted therapies, we changed originators and biosimilars into generic names for analysis (see Supplementary Table 1, which demonstrates the drugs and disease codes).

Within the KOBIO registry, adverse drug reactions (ADRs) are systematically categorized according to organ systems, providing the option to choose predefined ADR categories or input additional details in a free-text format. The reporting system obliges the submission of the association between ADRs and the drugs (classified as related, unrelated, or indeterminate), ADR severity (categorized as mild/Grade I, moderate/Grade II, or severe/Grade III), and ADR outcome (indicating whether the ADR did not resolve, resolved, resolved with sequelae, had an unknown outcome, or resulted in death). In light of our study's primary focus on the identification of ADR signals, we specifically selected ADR associations classified as 'related' and 'indeterminate' while excluding ADR severity and outcome data, with our main emphasis placed on the presence or absence of ADR reports.

Furthermore, the KOBIO registry, although primarily serving as a drug registry, separately catalogues data regarding adverse reactions based on the specific diseases. During the data collection phase for our study, medications used in RA and AS, except for TNF inhibitors, did not overlap. The structural differences in the database of the two diseaes and the potential variations in signal based on the specific diseases make it challenging to merge and analyze the data collectively. So, we conducted separate analyses for the RA and AS databases and after obtaining significant drug-ADR paris for both diseases, we merged them and submitted them as candidate signals following the extraction of overlaps with FAERS.

Patients who withdrew consent after registration were excluded from the analysis and each follow-up event was considered an independent case. Only patients who continued treatment with the same agents at follow-up were selected for valid analysis of cause-and-effect relationships. To facilitate external validation of the results with disparate data sources, we converted the ADR terms reported by KOBIO into the Medical Dictionary for Regulatory Activities Preferred Terms (MedDRA PTs).

Total ADRs consisted of ADRs reported by researchers from participating hospitals and laboratory abnormalities collected during follow-up. Contingency tables were used to calculate the relative risk (RR) of each reported ADRs. For this analysis, we designated the number of cases with the drug of interest that also had the ADR of interest as ‘a’, and the number of cases with the drug of interest but without the ADR of interest as ‘b’. Additionally, we labeled the number of ADR cases that occurred with other drugs than interest as ‘c’, and the number of no ADR cases associated with other drugs as ‘d’, categorising these counts by disease. RRs > 1 was considered statistically significant15.

$$RR = \frac{{a/\left( {a + b} \right)}}{{c/\left( {c + d} \right)}}$$
 

ADR of interest

No ADR of interest

Drug of interest

a

b

Other drugs

c

d

Exclusion process for selection of the novel findings

Drug-ADR pairs showing significant RRs (> 1) from the KOBIO data were searched for existence checks in the latest FDA labels of each originator drug (2019–2021). Pairs not included in the reference were selected as signal candidates for further analysis.

External validation

We conducted external validation of our significant drug-ADR pairs using two different datasets to ensure the reliability and generalizability of the study results and to enhance the overall quality of the research.

FAERS database

To externally validate the drug-ADR signals from KOBIO, we used publicly available FAERS data (January 2012 to December 2018, http://www.fda.gov/). In the FAERS database, drug-ADR pairs have been reported with either brand or generic names. We extracted brand names (DrugBank, https://go.drugbank.com/), then changed them into the corresponding generic names (see Supplementary Table 1). For the novel drug-ADR pairs, we performed disproportionality analysis with FAERS data using the R package “PhViD” according to the authors’ protocol (I. Ahmed & A. Poncet, 2016. PhViD: an R package for PharmacoVigilance signal Detection). We obtained the PRR, ROR, and information component (IC) of BCPNN with FDRs for significant drug-ADR pairs (FDR < 0.05).

$$PRR = \frac{{a/\left( {a + b} \right)}}{{c/\left( {c + d} \right)}}\quad ROR = \frac{a/b}{{c/d}}\quad IC = \log_{2} \frac{{a \times \left( {a + b + c + d} \right)}}{{\left( {a + c} \right)\left( {a + b} \right)}}$$
 

ADR of interest “Cases”

Other ADR “Non-cased”

Drug of interest

A

B

Other drugs

c

d

KYUH-CDM laboratory data

One of the authors has developed and documented an algorithm for pharmacovigilance within the CDM referred to as MetaLAB16. This was designed to identify abnormal test results in response to drug usage for 102 laboratory signals. These signals encompass 38 instances of values exceeding upper limits, 39 instances of values falling below lower limits, and 25 instances of values falling outside the normal range.

In this study, we applied the CDM-based MetaLAB algorithm to KYUH-CDM data (January 2012 to December 2019) to detect ADR signals by employing laboratory test results as supplemental information for ADR assessment using statistical approaches. The patient selection process involved identifying individuals with one of the specified disease codes and one of the relevant drug codes. All drug codes used during the target period were included. Subsequently, patients with or without the drug code of interest were categorized as 'a' and 'c,' respectively, while the presence or absence of laboratory signals of interest was denoted as 'b' and 'd,' respectively (as illustrated in Methods 2.2). The Korean Standard Classification of Diseases (KCD) codes for diseases and the Anatomical Therapeutic Chemical Classification System (ATC) codes for biological DMARDs are shown in Supplementary Table 1. RRs were calculated, and > 1 was considered significant.

Final selection of candidate signals

To increase the validity of the signals by signal enhancement, the intersection of the KOBIO and FAERS signals was selected and suggested as the final novel signal candidate.

Statistical analysis

Numerical data are expressed as mean ± standard deviation (SD) or median and interquartile range (IQR), and as numbers (%) for categorical variables. For the KOBIO data and KYUH-CDM, RRs were calculated from contingency tables of “exposure to a drug of interest” and “ADR of interest,” and > 1 was considered significant. When using the FAERS database, we conducted a case/non-case analysis using the PhViD package to calculate PRR, ROR, BCPNN, and FDR. Data mining and statistical analyses were conducted using OracleSQL (for FAERS), PostgreSQL (for KYUH-CDM), and the R software (version 4.1.1).

Ethics statement

This study was approved by the Institutional Review Board of Konyang University Hospital (IRB No. KY 2020-07-005-003), and complied with the Declaration of Helsinki. The requirement for written informed consent was waived by the approving authority because the patients registered in KOBIO consented to the use of data for research, and the EHR-based CDM data and FAERS dataset comprised of de-identified secondary data.

Results

Patients' demographics for the KOBIO data

The total number of patients at initial registration and follow-up, excluding patients who withdrew consent (RA, n = 81; AS, n = 54), was 2279 and 6908 for RA and 1940 and 6583 for AS, respectively (Fig. 1). Among the eight options for the status of drug use at each follow-up, we only selected patients who continued the same medication during follow-ups (RA, n = 3972; AS, n = 4969) to obtain data on temporally related drug ADRs, excluding the effect of other biological DMARDs. Patient’ demographics at the initial registration are shown in Table 1, with the numbers from follow-up. Male to female ratios were 1:4.8 and 3.2:1, and the mean ages of patients were 55.0 ± 13.0 and 39.3 ± 13.2 years for RA and AS respectively. The most commonly used biological DMARDs for RA and AS were tocilizumab and adalimumab, respectively. Ninety-two percent of patients with RA were prescribed concomitant DMARDs, with methotrexate being the most prescribed; 12% of patients with AS were prescribed DMARDs, with both sulfasalazine and methotrexate being the most prescribed drugs.

Table 1 Demographics of patients from the KOBIO data.

Significant relative risks of drug-ADR pairs from the KOBIO data

We calculated the RRs for each drug (RA, n = 10, AS, n = 8) and ADR (approximately 600), and the signal candidates were presented as “drug-ADR” pairs. For RA, the number of significant drug-ADRs with RR > 1 was 51 from eight agents and 633 ADRs, including 34 laboratory abnormalities (certolizumab not prescribed, and baricitinib unassociated with any significant ADR). For AS, the number of significant ADRs was 36 from four agents and 549 ADRs, including 33 laboratory abnormalities (certolizumab, ixekizumab not prescribed).

Exclusion process for selection of the novel findings

All candidate ADRs were searched in the latest FDA labels of the originators for the exclusion process. Sixteen drug-ADR pairs for RA and 10 for AS were reported in the labels from the observations in the clinical trials and/or PMS (see Supplementary Table 2, which demonstrates excluded KOBIO drug-ADR pairs previously reported in the latest FDA labels). The final 35 drug-ADR pairs with significant relative risks and 95% confidence interval (CI) for RA and 26 for AS are listed in Tables 2 and 3, respectively.

Table 2 Drug-ADR pairs with the significant RRs for RA from the KOBIO data.
Table 3 Drug-ADR pairs with the significant RRs for AS from the KOBIO data.

External validation

FAERS database

For external validation, the final drug-ADR pairs from 3.3 were searched in the FAERS database with generic names, and disproportionality analyses were conducted. All PRR and ROR results were identical, and the BCPNN results included all PRR and ROR results for both the RA and AS (Tables 2 and 3). For RA, 13 of 35 drug-ADR pairs were also significant in the FAERS dataset (any of BCPNN, PRR, and ROR), and for AS, 6 of 26 pairs were common and are presented as marked in the tables.

KYUH-CDM data

Using the codes for diseases and drugs in Supplementary Table 1, we retrieved 1411 patients with RA and 656 patients with AS from the CDM data. When we independently applied the MetaLAB algorithm to the patients, the significant drug–laboratory ADR pairs (RR > 1, p < 0.05) were 14 pairs for RA, and 9 pairs for AS. The results consistent with the KOBIO results were “TCZ-Anemia” and “TOF-Thrombocytosis,” observed for RA (see Supplementary Table 2).

The final selection of candidate signals

The intersection of the significant signals from KOBIO and FAERS (BCPNN, PRR, and ROR) was selected as the final candidate (Tables 2 and 3), and the final 14 drug-ADR pairs are listed in Table 4.

Table 4 Final drug-ADR signals suggested from the KOBIO data.

Discussion

ADR is a serious problem that incurs various costs and losses to both patients and society, and drugs should be properly monitored for ADRs. For decades, quantitative measurements, such as the signal detection of ADRs, have been conducted, particularly for newer drugs17,18. Several drugs have been withdrawn from the market through signal detection with SRSs19,20. The biological DMARDs manufactured from biological sources using recombinant techniques have been widely prescribed for treating various rheumatic diseases, they also mandate comprehensive and thorough pharmacovigilance3. The main data source of pharmacovigilance is SRS, and other longitudinal real-world data sources are EHRs, and claims data. Additionally, multimodal signal detection was performed to obtain more valid results by combining the results from the SRSs, claims data, and EHRs21,22,23,24. Another data resource for ADRs are registries, and examples of nationwide registries for biological drugs are BSRBR(British Society for Rheumatology Biologics Register), DANBIO(Danish Database for Biological Therapies in Rheumatology), and ARTIS (antirheumatic therapies register, the Swedish biologics register). They include many patients on biological DMARDs nationwide, and could be valuable resources for signal detection. However, research on signal extraction from drug registry data has not yet been actively conducted.

This is the first report of signal detection for biological DMARDs from a nationwide drug registry, with external validation using two different datasets: FAERS and single-centre CDM laboratory data. Before detecting the signals of the drugs from the KOBIO data, we first assessed the performance of extracting known drug-ADRs. We found that several previously reported ADRs of several drugs were extracted from KOBIO; for example, acute nasopharyngitis was extracted for etanercept and golimumab, anemia for etanercept, golimumab, and tofacitinib, and hyperlipidemia for tocilizumab (see Supplementary Table 2).

The identification of the final 14 drug-ADRs as signals was mainly performed by calculation using operational definitions with statistical significance. Therefore, there is no related literature by definition, and we searched for mutations with traits similar to the relevant signal in the genes affected by each drug to seek possible associations. Tofacitinib-thrombocytosis was the only drug-ADR pair that was significant in all KOBIO, FAERS, and CDM results, and tofacitinib is a well-known drug that targets JAK 1,2,3 and TYK2. Although tofacitinib mainly inhibits JAK3 and JAK1 as its main targets, it also inhibits JAK2 to a lesser extent; therefore, it decreases hematopoiesis via thrombopoietin, erythropoietin, and GM-CSF25. However, thrombocytosis was observed in this study and when we searched in the GWAS catalog (https://www.ebi.ac.uk/gwas/), rs150221602-C, rs149757596-C, rs150221602-C, rs41215003-A, rs41316003-A, rs150221602-C, rs149757596-C, rs150221602-C, rs776830350 -?, rs150221602-?, rs77375493-T, rs41316003-A, rs41316003-A, and rs41316003-? were associated with JAK2, and mutations associated with thrombocytosis were identified, leaving room for potential drug relationships. As a subsequent process, it is crucial to determine the priorities of signals for evaluation, and they should undergo further external validation through reviews of SRSs, EHRs, literature, experimental trials, and the study of relevant mechanisms17,26,27,28.

This study shares common limitations inherent to pharmacovigilance studies, including constraints stemming from insufficient data sources, underreporting, small sample sizes, and varying report quality. In addition to these general limitations, several study-specific limitations should be noted. First, an initial constraint was our inability to integrate the datasets for TNF inhibitors in both diseases from the outset. Instead, the merging occurred at a later stage when combining them with the FAERS signal. Second, our study included only cases where the same medication was continued until the next follow-up, excluding instances of medication discontinuation or switching to another drug registered in KOBIO. This decision was based on the common practice of switching biological DMARDs or targeted therapies due to inefficacy without wash-out periods, which can complicate accurate ADR assessments. Third, for external validation, we utilized laboratory data from a single center's CDM with the MetaLAB, as the conversion of EHR data into CDM was in progress during the data acqusition period. To improve the study's overall validity and quality, future analyses should incorporate CDM data including majority of EHR data from multiple hospitals across South Korea. Furthermore, it's worth noting that for analysis purposes, we categorized drugs based on their generic names. However, it's important to acknowledge that biosimilars may not necessarily share the same adverse drug reactions (ADRs) as their originator drugs3,29.

In conclusion, we identified 14 novel drug-ADR signals of biological DMARDs and targeted therapies in the rheumatology field from KOBIO data. Further evaluation and external validation using other databases and literature should be conducted to assess the conclusive causal relationships between these drug-ADR signals.