Introduction

Large earthquakes are one of the most devastating natural hazards, not only because they cause a significant number of casualties, but also because they lead to widespread infrastructure damage. Researchers from multi-disciplinary backgrounds have been endeavouring to understand their underlying mechanisms and forecast their occurrences. Elastic rebound theory1 forms the basis of the standard earthquake cycle of strain accumulation and release. This theoretical basis, combined with analysis of long-term earthquake records, supports the proposition that the recurrence times of large earthquakes along the same fault can be modelled with renewal processes2,3,4,5,6. A renewal process is a statistical model that describes event occurrences in time, treating each new occurrence as a renewal in which the system is reset. It assumes that the times between events (the inter-event time, such as the time between two consecutive earthquake occurrences) are independent and identically distributed. Probabilistic Seismic Hazard Analysis7 frequently uses the Poisson process, which is a special type of renewal process that describes events effectively occurring randomly in time. Recent analysis of global paleoseismic records nevertheless suggests that large earthquake recurrence on individual fault segments is typically more periodic than expected from a Poisson process8,9,10. Such periodic patterns can be captured by other renewal processes, such as the Gamma renewal process. An alternative to renewal processes for modelling paleoseismic records uses a Long-Term Fault Memory Model11, which assumes that the timing of future events is dependent on not only the time elapsed since the most recent event (as in renewal models) but also the previous inter-event times. However, when considering global dataset and synthetic earthquake records there appears to be no significant correlation or anti-correlation between successive inter-event times for the vast majority of the earthquake records9,12. Therefore, in this study we only focus on renewal processes.

Previous earthquake forecasts using paleoseismic data were mainly based on one single best-model selected from a number of candidate models according to an information criterion, such as the Akaike Information Criterion (AIC13). This approach ignores the fact that there can be model uncertainty, i.e. uncertainty as to which model is best. If the AIC values for several models are not very different, it may be better to use a combination of the forecasts from those models, rather than rely solely on the forecast from the model with the lowest AIC. In real-world applications such model uncertainty may have a dominant effect over uncertainty in the estimates of the parameters in each model14,15. This becomes a serious limitation when the sample size is relatively small (only 14 out of 93 earthquake records considered in this study have more than 10 events). Failing to acknowledge model uncertainty by selecting a single best model may produce erroneous or unrealistically precise forecasts.

In this study, we use five candidate renewal processes to investigate the recurrence patterns of large earthquakes using paleoseismic records from 93 worldwide fault segments previously compiled in Griffin et al.9 and supplemented with additional records from other publications (See Data availability section). We use Bayesian model-averaging to carry out probabilistic forecasts of future large earthquakes to account for model uncertainty. We compare the forecasts from each single model with that from model averaging. We find that there is no single best model that universally describes the recurrence of large earthquakes for the 93 fault segments considered here, nor for fault segments with the same faulting styles, from the same tectonic region, or even within the same fault system. We provide the distribution of the probability that at least one large earthquake will occur in the next 50 years along each fault segment. We also carry out a leave-one-out test to compare the performance of the model-averaged forecast and the Poisson forecast, as the latter is frequently used in national hazard models7.

Results

No universal model for large earthquakes

Modelling of large-earthquake recurrence times from paleoseismic records typically either uses the BPT renewal process, because of its physical explanation of the earthquake process4,6,16, or selects the best model among a few candidate renewal processes, based on an information criterion3,17. As discussed above, inference based on a single best model ignores potential model uncertainty.

In order to minimise the impact of model uncertainty we use a model-averaging approach and consider geological and historical records of large-earthquake occurrence times from 93 worldwide fault segments (Table 1). Here, we consider a “fault segment” as a section of a fault that is recognised as being geometrically distinct, and that is likely to define the boundaries of at least some earthquake ruptures on the fault. Some models, such as the third Uniform California Earthquake Rupture Forecast (UCERF316) and the 2022 New Zealand National Seismic Hazard Model18, relax the strict fault segmentation assumption in their earthquake probability model component by considering interactions between different fault subsections. Nevertheless, for each individual fault subsection the fundamental elastic-rebound theory part of the model uses a Brownian passage-time (BPT) renewal model. Here we focus on the elastic-rebound theory part of the earthquake probability model component, and thus do not consider interactions between fault segments. Once this layer of modelling is improved, interactions between subsections can be considered in order to forecast large earthquakes that may rupture multiple fault subsections and/or faults. Improved forecasts from this layer will strengthen a holistic hazard model that includes fault models, deformation models, earthquake rate models, and earthquake probability models.

Table 1 Probability of at least one event occurring in the next 50 years from the 93 fault segments

We select fault segments with records of at least five large earthquakes in order to be able to fit two-parameter models to the inter-event times. We only consider the occurrence times of earthquakes that left detectable geological evidence9. Earthquakes that leave a geological signature are assumed to be large and significant for seismic hazard, but we do not explicitly consider their magnitudes, or the magnitude distribution (i.e. characteristic vs Gutenberg-Richter) in our model. The number of large earthquakes in the paleoseismic record of each fault segment is small, with a maximum of 35 events for the Chile Megathrust19, only 3 fault segments having more than 20 events, and 14 having more than 10 events. The measurement errors associated with most dated earthquake ages are large, resulting in 1σ uncertainties of 25 ± 12% for inter-event times in the data considered here. To maximise the reliability from the dataset of all event ages and to make good use of the measurement errors, we simulate 100 Monte Carlo (MC) samples from the empirical distribution of the occurrence times for each fault segment provided in the literature. If any earthquake has more than one dated age published in the literature, we take the average of the different age distributions and sample from this averaged distribution.

The 100 MC samples of occurrence times for each fault segment form each dataset. We fit five different renewal processes to each dataset, including the Poisson process, and the Gamma, Weibull, BPT and lognormal renewal processes (see Methods). In each model, we allow the parameters to vary for different MC samples from the same fault, which captures the similarities between the 100 MC samples. We use a Markov Chain Monte Carlo (MCMC) algorithm to generate samples from the joint posterior distribution of all the model parameters for each model. These posterior samples of parameters are then used to simulate future earthquake occurrence times from each fault, thereby providing us with forecasts from each model.

For each model fitted to each fault segment, we calculate the Watanabe-Akaike Information Criterion (WAIC20), a measure of the prediction performance of a model in Bayesian analysis. For model-averaging, we calculate the WAIC weight15 for each model for all 93 fault segments (see Methods), as shown in Fig. 1a. Model-averaged forecasts of future earthquakes for each fault are obtained by combining the forecasts from each individual model using these WAIC weights (see Methods). For a fault segment with any single model having weight ≥0.95, the model-averaged forecast is very close to the forecast from that model (i.e., the single-best model), as we would expect. The Weibull model best fits 41 (44.1%) fault segments, the Gamma model best fits 5 (5.4%) fault segments, the BPT model best fits 4 (4.3%) fault segments, and the lognormal model best fits 15 (16.1%) fault segments. The Poisson model has no weight greater than 0.7. The remaining 28 (30.1%) fault segments have weights < 0.95 for each single model, suggesting that model-averaging will be most beneficial for those fault segments.

Fig. 1: The Watanabe-Akaike Information Criterion (WAIC) weight for each model for all 93 fault segments.
figure 1

a Model weights for each of the 93 fault segments. Frequency (b) and proportions (c) of preferred models against number of events recorded at a fault segment. BPT Brownian passage-time, MA model-averaging.

WAIC weights provide a numerical comparison of the amount by which a model is better at prediction than another, as they show how much weight should be given to the prediction from each of these models when calculating a model-averaged prediction. Based on the WAIC weights in Fig. 1a, we can see that the predictive performance of the Weibull model is not uniformly better than the others. For 33 fault segments, the WAIC weight for the Weibull model is close to 0, which suggests that for these fault segments the estimated predictive accuracy of the other models is much better than that of the Weibull model. Among the 11 San Andreas Fault segments we find three different best-fit models. Although they all suggest that every segment exhibits quasi-periodic behaviour (see discussion later), segments with larger sample sizes were best fit by a Weibull model. It is unclear if this is simply an outcome of sampling or due to real differences in recurrence behaviour between the different segments, including how neighbouring fault segments interact. Variability in observed earthquake recurrence behaviour at paleoseismic sites on the San Andreas Fault has been proposed to be (at least partially) due to overlap of ruptures occurring on neighbouring segments21 and this has been supported by studies using earthquake simulators22. In contrast, relatively strong quasi-periodic recurrence on the Alpine Fault has been attributed to its geometric simplicity and relative isolation from other faults23, although its fault geometry has also been invoked to explain the variability in earthquake inter-event times24. The persistence (or otherwise) of rupture barriers between segments25 may also be a significant factor controlling the distribution of inter-event times observed on a fault segment. Therefore, because of the limitations of the available data, even within a single well-studied fault system, we cannot use a universal single best model, and it is not clear that one exists.

Best-model frequencies in Fig. 1b, c indicate that the Weibull renewal process is more likely to best reproduce recurrence times for fault segments with longer records compared to the BPT renewal process. The fault segments that have the Weibull renewal process as the single-best model have on average about 1.59 (95% CI 1.10–2.47) times more large earthquakes than those for which the BPT renewal process is the single-best model. The fault segments that have the BPT renewal process as the single-best model all have fewer than 10 events in the paleoseismic records. When we removed the last event in each record, in order to carry out retrospective forecasts (see section Assessment of Prediction Error), the fault segments with more than 15 events all have the Weibull renewal process as the single-best model; only one fault segment with 12 events has the BPT renewal process as the single-best model. It is therefore difficult to determine whether the dependence of the favoured model on sample size is due to real differences in the best model or simply an outcome of the small amount of data.

Previous studies26,27,28 have shown that the standard deviations of the scaled (divided by the mean value) inter-event times along several fault segments appear to be constant. For each of the 93 fault segments, we calculated the scaled inter-event times for each MC sample, and then reported the median and the 2.5% and 97.5% quantiles of the standard deviations of the scaled inter-event times calculated for the 100 MC samples. Fault segments with higher rates of earthquake occurrence appear to have smaller standard deviations of the scaled inter-event times (Supplementary Fig. 1 (e)). The median standard deviations of the scaled inter-event times for the 41 fault segments that were best fit by a Weibull model are all smaller than 0.8 except for one case. In contrast, more than half of the median standard deviations of the scaled inter-event times for the 15 fault segments that were best fit by a lognormal model are larger than 0.8 (Supplementary Fig. 1a, d). The large median standard deviations are related to fault segments that have inter-event times with long-tailed distributions which result in a large spread of data values. These long-tailed distributions are best fit by a lognormal renewal process.

Probabilistic forecasting

The estimated probability that at least one large earthquake will occur in the next 50 years along each fault segment is shown in Fig. 2 and Table 1. The highest probability of a large earthquake in the next 50 years appears to be along the Carrizo and Wrightwood segments of the San Andreas Fault, and the Mentawai Segment of the Sumatra megathrust, with median probabilities over 50%. The upper limit of the 95% credible interval (CI) of this probability exceeds 75% for the Wrightwood segment of the San Andreas Fault. These results are consistent with previous suggestions that these faults are in or approaching the beginning of a new seismic cycle29,30,31,32,33. Other sites with high probabilities of rupture (>20%) in the next 50 years include the Alpine Fault, the Chile Margin, Hayward Fault, most other segments of the San Andreas Fault, the San Jacinto Fault, and certain segments of the Dead Sea Transform and North Anatolian Fault (Table 1). If one fault segment has a high probability of rupture, it may affect the probability of rupture at a neighbouring fault segment. This is not built in our method for the global data analysis, but can be considered for a particular local fault system. The lowest probabilities of a large earthquake in the next 50 years are along the Kusaihu segment of the East Kunlun Fault, the Futugawa Fault, the Eastern Piedmont segment of the Helanshan Fault, and the Laohushan segment of the QilianshanFault, with the upper limit of the 95% credible interval of these probabilities all being lower than 0.001%. On average, faults along a plate boundary are about 32 (95% CI 9–125) times more likely to have a large earthquake in the next 50 years than intraplate faults.

Fig. 2: Forecast probability of an event occurring in the next 50 years for the 93 fault segments.
figure 2

The values are the medians of the posterior forecast probabilities. a World map; b San Andreas fault segments and surroundings; c central China; d New Zealand. Scale bars in bd are approximate. ArcGIS software by Esri was used to create the map. Basemap data sources: ETOPO elevation model (ETOPO 2022, 60 Arc-Second Resolution, Bedrock elevation geotiff)50, GNS Science, Natural Earth, USGS. Map projections are WGS 1984 Web Mercator (auxiliary sphere), WKID 1857.

Model-averaging and best-model approaches can give significantly different forecast probabilities for an earthquake occurring within the next 50 years. The maximum difference in the forecast probability between the two approaches is about 14% for the Thousand Palms segment of the San Andreas Fault, where the model-averaging approach gives a higher probability than the best-model approach (Table 1). Having said that, for about 90% of the fault segments, this difference is less than 0.1%. When considering the forecast occurrence times, for the 28 fault segments where the single-best model has a WAIC weight less than 0.95, the two approaches give quite different results. The maximum difference between the credible intervals obtained using the model-averaging and single-best model approaches is over 10,000 years, and the difference is over 50 years for 43% of the 28 fault segments that benefited from a model-averaging approach. Figure 3 shows the forecast occurrence time of the next large earthquake along each fault using the model-averaging approach, while Table 1 lists the probabilities of a large earthquake occurring within the next 50 years. A comparison of the forecasts from different models is in the supplement (Supplementary Figs. 24).

Fig. 3: Forecast occurrence times of the next large earthquake for the 93 fault segments.
figure 3

The x-axis is in years CE.

A quantitative comparison of our probabilistic forecasts with those published in other studies is challenging (Supplementary Data 1). Reports of the mean recurrence interval estimates are equivalent to forecasting the next occurrence time using a Poisson process (this comparison is in Supplementary Figs. 24). A meaningful comparison is difficult when the forecast of the probability of the next earthquake occurrence is specified in terms of a start date and a fixed time period, as these vary between studies. For example, UCERF3 provides 30 year probabilities from 2014 CE aggregated by parent fault section16, whereas we provide 50 year probabilities from 2022 CE on each fault segment. The UCERF3 forecasts for the San Andreas fault segments all have similar mean values and overlapping ranges, whereas our forecast probabilities show greater variability between segments, with several non-overlapping credible intervals (Supplementary Fig. 5). We found previous forecasts for another eight fault segments in our study. All except two of our forecasts overlap within uncertainties with previous values, although our uncertainty bounds are typically narrower (Supplementary Fig. 5). Having said that, UCERF3 provides the ranges of forecast probabilities rather than 95% confidence intervals.

Clustering or periodic behaviour

For a Gamma or Weibull renewal process, if the estimated shape parameter α < 1, then the process tends to have clustering behaviour, while if α > 1, the process tends to have quasi-periodic behaviour, with larger α suggesting more periodic behaviour. For a BPT renewal process with probability density function as defined in Eq (3), smaller β (for β < 1) suggests a more symmetrical probability density function of inter-event times and hence more periodic behaviour. A lognormal renewal process describes quasi-periodic behaviour, with a smaller standard deviation σ corresponding to more periodic behaviour.

Based on the parameter estimates from the Gamma and Weibull renewal processes, five fault segments appear to show clustering behaviour, with the upper 95% credible limit of the shape parameter from each of these models being less than 1. These are Cadell, Dunstan, Lake Edgar, Solitario Canyon, and Waitangi, all of which have low earthquake occurrence rates. For Waitangi, the BPT renewal process has a WAIC weight over 0.95, and the estimates of β are over 2.5, confirming a clustering behaviour. For Lake Edgar, the Gamma renewal process has a WAIC weight over 0.95. Six fault segments appear to demonstrate near Poisson behaviour, Dead Sea Beteiha, Dead Sea Qatar, Dead Sea Taybeh, Langshan Piedmont Xibulong East, Reelfoot, and Wharekuri, with the 95% credible interval of the shape parameter for both the Gamma and Weibull model containing 1. The remaining 82 fault segments show quasi-periodic behaviour. These results are consistent with a previous study9 that used different tests to check quasi-periodic recurrence behaviour.

A regression model for the relationship between the shape parameter of a Weibull renewal process and the earthquake rate, tectonic setting, faulting type, and the number of earthquakes of each of the 93 fault segments (Fig. 4) suggests that when the earthquake rate increases 10 fold (e.g., from 0.0001 to 0.001), the shape parameter of the Weibull renewal process increases by about 24% (95% CI: 5%–45%). This suggests that fault segments with higher earthquake rates tend to have more periodic behaviour. The shape parameter of the Weibull renewal process for fault segments located in a stable continental intraplate setting is about 87% (95% CI: 51%–148%) of that for fault segments located at or near a plate boundary. Although the 95% CI is wide and covers 100%, the posterior density plot (Supplementary Fig. 6) suggests that there is a high probability (0.7) that the latter may be more periodic than the former. There are only five fault segments from a stable continental intraplate setting, so to reach a more robust conclusion, more data from intraplate fault segments are needed. These findings are consistent with and nuancing those from past studies9,10. The shape parameter of the Weibull renewal process for fault segments located in an active intraplate setting (predominantly faults in central China) is about 1.4 (95% CI: 1.1–1.9) times that for fault segments located at or near plate boundary, suggesting that the former appear to be more periodic than the latter. The shape parameter of the Weibull renewal process for fault segments located in a subduction region is about 1.9 (95% CI: 1.1–3.3) times that for fault segments located at or near plate boundary, suggesting that the former appear to be more periodic than the latter. On average, the Weibull shape parameter for reverse faults is about 69% (95% CI: 46%–104%) of that for normal faults, suggesting that large earthquakes recur more periodically on normal faults than on reverse faults. Note that obtaining long earthquake records for reverse faults is often more difficult than for normal faults, due to progressive burial of the evidence for previous earthquakes. With more data becoming available in the future, one could investigate whether variations in strain rate or kinematics have a significant impact on record length and event preservation.

Fig. 4: Relationship between the Weibull shape parameters and the fault characteristics.
figure 4

The x axis is earthquake rate per year (on log10 scale) estimated from a Poisson process. The y axis is the mean of the log shape parameter of the Weibull renewal process fitted to each fault segment. The larger the shape parameter, the more periodic the behaviour. Note that some of the pattern observed will be influenced by the number of earthquakes along each fault segment, which is not plotted.

The likelihood of systematically missing very short inter-event times in the paleoseismic records and hence biasing our analysis is low. If fault re-rupture commonly occurred shortly after previous earthquakes, then we would expect to see this frequently in the historical record of surface rupturing earthquakes, which we largely do not34. While preservation of paleoseismic evidence is an important consideration in the interpretation of any paleoseismic record, the characteristics of the geomorphic and geological setting, and the relative rate of tectonic processes to non-tectonic geomorphic processes, will control whether evidence of past earthquakes is preserved35. Ideally, paleoseismic studies consider these factors in site selection and interpretation; while it is acknowledged that missed events are a possibility, it is not clear that this should lead to a systematic bias in our statistics towards or away from periodicity.

Assessment of prediction error

For each fault, we removed the last event in the record in order to carry out retrospective forecasts using both the model-averaging and single-best model approaches. The single-best model remained the same as that for the full dataset for 49% of the fault segments (46 out of the 93). It appears that the single-best model is more likely to change for fault segments with fewer recorded earthquakes. The number of earthquakes along fault segments for which the single-best model changed is about 27% (95% CI: 11%–39%) fewer than for those fault segments for which the single-best model remained the same. This demonstrates the large model uncertainties for paleoearthquake data, again suggesting that a model-averaging approach is preferable.

Figure 5 shows the 95% credible intervals of the forecast of the last earthquake occurrence time, with 0 representing the mean of the recorded last earthquake occurrence time. Out of the 93 fault segments, the model-averaged forecast successfully covered 79 of the mean true occurrence times. The forecasts from the Poisson process successfully covered 89 of the mean true occurrence times, which at first glance may suggest that it outperforms the model-averaged forecasts. However, it does this by having much wider credible intervals (on average twice as wide as the credible interval from the model-averaged forecast); i.e. it has much more uncertainty in the forecast. When examined in more detail, we see that model-averaged forecasts routinely outperformed Poisson process forecasts, provided that there were sufficient events left in the record. Specifically, about 83% (77 of the 93 fault segments) of the model-averaged forecasts have much smaller mean squared errors (MSEs) than the forecasts from the Poisson process (Supplementary Fig. 7). MSE is the average squared difference between the forecast value and the true value, which is equal to the sum of the variance and the bias squared, and provides a measure of the trade-off between accuracy and precision. For about half of the fault segments (45 of the 93), the MSEs of the forecasts from the Poisson process are more than twice of that from the model-averaged forecasts (Supplementary Fig. 8). The 14 fault segments for which the model-averaged forecast 95% credible interval didn’t cover the true mean were characterised by few events being left in the record: 6 had only 4 events left in the record and thus too few to fit models with more than two parameters, while a further 7 fault segments had fewer than 7 events. Even though in some situations with a small number of events in the record, the less informative, more uncertain Poisson-based forecasts seem to cover the true value, the majority of fault segments with small numbers of events are still better represented by the model-averaging approach (e.g., 23 out of the 30 fault segments which had 4 events in the retrospective forecasts have model-averaged forecasts with much smaller MSE than the Poisson forecasts). It is anticipated that for most hazard modelling purposes the smaller errors associated with the model-averaged forecasts favour their use. Having said this, the Poisson process may still be a valuable model when limited data are available, which is the case for many fault segments that are not included in this study because they have less than five events in the record.

Fig. 5: Retrospective forecast of the occurrence time of the last earthquake.
figure 5

Fault ID is numbered as per the list in Table 1. The forecasts from the model-averaging (MA) approach and the Poisson process are presented here. Markers show median and 95% credible intervals of the forecasts. We centred the estimated values by subtracting the mean occurrence time of the last earthquake in the paleoseismic records. 95% CI 95% credible interval.

The MSEs of the retrospective forecasts from the single-best model approach are very close to those from the model-averaging approach for the majority of the fault segments (Supplementary Fig. 9), all within two times relative difference. Model-averaging with WAIC weights is not usually designed to achieve a better MSE than the single best-model approach. However, when there is some uncertainty as to the best model, model-averaging outperforms a single best-model primarily in terms of better representing all the uncertainties.

The Bayesian model-averaging approach presented here explicitly considers model uncertainty based on the data and associated measurement errors, rather than relying on selection of a best model. Retrospective testing shows that model-averaging provides more informative and accurate forecasts compared with a single-best model approach or assuming a Poisson process (i.e., random earthquake recurrence). The earthquake probabilities presented in this study also provide a testable hypothesis of future earthquake occurrence that can be evaluated at a global scale.

Methods

Models

For each fault segment, we obtain 100 sequences of Monte Carlo (MC) samples for the ages of the sequence of large earthquakes in the paleoseismic record (see Data and Resources section). Each sequence of MC samples is then considered a realisation of the occurrence times of large earthquakes along that fault9,36. We denote them by \({t}_{k0} \, < \, {t}_{k1} \, < \, \ldots \, < \, {t}_{k{N}_{i}}\le T\), where k = 1, 2, …, 100 denotes the kth MC sample, Ni denotes the number of earthquakes in the record for the ith fault segment with i = 1, 2, …, 93, and T denotes the censoring time which we take as the year 2022. The inter-event times for the kth MC sample of the ith fault are then \({x}_{k1}={t}_{k1}-{t}_{k0},\ldots,\, {x}_{k{N}_{i}}={t}_{k{N}_{i}}-{t}_{k({N}_{i}-1)}\).

For each earthquake record, we fit the following five models. The first is a Poisson process with occurrence rate Zkλ for the kth sequence of MC samples, where Zk is a random variable (k = 1,   , 100) capturing the similarities between the 100 MC samples for each fault. The second model is a Gamma renewal process with the inter-event times for the kth sequence of MC samples following a Gamma distribution with probability density function

$$f(x;\alpha,\lambda,{Z}_{k},{Y}_{k})=\frac{1}{\Gamma ({Z}_{k}\alpha )}{({Y}_{k}\lambda )}^{{Z}_{k}\alpha }{x}^{{Z}_{k}\alpha -1}\exp \{-{Y}_{k}\lambda \, x\}$$
(1)

where Zkα and Ykλ are shape and rate parameters, with Zk and Yk being random variables (k = 1,   , 100) capturing the similarities between the 100 MC samples for each fault. The third model is a Weibull renewal process with inter-event times for the kth sequence of MC samples following a Weibull distribution with probability density function

$$f(x; \alpha,\lambda,{Z}_{k},{Y_{k}})={Z}_{k}\alpha {({Y}_{k}\lambda )}^{{Z}_{k} \alpha }{x}^{{Z}_{k}\alpha -1} \exp \{-{({Y}_{k} \lambda x)}^{{Z}_{k} \alpha }\}$$
(2)

where Zkα and Ykλ are shape and rate parameters. The fourth model is a Brownian Passage-Time (BPT, also called inverse Gaussian) renewal process with inter-event times for the kth sequence of MC samples following a BPT distribution with probability density function

$$f(x;\mu,\beta,{Z}_{k},{Y}_{k})=\sqrt{\frac{{Z}_{k}\mu }{2\pi {({Y}_{k}\beta )}^{2}{x}^{3}}}\exp \left\{-\frac{{(x-{Z}_{k}\mu )}^{2}}{2{Z}_{k}\mu {({Y}_{k}\beta )}^{2}x}\right\}$$
(3)

where Zkμ and Ykβ are the mean and coefficient of variation of the distribution. The fifth model is a lognormal renewal process with inter-event times for the kth sequence of MC samples following a lognormal distribution with mean Zk + μ and standard deviation Ykσ, both on the log scale.

Estimation

Given the kth MC sample of the earthquake occurrence times \({t}_{k0},\, {t}_{k1},\ldots,\, {t}_{k{N}_{i}}\) along the ith fault, with final censoring time T, the likelihood of the kth MC sample for each model is

$$L({\theta }_{k};{t}_{k1},\ldots,\, {t}_{k{N}_{i}},T)=(1-F(T-{t}_{k{N}_{i}}\!;{\theta }_{k}))\mathop{\prod }\nolimits_{j=1}^{{N}_{i}}f({t}_{kj}-{t}_{k(j-1)}\!;{\theta }_{k}),$$
(4)

where k = 1,   , 100, θk = (λ, Zk) for the Poisson process, θk = (α, λ, Zk, Yk) for the Gamma and Weibull renewal processes, θk = ( μ, β, Zk, Yk) for the BPT renewal process, and θk = (μ, σ, Zk, Yk) for the lognormal renewal process

The different MC samples from the same fault should have similar recurrence patterns. To reflect this, we assume that both Yk and Zk follow a distribution with mean 1, i.e.,

$${Y}_{k} \sim Gamma(1/{\sigma }_{Y}^{2},\, 1/{\sigma }_{Y}^{2}),\quad \quad {Z}_{k} \sim Gamma(1/{\sigma }_{Z}^{2},\, 1/{\sigma }_{Z}^{2}),$$
(5)

where σY and σZ are the standard deviations of Yk and Zk, respectively.

A Markov Chain Monte Carlo (MCMC) algorithm generates samples from the joint posterior distribution of θk, σY and σZ given the occurrence times \({t}_{k0},\, {t}_{k1},\ldots,\, {t}_{k{N}_{i}}\) from the kth MC sample and the censored time T, using software JAGS and the R2jags package in R37. We use three chains, half-normal priors for α and β, and weakly informative half-t prior distributions for the variance parameters σY and σZ38, i.e.

$$\begin{array}{rcl}\alpha \sim N(0,\, 10{0}^{2})T(0,),\qquad &&\lambda \sim N(0,\, 10{0}^{2})T(0,),\\ \mu \sim N(0,\, 10{0}^{2})T(0,),\qquad &&\beta \sim dt(0,\, 0.04,\, 3)T(0,),\\ \sigma \sim dt(0,\, 0.04,\, 3)T(0,),\qquad &&{\sigma }_{Y} \sim dt(0,\, 0.04,3)T(0,),\\ {\sigma }_{Z} \sim dt(0,\, 0.04,\, 3)T(0,).\qquad &&\end{array}$$

For the MCMC algorithm, we use three chains with 5,010,000 iterations, discarding the first 10,000 iterations as burn-in, and use a thinning rate of 1000. The scale reduction factors of the Gelman-Rubin convergence diagnostic are all less than 1.02, indicating convergence39,40.

Model-averaged forecasts

To calculate Bayesian model-averaged forecasts, we combine the posterior distributions of the forecast under each model using model weights. We use prediction-based Bayesian model-averaging (PBMA)15 with the model weights calculated using the Watanabe-Akaike Information Criterion (WAIC20). This is far less sensitive to the priors for the parameters than classical Bayesian model-averaging (CBMA), which uses posterior model probabilities15. PBMA is sometimes referred to as Bayesian model combination41. Unlike CBMA, it does not involve the assumption that one of the models is true. The WAIC for model k is calculated as

$${{{\mbox{WAIC}}}}_{k}=-2\mathop{\sum }\limits_{i=1}^{n}\log (p({y}_{i}| y,\, k))+2{p}_{k}$$
(6)

where p( yiy, k) is the pointwise posterior predictive density from model k, which can be estimated using the mean of the posterior MCMC sample of p( yiθk, y, k) for model k; and pk is a correction for overfitting. A common choice for pk is

$${p}_{k}=\mathop{\sum }\limits_{i=1}^{n}\,{{\mbox{var}}}\,\left\{\log p(\, {y}_{i}| {\theta }_{k},\, y,\, k)\right\},$$
(7)

where each term in the summation can be estimated by taking the variance of the posterior MCMC sample of \(\log p(\,{y}_{i}| {\theta }_{k},\, y,\, k)\) for model k. The WAIC weight for model k is given by

$$p(k| \,y)\propto \exp \left[-({{{\mbox{WAIC}}}}_{k}-\mathop{\min }\limits_{i}{{{\mbox{WAIC}}}}_{i})/2\right].$$
(8)

WAIC is a prediction-based criterion, analogous to AIC in the non-Bayesian setting, and use of Eq. (8) to define the model weights is motivated by the form of AIC weights42. Alternative approaches to prediction-based model averaging in seismology have been proposed43,44. We prefer to make use of WAIC weights for the following reasons. Model selection using WAIC has the desirable property, in large samples, of being equivalent to Bayesian leave-one-out cross-validation (B-LOO)45. When B-LOO is used in model averaging it is known as Bayesian stacking, and has the useful property that, for large samples, it leads to the best linear combination of the posterior distributions of the forecasts from each model, whereas CBMA will lead to use of the posterior distribution of the forecast from the single best model41,42,43 (which is why some authors refer to CBMA as a tool for model selection41). We would therefore expect WAIC weights to provide a close-to-optimal linear combination of the posterior distributions of the forecasts from each model, whilst being much less computationally-intensive than Bayesian stacking.

For each model, we can obtain a posterior MCMC sample of the forecast quantity: either the forecast occurrence time of the next large earthquake or the forecast probability of at least one large earthquake occurring in the next 50 years along the specified fault segment. These forecasts are conditioned on the fact that there was no large earthquake between the last large earthquake occurrence time and the year 2022. The five posterior MCMC samples of the forecast quantity were combined into one model-averaged posterior sample by randomly taking the value from one of the five posterior samples at each iteration. The probability weights for the random sampling are the WAIC weights for the corresponding five models.