Forecasting seizures based on electroencephalogram data is a well-studied problem1. In practice, however, it is challenging for patients to wear electroencephalogram devices consistently in their daily lives. Therefore, the idea of utilizing peripheral wearable devices to predict the onset of seizures has emerged2,3.

In this context, a device is wrist-worn and equipped with sensors measuring various physiological signals such as heart rate, movements, temperature and skin conductance (Fig. 1). Although wearable sensors provide valuable indications of an impending seizure, interpreting these data is challenging due to the presence of background activity and noise unrelated to seizure potential. Additionally, seizures themselves can be rare events, resulting in an imbalanced dataset with <1% positively labeled samples. To address these challenges, and explore the ability of wearable devices to predict seizures and advance the state of the art, the “My Seizure Gauge” challenge was organized.

Fig. 1: Photo of the wearable device used in the Challenge and a visualization of the data processing pipeline.
figure 1

The right portion of the figure was reproduced from ref. 2 under CC BY 4.0.

The challenge aimed to develop clinically relevant seizure forecasting solutions that could potentially guide targeted therapy and help patients better manage their activities to reduce the impact of seizures. Challenge participants had access to a publicly available dataset, split into 10-minute data segments leading up to a current time point (there were 130,000 samples in total). Evaluation was based on the Eval.ai Webplatform accepting the solution’s code and evaluating its performance on a private test dataset. The test dataset included six patients, and the submitted solutions were ranked according to their averaged area under the curve of the receiver operating characteristic (AUCROC) scores obtained from predictions indicating the probability of a seizure (AUCROC is a measure of binary classification performance that ranges from 0.5 for random performance to 1 for perfect prediction). These consecutive 10-minute segments spanned a 6–12-month period of patients living at home and have a ground truth based on simultaneously recorded electroencephalograms.

The challenge attracted great interest, receiving about 100 submissions from teams around the world. The top entries came from Chemnitz University of Technology (Germany) and associates, followed by independent data scientists and a team from Rice University (USA). Here, we briefly highlight the key insights and lessons from the organizers and the winning team.

From the organizers’ perspective, the results of the challenge were encouraging as it generated interest from a wide variety of academic backgrounds. As in previous research4, patient-specific modeling and use of temporal context such as the time-of-day as predictors were necessary for high performance. However, many submissions used more complex models than had been previously explored by the organizers. Forecasting from non-cerebral signals is an emerging area, and this contest, as well as previous challenges (references on Epilepsy Ecosystem web page, https://www.epilepsyecosystem.org/), have given us great hope for improved forecasting solutions. We hope that future challenges can be held with larger numbers of patients to better validate the proposed solutions.

Switching to the winning team’s perspective, the My Seizure Gauge task was challenging for established time-series classification techniques such as hand-crafted feature engineering or models from the field of deep learning. The key ingredients were leveraging temporal knowledge3 and patient-specific learning. The winning solution leverages hyperdimensional computing5,6, a computational paradigm using high-dimensional vectors and algebraic operators for symbolic computations. It allows the modeling of explicit knowledge in vector representations. In the solution of the winning team, it combines spatiotemporal features extracted using the MiniROCKET7 technique with a similarity-preserving encoding (such as HDC-MiniROCKET8) reflecting the time of day. The resulting encoding, in combination with a one-layer regularized feedforward neural network, is simple but performs well. To avoid overfitting and compensate for the imbalance of the labels, which is caused by the rarity of seizures, we used data augmentation (additive Gaussian noise to the encodings), regularization strategies (L2 norm penalty) and oversampling (repeated use of the few positive samples with data augmentation) in training the network. Additionally, we observed a high variance in scoring results for validation data splits based on the training data and scoring on the private test set provided by the organizers.

The AUCROC score of the winning submission on the test data was 0.78, indicating acceptable discrimination (according to ref. 9). The inclusion of the temporal knowledge, however, increased the risk of overfitting and subsequent false positives, where seizure predictions were primarily based on time of day rather than on physiological values. By design, the AUCROC metric is insensitive to the imbalanced outcome data, so it does not reflect the high false positive count arising from setting a high true positive rate. This observation might call for revisiting the performance metrics to be used for future challenges. Alternatives include the area under the precision–recall curve and the Matthews correlation coefficient10.

In addition to their winning submission, the team also explored alternative strategies, including transformer-like models with attention mechanisms and the incorporation of consecutive data samples to extend historical context. However, these performed less well than the winning submission and overfit due to the high number of trainable parameters.

In summary, this challenge represents the first crowdsourced confirmation that forecasting seizures from non-cerebral signals captured by non-invasive wearable devices is feasible. It sets a baseline for future work by underscoring that a complex one-size-fits-all approach such as a deep learning model designed for classifying time series can be outperformed by simpler, custom-tailored approaches for specific tasks like seizure prediction. Furthermore, a well-balanced combination of machine learning and hand-crafted feature engineering such as incorporating domain knowledge (temporal context) can lead to powerful solutions.

For future work, the remaining challenges are reducing the false positives due to imbalanced data, and increasing the contribution from the physiological signals relative to the time of day. We believe that organizing future challenges will help address these issues and encourage more researchers to investigate this important problem.