Introduction

Hypertension is among the most prevalent lifestyle-related diseases and is historically recognized as a significant risk factor for cardiovascular events, ultimately contributing to premature mortality1,2. Hypertension typically presents without noticeable symptoms, leading to a lack of awareness among patients about their high BP. Consequently, the disease may progress unnoticed. It is estimated that only one-third to one-fifth of all affected individuals successfully manage their blood pressure (BP)3,4.

The primary factors contributing to the prevalence of undiagnosed hypertension are often a lack of awareness and insufficient screening. Additionally, the poor management of blood pressure (BP) in hypertensive patients is largely due to poor patient adherence to treatment protocols and the high cost of BP monitoring devices. Continuous monitoring of BP is crucial for the effective management of hypertension. To date, BP measurement has relied on contact-based methodologies5, and cuff-based methods for measuring BP are subject to limitations, including issues related to patient tolerance and adherence.

To address problems of adherence, the prospect of BP measurement via smartphones has garnered attention, positioning them as devices capable of assessing BP ubiquitously6. Lawrence et al.7 highlighted photoplethysmography (PPG) as an economical vascular optical technique capable of detecting fluctuations in blood volume within the tissue's microvascular bed during each heartbeat. PPG is typically employed non-invasively on the skin's surface and can be adapted for non-contact measurements8. Notably, Bousefsaf et al. indicated that the synergy between PPG imaging and artificial intelligence (AI) achieved an estimation accuracy of 89.6% for systolic BP values, with a margin of error below 15 mmHg9. Studies on BP measurement using smartphones have reported both contact10 and non-contact11 estimation methods via a smartphone camera. Luo et al. and Jeong et al. have demonstrated the feasibility of non-contact, high-precision BP measurement through PPG of the skin on the finger; however, it is important to note that their studies were limited to adult subjects with normal BP, excluding hypertensive patients11,12. Consequently, the methodologies related to BP measurement via smartphones, which encompass data collection methods, reference measurements, and validation techniques, have yet to be established for broad clinical applicability8,9,13. Moreover, BP monitoring through PPG necessitates the acquisition of specialized equipment, such as smartphones. In low- and middle-income countries, merely one out of ten hypertensive individuals report having their BP adequately managed14,15. The financial burden associated with procuring such devices represents a significant obstacle.

In recent years, there has been an increasing interest in the estimation of vital signs through noncontact methods using cameras. These methods are not only aimed at enhancing adherence and reducing costs but also at decreasing labor costs for medical personnel, minimizing the risk of infection, and enabling the rapid detection of changes in patients’ conditions6,11,16,17. Despite ongoing research into non-contact BP measurement via high-speed cameras and smartphones12, a reliable method for achieving this has yet to be established18.

With the above in mind and the goal of establishing a non-contact BP measurement system, we conducted a prospective clinical study using a high-speed target spectral camera. Fan et al. employed such a camera operating at a frame rate of 30 frames per second (fps) to collect data on the pulse wave arrival time between the face and hands, thereby estimating BP through a non-contact methodology6. In our study, we used a spectral camera of 150 fps in accordance with this method. Our study was conducted in 127 patients with hypertension and 88 healthy individuals. BP was predicted based on the pulse wave transmission time (PTT) value, defined as the duration required for the pulse wave to propagate to a target site, such as the fingertip, following the ejection of arterial blood from the left ventricle.

Methods

Prospective study for the construction of a machine learning model

Two hundred and fifteen patients aged 20 years or older who were either hospitalized or receiving care from the Department of Cardiology at the University of Tokyo Hospital, as well as individuals undergoing physical examinations at the Center for Preventive Medicine, were enrolled. Those participants gave their written consent for the collection of spectral data from their faces and palms using a target spectral camera. The imaging was performed under ambient temperature and room lighting in the hospital. BP for calibration was measured using a CNAPmonitor500 (CNSystems, Austria, Safety Class II (IEC60601)) pulse from a finger cuff that calculated the BP from the difference in pulse wave propagation time19.

CNAP calibration process

The CNAPmonitor500 was calibrated before each session to ensure accurate BP measurements. Calibration involved standard procedures recommended by the manufacturer. The time interval between the calibration of the CNAPmonitor500 and the BP estimation by the camera ranged from 1 to 5 min. This brief interval helped ensure that the calibration remained relevant during the camera-based measurements.

Target spectral camera

The target spectral camera used for data acquisition had three wavelengths, a resolution of 8-bit 640 × 480 pixels, and a shutter speed of 150 fps. The three wavelengths used were ones at which RGB can be captured, but the manufacturer, EBA Japan, has not disclosed the specific number of wavelengths. The RGB signal data captured by this camera was converted to wavelengths, and features were calculated from the peaks.

Data collection protocol

Participants' BP was continuously monitored using the CNAPmonitor500 while their faces and palms were recorded using a high-speed target spectral camera. This dual setup allowed us to compare the BP readings from the CNAPmonitor500 with the data extracted from the camera images.

Data collection from healthy individuals and patients with hypertension

While measuring BP for 30 s, as shown in Fig. 1a, the target spectral camera captured images of the subjects' faces and palms, and machine learning attempted to determine hypertension from the blood pressure and target spectrum camera data. Of the 1-min data obtained from the target spectrum camera, 30 s of data were captured in MATLAB (Mathematical Calculation Software), and 30 ROIs were set, as shown in Fig. 1b. As shown in Fig. 2, the average value of the 3-wavelength signal was obtained from each of the 30 ROIs, and features were extracted and analyzed from the differences in pulse wave propagation time20 between the face and the palm of the hand.

Fig. 1
figure 1

Data acquisition and spectral reflectance analysis. (A) Subjects’ palms and faces were imaged using a high-speed target spectral camera concurrently with the measurement of blood pressure (BP) using a continuous BP monitor (CNAP). The distance between the camera and the subject was maintained within the range of 50 to 60 cm. Illumination was provided solely by the standard lighting fixtures available in a hospital room. (B) The average spectral reflectance was obtained from 22 regions of interest (ROIs) on the face and 8 ROIs on the palm. ROIs were selected to exclude areas with hair, eyebrows, and shadows, to avoid the inclusion of unclear signals.

Fig. 2
figure 2

(A) Signal peak detection. Representative signal peak detection from each region of interest (ROI). Peaks were identified from 5 or 30 s of video data for each ROI. (B) Pulse-Related Waveform Estimation. The average intensities in the green channel were used as the input signal to identify the pulse-related waves. The input signal underwent initial filtering to eliminate high-frequency noise and baseline wander. (C) Subsequently, it was decomposed into multiple intrinsic mode functions (IMFs) using Ensemble Empirical Mode Decomposition (EEMD). (D) The pulse-related waveform was then reconstructed from the selected IMFs. (E) Pulse Waveform Comparison. This figure presents paired pulse-related waveforms measured from the face (ROI 21) and the palm (ROI 30). Circles on the waveforms indicate detected peak locations. The time intervals between two detected peaks within the same cardiac pulse indicate the time delay for the pressure wave to travel between the face and palm, which can be used to estimate the beat-by-beat pulse transit time.

Classification

The 215 subjects were divided into two groups for analysis, as shown in Table 1: one group comprised individuals with a systolic blood pressure of 130 mmHg or higher, or a diastolic blood pressure of 80 mmHg or higher. They were identified based on the AHA Hypertension Treatment Guidelines21 as corresponding to Stage I Hypertension or higher.

Table 1 Participants’ characteristics.

Propensity score matching

To address initial differences in patient backgrounds between the hypertensive and normotensive groups, propensity score matching (PSM) was performed using EZR (Ver. 1.68 June 30, 2024). The following steps were followed:

Calculation of propensity scores

Propensity scores were calculated for each subject based on key demographic and clinical characteristics.

Matching process

Subjects from the hypertensive and normotensive groups were matched using a 1:1 nearest-neighbor matching algorithm without replacement, ensuring balanced covariates between the groups. A caliper width of 0.2 standard deviations of the logit of the propensity score was used to improve matching quality.

Matched cohorts

The matching process resulted in two matched cohorts, each consisting of 62 subjects, with comparable demographic and clinical characteristics.

Rhythm analysis

Using the 30-s data, peak detection was performed for each ROI to examine the heart rate (Fig. 2a). The most frequently occurring heart rate was adopted, and a rhythm analysis was conducted in the ROI where this heart rate was detected. The time between peaks was calculated, and the standard deviation over the 30-s period was examined. If this standard deviation exceeded 50 ms, it was inferred to be atrial fibrillation, and all heartbeats were used for analysis22. Otherwise, it was treated as non-atrial fibrillation. In cases of non-atrial fibrillation, if the time between peaks was less than 80% or more than 120% of the most frequent time between peaks, it was considered a premature contraction. For premature contractions, the heartbeat and the first heartbeat immediately following it were excluded from the analysis. If more than 50% of the heartbeats in the 30-s period were excluded, the measurement was deemed unfeasible.

Beat-by-beat pulse transit time (PTT) estimation

Over time, the average intensities in a green channel from the selected ROIs were used as the input signal to identify the pulse-related waves. We first eliminated the 60 Hz Hum noise using a Butterworth notch filter from the original input signal. Then, a continuous wavelet transform was applied to remove the baseline wander. After that, the pulse-related waveform was extracted using the ensemble empirical mode decomposition (EEMD) algorithm23. EEMD, an adaptive and noise-assisted time series analysis method widely used in signal processing, is based on the EMD method, which decomposed the input signal into several intrinsic mode functions (IMF) with different spectral scales in the time domain. Adding white noise after sufficient trials enabled the different scale signals to collate in the proper IMF. It provided a better spectral separation of IMFs without priority-subjective criterion selection. The baseline wander removed signal was decomposed into several IMFs. Then, the IMFs with average frequencies around 1 Hz (0.6 Hz–1.5 Hz) correlated to pulse waves were selected for signal reconstruction, as shown in Fig. 2b,c.

PTT could be estimated from pairs of pulse-related waveforms between the face and palm (Fig. 2e). A feature vector \({PTT}_{k}\) then could be obtained by

$${PTT}_{k}={[PTT}_{mi-nj}],$$
$${m}_{i}\in ROI1\sim ROI22,{n}_{i}\in ROI23\sim ROI30,$$

where the max number (k) of \({PTT}_{k}\) was 176. According to previous empirical studies12,24, PTT between the face and palm ranged from 10 to 60 ms; thus, the estimated PTT should be within this range. We used these thresholds to filter out PTT data that were too short or too long due to underestimation. The reasons for underestimation were considered due to motion artifacts or weak signal intensities. We excluded PTT feature vectors with more than 30% missing observations, retaining the remaining 123 PTT feature vectors (k = 123) for training. For PTT features with less than 30% missing data, we imputed the missing data with nearest neighbor values.

Machine learning model

We applied different classifiers for prediction modeling and found the ensemble models with the best performance. The ensemble-based techniques25 combined predictions from a set of classifiers to make more accurate predictions than each classifier. We applied an ensemble of 30 subsets of k-nearest neighbors (kNN) classifiers for modeling. Each subspace ensemble was classified by manipulating random samples of feature subspaces instead of the entire feature set. Given an input of feature vector \({PTT}_{k}\) from one heartbeat, the trained model would output a prediction of BP class.

PTT feature vectors extracted from the full 5-s or rhythm-based extracted 30-s video data were used for training. In classifier modeling, we employed fivefold cross-validation, in which all PTT feature vectors of the training data were randomly split into stratified five-fold subsets. The classifier was trained using 80% of the training data and optimized using the remaining 20% as an internal validation set. This process was repeated five times until each subset had been used for validation. Finally, the performance of the model was evaluated using the 5 and 30 s of the test data, respectively. The final predicted class of the input test data was determined by using a majority vote in the predicted classes of all \({PTT}_{k}\).

Statistical considerations

In Tables 1 and 2, 11 items were analyzed to detail the characteristics of the examinees. The Mann-Whitney U test served as the method of analysis, with a P-value of 0.05 or lower being deemed indicative of a statistically significant difference.

Table 2 Propensity-matched participants’ characteristics.

Ethical approval

This study was approved by the University of Tokyo’s institutional ethics committee, Research Ethics Committee of the Faculty of Medicine of the University of Tokyo (approval number 2021327NI). All study participants provided written informed consent after receiving a detailed explanation of it, accompanied by written documentation. The study protocol was conducted in accordance with the Declaration of Helsinki.

Results

A total of 215 participants, patients with hypertension and healthy individuals, were enrolled (Table 1). Among them, 127 patients had a systolic BP ≥ 130 mmHg or diastolic BP ≥ 80 mmHg, which exceeded the hypertension Stage 1 or 2 AHA guidelines21 (Table 1). Significant disparities were observed between the hypertensive and normotensive cohorts across several parameters, including age, sex distribution, body mass index, systolic BP, and diastolic BP. The study population included individuals diagnosed with atrial fibrillation and those who had undergone the implantation of cardiac implantable electrical devices. Nevertheless, there was no significant difference in the distribution of patients with these medical histories between the two groups.

As shown in Fig. 1, subjects sat in a chair in a stable condition, and videos of their face and hands were recorded for 30 s using a target spectral camera. Simultaneously, a continuous blood pressure monitor, whose accuracy is guaranteed by the FDA, measured the BP for each heartbeat.

For the recorded videos, 30 regions of interest (ROIs) were set on the face and hands (Fig. 2). Pulse information was detected from the data within these ROIs, and the timing of pulse detection within each ROI was inferred. Variations in pulse detection timing across the ROIs were extracted as a characteristic of the individual. Based on the pulses detected in the ROIs, data acquisition was performed for each heartbeat. Initially, data from the first 5 s were used to extract features from the multiple heartbeats within this period. Features were extracted from all heartbeats contained within the 5-s period. Subsequently, analyses were performed using the 30-s data. In this analysis, the standard deviation of the pulse intervals was first used to determine whether atrial fibrillation was present. If atrial fibrillation was detected, features were extracted from all heartbeats. If atrial fibrillation was not present, and premature contractions were expected based on the RR intervals, the premature contractions and the first heartbeat immediately following each premature contraction (a total of two heartbeats) were excluded from the feature extraction. If more than 50% of the heartbeats in the 30-s period were excluded, the measurement was deemed unfeasible.

To address initial differences in patient backgrounds between the hypertensive and normotensive groups, propensity score matching (PSM) was performed. This resulted in two matched groups of 66 subjects each (Table 2). These matched cohorts were used for subsequent analyses.

In the analyses based on 5-s data, the sensitivity for normotension was 98.3 ± 6.22%, the sensitivity for hypertension was 84.5 ± 7.26%, and the overall accuracy was 90.3 ± 4.31% (Table 3). For the analyses based on 30-s data, the sensitivity for normotension improved to 100.0 ± 0.00%, while the sensitivity for hypertension was 89.2 ± 2.30%, and the overall accuracy increased to 95.0 ± 1.67% (Table 3). No data were deemed unanalyzable in either dataset.

Table 3 Machine learning and evaluation.

Discussion

Our findings indicate that machine learning models have the potential to accurately identify hypertension using images captured by a target spectral camera within a brief duration of 5 s. In a prior study, Rong et al. attempted to measure BP non-invasively using PPG and a ML model in a prospective cohort of patients with hypertension. However, their analysis relied on data collected over 40 seconds26; to date, no study has successfully determined hypertension using only 5 s of data. Our study presents novel findings that are distinct from previous research, as it incorporates the use of a high-speed camera in conjunction with a ML model11.

This study does have several limitations. The technology employed requires comprehensive validation across diverse racial backgrounds and skin pigmentation levels to affirm its reliability and generalizability27. Our research was conducted exclusively with Japanese individuals aged 20 years or older, which raises questions about the universality of our findings across different ethnic groups. Moreover, the study's sample size, which was limited to 215 participants, underscores the need for larger-scale validations to ascertain the system's efficacy in broader populations, including both healthy individuals and those with various health conditions. There were significant differences in patient backgrounds between the initial hypertensive and non-hypertensive groups. To address this, we performed PSM, resulting in two matched groups of 66 subjects each. Despite these adjustments, there remains a possibility that hypertension detection could be influenced by other characteristics in blood flow, potentially due to other diseases or factors.

Another aspect pertains to the data acquisition method used in our study, which involved the simultaneous collection and analysis of palm and facial data. Participants were required to remain motionless while positioning their face and palm in front of the camera for a minimum of five seconds. This procedure needs simplification to practical use; for instance, a hypertension diagnosis might be facilitated by analyzing data from either the face or palm alone, leveraging variations in PTT within such specific regions. Furthermore, the development of a non-contact, motion-tolerant system capable of hypertension assessment in less than five seconds would significantly enhance convenience for both patients and healthcare providers, making BP measurement less intrusive.

This study tested BP measurements at a single time point and did not assess BP dynamics in the same subjects over time. This study was based on BP and video recordings taken simultaneously at a specific time point for each subject. We have not yet tested the reproducibility of the results by measuring BP again in the same subjects or evaluated how the measurement results change when BP fluctuates significantly over time. These aspects remain to be investigated in future studies.

Additionally, environmental conditions during data capture, particularly lighting, play a crucial role in the system's versatility. Our investigation has not yet explored the system's performance in varying lighting conditions, such as outdoor environments or poorly lit spaces, since data collection was confined to controlled settings within a hospital (examination rooms, treatment rooms, and outpatient areas). Addressing these limitations is essential for devising a robust system capable of accurately diagnosing hypertension under any circumstances, thereby broadening its applicability and utility in real-world scenarios.

The results of this study reveal that it may be possible to determine high BP in a few seconds or a few heartbeats. The implementation of a non-contact, 5-s BP measurement technique would significantly enhance the efficiency of BP monitoring, encouraging a broader segment of the population, perhaps previously indifferent to BP tracking, to engage in regular monitoring. This could lead to the early identification of individuals at risk of hypertension. This monitoring system has the potential to streamline hypertension detection by deploying cameras in public spaces, thereby obviating the necessity for individuals to acquire specialized BP measurement devices personally. Moreover, the non-contact approach minimizes the risk of transmitting infectious diseases, including COVID-19, thereby safeguarding both patients and healthcare professionals.

Conclusions

The present study is the first prospective study using a spectrum camera with hypertensive patients. Hypertension could be accurately determined using only 5 s of video data from the target spectral camera. The results indicate the possibility of implementing a simplified approach to BP monitoring on a broader population scale.