Introduction

The effects of acoustic reproduction in a higher frequency over audible range on the psychophysiology of humans and animals have been investigated. Recently, it was reported that high-frequency acoustic reproduction can contribute to life-span extension in mice1. The improvement of anhedonia symptoms using high-frequency sound reproduction in humans has also been pointed out2. Regarding their effects on humans, in 2000, Ohashi et al.3 reported that gamelan music from Bali, which has frequency components above 22 kHz outside the audible range, affects human brain activity. That study revealed that sound stimuli with components outside the audible range were significantly activated by alpha-band electroencephalography (EEG). The psychological effects were also mentioned, and the results of subjective experiments on sound impressions using adjectives showed significant differences depending on the presence or absence of frequency components above the audible range. Subsequently, this so-called “hypersonic effect” was verified by the same research group, which found that the activation of brain wave is influenced by factors other than air-conducted hearing4, and that frequency components above 32 kHz have the effect of increasing the alpha-band EEG power, whereas those below 32 kHz have the inverse effect of decreasing the alpha-band EEG power5. Another research group replicated this result using classical music (J. S. Bach), reporting that listening to music with abundant high-frequency components above the audible range increased the alpha-band EEG power6,7. However, no differences in subjective impressions were found between the cases with and without high-frequency components.

Each of these previous studies had different views on the effects of exposure to sounds with high-frequency components that are basically inaudible via the air-conducted hearing path of humans. Muraoka et al.8 and Plenge et al.9 concluded that there is no conscious recognition of differences in sound quality, even when sounds in the frequency range of 15 kHz or higher are included. This finding has been used as original data for determining standards regarding the sampling frequencies for the digital CD and DAT formats. Kuribayashi et al.6,7 also found no difference in subjective impressions depending on the presence or absence of high-frequency components. In contrast, Oohashi et al.3 pointed out that the presence or absence of high-frequency components influences subjective impressions, but this conclusion was limited to cases in which gamelan sounds were used as the stimuli. Thus, whether psychological effects can be confirmed for other sound types, such as natural sounds, remains unknown. A previous study investigated the effects of water sounds on reducing stress10. More specifically, natural sounds such as river sounds are simple and may promote relaxation in humans11. It is interesting to see how frequency components outside the audible frequency range affect the psychophysiological aspects of humans. However, to the best of our knowledge, no studies have investigated the effects of inaudible sound components.

Given this background, the present study aimed to investigate the effects of an ultrasonic component with abundant high-frequency components above 20 kHz on the subjective impressions caused by a waterfall and a stream. Herein, considering the physiological characteristics of EEG, we investigate the mechanisms underlying the psychophysiological effects of acoustic exposure to broadband sounds outside the audible range.

Results

Effect of ultrasonic sound on psychological response

The results of the subjective evaluation experiment using the semantic differential (SD) method are shown in Fig. 1 as a profile. The figure presents the average of the scores answered by each subject under three types of acoustic stimuli: WF_66_*, WF_76_*, and ST_66_*. Table 1 shows the results of a statistical test of the difference between the results with and without the reproduction of the ultrasonic component.

Figure 1
figure 1

Average profile ratings obtained using the SD method for the sound stimuli of (a) WF_66_20k and WF_66_96k, (b) WF_76_20k and WF_76_96k, and (c) ST_66_20k and ST_66_96k.

Table 1 Statistical results of the semantic differential scores for the conditions with and without ultrasonic components (†: p < 0.1, *: p < 0.05, **: p < 0.01). The effect size was calculated using Cohen’s d.

First, regarding the results for the waterfall sound reproduced at 66 dB (WF_66_*), differences in many adjectives were found between with and without the ultrasonic component (Fig. 1a). The adjective pairs that tended toward being different or significantly different were beautiful vs. dirty (p < 0.05, d = 0.89), clear vs. dull (p < 0.1, d = 0.70), calm vs. loud (p < 0.05, d = 0.40), and noisy vs. quiet (p < 0.1, d = 0.53). The waterfall sound with the ultrasonic component reproduced at 66 dB was more beautiful, clearer, calmer, and quieter than that without the ultrasonic component. As in the 66 dB case, the adjective pairs that showed significant differences for the waterfall sound played at 76 dB (WF_76_*) were beautiful vs. dirty (p < 0.005, d = 0.82), clear vs. dull (p < 0.05, d = 0.31), and calm vs. loud (p < 0.1, d = 0.31). In addition, a new trend toward a difference was found for comfortable vs. uncomfortable (Fig. 1b). Next, when the sound pressure levels (SPLs) were kept equal and the type of sound was a set to a stream, significant differences were found for moist vs. dry (p < 0.05, d = 0.51) and powerful vs. powerless (p < 0.05, d = 0.56), in addition to clear vs. dull (p < 0.1, d = 0.53), which was also observed under the other conditions (Fig. 1c). In most conditions, Cohen’s d ranged from medium to large.

Next, as shown in Table 1, differences were observed in perceptions related to the aesthetic aspects of sound, such as beauty, clarity, and comfort. Significant differences were also observed in adjectives related to the noisiness, loudness, and power of the sound. When the ultrasonic component was additionally reproduced, the sense of noisiness was reduced, whereas that of power was increased. Generally, the impression of the sound changed to more positive.

Effect of ultrasonic sound on physiological response

The results for the EEG change rate in each frequency domain are shown in Fig. 2. Figure 2a, c, e and Fig. 2b, d, f show the results acquired in the o1 and o2 channels, respectively. Figure 2a–f show the results for WF_66_20k and WF_66_96k, WF_76_20k and WF_76_96k, and ST_66_20k and ST_66_96k, respectively.

Figure 2
figure 2

Results of a comparison of measured α1-, α2-, α-, and β-EEGs obtained from the (a, c, e) o1 and (b, d, f) o2 channels. The panels show the results for the sound stimuli of (a, b) WF_66_20k and WF_66_96k, (c, d) WF_76_20k and WF_76_96k, and (e, f) ST_66_20k and ST_66_96k. The error bars represent ± SD of each data set. †: p < 0.1, *: p < 0.05, **: p < 0.01.

An example of the EEG time series data used to obtain the change rate is shown in Fig. 3. Herein the results of measured time transient characteristics of α-EEGs obtained from the o1 and o2 channels under the conditions of ST_66_20k and ST_66_96k are comparatively shown. In this experiment, EEG waveforms were epoched in every 30 s, and each band component in each 30-s waveform was analyzed to determine the change ratio of the average power over 8 min of stimuli (EEGStim) to the average power over 4 min of pre-rest (EEGBase) in each bands of α, α1, α2, and β, respectively. More detailed procedure is described in the following section of Physiological measurement. In the condition of o1 channel (Fig. 3a), α-EEG in the condition with the ultrasound component during stimulus playback is lower than that in the condition without the ultrasound component. On the other hand, in the o2 channel (Fig. 3b), there is no difference in the change rate during stimulation in both the condition with and without ultrasound. We integrated the power of the waveform for all time periods during the stimulation and obtained EEGStim to obtain a single index of the change rate. Then, as shown in Fig. 2e and f, the former figure shows a significant difference in the α-EEG band and the latter shows no significant difference. By applying this manipulation to the EEG results of the other conditions, the effects of EEG changes caused by each of the acoustic stimuli in each condition can be evaluated.

Figure 3
figure 3

Example of measured time transient characteristics of α-EEGs obtained from the (a) o1 and (b) o2 channels. The results under the conditions of ST_66_20k and ST_66_96k are comparatively shown. The error bars represent ± SD of each data set.

First, Fig. 2c, d shows that the α-EEG was increased by additionally reproducing the ultrasonic component in most bands and channels. In Fig. 2a, b, which was obtained by decreasing the SPL by 10 dB, a difference in the o1 channel is seen in the α1 and α bands, but not in the other bands. On the other hand, in Fig. 2e, f, which shows a condition where the sound type was changed to a stream sound, the result obtained for o1 shows differences in the α1, α2, and α bands, while that for o2 shows differences in the α2 band, although these differences only indicated a trend. In the right channel o2, a difference is observed in the o2 band, although this was only a trend toward a significant difference.

Thus, it can be confirmed that exposure to ultrasound mainly affects the α-EEG, which is comparable to the results of a previous study1. Furthermore, the results of this study suggest that the effect of ultrasound reproduction is greatly influenced by the SPL and type of sound. In contrast, we could not confirm any specific effect of the difference between the left and right o1 and o2 channels.

Effects of ultrasonic sound and the relationship between physiological and psychological responses

Table 2 shows the correlation coefficients between the SD scores and EEG indices obtained under the conditions with and without ultrasound components. The conditions with correlation coefficients > 0.4, which indicates a medium or higher correlation, are shown in red12. Comparing the results obtained for the o1 and o2 channels, the latter showed a relatively higher correlation between SD scores and the EEG change rate under both conditions (with and without the ultrasound component).

Table 2 Results of calculated correlation coefficients between the SD scores and EEG change rates for α1-, α2-, α-, and β-EEGs obtained at the o1 and o2 channels.

Next, the correlation coefficients between the α- and β-EEGs were inverted in most cases. For example, the SD scores for clear vs. dull were positively correlated with α-EEG, but negatively correlated with β-EEG. Conversely, in the case of noisy vs. quiet, SD scores were negatively correlated with α-EEG and positively correlated with β-EEG. More specifically, α-EEG and β-EEG seemed to be strongly associated with positive and negative emotions, respectively. For example, in a study of the built environment, α-EEG increased in a comfortable environment13, whereas β-EEG increased in a stressful environment14, suggesting that the correlation between α- and β-EEGs in the present study is based on the same mechanism.

Discussion

First, the results from Fig. 1 and Table 1 suggest that the additionally reproduced ultrasonic component affects subjective impressions. Specifically, an effect was observed for the evaluation terms related to aesthetic and power impressions. In a previous study by Kuribayashi5, the first 200-s portion of French Suite No. 5 by J. S. Bach (on cembalo, 24-bit quantization, 192 kHz A/D sampling) was selected as the sound stimuli, and was reproduced by removing higher components using a low-pass finite impulse response digital filter with a very steep slope. As detailed acoustic spectral information on this acoustic signal is not available, it is not possible to compare directly the acoustic energy of the included ultrasonic waves between the above experiment and the present study. However, it is true that the water sound used in the present study has a continuously abundant signal from 20 to 96 kHz, which may have strongly influenced the results. In contrast, Oohashi et al.1 conducted a subjective evaluation experiment using gamelan music from Bali, which has a continuously decaying abundant component similar to the water sound in the ultrasonic domain, and reported finding significant differences (p < 0.01) in the following scales: soft vs. hard, reverberant vs. percussive, comfortable to ears vs. uncomfortable to ears, and rich in nuance vs. lacking in nuance. In that study, the acoustic stimuli were adjusted to a comfortable SPL for each of the subjects, where the maximum SPL distributed approximately 80–90 dB at the listening position. Those SPLs are about 10–20 dB higher than the SPLs used in the present study. In any case, comparing these results, it can be said that the time and frequency characteristics of the sound source signal influenced whether the ultrasonic component had affected subjective impressions. On the other hand, in the present study, even at a relatively low SPL of 66 dB, the ultrasonic components could affect subjective impressions depending on the type of sound source. However, further clarification of how subjective impressions are affected when the SPL is changed requires more detailed subjective evaluation experiments controlled at more finely categorized SPLs.

On the contrary, the present results also confirmed that the additionally reproduced ultrasonic components affect the EEG, as shown in Fig. 2. In particular, α-EEG was increased by the influence of the water sound, as shown in a previous study. As mentioned above, increasing the SPL of the waterfall increased the frequency of significant increases in α-EEG. However, this was dependent on the EEG change rate; thus, a more detailed verification is needed to determine the nature of this relationship.

Finally, the results in Table 2 indicated a significant correlation between SD scores and the EEG change rate. However, no significant difference between correlations was observed when ultrasound was or was not included. In other words, the ultrasound component increased α-EEG and improved the impression, but this effect did not have a significant impact on the relationship between subjective impressions and EEG. However, as mentioned above, the situation was very different for the left and right channels of o1 and o2 placed in the occipital region, with an increase in the number of pairs showing a moderate or higher correlation at o2. This is interesting from the viewpoint of the relationship between subjective impressions and physiological responses to a general water sound that contains more than a certain wide frequency band component, although it occurred regardless of the presence or absence of ultrasound reproduction. In a previous study, music perception was shown to be lateralized to the right hemisphere of the brain to a considerable extent15. A study target Indian music16 found that the right hemisphere is more adapted to emotional categorization for happiness and sadness, whereas another study17 questioned this concept of the lateralization of music perception. As the asymmetry suggested in the correspondence between subjective impressions of water sounds and the EEG change rate in the present study is also consistent with previous findings, the respective contributions to sound perception of the right and left hemispheres of the brain suggest a functional asymmetry. Furthermore, the asymmetric distribution of the EEG even for the water sounds, which do not directly affect human emotions like music, is a point that has not been observed in previous studies. On the other hand, the data obtained in this study did not allow us to extract fully how the relationship between subjective evaluations and EEG changes depend on the presence or absence of ultrasonic components. This will require further verification through experiments under more detailed control.

Limitations

This study has following limitations. Fist, although the total number of the participants with 10 subjects was determined by following the previous study18, it is considered necessary to verify the findings of this study in an extended experiment with a larger sample size which can be determined based on the means and standard deviations obtained by this experiment. Second, the current study presented results focusing on the psychological and physiological effect of the ultrasonic components including quite broad frequency band from 20 to 96 kHz. Herein this study lacks controlled experimentation that isolates specific frequency bands in the higher frequency bands above 20 kHz. By adopting such a more detailed experimental design, more specific results can be obtained.

Methods

Experimental procedure

The overall scheme of the experiment is described as follows. To ensure a quiet environment, the experiment was conducted in a soundproof room with an SPL difference of approximately 35 dB. The acoustic reverberation was suppressed by coating the surface inside the soundproof room by porous absorbers. The environment inside the room was kept dark to avoid the influence of visual stimuli, including the light environment, on the measured EEG. Related to this, the subjects' eyes were kept closed during all the experiments except for the rest time between each condition. As shown in Fig. 4, the subject was placed at the center of the soundproof room. Four loudspeakers (D-D2E; ONKYO, Osaka, Japan) and one woofer (HS8S; Yamaha, Hamamatsu, Japan) were set at each of the corners and in front of the subject, respectively, and the target sounds were reproduced. The experimental procedure used in the present investigation is shown in Fig. 5. A 1-min rest period was included, followed by a 4-min pre-rest period, which was used as a baseline for the EEG measurements. After that, 8 min of auditory stimulation were reproduced, followed by a post-rest period. Overall, the EEG response to each of the auditory stimuli was measured for 16 min. After that, subjective evaluation of the given acoustic condition was performed by the subject. Three types of acoustic auditory stimuli were used, as shown in Table 3: the sound of waterfalls with equivalent continuous A-weighted SPLs (LAeq) of 66 dB (WF_66_20k and WF_66_96k) and 76 dB (WF_76_20k and WF_76_96k), and the sound of a river with LAeq of 66 dB (ST_66_20k and ST_66_96k). For these conditions, the sound files including the low-frequency components up to 20 kHz were also treated as stimuli, along with the original file with frequency components up to 96 kHz. Further details of the above conditions are described in section “Effects of ultrasonic sound and the relationship between physiological and psychological responses”.

Figure 4
figure 4

Spatial relationship between each of the speakers and the subject (+).

Figure 5
figure 5

Experimental flow for (a) each of the acoustic conditions, and (b) the overall flow of experiment, including the three acoustic conditions.

Table 3 Experimental conditions of the adopted acoustic stimuli.

Participants

In total, 10 subjects (5 males and 5 females, mean age ± standard deviation: 21.9 ± 1.1 years) participated in the experiment. The number of subjects was determined with reference to a previous experiment18 in which an equivalent number of subjects was used to examine the relationship between subjective impressions of sound and its effects on neurologic behavior. All participants were recruited by an e-mail sent to students at the author’s institution. In accordance with EN 50332-1 and -2 proposed by the European Committee for Electrotechnical Standardization19,20 as sound pressure regulations for portable audio players and the ethical guidelines of Tokyo University of Science, this experimental study was conducted to be noninvasive. Informed consent for the experiment was obtained from all participants after being briefed on the purpose of the study and experimental methods, as well as the anonymization and use of data. Prior to the study, the participants were asked about their hearing ability, which was tested using the hearWHO hearing test app21, and assured that all of their ears were normal (all participants had a score > 75, indicating good hearing).

Acoustic stimuli

In this study, naturally occurring sounds of streams and waterfalls were used as acoustic stimuli. It is significant to examine the psychophysiological effects of water sounds, which have a wide range of frequency components, because previous studies have examined the effects of water sounds22, mainly stream and waterfall sounds, to enhance the sound environment and to mask road traffic noise23.

The reproduced stream and waterfall sounds with frequency components up to 96 kHz were recorded at the Watarase River (Ashikaga city, Tochigi, Japan) and Otaki (Kanuma city, Tochigi, Japan), respectively, using two monaural ¼-inch microphones (378C01; PCB Piezotronics, Depew, NY, USA) and two monaural ½-inch microphones (378B02; PCB Piezotronics). As shown in Fig. 4, a five-channel speaker system including one channel with a woofer was used in this experiment. A two-way speaker with a crossover frequency of 2.5 kHz and a frequency response from 50 Hz up to 100 kHz (D-D2E; ONKYO, Osaka, Japan) was used as the four-channel loudspeaker; this speaker was capable of sufficiently driving the frequency range of the acoustic stimuli used in this study. Then, the sound recorded by the four microphones was reproduced through four speakers using two ¼-inch microphones with sensitivity up to 100 kHz and two ½-inch microphones with sensitivity up to 20 kHz. Although it is possible to use four ¼-inch microphones to record the sound for all channels, the small size of these microphones makes their self-noise more prominent than that of ½-inch microphones, and noise in the audible frequency range may cause problems in terms of subjective evaluations. Therefore, the sound recorded by the ¼-inch microphone was reproduced from the two front speakers to radiate the ultrasonic component to the subject efficiently, and the sound recorded by the ½-inch microphone was reproduced from the two rear speakers. The woofer in front of the subject reproduced the sound recorded by the ½-inch microphone. To reproduce the sounds, a PC was connected to an audio interface (UR44C; Steinberg, Hamburg, Germany), and the sound files were converted to analog signals and reproduced from the speakers. As the maximum sampling frequency of the audio interface is 192 kHz, the maximum upper limit frequency for recording and playback of the sound sources used in this study was set to 96 kHz.

The SPLs for each of the sound stimuli listed in Table 1 were set for the following reasons. First, the equivalent continuous A-weighted SPLs presented were 76 dB for the waterfall sound and 66 dB for the stream sound; these were the actually measured SPLs at the recording locations. To evaluate the differences in the presented SPLs, a waterfall sound stimulus with an equivalent continuous A-weighted SPL of 66 dB was additionally prepared by digitally attenuating the amplitude of the original sound wave file.

In this study, as already mentioned, two types of sounds were prepared for the various acoustic stimuli above, one with components up to 96 kHz as the original sound, and the other with components up to 20 kHz obtained by low-pass filtering. Then, the effects of the two kinds of stimuli on human psychophysiological states were compared and discussed. All of the above low-pass filtering and other types of digital processing were carried out using the Audition program (Adobe, San Jose, USA). The frequency characteristics of the various acoustic stimuli measured at the center of the subject’s head when each of the two sounds was reproduced are shown in Fig. 6. The three sounds shown in Fig. 6a contain an abundant frequency component > 20 kHz. By blocking these components, as shown in Fig. 6b, sound stimuli without ultrasonic components were generated and presented.

Figure 6
figure 6

Frequency characteristics of the adopted water sounds for waterfalls and streams under condition (a) with an ultrasonic component and condition (b) without an ultrasonic component.

Psychological measurements

The subjective impressions of the simulated acoustic stimuli were measured by a subjective evaluation test using the SD method24. In this evaluation, a seven-point Likert scale (from “Strongly disagree” to “Strongly agree”) was used (Fig. 7a). Then, the 15 adjectives shown in Fig. 7b were used.

Figure 7
figure 7

(a) The seven-point Likert scale used in the SD experiment and (b) the adjective pairs for positive (Adjective 2) and negative (Adjective 1) poles.

Physiological measurements

In this study, in addition to the psychological responses described in the previous section, EEG was observed to examine physiological responses in relation to psychological responses. EEG was measured using the Ultracortex Mark IV (OpenBCI, Brooklyn, NY, USA), as shown in Fig. 8a. The measured EEG waveforms were analyzed by using MATLAB (The Mathworks, Massachusetts, USA) with EEGLAB. The sampling rate of the measurements was 250 Hz. The measured waveform data were recorded via Bluetooth. As the main purpose of this study was to examine the subjective impressions caused by reproduced water sounds with ultrasound components in a relaxed situation, we decided to use as few channels as possible so as not to affect the natural impressions. Previous studies have shown that exposure to ultrasound components above the audible frequency range increases α-EEG1, and thus, changes in α-EEG were also expected in this study. The occipital significance for α-EEG25,26 have been reported, and changes in α-EEG at the occipital region have also been found in relation to comfort27. Therefore, the two measurement points shown in Fig. 8b (o1 and o2) were selected in this study. EEG recordings were derived using the monopolar derivation method with the earlobe as the reference electrode.

Figure 8
figure 8

(a) EEG adopted in this study. (b) The measured channels for EEG were o1 and o2 on the posterior part of head, and A1 and A2 for the ground electrodes on the earlobes.

The EEG analysis methods are described as follows. First, the EEG indices in the respective frequency domains were obtained as follows. The low-frequency components were filtered out using a low-pass filter with frequency components up to 30 Hz. The unnecessary pulsive waves of artifacts caused by eye movements were manually excluded. Next, the waveform was cut into multiple 10-s segments, and band-limited powers of PSD(f) from f1 Hz to f2 Hz in the α1 domain from 8 to 10 Hz, α2 domain from 10 to 13 Hz, overall α domain from 8 to 13 Hz, and β domain from 13 to 30 Hz were calculated as follows:

$$P_{{f_{1} - f_{2} }} = \mathop \smallint \limits_{{f_{1} }}^{{f_{2} }} PSD\left( f \right)df$$
(1)

where PSD(f) is the power spectrum density obtained from the fast Fourier transform treatment of each segment. Physiological indices may have different baseline physiological quantities depending on the physiological and psychological state at the time of their measurement. Therefore, it is difficult to compare absolute values of indices among subjects. So, as shown in Fig. 5a, a pre-rest period was set during each measurement, and the physiological parameters measured during that period was used as the baseline value (EEGBase). Then, the ratio of the physiological parameters when acoustic stimulation was applied (EEGStim) to this baseline value was calculated as the change rate and discussed. The EEG change rate in each frequency domain R was calculated as follows:

$$R = \frac{{EEG_{{{\text{stim}}}} - EEG_{{{\text{base}}}} }}{{EEG_{{{\text{base}}}} }} \cdot 100$$
(2)

Regarding the frequency domain segmentation of the EEG, the α band, which has been found to differ significantly in a previous study1, was mainly discussed. The β band, which is adjacent to the α band, was additionally treated. To discuss further details of the α band, the segmentation method of the α band into α1 and α2 was also adopted, in accordance with a previous study on the hypersonic effect5. Finally, the α-EEG, α1-EEG, α2-EEG, and β-EEG change rates were treated.

Statistical analysis

Statistical analysis was performed using BellCurve for Excel (Social Survey Research Information Co., Ltd., Tokyo, Japan). The difference between the with and without ultrasound component conditions was tested using a two-tailed Students t-test. For all analyses, p values < 0.05 were considered significant. To clarify the psychological mechanisms of highly variable events regarding the residents’ impressions of practical audiovisual stimuli, such as the urban environment, p values between 0.1 and 0.05 were discussed together as indicating a trend toward significance. The parameters of Cohen’s d were calculated to estimate effective sizes28, where the threshold of the extent of effectiveness is defined as follows: 0.2 < d < 0.5: small, 0.5 < d < 0.8: medium, and 0.8 < d < 1.0: large.