Abstract
Vocalizations including laughter, cries, moans, or screams constitute a potent source of information about the affective states of others. It is typically conjectured that the higher the intensity of the expressed emotion, the better the classification of affective information. However, attempts to map the relation between affective intensity and inferred meaning are controversial. Based on a newly developed stimulus database of carefully validated non-speech expressions ranging across the entire intensity spectrum from low to peak, we show that the intuition is false. Based on three experiments (Nā=ā90), we demonstrate that intensity in fact has a paradoxical role. Participants were asked to rate and classify the authenticity, intensity and emotion, as well as valence and arousal of the wide range of vocalizations. Listeners are clearly able to infer expressed intensity and arousal; in contrast, and surprisingly, emotion category and valence have a perceptual sweet spot: moderate and strong emotions are clearly categorized, but peak emotions are maximally ambiguous. This finding, which converges with related observations from visual experiments, raises interesting theoretical challenges for the emotion communication literature.
Similar content being viewed by others
Introduction
Whether conveyed by the face, body, or voice, expressions of emotion are ubiquitous. The inferred meaning of the expressions is, generally speaking, substantially aligned with the affective content expressed, and it is intuitive to suggest that the stronger the expressed affective state the more clear-cut the inferred emotional meaning. Indeed, a body of research suggests that high-intensity emotion expressions are better ārecognizedā1,2,3,4,5. Importantly, both discrete-emotion and dimensional theories predict this pattern of results, although by different mechanisms: either maximized distance to other emotions by the increasing recruitment of diagnostic content (e.g., facial muscle action6,7) or through maximized distance in the affective space encompassed by the dimensions valence and arousal8 (note that alternative and higher-dimension models arrive at similar predictions, e.g., Plutchikās circumplex but discrete emotion view9). In other words, the prevailing approaches conjecture less confusion or ambiguity for the classification of highly intense expressions than for intermediate ones, as the distinctiveness of emotion expressions is predicted to increase with increasing intensity.
This generalization has been challenged by the discovery of perceptual ambiguity for facial10,11 and vocal12 expressions of peak emotional intensity. In the latter study, vocalizations of extreme positive valence could not be disambiguated from extreme negative valence. Moreover, these authors demonstrated a trend opposite the predicted relation for peak intense positive situations: the reactions of real-life lottery winners were rated more negatively as hedonic intensity (in this case cued by the prize sum) increased. They argue that peak emotion expression is inherently ambiguous and reliant on contextual information12,13,14.
The research on the ambiguity of intense expressions is intriguing, but key issues lack sufficient evidence to refine our theoretical understanding. The studies on peak emotion elegantly contrast positive and negative affect. As such, one aspect of affective experience (i.e., valence) is hard to differentiate. Valence along with arousal are thought to constitute essential building blocks of core affect15,16. Hence, its compromised perceptual representation invites the speculation that peak intense vocalizations do not convey any affective meaning. But it is not known whether arousal, an equally fundamental property of affect, is similarly indistinctive. Moreover, the data raise the question whether individual emotions of the same or opposing valence can be differentiated, or if only peak positive affect is unidentifiable.
These considerations are important to understand the complex role of emotion intensity. From an analytic perspective, the two types of studies yielding the contradictory evidence are difficult to compare. The contrast differs between the groups of studies (emotion categories versus hedonic value, i.e., positive or negative). Additionally, it is unclear whether ambiguity is specific to peak emotion, or if affective expressions are generally more ambiguous than previously thought12,13. The one group of studies largely base their interpretation on results obtained with moderately intense emotion expressions; peak intensity emotional states were not examined. On the other hand, the data challenging this interpretation exclusively address peak emotional states. In summary, the data motivating these ideas are too sparse to adjudicate between the theoretical alternatives.
Various questions arise. First, what underlies the perceptual ambiguity, that is, which aspects of emotion lack a differentiable perceptual representation? Are valence, arousal, and emotion category equally affectedāand is ambiguity a general property of emotion communication? Second, how does affective information vary as a function of emotion intensity, if not linearly, as previously assumedāand what are the resulting theoretical implications? We illuminate the seemingly contradictory findings and provide insight into the processes of nonverbal emotion expression.
Nonverbal vocalizations reflect variable degrees of spontaneity, cognitive control, social learning, and culture17,18. They are largely shaped by physiological effects on voice. Such effects, associated with sympathetic activation or arousal, can be perceived through characteristic changes in vocal cues19,20 and play a role especially in the communication of strong emotion21,22. Specifically, for nonverbal expressions arising from extreme situations, little voluntary regulation and socio-cultural dependency are expected23,24. Emotionally intense vocalizationsāin negative as well as positive contextsāoftentimes encompass harsh sounding call types such as screams, roars, and cries23,25,26,27. On a functional account, the characteristic acoustic structure (i.e., nonlinearities and spectro-temporal modulations) seem ideal to capture listener attention25,26,27. Importantly, acoustic signatures are linked to high attention and salience as well as the perception of arousal across species and signal modalities26,28,29,30,31,32,33. Their biological relevance thus seems irrefutable.
Though valence and arousal are equally fundamental in emotion theoretical frameworks, it is implausible to assume that the human voice does not signal physical activation or arousal in the most extreme instances of emotion. In fact, from an ethological perspective, a perceptual representation of arousal as well as the specific intensity of the emotional state seem essential, even when overall valence and the specific type of emotion cannot be identified.
To address specifically the influence of emotional intensity on emotion perception, we use nonverbal vocalizations from a newly developed database, the Variably Intense Vocalizations of Affect and Emotion Corpus (VIVAE). The corpus, openly available (http://doi.org/10.5281/zenodo.4066235), encompasses a range of vocalizations and was carefully curated to comprise expressions of three positive (achievement/triumph, positive surprise, sexual pleasure) and three negative affective states (anger, fear, physical pain), ranging from low to peak emotion intensity. Perceptual evaluations were performed by Nā=ā90 participants, who in three separate experiments classified (Experiment 1, Fig.Ā 1) and rated emotion (Experiment 2āgiven the limitations of forced choice response formats, discussed, e.g., in Refs.1,34), rated the affective dimensions valence and arousal (Experiment 3), and rated perceived authenticity (Experiments 1 and 3). We hypothesized that listeners would be able to classify emotional categories significantly above chance (Experiments 1 and 2) and to rate the affective properties of the stimuli congruently with the expressed affective states (Experiment 3). The critical hypothesis was as follows: All judgments were examined as a function of emotion intensity, which we expected to have a systematic effect on stimulus classification (Experiment 1) and on perceptual ratings (Experiments 2 and 3). Following the theoretical frameworks, we predicted that intensity and arousal would be classified clearly over the range of expressed intensities, while, in line with recent empirical data, the amplifying role of emotional intensity on the classification of valence and emotion category would plateau at strong emotion. Peak emotion should be maximal in received intensity and arousalāhowever, valence and emotion category would be more ambiguous. Together, we conjectured a paradoxical effect of the intensity of expressed emotion on perception, a finding not easy to accommodate by current versions of categorical and dimensional theories of emotion.
Results
Emotions are accurately classified
In the emotion categorization task (Expt1, Figs.Ā 1 and 2a, Supplementary Table S2), classification was significantly better than chance (16.67%) for each emotion (t(29)ā=ā12.91 for achievement, 21.57 for anger, 13.85 for fear, 19.54 for pain, 18.02 for pleasure, 13.54 for surprise, Bonferroni-corrected psā<ā0.001, dsā>ā2.36). Of the expressions with incongruent emotion classification, positive expressions were more likely to be misclassified as negative (t(29)ā=āāĀ 5.36, pā<ā0.001, dā=āāĀ 0.98), whereas negative expressions were equally likely to be confused within as across valences (t(29)ā=ā0.95, pā=ā0.35, dā=ā0.17).
Comparing participantsā ratings on each of the six emotion scales in the emotion rating task (Expt2, Figs.Ā 1, 2b), we found that the expressed emotions were rated higher on the matching scale than on the other scales (main effect of emotion scale, F(5, 145)ā=ā215.55 for achievement, 178.15 for anger, 135.53 for fear, 173.63 for pain, 171.93 for pleasure and 124.06 for surprise, psā<ā0.001; all but one pairwise comparisons contrasting the matching scale ratings with the other scale ratings were significant at pā<ā0.001; pā=ā0.04 for achievement-surprise). Above chance classification for each emotion is reported in Supplementary Table S2.
Intensity is faithfully tracked
Congruence between expressed and perceived intensity is reflected in the monotonic increases depicted in Fig.Ā 3a (Expt1) and Fig.Ā 3b (Expt2). We tested whether listeners could reliably identify the intensity of the expressions and if they could do so across tasks. Separate ANOVAs were performed to investigate how listenersā ratings vary as a function of expressed valence, emotion, and intensity.
For the Experiment 1, the Emotion Ć Intensity rmANOVA revealed significant main effects of emotion (F(5, 145)ā=ā15.10, pā<ā0.001, Ī·p2ā=ā0.08) and intensity (F(3, 87)ā=ā266.07, pā<ā0.001, Ī·p2ā=ā0.68), and a significant interaction (F(15, 435)ā=ā9.91, pā<ā0.001, Ī·p2ā=ā0.03). Planned comparisons confirmed systematic differences in participantsā ratings, with low (Mā=ā3.09, 95% CI [2.91, 3.27])ā<āmoderate (Mā=ā3.82, [3.64, 4.00])ā<āstrong (Mā=ā4.72, [4.54, 4.90])ā<āpeak emotion intensity ratings (Mā=ā5.49, [5.31, 5.67], all psā<ā0.001). Post hoc comparisons of the interaction are reported in Supplementary Fig. S1c.
Results were replicated in the emotion rating task (Fig.Ā 3b): We found significant main effects for emotion (F(5, 145)ā=ā17.05, pā<ā0.001, Ī·p2ā=ā0.05), intensity (F(3, 87)ā=ā204.00, pā<ā0.001, Ī·p2ā=ā0.57), and a significant interaction (F(15, 435)ā=ā10.85, pā<ā0.001, Ī·p2ā=ā0.04). In line with the results from Experiment 1, planned comparisons confirmed an increase in participantsā intensity ratings from low to peak emotion intensity (Msā=ā3.41, [3.19, 3.63]ā<ā4.03, [3.81, 4.25]ā<ā4.85, [4.63, 5.07]ā<ā5.54, [5.32, 5.76], psā<ā0.001).
The effect of valence on intensity ratings was assessed in Valence Ć Intensity rmANOVAs. Here, results differed between the two experimental groups. In Experiment 1, intensity ratings did not differ significantly between negative (Mā=ā4.30, [4.12, 4.48]) and positive expressions (Mā=ā4.26, [4.08, 4.44]) (F(1, 29)ā=ā0.42, pā=ā0.52). In Experiment 2, intensity ratings were higher for negative (Mā=ā4.53, [4.31, 4.75]) compared to positive expressions (Mā=ā4.38, [4.16, 4.60]) (F(1, 29)ā=ā16.43, pā<ā0.001, Ī·p2ā=ā0.01). As expected, the main effect of intensity was significant for both groups (Expt1, pā<ā0.001, Ī·p2ā=ā0.71; Expt2, pā<ā0.001, Ī·p2ā=ā0.60; F-ratios for intensity are reported in the Emotion Ć Intensity ANOVAs). The interaction of valence and intensity was significant in both groups (Expt1, F(3, 87)ā=ā4.48, pā=ā0.008, Ī·p2ā=ā0.001; Expt2, pā=ā0.007, Ī·p2ā=ā0.002). Post-hoc comparisons revealed that for Experiment 2, differences between positive and negative valences were significant at higher intensities (Msā=ā4.75, [4.50, 5.0] and 4.94, [4.70, 5.19] (pā=ā0.04) for positive and negative strong intensity; Msā=ā5.43, [5.18, 5.68] and 5.64, [5.40, 5.89] (pā=ā0.02) for positive and negative peak intensity), but not at weaker intensities (low, pā=ā0.54; moderate, pā=ā0.11). For Experiment 1, the same trend was shown but did not reach significance.
The detected effect of expressed on perceived intensity persisted for trials in which emotion was not classified concordantly. Despite differences between congruently and incongruently classified trials (Expt1, F(1, 29)ā=ā16.33, pā<ā0.001; Ī·p2ā=ā0.02), perceived intensity increased significantly in line with intended intensity (Msā=ā3.04, 3.65, 4.68, and 5.44, with lowā<āmoderateā<āstrongā<āpeak, psā<ā0.001) in trials of incongruent emotion classification, and did so also in the case of incongruent valence classification (pā<ā0.001 for all pairwise comparisons). Cumulatively, the data on intensity ratings show the coherence between expressed and perceived intensity across all tested contrasts.
Paradoxical role of intensity reveals classification sweet spot
Separate ANOVAs were computed to assess the effect of valence, emotion, and emotion intensity on classification accuracy in Experiment 1 (Fig.Ā 3c). Classification accuracy differed between intended emotions, F(5, 145)ā=ā20.41, pā<ā0.001, Ī·p2ā=ā0.26 and intensity levels, F(3, 87)ā=ā81.66, pā<ā0.001, Ī·p2ā=ā0.13 The interaction Emotion Ć Intensity was significant, F(15, 435)ā=ā39.70, pā<ā0.001, Ī·p2ā=ā0.31. Four (anger, pleasure, pain, surprise) out of six emotions featured lower classification accuracy for peak compared to strong, moderate, and low intensity (anger, peakā<ālow, pā=ā0.004, pleasure, peakā<ālow, pā=ā0.02; pā<ā0.001 for all other peakā<ālow, moderate, strong). The opposite pattern was shown for achievement (psā<ā0.001), whereas accuracy for fear was uniform across intensity levels.
In a Valence Ć Intensity rmANOVA, no main effect of valence on classification accuracy was found, F(1, 29)ā=ā2.95, pā=ā0.10. Again, the main effect of intensity was significant, pā<ā0.001, Ī·p2ā=ā0.28. Planned comparisons confirmed the pattern shown in Fig.Ā 3c: Accuracy was highest for strong intensity expressions, Mā=ā57.78%, 95% CI [55.17, 60.45], which were not significantly different from moderate intensity expressions, Mā=ā54.17%, [51.5, 56.84] (pā=ā0.053). The decrease in accuracy from moderate to low intensity (Mā=ā49.31%, [46.64, 51.98]) was significant (pā=ā0.004). Classification accuracy for peak intensity was lower than for strong, moderate, and low intensity Mā=ā43.11%, [40.44, 45.78], psā<ā0.001. The interaction between valence and intensity (F(3,87)ā=ā5.21, pā=ā0.002, Ī·p2ā=ā0.02) corresponded to a significant difference in accuracy between low and moderate intensity only for positive but not negative expressions, along with a significant drop in accuracy for peak compared to all other intensity levels for expressions of either valence (Fig.Ā 3c).
In parallel to Experiment 1, separate ANOVAs were computed to assess the effect of valence, emotion, and emotion intensity on classification accuracy (Fig.Ā 3d). The Emotion Ć Intensity rmANOVA revealed significant differences between emotions, F(5, 145)ā=ā21.29, pā<ā0.001, Ī·p2ā=ā0.23, intensities, F(3, 87)ā=ā75.28, pā<ā0.001, Ī·p2ā=ā0.13, and an interaction, F(15, 435)ā=ā32.83, pā<ā0.001, Ī·p2ā=ā0.33. Post hoc comparisons of the interaction replicated the pattern obtained in Experiment 1.
For the Valence Ć Intensity rmANOVA, no main effect of valence on classification accuracy was found, F(1, 29)ā=ā1.59, pā=ā0.22. The main effect of intensity was significant, pā<ā0.001, Ī·p2ā=ā0.32, as well as the interaction of valence and intensity, F(3, 87)ā=ā3.45, pā=ā0.044, Ī·p2ā=ā0.03. Accuracy was lower for low (Mā=ā58.14%, 95% CI [55.81, 60.47]) compared to moderate intensity (Mā=ā62.22%, [59.89, 64.55], pā=ā0.007), and lower for peak (Mā=ā51.08%, [48.75, 53.41] compared to strong intensity (Mā=ā65.14%, [62.81, 67.47], pā<ā0.001), whereas no difference was found between moderate and strong intensity (pā=ā0.095). The significant interaction stems from higher derived accuracy for expressions of negative valence at the outer intensity levels (low: positive, Mā=ā55.94%, [52.99, 58.90], negative, Mā=ā60.33%, [57.38, 63.29], pā=ā0.014; peak: positive, Mā=ā48.83%, 95% CI [45.88, 51.79], negative, Mā=ā53.33%, [50.38, 56.29], pā=ā0.012), but no significant differences at the centered intensity levels.
A second comparison across the two tasks examined how well listeners could distinguish positive from negative expressions. Valence classification accuracy (derived from forced choice judgements in Experiment 1), like emotion categorization, followed a paradoxical pattern. Accuracy was lower at peak (Mā=ā68.47%) compared to strong (Mā=ā76.42%, pā<ā0.001) and moderate intensity (Mā=ā73.78%, pā=ā0.02), yet peak and low (Mā=ā72.61%) did not differ significantly (pā=ā0.11). The highest valence confusion occurred at peak intensity expressions of positive valence, where the classification accuracy of 53.67% was only marginally above chance (50%), t(29)ā=ā1.86, pā=ā0.04, dā=ā0.34. In Experiment 2, correct valence classification dropped significantly for peak (Mā=ā75.56%) compared to low (Mā=ā82.19, pā=ā0.001), moderate (Mā=ā82.42, pā<ā0.001%, pā=ā0.004), and strong intensity (Mā=ā82.67, pā<ā0.001). Again, congruency of expressed and perceived valence was lowest for positive peak emotional states (63.11%).
Valence and arousal ratings differ
FigureĀ 4 depicts the two-dimensional space of mean valence and arousal ratings for each stimulus in the dimensional rating task (Fig.Ā 1). The U-shaped distribution of affective valence and arousal can be described by a significant quadratic fit, yā=ā0.34x2Ā āĀ 2.83xā+ā10, R2adjā=ā0.23, F(2, 477)ā=ā72.60, pā<ā0.001. This relationship is characterized by higher ratings in arousal for sounds which are rated as either highly pleasant or highly unpleasant. In addition, the relationship in our sample is asymmetrical: Negatively rated stimuli show higher arousal ratings (Mā=ā4.82) than positively rated stimuli (Mā=ā4.56), confirmed by a significant Wilcoxon test (zā=āāĀ 2.69, pā=ā0.007).
While arousal ratings increased from low to peak intensity (Msā=ā3.63 95%, CI [3.41, 3.85]ā>ā4.33, [4.11, 4.55])ā>ā5.13, [4.92, 5.35]ā>ā5.79, [5.57, 6.01], psā<ā0.001), the pattern of valence ratings showed interesting confusion and variation in participantsā agreement (Fig.Ā 4). The number of expressions perceived as negative (299, average ratingā<ā4) and positive (181, average ratingāā„ā4) deviated significantly from the balanced number of stimuli per expressed valence (X2 (1, Nā=ā480)ā=ā183.91, pā<ā0.001). A factorial logistic regression quantified the effect of expressed valence and intensity on congruent or incongruent valence rating. Positive expressions, especially high and peak intensity expressions, were more likely to be rated of negative valence (strong, zā=āāĀ 2.08, pā=ā0.04; peak, zā=āāĀ 3.13, pā=ā0.002), accounting for the higher number of stimuli perceived as negative.
Discussion
Three experiments show that listeners are remarkably good at inferring meaning from variably intense nonverbal vocalizations. Yet their ability to do so is affected by the expressed emotional intensity. We demonstrate a complex relationship between intensity and inferred affective state. Whereas both intensity and arousal are perceived coherently over the range of expressed intensities, the facilitatory effect of increasing intensity on classifying valence and emotion category plateaus at strong emotions. Remarkably, peak emotions are the most ambiguous of all. We call this the āemotion intensity paradoxā. Our results suggest that value (i.e., valence and emotion category) cannot be retrieved easily from peak emotion expressions. However, arousal and emotion intensity are clearly perceivable in peak expressions.
In addition to the reported parabolic relationship of emotional intensity to classification accuracy, overall accuracy scores of individual emotions, although above chance, were far from perfect and in fact relatively low compared to previous research1,3,34. A direct comparison of accuracy scores across studies should be treated with caution, as, for example, the number of emotion categories varies across studies, and so does their intensityāhere shown systematically to affect emotion classification. Furthermore, substantial differences exist in the tested stimulus sets themselves, that is stimulus production and selection procedures as well as stimulus sources (i.e., studio-produced or real-life). One speculative, but interesting possibility is that the lower convergence observed in our data reflects the heterogeneity allowed for in the stimulus material.
The data are incompatible with the view of diagnostic emotion expression suggested by basic emotion theories36,37. Likewise, the data challenge the conception that valence and arousal are equivalent elements in the composition of core affect15,16. Future work will need to investigate whether valence and arousal really share the same level of representation. Information on arousal is already available at early processing stages38,39,40,41 and may serve as an attention-grabbing filter, ensuring the detection of biological relevance in the most extreme cases. Valuation likely constitutes a more complex process, perhaps secondary in peak emotion.
We exploited a new database of human vocal emotion expressions (http://doi.org/10.5281/zenodo.4066235), systematically manipulating emotion intensity. In line with previous research3,4, the data underscore that emotion intensity constitutes a prominent property of vocal emotion communication. In our population of listeners (the cultural relativity of vocal emotion perception is discussed e.g., in Ref.18), we report compelling specific effects of intensity. Forced choice judgements and emotion ratings both revealed an inverted U pattern: The expressed emotion was most accurately classified for moderate and strong intensity expressions; low intensity expressions were frequently confused, and the least accurately classified were peak intensity expressions. The higher ambiguity of peak states was reflected in both lower valence and lower emotion classification accuracy. At the most extreme instances of emotion, the evaluation of āaffective semanticsā, i.e., valence and emotion type, is constrained by an ambiguous perceptual representation.
We find that peak emotion is not per se ambiguous. Arousal and intensity of emotion expressions are perceived clearly across the range of expressed intensities, including peak emotion (e.g. Fig.Ā 4). Notably, we find that the intensity of the expressions is accurately perceived even if other affective features, such as valence and emotion category, prove ambiguous.
In other words, for a given expression, despite the unreliable identification of the affective semantics, the relevance of the signal is readily perceived, through the unambiguous representation of arousal and intensity. Taken together, extremely intense expressions seem to convey less information on the polarity (positive or negative, triumph or anger), though their indication of ārelevanceā remains unaltered. We speculate that this central representation of āalarmingnessā, i.e., biological relevance of highly intense expressions, comes at the cost of other affective semantics, including valence and type of affective state. The latter might rely on contextual cues, underlining the role of top-down modulations and higher-order representations of emotional states12,42,43.
In nonverbal vocalizations, the effects of increased emotional intensity and arousal have been linked to acoustic characteristics that attract attention25,27,44. Screams, for example, have spectro-temporal features irrelevant for linguistic, prosodic, or speaker identity information, but dedicated to alarm signals. The corresponding unpleasant acoustic percept, roughness, correlates with how alarming the scream is perceived and how efficiently it is appraised26. One hypothesis that arises is that information is prioritized differently as a function of emotion intensity. At peak intensity, the most vital job is to detect ābigā events. A salient, high arousal signal may serve as an attention-grabbing filter in a first step, and affective semantic evaluation may follow. In contrast, intermediate intensity signals do not necessarily elicit or require immediate action and can afford a more fine-grained analysis of affective meaning. A possible neurobiological implementation is that information is carried at different timescales, and ultimately integrated in a neural network underlying affective sound processing39,41,45. Concurrent functional pathways allow a rapid evaluation of relevance for vocal emotions of any valence, occurring at early processing stages and via fast processing routes38,39,40,41,46,47. Though perceptually unavailable, it might well be that information is objectively present in the signal, as has been shown for facial and body cues of extreme emotion11. The conjecture that a similar pattern might also exist in vocal peak emotion would resonate with the interpretation of the findings as temporally masked affective value and emotion information through the central representation of salienceāvia arousal and emotion intensity.
Materials and methods
Study design
Stimuli
The stimuli are 480 nonverbal vocalizations, representing the Core Set of a validated corpus48. The database comprises six affective states (three positive and three negative) at four different intensity levels (low, moderate, strong, and peak emotion intensity; note that in this text, the term āintensityā exclusively refers to the emotional intensity, i.e., the variation from a very mildly sensed affective state to an extremely intense affective state and should not be confused with the auditory perception of signal intensity as loudness). The six affective statesāachievement/triumph, anger, fear, pain, positive surprise, sexual pleasure-represent a suitable, well-studied sample of affective states for which variations in emotion intensity have previously been described3,4,10.
Vocalizations were recorded at the Berklee College of Music (Boston, MA). Ten female speakers, all non-professional actors, were instructed to produce emotion expressions as spontaneously and genuinely as possible. No restrictions were imposed on the specific sounds speakers should produce, only that vocalizations should have no verbal content as in words (e.g., āyesā) or interjections (e.g., āouchā). Following a technical validation, the Core Set was developed as fully crossed stimulus sample based on authenticity ratings. Stimuli were recorded with a sampling rate of 44.1-kHz (16-bit resolution). Sound duration ranges from 400 to 2000Ā ms.
Participants
A total of ninety participants were recruited through the Max-Planck-Institute for Empirical Aesthetics (MPIEA), Frankfurt. Thirty participants were assigned to the emotion categorization task (Mā=ā28.77Ā years old, SDā=ā9.46; 16 self-identified as women, 14 as men); thirty (Mā=ā28.53Ā years old, SDā=ā8.62; 15 self-identified as women, 14 as men, 1 as nonbinary) to the emotion rating task; and thirty participants (Mā=ā24.37Ā years old, SDā=ā4.80; 15 self-identified as women, 15 as men) to the dimensional rating task (Fig.Ā 1). Our sample size was based on previous research3,23, and a power analysis in G*Power49 confirmed that our sample size (Nā=ā30 each) would allow us to detect an effect as small as Ī·p2ā=ā0.005 (Cohenās fā=ā0.06) with a power of 0.80. The experimental procedures were approved by the Ethics Council of the Max Planck Society. Experiments were performed in accordance with relevant named guidelines and regulations. Participants provided informed consent before participating and received financial compensation. All participants were native speakers of German, reported normal hearing, and no history of psychiatric or neurological illnesses.
Procedure
The studies took place at MPIEA. The 480 stimuli were presented using Presentation (Version 20.0) software (www.neurobs.com), through DT 770 Pro Beyerdynamic headphones. Sound amplitude was calibrated to a maximum of 90.50Ā dB(A), resulting in 43Ā dB(A) for the peak amplitude in the quietest sound file. Each stimulus was presented once in pseudorandomized order. No feedback regarding response accuracy was provided.
Emotion categorization task (Experiment 1)
Participants were asked to assign one of seven possible response options to each vocalization: the German emotion labels for anger (Ćrger), fear (Angst), pain (Schmerz), achievement (Triumph), positive surprise (Positive Ćberraschung), and sexual pleasure (Sexuelle Lust), plus a ānone of the specified emotionsā option (Keines der Genannten). Next, participants were asked to indicate how intensively they believed the speaker had experienced the emotional state, reaching from minimally intense (āminimalā) to maximally intense (āmaximalā). Finally, participants indicated how authentic they perceived the expression, from not at all (āgar nichtā) authentic to fully (āvollkommenā) authentic. The order of the 7AFC and intensity rating tasks was counterbalanced across participants. After the authenticity rating, the next stimulus was played automatically.
Emotion rating task (Experiment 2)
Participants completed ratings for each emotion. They were instructed to indicate how clearly they perceived the specified emotion in the expression. A judgement from not at all (āgar nichtā) to completely (āvƶlligā) was performed on each of the simultaneously presented scales. Thereby, from none to all emotions could be identified to various extents in each vocalization. As in the categorization task, emotion intensity was rated. The order of emotion ratings and emotion intensity ratings was counterbalanced across participants.
Dimensional rating task (Experiment 3)
Participants were asked to judge each stimulus on the dimensions valence and arousal. Valence was rated from negative to positive and arousal from minimal to maximal. The scales were presented subsequently on individual screens, in counterbalanced order across participants. Authenticity judgements were performed in the same format as described for the categorization task.
Statistical analysis
All statistical analyses and data visualizations were performed using R Studio.
We refer to āclassification accuracyā as the consistency of speaker intention and listener perceptual judgements. In Experiment 1, this corresponds to the percentage of correct classification of emotions. A measure of accuracy was also obtained from the emotion ratings performed in Experiment 2 by defining the response as a match whenever the highest of the ratings was provided on the intended emotion scale, and a miss whenever rated lower on the intended than on any other scale. As additional indices that take into account response biases, we report unbiased hit rates, differential accuracy, false alarm rates, and detailed confusion matrices of the response data in the Supplemental Information.
Intensity ratings and classification accuracy were tested with repeated measures analyses of variance (rmANOVA) to assess the effects of affective stimulus properties (i.e., valence, emotion category, and emotion intensity) and their interactions. Normality was screened; sphericity was assessed using Mauchlyās sphericity tests. When sphericity could not be assumed (pā<ā0.001), GreenhouseāGeisser corrections were applied. For readability, we report uncorrected degrees of freedom and adjusted p values. Pairwise comparisons were adjusted using the Tukeyās HSD correction in the emmeans package50.
Authenticity ratings are reported and discussed in the Supplemental Materials. Perceptual judgements for each stimulus are available at https://osf.io/jmh5t/.
Code availability
The data is available via the Open Science FrameworkĀ atĀ https://osf.io/jmh5t/. All stimuli are openly available at http://doi.org/10.5281/zenodo.4066235.
References
BƤnziger, T., Mortillaro, M. & Scherer, K. R. Introducing the Geneva multimodal expression corpus for experimental research on emotion perception. Emotion 12, 1161ā1179. https://doi.org/10.1037/a0025827 (2012).
Hess, U., Blairy, S. & Kleck, R. E. The intensity of emotional facial expressions and decoding accuracy. J. Nonverbal Behav. 21, 241ā257. https://doi.org/10.1023/a:1024952730333 (1997).
Juslin, P. N. & Laukka, P. Impact of intended emotion intensity on cue utilization and decoding accuracy in vocal expression of emotion. Emotion 1, 381ā412. https://doi.org/10.1037/1528-3542.1.4.381 (2001).
Livingstone, S. R. & Russo, F. A. The Ryerson audio-visual database of emotional speech and song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 13, e0196391. https://doi.org/10.1371/journal.pone.0196391 (2018).
Wingenbach, T. S., Ashwin, C. & Brosnan, M. Validation of the Amsterdam Dynamic Facial Expression Set-Bath Intensity Variations (ADFES-BIV): A set of videos expressing low, intermediate, and high intensity emotions. PLoS ONE 11, e0147112. https://doi.org/10.1371/journal.pone.0147112 (2016).
Ekman, P. Methods for measuring facial action. In Handbook of Methods in Nonverbal Behavior Research (eds. Scherer, K. R. & Ekman, P.) 45ā90 (Cambridge University Press, 1982).
Ekman, P. Expression and the nature of emotion. In Approaches to Emotion (eds. Scherer, K. R. & Ekman, P.) 319ā344 (Erlbaum Associates, 1984).
Russell, J. A. A circumplex model of affect. J. Pers. Soc. Psychol. 39, 1161ā1178. https://doi.org/10.1037/h0077714 (1980).
Plutchik, R. The Psychology and Biology of Emotion (HarperCollins College Publishers, 1994).
Aviezer, H., Trope, Y. & Todorov, A. Body cues, not facial expressions, discriminate between intense positive and negative emotions. Science 338, 1225ā1229. https://doi.org/10.1126/science.1224313 (2012).
Aviezer, H. et al. Thrill of victory or agony of defeat? Perceivers fail to utilize information in facial movements. Emotion 15, 791ā797. https://doi.org/10.1037/emo0000073 (2015).
Atias, D. et al. Loud and unclear: Intense real-life vocalizations during affective situations are perceptually ambiguous and contextually malleable. J. Exp. Psychol. Gen. 148, 1842ā1848. https://doi.org/10.1037/xge0000535 (2019).
Aviezer, H., Ensenberg, N. & Hassin, R. R. The inherently contextualized nature of facial emotion perception. Curr. Opin. Psychol. 17, 47ā54. https://doi.org/10.1016/j.copsyc.2017.06.006 (2017).
Israelashvili, J., Hassin, R. R. & Aviezer, H. When emotions run high: A critical role for context in the unfolding of dynamic, real-life facial affect. Emotion 19, 558ā562. https://doi.org/10.1037/emo0000441 (2019).
Barrett, L. F. & Bliss-Moreau, E. Affect as a psychological primitive. Adv. Exp. Soc. Psychol. 41, 167ā218. https://doi.org/10.1016/S0065-2601(08)00404-8 (2009).
Russell, J. A. Core affect and the psychological construction of emotion. Psychol. Rev. 110, 145ā172. https://doi.org/10.1037/0033-295X.110.1.145 (2003).
Bryant, G. A. et al. The perception of spontaneous and volitional laughter across 21 societies. Psychol. Sci. 29, 1515ā1525. https://doi.org/10.1177/0956797618778235 (2018).
Gendron, M., Roberson, D., van der Vyver, J. M. & Barrett, L. F. Cultural relativity in perceiving emotion from vocalizations. Psychol. Sci. 25, 911ā920. https://doi.org/10.1177/0956797613517239 (2014).
Bachorowski, J. A. Vocal expression and perception of emotion. Curr. Dir. Psychol. Sci. 8, 53ā57. https://doi.org/10.1111/1467-8721.00013 (1999).
Patel, S., Scherer, K. R., Bjƶrkner, E. & Sundberg, J. Mapping emotions into acoustic space: the role of voice production. Biol. Psychol. 87, 93ā98. https://doi.org/10.1016/j.biopsycho.2011.02.010 (2011).
Gobl, C. & Chasaide, A. N. The role of voice quality in communicating emotion, mood and attitude. Speech Commun. 40, 189ā212. https://doi.org/10.1016/S0167-6393(02)00082-1 (2003).
Scherer, K. R. Vocal affect expression: A review and a model for future research. Psychol. Bull. 99, 143ā165. https://doi.org/10.1037/0033-2909.99.2.143 (1986).
Anikin, A. & Persson, T. Nonlinguistic vocalizations from online amateur videos for emotion research: A validated corpus. Behav. Res. Methods 49, 758ā771. https://doi.org/10.3758/s13428-016-0736-y (2017).
Juslin, P. N. Vocal affect expression: problems and promises. In Evolution of Emotional Communication (eds. AltenmĆ¼ller, E., Schmidt, S. & Zimmermann, E.) 252ā273 (Oxford University Press, 2013).
Anikin, A. The link between auditory salience and emotion intensity. Cogn. Emot. 34, 1246ā1259. https://doi.org/10.1080/02699931.2020.1736992 (2020).
Arnal, L. H., Flinker, A., Kleinschmidt, A., Giraud, A. L. & Poeppel, D. Human screams occupy a privileged niche in the communication soundscape. Curr. Biol. 25, 2051ā2056. https://doi.org/10.1016/j.cub.2015.06.043 (2015).
Raine, J., Pisanski, K., Simner, J. & Reby, D. Vocal communication of simulated pain. Bioacoustics 28, 404ā426. https://doi.org/10.1080/09524622.2018.1463295 (2019).
Belin, P. & Zatorre, R. J. Neurobiology: Sounding the alarm. Curr. Biol. 25, R805āR806. https://doi.org/10.1016/j.cub.2015.07.027 (2015).
Blumstein, D. T. & Recapet, C. The sound of arousal: The addition of novel non-linearities increases responsiveness in marmot alarm calls. Ethology 115, 1074ā1081. https://doi.org/10.1111/j.1439-0310.2009.01691.x (2009).
Charlton, B. D., Watchorn, D. J. & Whisson, D. A. Subharmonics increase the auditory impact of female koala rejection calls. Ethology 123, 571ā579. https://doi.org/10.1111/eth.12628 (2017).
HechavarrĆa, J. C., Beetz, M. J., GarcĆa-Rosales, F. & Kƶssl, M. Bats distress vocalizations carry fast amplitude modulations that could represent an acoustic correlate of roughness. Sci. Rep. 10, 1ā20. https://doi.org/10.1038/s41598-020-64323-7 (2020).
Reby, D. & Charlton, B. D. Attention grabbing in red deer sexual calls. Anim. Cogn. 15, 265ā270. https://doi.org/10.1007/s10071-011-0451-0 (2012).
Trevor, C., Arnal, L. H. & FrĆ¼hholz, S. Terrifying film music mimics alarming acoustic feature of human screams. J. Acoust. Soc. Am. 147, EL540āEL545. https://doi.org/10.1121/10.0001459 (2020).
Lima, C. F., Castro, S. L. & Scott, S. K. When voices get emotional: A corpus of nonverbal vocalizations for research on emotion processing. Behav. Res. Methods 45, 1234ā1245. https://doi.org/10.3758/s13428-013-0324-3 (2013).
R Core Team. A Language and Environment for Statistical Computing. (R Found. Stat. Comput., 2020).
Ekman, P. & Cordaro, D. What is meant by calling emotions basic. Emot. Rev. 3, 364ā370. https://doi.org/10.1177/1754073911410740 (2011).
Izard, C. E. Emotion theory and research: Highlights, unanswered questions, and emerging issues. Annu. Rev. Psychol. 60, 1ā25. https://doi.org/10.1146/annurev.psych.60.110707.163539 (2009).
FrĆ¼hholz, S., Trost, W. & Grandjean, D. The role of the medial temporal limbic system in processing emotions in voice and music. Prog. Neurobiol. 123, 1ā17. https://doi.org/10.1016/j.pneurobio.2014.09.003 (2014).
FrĆ¼hholz, S., Trost, W. & Kotz, S. A. The sound of emotionsāTowards a unifying neural network perspective of affective sound processing. Neurosci. Biobehav. Rev. 68, 96ā110. https://doi.org/10.1016/j.neubiorev.2016.05.002 (2016).
Sauter, D. A. & Eimer, M. Rapid detection of emotion from human vocalizations. J. Cogn. Neurosci. 22, 474ā481. https://doi.org/10.1162/jocn.2009.21215 (2010).
Schirmer, A. & Kotz, S. A. Beyond the right hemisphere: Brain mechanisms mediating vocal emotional processing. Trends Cogn. Sci. 10, 24ā30. https://doi.org/10.1016/j.tics.2005.11.009 (2006).
Barrett, L. F. The theory of constructed emotion: An active inference account of interoception and categorization. Soc. Cogn. Affect. Neurosci. 12, 1ā23. https://doi.org/10.1093/scan/nsw154 (2017).
LeDoux, J. E. & Brown, R. A higher-order theory of emotional consciousness. Proc. Natl. Acad. Sci. U. S. A. 114, E2016āE2025. https://doi.org/10.1073/pnas.1619316114 (2017).
Arnal, L. H., Kleinschmidt, A., Spinelli, L., Giraud, A. L. & MĆ©gevand, P. The rough sound of salience enhances aversion through neural synchronisation. Nat. Commun. 10, 3671. https://doi.org/10.1038/s41467-019-11626-7 (2019).
Boemio, A., Fromm, S., Braun, A. & Poeppel, D. Hierarchical and asymmetric temporal sensitivity in human auditory cortices. Nat. Neurosci. 8, 389ā395. https://doi.org/10.1038/nn1409 (2005).
Fecteau, S., Belin, P., Joanette, Y. & Armony, J. L. Amygdala responses to nonlinguistic emotional vocalizations. Neuroimage 36, 480ā487. https://doi.org/10.1016/j.neuroimage.2007.02.043 (2007).
Sander, D., Grafman, J. & Zalla, T. The human amygdala: An evolved system for relevance detection. Rev. Neurosci. 14, 303ā316. https://doi.org/10.1515/REVNEURO.2003.14.4.303 (2003).
Holz, N., Larrouy-Maestri, P. & Poeppel, D. The variably intense vocalizations of affect and emotion corpus (VIVAE). Zenodo. https://doi.org/10.5281/zenodo.4066234 (2020).
Faul, F., Erdfelder, E., Lang, A. G. & Buchner, A. G* Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behav. Res. Methods 39, 175ā191. https://doi.org/10.3758/BF03193146 (2007).
Lenth, R., Singmann, H., Love, J., Buerkner, P. & Herve, M. Emmeans: Estimated marginal means, aka least-squares means. R package version 1.3.4 [software] (2018).
Acknowledgements
We thank R. Muralikrishnan, Sarah Brendecke, Dominik Thiele, and Cornelius Abel for assistance.
Funding
Open Access funding enabled and organized by Projekt DEAL.
Author information
Authors and Affiliations
Contributions
All authors contributed equally to the study design. N.H. conducted the research and analyzed the data. N.H. drafted the manuscript, and P.L.-M. and D.P. provided critical revisions. All authors approved the final version of the manuscript for submission.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Holz, N., Larrouy-Maestri, P. & Poeppel, D. The paradoxical role of emotional intensity in the perception of vocal affect. Sci Rep 11, 9663 (2021). https://doi.org/10.1038/s41598-021-88431-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-021-88431-0
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.