Information Content of Prefrontal Cortex Activity Quantifies the Difficulty of Narrated Stories

Keshmiri, Soheil; Sumioka, Hidenobu; Yamazaki, Ryuji; Shiomi, Masahiro; Ishiguro, Hiroshi

doi:10.1038/s41598-019-54280-1

Download PDF

Article
Open access
Published: 29 November 2019

Information Content of Prefrontal Cortex Activity Quantifies the Difficulty of Narrated Stories

Scientific Reports volume 9, Article number: 17959 (2019) Cite this article

1577 Accesses
1 Citations
Metrics details

Subjects

Abstract

The ability to realize the individuals’ impressions during the verbal communication allows social robots to significantly facilitate their social interactions in such areas as child education and elderly care. However, such impressions are highly subjective and internalized and therefore cannot be easily comprehended through behavioural observations. Although brain-machine interface suggests the utility of the brain information in human-robot interaction, previous studies did not consider its potential for estimating the internal impressions during verbal communication. In this article, we introduce a novel approach to estimation of the individuals’ perceived difficulty of stories using the quantified information content of their prefrontal cortex activity. We demonstrate the robustness of our approach by showing its comparable performance in face-to-face, humanoid, speaker, and video-chat settings. Our results contribute to the field of socially assistive robotics by taking a step toward enabling robots determine their human companions’ perceived difficulty of conversations, thereby enabling these media to sustain their communication with humans by adapting to individuals’ pace and interest in response to conversational nuances and complexity.

Precuneus brain response changes differently during human–robot and human–human dyadic social interaction

Article Open access 30 August 2022

How attitudes generated by humanoid robots shape human brain activity

Article Open access 09 October 2020

Robots facilitate human language production

Article Open access 18 August 2021

Introduction

Gone are the days when robots were sitting on the factory floors to perform tasks whose instructions were hardcoded in a great detail flawlessly. As the field of robotics matures, human society witnesses a growing integration of these media in individuals’ daily lives and activities. In fact, today’s robotics is less about assisting humans perform their physical tasks and more about facilitating their social interaction¹. This observation is evident in growing adaptation of these media in such broad social domains as early child education^2,3 and elderly care^4,5. Pivotal to these applications is the ability for these agents to engage in social interaction⁶ and therefore solutions to such hard problems as learning the social norms and dynamics form the foundation for enabling robots understand the intentions of their human companion, thereby allowing them to achieve a sustainable long-term interaction.

However, obtaining these abilities by only observing the human behaviour is not sufficient considering the fact that behavioural cues can often be interpreted in different ways which is even more so during a verbal communication. For example, a frowning face during a conversation can be construed as sign of attentiveness or it may signal the person’s difficulty in following its content. Interestingly, deciphering such cues becomes even harder once we take into account the ability of individuals to disguise their emotions (e.g., smiling in a stressful situation). In the same vein, the fact that such reactions are highly subjective (i.e., they differ from person to person) makes facial analysis approaches⁷ fall short in decoding the cues associated with verbal communication: shy students that find a lecture difficult can shun the clarifying questions by acting as if they were following the lecture.

On the other hand, the human brain as the source of these subjective and internalized states can provide more reliable information about them. The brain activity cannot be easily suppressed or manipulated and therefore the information that is reflected in such activities has the potential to quantify individuals’ feelings. However, despite substantial progress in utilization of the brain information in such applications as brain machine interface (BMI)⁸ and human-robot interaction (HRI)^9,10, there appears to be a paucity of research (to the best of our knowledge) on the use of brain information for estimating individuals’ assessment of the verbal communication (e.g., its difficulty) during HRI. Robots can particularly benefit from such an ability while interacting with overstressed persons and individuals with selective mutism disorders.

In this article, we propose to estimate the individuals’ perceived difficulty of a verbal communication by quantifying the information content of their prefrontal cortex (PFC) activity. We use the term “perceived difficulty” to refer to the cognitive load that a person’s PFC may endure during such tasks as language processing¹¹, social cognition¹², and story comprehension^13,14,15. In this respect, functional imaging techniques in conjunction with such tasks as mental arithmetic (MA)¹⁶ and n-back¹⁷ have enabled researchers to shed light on PFC functioning and change in its activity in response to varying cognitive load^16,17,18. Considering these findings, we expect that these tasks can also be useful in estimating the perceived difficulty of a more cognitively demanding task like verbal communication.

We also introduce a novel information-theoretic approach for quantification of such cognitive loads. The choice of information is motivated by the following three observations. First, information is an unbiased measure of association between interacting processes¹⁹ (e.g., change in the brain activity in response to varying task’s difficulty) and hence an attractive choice for brain mapping²⁰ as well as the modeling of its inherent complexity^21,22,23,24. Second, information allows for a more robust handling of such confounders as residual brain activity prior to the start of the task period (also known as resting state²⁵). This ability plays a central role in preventing an overestimation of the cognitive load due to the prior brain activity that is not induced by the task. Third, brain activation can take place in a shorter time span (i.e., faster) in case of easier than more difficult tasks²⁶ and therefore its differential activities can simply be averaged out^27,28,29 if the incurred variability is not accounted for^30,31,32. This hinders the ability to differentiate the activation patterns that are induced by tasks with varying level of difficulty. On the other hand, information of a continuous random variable is a function of the variance than the mean³³, p. 182] and therefore can preserve the variability of such time series as brain activity³⁴. This, in turn, makes information well-suited for scenarios in which higher variability in individuals’ brain responses is expected (e.g., verbal communication). Therefore, we expect that an information-based quantification of the individuals’ PFC activation to form a reliable biomarker for measuring the cognitive load that is associated with the difficulty of a verbal communication.

In our approach, we first determine a decision boundary that distinguishes between the incurred cognitive loads by one- and two-back auditory tasks on individuals’ PFC (acquired by near-infrared spectroscopy (NIRS)). In this task, the participants are instructed to respond (e.g., through mouse clicks) to the repeated patterns in numerical sequences (e.g., sequential i.e., n = 1 or one-back and every-other repetition i.e., n = 2 or two-back) that are presented auditorily. We use n-back in our study due to its demonstrated ability in inducing differential cognitive load on PFC³⁵ as well as its utility in quantifying the PFC activity in response to individuals’ change in mood^18,36. Then in a realtime storytelling scenario, we use this boundary to estimate the individuals’ perceived difficulty of narrated stories, thereby interpreting their perceived difficulty of stories based on induced cognitive load by n-back auditory task.

Our contributions are threefold. First, we introduce a novel information-theoretic approach to quantification of the induced cognitive load on PFC. We present the effectiveness of our approach in quantification of the PFC activity during a WM task. Our results show a substantial improvement on the previous findings^17,37. Second, We demonstrate the utility of our approach in estimation of the individuals’ perceived difficulty of the verbally communicated content in a humanoid-mediated storytelling scenario. Third, we provide evidence for robustness of our approach through comparative analysis of its performance in face-to-face, humanoid, speaker, and video-chat system media settings.

In our view, the use of brain information can advance the HRI research on modeling of the human behaviour by providing invaluable information about mechanisms that underlie human behavioural responses. For instance, brain activity can be used as neurophysiological feedbacks about individuals’ mental states⁶ in multimodal modeling of human behaviour³⁸. This, in turn, can open a new venue for formal analysis of a robotic ToM³⁹ that (in addition to behavioural observations) builds upon critical implications of the humans’ neurological responses during interaction with their synthetic companions.

Methods

Our approach comprises of three steps: (A) information-theoretic formulation of the cognitive load (CL), (B) determination of a decision boundary that identifies the CL quantities that are uniquely associated with differing WM task’s level, (C) realtime estimation of the perceived difficulty of the verbally communicated content. In what follows, we explain each step in details.

Information-theoretic formulation of cognitive load

This step consists of two components: the “quantification of the cognitive load” that estimates the induced cognitive load on the PFC in response to external stimuli per estimation step and the “constrained updating of the induced cognitive load” that dictates an update rule to reduce the effect of the PFC activity’s fluctuations on such a quantification.

Quantification of the cognitive load

Let X_τ represent the time series associated with the task period’s PFC activity at estimation step τ. Let B be the baseline (i.e., resting state) time series that represent the frontal brain activity prior to the start of the task. Furthermore, let H(X_τ) represent the entropy of X_τ (i.e., its average information content). Although H(X_τ) quantifies the PFC’s cognitive load (CL)⁴⁰ at τ, it is an overestimation of CL if PFC’s residual activity that is carried over from the resting period is not attenuated. It is also crucial to observe that such a residual effect cannot be attenuated by mere subtraction of the expected resting state’s activity (i.e., μ_B) from X_τ due to the invariant of information to translation [⁴¹, Theorem 8.6.3, p. 253]. Therefore, we quantify the PFC’s cognitive load at estimation step τ i.e., CL(X_τ) through conditioning of the PFC activity at τ with respect to its activity prior to the start of the task:

$$CL({X}_{\tau })=H({X}_{\tau }|B)=H({X}_{\tau })-MI({X}_{\tau };B)$$

(1)

where MI(X_τ; B) represents the mutual information between PFC activity during the task period at estimation step τ and its activation pattern during the resting period.

Constrained updating of the induced cognitive load

Neuroscientific findings imply that the brain activity occurs in sparse transient⁴². In other words, observed brain activities are subject to fluctuation. Considering the direct correspondence between information and the variation^33,40, such a sparsity can directly affect the calculated CL as formulated in Eq. (1) which, in turn, can result in a false belief about the overall task-induced cognitive load on PFC due to the accumulation of such over/underestimations of CL. In other words, fluctuating patterns in PFC activity can result in rapid changes in signal variability whose discrimination from desirable task-induced changes in PFC activation might not be trivial if one only rely on the computed CL. For instance, an increase/decrease in CL at a given time might solely be explained by a short-lived fluctuation and not the effect of the task per se on PFC. In such a scenario, simply following the computed CL can lead to a false conclusion since a small number of such rapid and short-lived incremental/decremental fluctuations can cancel out and average the actual effect of the task on PFC activity.

Above observations identify the need for additional measures to validate the correspondence between potential differences between two consecutive CLs. More importantly, these measures must take into account the pattern of PFC activity associated with these consecutive CLs to verify whether their observed differences are in fact due to a substantial variation than a mere short-lived fluctuation. In other words, they must allow for constraining the observed differences between two consecutive CLs with the level of change in their respective PFC variability.

Interestingly, these fluctuations can conveniently be accounted for through realization of the MI: a measure of the shared information among interacting processes [⁴¹, p. 19 and p. 251]. Specifically, rewriting Eq. (1) as MI(X_τ; B) = H(X_τ)−CL(X_τ) it becomes apparent that an increase/decrease in the cognitive load must, in principle, be accompanied with its corresponding decrease/increase in mutual information between X_τ and B. In fact, if the interacting processes belonged to a well-defined parametric distribution it would have sufficed to solely check for MI(X_τ−1; B) and MI(X_τ; B) to discriminate between potential fluctuations and the legitimate variations in the task-induced cognitive load. However, extent of the brain dynamics that borders with chaotic system⁴³ in conjunction with varying complexity of naturalistic tasks (e.g., conversational nuances and change in difficulty of their contents) do not warrant the utility of such simplifying assumption as imposing a known parametric distribution on observed PFC activity during naturalistic scenarios.

Alternatively, we can verify whether above necessary condition in case of MI is also satisfied at the distribution level of these interacting processes, thereby bypassing any unwarranted assumption on their distributions. This can be achieved by utilization of the Kullback-Leibler divergence (D_KL) that reflects the distance between the distribution of interacting processes [⁴¹, p. 19 and p. 251]. The utility of D_KL is realized by observing that any increase/decrease in MI due to a reduced/increased CL in MI(X_τ; B) = H(X_τ)−CL(X_τ) indeed identifies an increase/decrease in resemblance between their distributions and therefore their reduced/increased divergence.

Therefore, we control for potential fluctuating patterns in PFC during the task performance by evaluating the difference between CL(X_τ−1) and CL(X_τ) through quantification of their respective MI and D_KL with respect to B, thereby constraining the updates of computed PFC’s cognitive load. Concretely, we directly use the result from Eq. (1) if the difference between CL(X_τ−1) and CL(X_τ) is warranted by their MI and D_KL with respect to B or, alternatively, we compensate for the potential fluctuation by averaging CL(X_τ−1) and CL(X_τ), weighted by their variation of information (VI)⁴⁴:

$$CL({X}_{\tau })=\{\begin{array}{c}H({X}_{\tau }|B)\,\,\,\,(MI({X}_{\tau };B)\le MI({X}_{\tau -1};B)\,\,and\,\,{D}_{KL}({X}_{\tau };B) > {D}_{KL}({X}_{\tau -1};B))\\ \,\,\,\,\,\,or\,(MI({X}_{\tau };B) > MI({X}_{\tau -1};B)\,and\,{D}_{KL}({X}_{\tau };B)\le {D}_{KL}({X}_{\tau -1};B))\\ \alpha H({X}_{\tau -1}|B)+\beta H({X}_{\tau }|B)\,(otherwise)\end{array}$$

(2)

and:

$$\alpha =\frac{H({X}_{\tau -1}|{X}_{\tau })}{VI}$$

(3)

$$\beta =\frac{H({X}_{\tau }|{X}_{\tau -1})}{VI}$$

(4)

$$VI=H({X}_{\tau -1})+H({X}_{\tau })-2MI({X}_{\tau -1};{X}_{\tau })$$

(5)

$$=H({X}_{\tau -1}|{X}_{\tau })+H({X}_{\tau }|{X}_{\tau -1})$$

(6)

Decision boundary determination

Let S1 and S2 denote the CLs that correspond to cognitive loads induced by two-level WM tasks. Computing the decision boundary 𝔻 between S1 and S2 is analogous to determining the midpoint between the CL quantities that uniquely fall within the S1 or S2 intervals i.e., the elements that are not members of their overlapping subset. Algorithm 1 outlines this process. It first sorts S1 and S2 in their descending and ascending orders (steps 1 and 2). Next, it finds the smallest CL in S1 that lies within the S2 interval (step 3) and the largest CL in S2 that is within S1 interval (step 4), thereby marking their overlapping partition. Then, it determines the immediate largest CL in S1 and the immediate smallest CL in S2 that are smaller and larger than S1’s and S2’s respective CLs that mark this overlapping partition. Last, it returns the average of these immediate largest and immediate smallest CLs as the decision boundary 𝔻 that separates the CLs associated with the disjoint S1 and S2 sets.

In this article, we utilize n-back auditory task as the WM task. In this case, S1 and S2 correspond to cognitive loads induced by one- and two-back WM tasks, respectively.

Online estimation of the perceived difficulty of conversation

We utilize the calculated decision boundary 𝔻 for online estimation of the individuals’ perceived difficulty of the verbally communicated content. At every estimation step, our model calculates the CL of the current PFC activity time series. At the end of the verbal communication, it computes the median of these computed CLs and determines whether it is above or below the computed decision boundary 𝔻. Subsequently, it marks the individual’s perceived difficulty of the verbally communicated content as “difficult/easy” if this median is above/below 𝔻.

Ethics statement

This study was carried out in accordance with the recommendations of the ethical committee of the Advanced Telecommunications Research Institute International (ATR) with written informed consent from all subjects in accordance with the Declaration of Helsinki. The protocol was approved by the ATR ethical committee (approval code:16-601-1).

Experiments

We conducted two experiments to evaluate the utility of our model. In the first experiment, we verified whether our proposed measure of cognitive load can distinguish the PFC activation in response to low vs. high cognitive loads in one- and two- back WM task. This allowed us to evaluate the ability of our approach for differential quantification of the induced PFC activation in response to these tasks. It also allowed us to determine the decision boundary 𝔻 between differential cognitive loads on the PFC activity which we used in the second experiment.

The second experiment was for verification of the performance of our approach on estimation of the individuals’ perceived difficulty of the verbal communication in a naturalistic setting. For this purpose, we used storytelling as a first step toward decoding of the conversational communication since stories’ scripts can be kept intact and repeated to different individuals without any change in their contents, thereby allowing for the control of such confounders as subtle differences in conveyed information.

The “perceived difficulty” within the context of the first experimental paradigm then refers to the cognitive psychology notion of cognitive load: measurable change in WM capacity in processing information that is associated with controlled tasks that are specifically designed for WM excitation^17,40. In the context of second experiment, on the other hand, it reflects the change in WM capacity at more subjective level (e.g., increase in WM information processing with respect to the change in stories’ difficulty) that is quantified by such fine-grained and well-designed class of WM tasks as n-back.

Experiment 1: Discrimination of differential cognitive load in N-Back WM Task

Purpose

In this experiment, we validated the performance of our approach on quantification of the effect of the WM tasks on PFC activity. Among such tasks, we chose n-back WM task since it forms a better basis for quantification of the verbally communicated contents, considering its effect on PFC³⁵ and its ability in identifying the change in PFC activation in response to individuals’ emotions and change in mood¹⁸.

Participants

Thirty three younger adults (fourteen males and nineteen females, M = 30.96 years, SD = 10.84) participated in this experiment. Data from one male and one female were not recorded properly and were discarded. All participants were free of neurological and psychiatric disorders and had no history of hearing impairment. All experiments were carried out with written informed consents from all subjects. We used a job-offering site for university students to recruit our participants.

Paradigm

It included a seventy-second audio (in Japanese) sequences of numerical (1 through 9) one- and two-back WM tasks. Each session consisted of a one- and a two-back WM tasks. We kept the order as well as content of these WM tasks intact for all the participants. We used a speaker to play the audio sequences of numerical one- and two-back WM tasks to the participants. Every participant completed these two tasks. The participants responded to sequential (i.e., one-back) and every-other (in case of two-back) occurrences of these numerical values through mouse-clicks. We used PsychoPy for generating these audio one- and two-back WM tasks.

Procedure

Every participant first was seated in an armchair with proper head support in a sound-attenuated testing chamber and gave written informed consent in the experimental room. Then, a male experimenter explained the experiment’s full procedure to the participants. This included the total number of tasks in a session (i.e., a one-back followed by a two-back WM tasks), the duration of each task (i.e., seventy seconds per WM task), instructions on WM tasks procedure (i.e., periodic sequential (i.e., one-back) or every-other (i.e., two-back) occurrences of some of the numerical values), instructions on how to respond if the participants detected such reoccurrences (i.e., through the mouse-click at every detection), instructions about the one-minute rest period prior to the actual session (i.e., sitting still with eyes closed), and the content of the audio sequences (i.e., numerical values 1 through 9). The experimenter also asked the participants to stay focused on listening to the one- and two-back sequences that were played back to the participants through a computer speaker and then began the experimental session. Every one- and two-back WM task started by recording a one-minute rest data which was followed by its seventy seconds WM task period. We recorded the participants’ frontal brain activity time series throughout these tasks’ periods (including their respective one-minute resting).

Once the participants were ready, the experimenter asked them to follow the instructions on the computer screen in front of them (Fig. 1(A)). The instructions on the display informed the participants that they participate in a one- and a two-back WM tasks, that each WM task was seventy seconds long, that the task period was proceeded with a one minute resting period during which they needed to close their eyes and relax as much as possible, and that once this resting period was over a voice (recorded voice of the experimenter through the computer speaker) would announce the start of the WM task after which the task would immediately begin. These instructions also provided the participants with an audio-visual example of the task that was about to begin. For instance, in the case of one-back it displayed a short sequence of numbers which were (in sequential fashion) highlighted by a square around the digit that was being read out and explaining how the value that was just read out was related to the value one-back before (Fig. 1(B)). The participants then started with the resting period of the one-back WM task and immediately engaged in this task once the end of the resting period and the start of the WM task was announced by the voice. For the one-back task, there were ten numerical values that were repeated in one-back fashion (Fig. 1(B)). Once one-back was over, the voice announced the end of the task and instructed the participants to follow the on-screen information for two-back task (similar to the one-back but this time the instruction of the task was about the two- than one-back WM task). The participants then started the two-back WM task which began by its one-minute resting period at the end of which the voice asked the participants to open their eyes and that the task was about to start. Two-back WM task also (similar to one-back) included the repetition of ten numerical values in a two-back fashion (Fig. 1(C)). Once the two-back WM task was over, the experimenter removed the NIRS device from the forehead of the participants and guided them out side the experimental room.

Data acquisition

We used functional near infrared spectroscopy (fNIRS) to collect the frontal brain activity of the participants and acquired their NIRS time series data using a wearable optical topography system called “HOT-1000,” developed by Hitachi High-Technologies Corp. (Fig. 2). Participants wore this device on their forehead to record their frontal brain activity through detection of the total blood flow by emitting a wavelength laser light (810 nm) at a 10.0 Hz sampling rate. Data acquisition was carried out through four channels (L1, L3, R1, and R3, Fig. 2). Postfix numerical values that are assigned to these channels specify their respective source-detector distances. In other words, L1 and R1 have a 1.0 cm source-detector distance and L3 and R3 have a 3.0 cm source-detector distance. Note that whereas a short-detector distance of 1.0 cm is inadequate for the data acquisition of cortical brain activity (e.g., 0.5 cm⁴⁵, 1.0 cm⁴⁶, 1.5 cm⁴⁷, and 2.0 cm⁴⁸), 3.0 cm is suitable^45,47. Therefore, we mainly report the result with the data from L3 in present study and the result with R3 is shown in Supplementary Materials (SM).

Data processing

We first attenuated the effect of systemic physiological artefacts⁴⁹ (e.g., cardiac pulsations, respiration, etc.) using a one-degree polynomial Butterworth filter with 0.01 and 0.6 Hz for low and high bandpass which was then followed by performing the linear detrending on the data. We then attenuated the effect of the skin blood flow (SBF) using an eigen decomposition technique⁵⁰. This approach considers the first three principal components of all NIRS recorded channels of the participants’ frontal brain activity during rest period to represent the SBF. Subsequently, it eliminates the SBF effect by removing these three components from participants’ NIRS time series in the task period. Although Sato et al.⁴⁷ suggested that the use of first principal component than first three components appeared to be sufficient for SBF attenuation, Keshmiri et al.⁵¹ demonstrated that the use of first two principal components resulted in both significantly higher SBF attenuation as well as more cortical activity’s information preservation. Therefore, we followed⁵¹ and removed the first two principal components of the respective resting period of the participants from the NIRS time series of their frontal brain activity that was recorded during the task period. It is worth emphasizing that we used the same measurement settings (i.e., same equipment, number of measurement channel, and its position) as in Keshmiri et al.⁵¹ Similar to our NIRS recording, Zhang et al.⁵⁰ also used 3.0 cm source-detector distance channels. Cooper et al.⁵² showed that this filter also attenuates the motion artefact (e.g., head motion).

While quantifying the PFC activation, we used twenty-second NIRS time series segments of participants’ PFC activation with ten-second of overlap between every two consecutive segments to calculate the CLs at every ten-second estimation step. We used our mathematical model in Section 2.1 for CL computation. For the first segment in the task period, we considered its overlap with the last ten seconds of the rest period. We used the last twenty-second of the rest period’s NIRS time series for each participant in Eq. (1). The ten-second estimation step resulted in seven CLs in case of one- and two-back WM tasks (per task).

Analysis

First, we computed the medians (per participant per task) of the CLs for the one- and the two-back WM tasks. Then, we applied Wilcoxon rank sum on these medians to determine the differential significance between these WM tasks’ CLs. Next, we used these medians to determine the discrimination accuracy of our model’s CLs in differentiating between one- and two-back WM tasks. We computed the accuracy of our model using Mdn_two−back > Mdn_one−back (Mdn stands for median) criterion per participant. We also computed the Spearman correlation between these medians and the percentage of correct clicks by the participants. We scaled the participants’ number of clicks within [0, … 1] interval. Last, we computed the Spearman correlation between one- and two-back WM tasks’ medians. In order to determine the utility of our model, we also applied these analyses on participants’ average PFC activation (SM Section 1 for left-hemispheric and SM 2.1.2 for right-hemispheric PFC).

To further examine whether the changes in participants’ CLs during two-back WM task were significantly associated with the cognitive load induced by this WM task period than being the residual effect from their one-back WM task period, we performed a one-sample bootstrap test of significance (10,000 simulation runs) at 99.0% confidence interval (CI) on the difference between participants’ CLs during two-back and one-back WM tasks (i.e., CL_B2 − CL_B1). We then considered the null hypothesis H0: induced change in CLs by two-back WM task after the deduction of one-back’s CLs was non-significant and tested it against the alternative hypothesis H1: two-back WM task’s induced change in CLs after the deduction of one-back’s CLs was significant. Since we considered CL_B2 − CL_B1, H0 and H1 then represented the situations in which CL_B2 − CL_B1 ≈ 0 (i.e., zero fell within the 99.0% confidence interval) and CL_B2 − CL_B1 > 0 (i.e., their 99.0% confidence interval was significantly above zero), respectively. It is worth noting that H1: CL_B2 − CL_B1 > 0 is equivalent to H1': CL_B1 − CL_B2 < 0. We reported the mean, standard deviation, and 99.0% confidence interval for left PFC in the main manuscript (for results associated with right PFC, see SM, Section 2 and Fig. 3).

Next, we computed the Spearman correlations between these CL_B2 − CL_B1 values and the participants’ correct clicks during the two-back WM task. We chose the participants’ correct clicks during two- than one-back WM tasks since CL_B2 − CL_B1 values reflected the quantitative changes in participants’ CLs associated with two-back WM task and after the reduction of participants’ CLs during one-back WM task period. We followed this by computing their 95.0% bootstrap (10,000 simulation runs) confidence intervals. For the bootstrap test, we considered the null hypothesis H0: there was no correlation between CL_B2 − CL_B1 and participants’ correct clicks during two-back WM task and tested it against the alternative hypothesis H1: CL_B2 − CL_B1 significantly correlated with participants’ correct clicks during two-back WM task. We reported the mean, standard deviation, and 95.0% confidence interval for this test. We also computed the p-value of this test as the fraction of the distribution that was more extreme than the actually observed correlation values. For this purpose, we performed a two-tailed test in which we used the absolute values so that both the positive and the negative correlations were accounted for.

Last, to ensure that the observed changes in the participants’ CLs during two-back WM tasks were due to the PFC activity during this WM task than artefacts (e.g., noise or an affine transformation of one-back’s CLs as a result of the underlying linear property of the hemodynamic responses^53,54), we applied a one-sample bootstrap test of significance (10,000 simulation runs) at 99.0% confidence interval on the Kullback-Leibler divergence D_KL of participants’ CLs during two- and one-back WM tasks (i.e., D_KL(CL_B2, CL_B1). we considered the null hypothesis H0: difference in the distribution of CLs in two- and one-back WM tasks were non-significant (hence one-back’s CLs can be used to explain the observed CLs during two-back WM task) and tested it against the alternative hypothesis H1: distribution of CLs during two-back WM task was significantly different from one-back’s CLs. We reported the mean, standard deviation, and 99.0% confidence interval for this test. It is worthy of note that whereas H0 in this test was satisfied if zero fell within the computed D_KL’s 99.0% confidence interval, H1’s satisfaction was associated with the case in which 99.0% confidence interval was significantly above zero (or equivalently significantly below zero in the case of D_KL(CL_B1, CL_B2).

The earlier studies on n-back³⁵ WM tasks and the language processing¹¹ reported on a higher activation in left- than right-hemispheric PFC. On the other hand, the recent findings on the role of PFC in n-back¹⁷ and story comprehension^13,14,15 indicate that such a distinction is not necessarily warranted. Therefore, we considered both left as well as right PFC in our study. However, we focused on the left PFC in the main manuscript since the activity in left PFC formed the common themes among these previous findings and provided the results pertinent to the right PFC in SM.

Results

Wilcoxon rank sum (Fig. 3(A)) identified a significant difference between the participants’ CLs in one- and two-back WM tasks (p < 0.001, W(60) = 3.59, r = 0.46, M_one−back = 1.83, SD_one−back = 0.59, M_two−back = 2.48, SD_two−back = 0.64). Our model achieved an 87.10% prediction accuracy for classification of these tasks. Table 1 summarizes these results.

Table 1 Wilcoxon rank sum along with the mean and standard deviation of the one- (M₁ and M₂) and two-back (M₂ and SD₂) CLs.

Full size table

We found a significant correlation between participants’ CLs and their number of correct clicks in response to one-back (Fig. 3(B)) WM task (r = 0.52, p < 0.01, M_Clicks = 0.88, SD_Clicks = 0.05). Similarly, this correlation was significant in two-back (Fig. 3(C)) WM task (r = 0.46, p < 0.01, M_Clicks = 0.69, SD_Clicks = 0.15). Last, we observed (Fig. 3(D)) a significant correlation between participants’ CLs in one- and two-back WM tasks (r = 0.41, p < 0.03).

One-sample bootstrap test of significance (10,000 simulation runs) at 99.0% confidence interval (CI) on the difference between participants’ CLs during two-back and one-back WM tasks (i.e., CL_B2 − CL_B1) verified that (Fig. 4(A)) the changes in participants’ CLs during two-back WM task were significantly associated with the cognitive load associated with this WM task period (i.e., CL_B2 − CL_B1 > 0.0) than being the residual effect from their one-back WM task period (M_CLB2−CLB1 = 0.96, SD_CLB2−CLB1 = 0.71, CI_CLB2−CLB1 = [0.74 1.18] where M and SD refer to the mean difference and the standard deviation of such a difference between the two compared states and CI shows the 99.0% confidence interval of their difference).

We also observed a significant correlation between participants’ CL_B2 − CL_B1 values and their correct click during their two-back WM task period (Fig. 4(B), r = 0.39, p = 0.03) which was further supported by their corresponding bootstrap tests (10,000 simulation runs) at 95.0% confidence interval (Fig. 4(C), CI_95.0% = [0.05 0.67]).

Finally, the bootstrap test of significance (10,000 simulation runs) at 99.0% confidence interval on the Kullback-Leibler divergence (Fig. 4(D)) between participants’ CLs distribution in two- versus (i.e., D_KL(CL_B2, CL_B1)) identified a significant difference in the distribution of the participants’ CLs in two-back (i.e., B2 in this subplot) and their corresponding CLs during one-back (i.e., B1 in this subplot) WM tasks (M_{DKL(B2, B1)} = 3.44, SD_{DKL(B2, B1)} = 0.87, CI_{DKL(B2, B1)} = [1.48 5.93]). This test ruled out that the observed changes in the participants’ CLs during the two-back were primarily due to the proceeding one-back task (e.g., effect of noise, linear scaling, or affine transformation).

Experiment 2: Estimation of the perceived difficulty during naturalistic storytelling

The first experiment showed that our proposal for quantification of the cognitive load can significantly discriminate the differential load of the WM tasks on PFC activity. In the second experiment, we verified the ability of our approach in estimation of the individuals’ perceived difficulty of the verbally communicated content in a naturalistic storytelling. We used our CL’s formulations and the decision boundary computed with the data used in the second experiment. We also investigated the performance of our approach in four media settings: face-to-face, humanoid, speaker, and video-chat. These media settings allowed us to validate the robustness of our approach. Precisely, the face-to-face setting laid down a reliable basis for verification of our model’s performance: throughout the history, stories have been made and narrated by the people for the people. On the other hand, the speaker and the video-chat verified the utility of our approach in capturing the PFC activation in response to the content of the story than such potential factors as embodiment and novelty effect (in case of the humanoid). Taken together, comparable performance of our model on these media settings in conjunction with its accuracy in case of the humanoid demonstrated its generalizability and therefore effectiveness of the brain-based quantification of the perceived difficulty of the communicated content using PFC pattern of activation.

Contents of Sections 3.2.2 through 3.2.4 are also appeared in Keshmiri et al.⁵⁵. For the sake of clarity, we provide their outline in this article as well.