Using deep neural networks to disentangle visual and semantic information in human perception and memory

Shoham, Adva; Grosbard, Idan Daniel; Patashnik, Or; Cohen-Or, Daniel; Yovel, Galit

doi:10.1038/s41562-024-01816-9

Article
Published: 08 February 2024

Using deep neural networks to disentangle visual and semantic information in human perception and memory

Nature Human Behaviour volume 8, pages 702–717 (2024)Cite this article

2724 Accesses
24 Altmetric
Metrics details

Subjects

Human behaviour

Abstract

Mental representations of familiar categories are composed of visual and semantic information. Disentangling the contributions of visual and semantic information in humans is challenging because they are intermixed in mental representations. Deep neural networks that are trained either on images or on text or by pairing images and text enable us now to disentangle human mental representations into their visual, visual–semantic and semantic components. Here we used these deep neural networks to uncover the content of human mental representations of familiar faces and objects when they are viewed or recalled from memory. The results show a larger visual than semantic contribution when images are viewed and a reversed pattern when they are recalled. We further reveal a previously unknown unique contribution of an integrated visual–semantic representation in both perception and memory. We propose a new framework in which visual and semantic information contribute independently and interactively to mental representations in perception and memory.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: The representational geometry of familiar faces based on visual, visual–semantic and semantic DNNs.**

**Fig. 2: The contribution of visual, visual–semantic and semantic DNNs to human representations of familiar faces in perception and memory.**

**Fig. 3: Human similarity ratings of AI-generated faces.**

**Fig. 4: The contribution of visual and visual–semantic information to the perceptual representation of unfamiliar faces.**

**Fig. 5: The contribution of visual, visual–semantic and semantic DNNs to human semantic representations.**

**Fig. 6: The contribution of visual, visual–semantic and semantic DNNs to human representations of objects in perception and memory.**

Capturing the objects of vision with neural networks

Article 20 September 2021

Capturing human categorization of natural images by combining deep networks and cognitive models

Article Open access 27 October 2020

Qualitative similarities and differences in visual object representations between brains and deep networks

Article Open access 25 March 2021

Data availability

Data ware analysed using R⁹¹. The datasets are available from the OSF website at https://osf.io/3hwmy/?view_only=584ab8985520411183321008a2fb1a60. The following datasets were used for DNN training: ImageNet (https://www.image-net.org/download.php); VGGFace2 (currently there is no official download link, please contact the dataset’s original publishers); and CelebA-HQ (https://mmlab.ie.cuhk.edu.hk/projects/CelebA/CelebAMask_HQ.html). Source data are provided with this paper.

Code availability

The code for data analysis is available from the OSF website at https://osf.io/3hwmy/?view_only=584ab8985520411183321008a2fb1a60. The deep-learning algorithms are open-source codes that can be obtained from the references cited in the text.

References

Gibson, J. J. The Ecological Approach to Visual Perception: Classic Edition (Psychology Press, 2014).
Sperry, R. W. Neurology and the mind–body problem. Am. Sci. 40, 291–312 (1952).
Google Scholar
Miller, G. A. The cognitive revolution: a historical perspective. Trends Cogn. Sci. 7, 141–144 (2003).
PubMed Google Scholar
Firestone, C. & Scholl, B. J. Cognition does not affect perception: evaluating the evidence for “top-down” effects. Behav. Brain Sci. 39, e229 (2016).
PubMed Google Scholar
Barsalou, L. W. Perceptual symbol systems. Behav. Brain Sci. 22, 577–609 (1999).
CAS PubMed Google Scholar
Kosslyn, S. M. Image and Brain: The Resolution of the Imagery Debate (MIT Press, 2014).
Tversky, A. Features of similarity. Psychol. Rev. 84, 327–352 (1977).
Google Scholar
Leshinskaya, A. & Caramazza, A. For a cognitive neuroscience of concepts: Moving beyond the grounding issue. Psychon. Bull. Rev. 23, 991–1001 (2016).
PubMed Google Scholar
Pylyshyn, Z. W. Mental imagery: in search of a theory. Behav. Brain Sci. 25, 157–182 (2002).
PubMed Google Scholar
Clark, J. M. & Paivio, A. in Imagery and Related Mnemonic Processes (eds McDaniel, M. A. & Pressley, M.) 5–33 (Springer, 1987).
Bankson, B. B., Hebart, M. N., Groen, I. I. A. & Baker, C. I. The temporal evolution of conceptual object representations revealed through models of behavior, semantics and deep neural networks. NeuroImage 178, 172–182 (2018).
CAS PubMed Google Scholar
Bar, M. Visual objects in context. Nat. Rev. Neurosci. 5, 617–629 (2004).
CAS PubMed Google Scholar
Barense, M. D., Henson, R. N. A. & Graham, K. S. Perception and conception: temporal lobe activity during complex discriminations of familiar and novel faces and objects. J. Cogn. Neurosci. 23, 3052–3067 (2011).
PubMed Google Scholar
Bonnen, T., Yamins, D. L. K. & Wagner, A. D. When the ventral visual stream is not enough: a deep learning account of medial temporal lobe involvement in perception. Neuron 109, 2755–2766.e6 (2021).
CAS PubMed PubMed Central Google Scholar
Bracci, S. & Op de Beeck, H. Dissociations and associations between shape and category representations in the two visual pathways. J. Neurosci. 36, 432–444 (2016).
CAS PubMed PubMed Central Google Scholar
Capitani, E., Caramazza, A. & Borgo, F. What are the facts of semantic category-specific deficits? Cogn. Neuropsychol. 20, 213–261 (2003).
CAS PubMed Google Scholar
Clarke, A. & Tyler, L. K. Understanding what we see: how we derive meaning from vision. Trends Cogn. Sci. 19, 677–687 (2015).
PubMed PubMed Central Google Scholar
Visconti di Oleggio Castello, M., Haxby, J. V. & Gobbini, M. I. Shared neural codes for visual and semantic information about familiar faces in a common representational space. Proc. Natl Acad. Sci. USA 118, e2110474118 (2021).
PubMed PubMed Central Google Scholar
Hasantash, M. & Afraz, A. Richer color vocabulary is associated with better color memory but not color perception. Proc. Natl Acad. Sci. USA 117, 31046–31052 (2020).
CAS PubMed PubMed Central Google Scholar
Inhoff, M. C. et al. Understanding perirhinal contributions to perception and memory: Evidence through the lens of selective perirhinal damage. Neuropsychologia 124, 9–18 (2019).
PubMed Google Scholar
Linde-Domingo, J., Treder, M. S., Kerrén, C. & Wimber, M. Evidence that neural information flow is reversed between object perception and object reconstruction from memory. Nat. Commun. 10, 179 (2019).
PubMed PubMed Central Google Scholar
Martin, C. B., Douglas, D., Newsome, R. N., Man, L. L. Y. & Barense, M. D. Integrative and distinctive coding of visual and conceptual object features in the ventral visual stream. eLife 7, e31873 (2018).
PubMed PubMed Central Google Scholar
Kell, A. J. E., Yamins, D. L. K., Shook, E. N., Norman-Haignere, S. V. & McDermott, J. H. A task-optimized neural network replicates human auditory behavior, predicts brain responses, and reveals a cortical processing hierarchy. Neuron 98, 630–644.e16 (2018).
CAS PubMed Google Scholar
Geirhos, R. et al. Imagenet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. In Proc. 7th International Conference on Learning Representations 1–22 (ICLR, 2019).
Kriegeskorte, N. Deep neural networks: a new framework for modelling biological vision and brain information processing. Annu. Rev. Vis. Sci. 1, 417–446 (2015).
PubMed Google Scholar
Marcus, G. Deep learning: a critical appraisal. Preprint at http://export.arxiv.org/abs/1801.00631v1 (2018).
Dobs, K., Martinez, J., Kell, A. J. E. & Kanwisher, N. Brain-like functional specialization emerges spontaneously in deep neural networks. Sci. Adv. 8, eabl8913 (2022).
PubMed PubMed Central Google Scholar
Grand, G., Blank, I. A., Pereira, F. & Fedorenko, E. Semantic projection recovers rich human knowledge of multiple object features from word embeddings. Nat. Hum. Behav. 6, 975–987 (2022).
PubMed PubMed Central Google Scholar
Groen, I. A. A. et al. Distinct contributions of functional and deep neural network features to representational similarity of scenes in human brain and behavior. eLife 7, e32962 (2018).
PubMed PubMed Central Google Scholar
Hasson, U., Nastase, S. A. & Goldstein, A. Direct fit to nature: an evolutionary perspective on biological and artificial neural networks. Neuron 105, 416–434 (2020).
CAS PubMed PubMed Central Google Scholar
Abudarham, N., Grosbard, I. & Yovel, G. Face recognition depends on specialized mechanisms tuned to view-invariant facial features: insights from deep neural networks optimized for face or object recognition. Cogn. Sci. 45, e13031 (2021).
PubMed Google Scholar
Jacobs, R. A. & Bates, C. J. Comparing the visual representations and performance of humans and deep neural networks. Curr. Dir. Psychol. Sci. 28, 34–39 (2019).
Google Scholar
Radford, A. et al. Learning transferable visual models from natural language supervision. In Proc. 38th International Conference on Machine Learning (eds Meila, M. & Zhang, T.) 8748–8763 (PMLR, 2021).
Bruce, V. & Young, A. Understanding face recognition. Br. J. Psychol. 77, 305–327 (1986).
PubMed Google Scholar
Clarke, A., Taylor, K. I., Devereux, B., Randall, B. & Tyler, L. K. From perception to conception: how meaningful objects are processed over time. Cereb. Cortex 23, 187–197 (2013).
PubMed Google Scholar
Clarke, A. & Tyler, L. K. Object-specific semantic coding in human perirhinal cortex. J. Neurosci. 34, 4766–4775 (2014).
CAS PubMed PubMed Central Google Scholar
Devereux, B. J., Clarke, A. & Tyler, L. K. Integrated deep visual and semantic attractor neural networks predict fMRI pattern-information along the ventral object processing pathway. Sci. Rep. 8, 10636 (2018).
PubMed PubMed Central Google Scholar
Gobbini, M. I. & Haxby, J. V. Neural systems for recognition of familiar faces. Neuropsychologia 45, 32–41 (2007).
PubMed Google Scholar
Clerkin, E. M., Hart, E., Rehg, J. M., Yu, C. & Smith, L. B. Real-world visual statistics and infants’ first-learned object names. Philos. Trans. R. Soc. B Biol. Sci. 372, 20160055 (2017).
Google Scholar
Hall, D. G., Corrigall, K., Rhemtulla, M., Donegan, E. & Xu, F. Infants’ use of lexical-category-to-meaning links in object individuation. Child Dev. 79, 1432–1443 (2008).
PubMed Google Scholar
Yee, M., Jones, S. S. & Smith, L. B. Changes in visual object recognition precede the shape bias in early noun learning. Front. Psychol. 3, 533 (2012).
PubMed PubMed Central Google Scholar
Carlin, J. D. & Kriegeskorte, N. Adjudicating between face-coding models with individual-face fMRI responses. PLoS Comput. Biol. 13, 1–28 (2017).
Google Scholar
Kubilius, J., Bracci, S. & Op de Beeck, H. P. Deep neural networks as a computational model for human shape sensitivity. PLoS Comput. Biol. 12, e1004896 (2016).
PubMed PubMed Central Google Scholar
O’Toole, A. J. & Castillo, C. D. Face recognition by humans and machines: three fundamental advances from deep learning. Annu Rev. Vis. Sci. 7, 543–570 (2021).
PubMed PubMed Central Google Scholar
O’Toole, A. J., Castillo, C. D., Parde, C. J., Hill, M. Q. & Chellappa, R. Face space representations in deep convolutional neural networks. Trends Cogn. Sci. 22, 794–809 (2018).
PubMed Google Scholar
Schyns, P. G., Snoek, L. & Daube, C. Degrees of algorithmic equivalence between the brain and its DNN models. Trends Cogn. Sci. 26, 1090–1102 (2022).
PubMed Google Scholar
Tsantani, M., Kriegeskorte, N., McGettigan, C. & Garrido, L. Faces and voices in the brain: A modality-general person-identity representation in superior temporal sulcu. NeuroImage 201, 116004 (2019).
PubMed Google Scholar
Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition. Preprint at arXiv https://doi.org/10.48550/arXiv.1409.1556 (2014).
Muennighoff, N. SGPT: GPT sentence embeddings for semantic search. Preprint at arXiv https://doi.org/10.48550/arXiv.2202.08904 (2022).
Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 1637–1639 (2008).
Google Scholar
Abudarham, N., Bate, S., Duchaine, B. & Yovel, G. Developmental prosopagnosics and super recognizers rely on the same facial features used by individuals with normal face recognition abilities for face identification. Neuropsychologia 160, 107963 (2021).
PubMed Google Scholar
Dobs, K., Kell, A. J., Martinez, J., Cohen, M. & Kanwisher, N. Using task-optimized neural networks to understand why brains have specialized processing for faces. J. Vis. 20, 660 (2020).
Google Scholar
Cavazos, J. G., Jeckeln, G., Hu, Y. & O’Toole, A. in Deep Learning-Based Face Analytics (eds Ratha, N. K. et al.) 361–379 (Springer, 2021).
Jacob, G., Pramod, R. T., Katti, H. & Arun, S. P. Qualitative similarities and differences in visual object representations between brains and deep networks. Nat. Commun. 12, 1872 (2021).
CAS PubMed PubMed Central Google Scholar
Jozwik, K. M. et al. Face dissimilarity judgments are predicted by representational distance in morphable and image-computable models. Proc. Natl Acad. Sci. USA 119, e2115047119 (2022).
CAS PubMed PubMed Central Google Scholar
Song, Y., Qu, Y., Xu, S. & Liu, J. Implementation-independent representation for deep convolutional neural networks and humans in processing faces. Front. Comput. Neurosci. 14, 601314 (2021).
PubMed PubMed Central Google Scholar
Tian, F., Xie, H., Song, Y., Hu, S. & Liu, J. The face inversion effect in deep convolutional neural networks. Front. Comput. Neurosci. 16, 854218 (2022).
PubMed PubMed Central Google Scholar
Yildirim, I., Belledonne, M., Freiwald, W. & Tenenbaum, J. Efficient inverse graphics in biological face processing. Sci. Adv. 6, eaax5979 (2020).
PubMed PubMed Central Google Scholar
Kriegeskorte, N., Mur, M. & Bandettini, P. Representational similarity analysis—connecting the branches of systems neuroscience. Front. Syst. Neurosci. 2, 4 (2008).
PubMed PubMed Central Google Scholar
Karras, T., Laine, S. & Aila, T. A style-based generator architecture for generative adversarial networks. IEEE Trans. Pattern Anal. Mach. Intell. 43, 4217–4228 (2018).
Google Scholar
Karras, T. et al. Analyzing and improving the image quality of StyleGAN. in Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition 8107–8116 (IEEE, 2019).
Slone, L. K., Smith, L. B. & Yu, C. Self-generated variability in object images predicts vocabulary growth. Dev. Sci. 22, e12816 (2019).
PubMed PubMed Central Google Scholar
Young, A. W. & Bruce, V. Understanding person perception. Br. J. Psychol. 102, 959–974 (2011).
PubMed Google Scholar
Burton, A. M., Jenkins, R. & Schweinberger, S. R. Mental representations of familiar faces. Br. J. Psychol. 102, 943–958 (2011).
PubMed Google Scholar
Jenkins, R., White, D., Montfort, X. & Burton, A. M. Variability in photos of the same face. Cognition 121, 313–323 (2011).
PubMed Google Scholar
Kramer, R. S. S., Young, A. W. & Burton, A. M. Understanding face familiarity. Cognition 172, 46–58 (2018).
PubMed Google Scholar
Young, A. W. & Burton, A. M. Are we face experts? Trends Cogn. Sci. 22, 100–110 (2018).
PubMed Google Scholar
Burton, M. A. Why has research in face recognition progressed so slowly? The importance of variability. Q. J. Exp. Psychol. 66, 1467–1485 (2013).
Google Scholar
Ritchie, K. L. & Burton, A. M. Learning faces from variability. Q. J. Exp. Psychol. 70, 897–905 (2017).
Google Scholar
Golan, T., Raju, P. C. & Kriegeskorte, N. Controversial stimuli: pitting neural networks against each other as models of human cognition. Proc. Natl Acad. Sci. USA 117, 29330–29337 (2020).
CAS PubMed PubMed Central Google Scholar
Yamins, D. L. K. & DiCarlo, J. J. Using goal-driven deep learning models to understand sensory cortex. Nat. Neurosci. https://doi.org/10.1038/nn.4244 (2016).
Kaniuth, P. & Hebart, M. N. Feature-reweighted representational similarity analysis: a method for improving the fit between computational models, brains, and behavior. NeuroImage 257, 119294 (2022).
PubMed Google Scholar
Khaligh-Razavi, S.-M. & Kriegeskorte, N. Deep supervised, but not unsupervised, models may explain IT cortical representation. PLoS Comput. Biol. 6, e1003915 (2014).
Google Scholar
Schacter, D. L., Norman, K. A. & Koutstaal, W. The cognitive neuroscience of constructive memory. Annu. Rev. Psychol. 49, 289–318 (1998).
CAS PubMed Google Scholar
Schacter, D. L. The seven sins of memory. Insights from psychology and cognitive neuroscience. Am. Psychol. 54, 182–203 (1999).
CAS PubMed Google Scholar
Schacter, D. L., Guerin, S. A. & St. Jacques, P. L. Memory distortion: an adaptive perspective. Trends Cogn. Sci. 15, 467–474 (2011).
PubMed PubMed Central Google Scholar
Bower, G. H. & Karlin, M. B. Depth of processing pictures of faces and recognition memory. J. Exp. Psychol. 103, 751–757 (1974).
Google Scholar
Craik, F. I. M. & Lockhart, R. S. Levels of processing: a framework for memory research. J. Verbal Learn. Verbal Behav. 11, 671–684 (1972).
Google Scholar
Schwartz, L. & Yovel, G. Social judgements improve face recognition more than perceptual judgements. J. Vis. 17, 1001 (2017).
Google Scholar
Ganis, G., Thompson, W. L. & Kosslyn, S. M. Brain areas underlying visual mental imagery and visual perception: an fMRI study. Cogn. Brain Res. 20, 226–241 (2004).
Google Scholar
Gelbard-Sagiv, H., Mukamel, R., Harel, M., Malach, R. & Fried, I. Internally generated reactivation of single neurons in human hippocampus during free recall. Science 322, 96–101 (2008).
CAS PubMed PubMed Central Google Scholar
O.’Craven, K. M. & Kanwisher, N. G. Mental imagery of faces and places activates corresponding stimulus-specific brain regions. J. Cogn. Neurosci. 12, 1013–1023 (2000).
PubMed Google Scholar
Cao, Q., Shen, L., Xie, W., Parkhi, O. M. & Zisserman, A. VGGFace2: a dataset for recognising faces across pose and age. in Proc. 13th IEEE International Conference on Automatic Face and Gesture Recognition 67–74 (FG, 2018).
Zhang, K., Zhang, Z., Li, Z. & Qiao, Y. Joint face detection and alignment using multi-task cascaded convolutional networks. IEEE Signal Process Lett. 23, 1499–1503 (2016).
Google Scholar
Parkhi, O. M., Vedaldi, A. & Zisserman, A. D. F. R. Deep face recognition. In BMVC 2015-Proceedings of the British Machine Vision Conference (2015).
Huang, G. B., Ramesh, M., Berg, T. & Learned-Miller, E. Labeled faces in the wild: a database for studying face recognition in unconstrained environments. In Workshop on faces in ‘Real-Life’ Images: detection, alignment, and recognition (2008).
Kingma, D. P. & Ba, J. Adam: A method for stochastic optimization. Preprint at https://doi.org/10.48550/arXiv.1412.6980 (2014).
Paszke, A. PyTorch: an imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 32, 8024–8035 (2019).
Google Scholar
Deng, J. ImageNet: a large-scale hierarchical image database. in Proc. 2009 IEEE Conference on Computer Vision and Pattern Recognition 248–255 (IEEE, 2009).
Ma, N., Baetens, K., Vandekerckhove, M., Van der Cruyssen, L. & Van Overwalle, F. Dissociation of a trait and a valence representation in the mPFC. Soc. Cogn. Affect. Neurosci. 9, 1506–1514 (2013).
PubMed PubMed Central Google Scholar
The R Development Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2019).

Download references

Acknowledgements

We would like to thank R. Malach, M. Gilead and I. Blank for discussions of this manuscript. We thank G. Levy and J. Hajajra for their help with stimuli collection and initial analysis of experiment 4. This work was funded by an Israeli Science Foundation grant (ISF 917/2021) to G.Y. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information

These authors contributed equally: Adva Shoham, Idan Daniel Grosbard.

Authors and Affiliations

School of Psychological Sciences, Tel Aviv University, Tel Aviv, Israel
Adva Shoham, Idan Daniel Grosbard & Galit Yovel
Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel
Idan Daniel Grosbard & Galit Yovel
The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
Idan Daniel Grosbard, Or Patashnik & Daniel Cohen-Or

Authors

Adva Shoham
View author publications
You can also search for this author in PubMed Google Scholar
Idan Daniel Grosbard
View author publications
You can also search for this author in PubMed Google Scholar
Or Patashnik
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Cohen-Or
View author publications
You can also search for this author in PubMed Google Scholar
Galit Yovel
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A.S. and G.Y. conceptualized and designed the experiment. A.S. and I.G. wrote the code and performed the analysis of the deep-learning algorithms. A.S. collected data and performed analyses of human behavioural data. I.G., O.P. and D.C.-O. designed the experiment and wrote the codes for experiment 1b. A.S. and G.Y. wrote the manuscript.

Corresponding authors

Correspondence to Adva Shoham or Galit Yovel.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Human Behaviour thanks Tomoyasu Horikawa and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 CLIP distance matrix of face images and their names.

A distance matrix of CLIP embeddings of face images and their names. Only identities that were familiar to CLIP were selected to the study. An identity is considered familiar if the distance between the embedding of its name (rows) and the embedding of its corresponding face image (columns) is closest relative to all other identities. This is indicated by the dark diagonal of the matrix.

Source data

Extended Data Fig. 2 The contribution of gender, occupation and age to DNNs representations of familiar faces.

The correlations between the RDMs for gender, occupation and age with the RDMs of the embeddings of the faces of the famous identities by visual (VGGft20), visual-semantic (CLIP) and their Wikipedia descriptions based on the semantic (SPGT) DNNs (see supplementary Table 1).

Source data

Extended Data Fig. 3 The correlations of human representations in perception and memory across all DNNs’ layers.

The mean correlations between RDMs of faces based on the embeddings of each layer of VGG-ft20 (left) and each layer of CLIP (right) with human visual similarity ratings in perception (left; N = 20) and memory (right; N = 19). Error bars indicate the standard error of the mean. Each dot indicates a participant.

Source data

Extended Data Fig. 4 DNNs visual and visual-semantic representations account for human representations beyond gender, occupation and age.

A. The mean partial correlations of the RDMs of visual (VGG-ft20), visual-semantic (CLIP) and semantic (SGPT) DNNs with RDMs of human perception (N = 20) and memory (N = 19) of the same identities, when B. gender, C. gender and occupation and D. gender, occupation and age are held out. Error bars indicate the standard error of the mean. See statistical analysis in Extended Data Table 1.

Extended Data Fig. 5 Human representations of AI-generated faces.

A. RDMs of human similarity rating of VGG-generated (left) and CLIP-generated (right) images of the celebrity faces. B. A t-SNE visualization of human RDMs of VGG-generated (left) and CLIP-generated (right) faces C. High correlations between human similarity ratings of VGG and CLIP-generated faces and RDMs of VGG and CLIP’s embeddings of the original faces: D. The mean correlations between the RDM of human similarity ratings of VGG-generated (N = 20) and CLIP-generated faces (N = 20) with Gender, Occupation and Age. Error bars represent the standard error of mean correlations. VGG: VGG embeddings of original faces; VGG-gen: human similarity ratings of the VGG-generated faces. CLIP: CLIP embeddings of original faces; CLIP-gen: human similarity ratings of the CLIP-generated faces. The AI-generated faces cannot be copyrighted and are not shown in the figure. The images can be obtained by contacting the authors.

Source data

Extended Data Fig. 6 Images of the objects used in the study.

The 20 objects that were selected for Experiment 4. For display purpose, all images were replaced by licensed images with similar appearance from Freepik (https://www.freepick.com).

Extended Data Fig. 7 The representations of objects by visual, visual semantic and semantic DNNs.

The RDMs, the correlations between them and t-SNE visualization for objects based on embeddings of the images by VGG trained on ImageNet and CLIP and SGPT embeddings of their dictionary definitions (see supplementary Table 3).

Source data

Extended Data Table 1 Statistical analysis of results reported in Extended Data Fig. 4

Full size table

Supplementary information

Supplementary Information

Supplementary Figs. 1 and 2, Supplementary Tables 1–3, supplementary methods and supplementary results.

Reporting Summary

Peer Review File

Source data

Source Data Fig. 1

Similarity score (cosine distance) for each pair of familiar faces based on image embeddings of VGGft and CLIP and text embeddings of SGPT.

Source Data Fig. 2

Similarity scores for each pair of familiar faces based on the DNNs (Source Data Fig. 1) and human visual similarity rating in perception and memory.

Source Data Fig. 3

Human visual similarity ratings of AI-generated faces, and data DNNs and human similarity scores of the original images of the same faces.

Source Data Fig. 4

Similarity scores for each pair of familiar and unfamiliar faces based on VGG, CLIP and human similarity rating.

Source Data Fig. 5

Similarity scores for each pair of familiar faces based on the DNNs and human semantic similarity rating for images and names.

Source Data Fig. 6

Similarity scores for each pair of objects based on the DNN image embeddings of VGGft and CLIP and text embeddings of SGPT and human visual similarity rating in perception and memory.

Source Data Extended Data Fig. 1

CLIP image–name cosine distance.

Source Data Extended Data Fig. 2

The distance scores between all pairs of faces based on age, sex, occupation, VGGft, CLIP and SGPT.

Source Data Extended Data Fig. 3

Distance scores between all pairs of faces in each layer of VGGft and CLIP and human visual similarity ratings in perception and memory.

Source Data Extended Data Fig. 4 and Table 1

The distance scores between all pairs of faces based on age, sex, occupation, VGGft, CLIP and SGPT and human visual similarity ratings in perception and memory.

Source Data Extended Data Fig. 5

The distance scores between all pairs of faces based on age, sex, occupation and human similarity ratings of VGG-generated and CLIP-generated faces.

Source Data Extended Data Fig. 7

Similarity scores for each pair of objects based on the DNNs and human semantic similarity rating for images and names.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Shoham, A., Grosbard, I.D., Patashnik, O. et al. Using deep neural networks to disentangle visual and semantic information in human perception and memory. Nat Hum Behav 8, 702–717 (2024). https://doi.org/10.1038/s41562-024-01816-9

Download citation

Received: 06 February 2023
Accepted: 22 December 2023
Published: 08 February 2024
Issue Date: April 2024
DOI: https://doi.org/10.1038/s41562-024-01816-9

Subjects

Abstract

Access options

Similar content being viewed by others

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links