Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Using deep neural networks to disentangle visual and semantic information in human perception and memory

Subjects

Abstract

Mental representations of familiar categories are composed of visual and semantic information. Disentangling the contributions of visual and semantic information in humans is challenging because they are intermixed in mental representations. Deep neural networks that are trained either on images or on text or by pairing images and text enable us now to disentangle human mental representations into their visual, visual–semantic and semantic components. Here we used these deep neural networks to uncover the content of human mental representations of familiar faces and objects when they are viewed or recalled from memory. The results show a larger visual than semantic contribution when images are viewed and a reversed pattern when they are recalled. We further reveal a previously unknown unique contribution of an integrated visual–semantic representation in both perception and memory. We propose a new framework in which visual and semantic information contribute independently and interactively to mental representations in perception and memory.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: The representational geometry of familiar faces based on visual, visual–semantic and semantic DNNs.
Fig. 2: The contribution of visual, visual–semantic and semantic DNNs to human representations of familiar faces in perception and memory.
Fig. 3: Human similarity ratings of AI-generated faces.
Fig. 4: The contribution of visual and visual–semantic information to the perceptual representation of unfamiliar faces.
Fig. 5: The contribution of visual, visual–semantic and semantic DNNs to human semantic representations.
Fig. 6: The contribution of visual, visual–semantic and semantic DNNs to human representations of objects in perception and memory.

Similar content being viewed by others

Data availability

Data ware analysed using R91. The datasets are available from the OSF website at https://osf.io/3hwmy/?view_only=584ab8985520411183321008a2fb1a60. The following datasets were used for DNN training: ImageNet (https://www.image-net.org/download.php); VGGFace2 (currently there is no official download link, please contact the dataset’s original publishers); and CelebA-HQ (https://mmlab.ie.cuhk.edu.hk/projects/CelebA/CelebAMask_HQ.html). Source data are provided with this paper.

Code availability

The code for data analysis is available from the OSF website at https://osf.io/3hwmy/?view_only=584ab8985520411183321008a2fb1a60. The deep-learning algorithms are open-source codes that can be obtained from the references cited in the text.

References

  1. Gibson, J. J. The Ecological Approach to Visual Perception: Classic Edition (Psychology Press, 2014).

  2. Sperry, R. W. Neurology and the mind–body problem. Am. Sci. 40, 291–312 (1952).

    Google Scholar 

  3. Miller, G. A. The cognitive revolution: a historical perspective. Trends Cogn. Sci. 7, 141–144 (2003).

    PubMed  Google Scholar 

  4. Firestone, C. & Scholl, B. J. Cognition does not affect perception: evaluating the evidence for “top-down” effects. Behav. Brain Sci. 39, e229 (2016).

    PubMed  Google Scholar 

  5. Barsalou, L. W. Perceptual symbol systems. Behav. Brain Sci. 22, 577–609 (1999).

    CAS  PubMed  Google Scholar 

  6. Kosslyn, S. M. Image and Brain: The Resolution of the Imagery Debate (MIT Press, 2014).

  7. Tversky, A. Features of similarity. Psychol. Rev. 84, 327–352 (1977).

    Google Scholar 

  8. Leshinskaya, A. & Caramazza, A. For a cognitive neuroscience of concepts: Moving beyond the grounding issue. Psychon. Bull. Rev. 23, 991–1001 (2016).

    PubMed  Google Scholar 

  9. Pylyshyn, Z. W. Mental imagery: in search of a theory. Behav. Brain Sci. 25, 157–182 (2002).

    PubMed  Google Scholar 

  10. Clark, J. M. & Paivio, A. in Imagery and Related Mnemonic Processes (eds McDaniel, M. A. & Pressley, M.) 5–33 (Springer, 1987).

  11. Bankson, B. B., Hebart, M. N., Groen, I. I. A. & Baker, C. I. The temporal evolution of conceptual object representations revealed through models of behavior, semantics and deep neural networks. NeuroImage 178, 172–182 (2018).

    CAS  PubMed  Google Scholar 

  12. Bar, M. Visual objects in context. Nat. Rev. Neurosci. 5, 617–629 (2004).

    CAS  PubMed  Google Scholar 

  13. Barense, M. D., Henson, R. N. A. & Graham, K. S. Perception and conception: temporal lobe activity during complex discriminations of familiar and novel faces and objects. J. Cogn. Neurosci. 23, 3052–3067 (2011).

    PubMed  Google Scholar 

  14. Bonnen, T., Yamins, D. L. K. & Wagner, A. D. When the ventral visual stream is not enough: a deep learning account of medial temporal lobe involvement in perception. Neuron 109, 2755–2766.e6 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  15. Bracci, S. & Op de Beeck, H. Dissociations and associations between shape and category representations in the two visual pathways. J. Neurosci. 36, 432–444 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  16. Capitani, E., Caramazza, A. & Borgo, F. What are the facts of semantic category-specific deficits? Cogn. Neuropsychol. 20, 213–261 (2003).

    CAS  PubMed  Google Scholar 

  17. Clarke, A. & Tyler, L. K. Understanding what we see: how we derive meaning from vision. Trends Cogn. Sci. 19, 677–687 (2015).

    PubMed  PubMed Central  Google Scholar 

  18. Visconti di Oleggio Castello, M., Haxby, J. V. & Gobbini, M. I. Shared neural codes for visual and semantic information about familiar faces in a common representational space. Proc. Natl Acad. Sci. USA 118, e2110474118 (2021).

    PubMed  PubMed Central  Google Scholar 

  19. Hasantash, M. & Afraz, A. Richer color vocabulary is associated with better color memory but not color perception. Proc. Natl Acad. Sci. USA 117, 31046–31052 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  20. Inhoff, M. C. et al. Understanding perirhinal contributions to perception and memory: Evidence through the lens of selective perirhinal damage. Neuropsychologia 124, 9–18 (2019).

    PubMed  Google Scholar 

  21. Linde-Domingo, J., Treder, M. S., Kerrén, C. & Wimber, M. Evidence that neural information flow is reversed between object perception and object reconstruction from memory. Nat. Commun. 10, 179 (2019).

    PubMed  PubMed Central  Google Scholar 

  22. Martin, C. B., Douglas, D., Newsome, R. N., Man, L. L. Y. & Barense, M. D. Integrative and distinctive coding of visual and conceptual object features in the ventral visual stream. eLife 7, e31873 (2018).

    PubMed  PubMed Central  Google Scholar 

  23. Kell, A. J. E., Yamins, D. L. K., Shook, E. N., Norman-Haignere, S. V. & McDermott, J. H. A task-optimized neural network replicates human auditory behavior, predicts brain responses, and reveals a cortical processing hierarchy. Neuron 98, 630–644.e16 (2018).

    CAS  PubMed  Google Scholar 

  24. Geirhos, R. et al. Imagenet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. In Proc. 7th International Conference on Learning Representations 1–22 (ICLR, 2019).

  25. Kriegeskorte, N. Deep neural networks: a new framework for modelling biological vision and brain information processing. Annu. Rev. Vis. Sci. 1, 417–446 (2015).

    PubMed  Google Scholar 

  26. Marcus, G. Deep learning: a critical appraisal. Preprint at http://export.arxiv.org/abs/1801.00631v1 (2018).

  27. Dobs, K., Martinez, J., Kell, A. J. E. & Kanwisher, N. Brain-like functional specialization emerges spontaneously in deep neural networks. Sci. Adv. 8, eabl8913 (2022).

    PubMed  PubMed Central  Google Scholar 

  28. Grand, G., Blank, I. A., Pereira, F. & Fedorenko, E. Semantic projection recovers rich human knowledge of multiple object features from word embeddings. Nat. Hum. Behav. 6, 975–987 (2022).

    PubMed  PubMed Central  Google Scholar 

  29. Groen, I. A. A. et al. Distinct contributions of functional and deep neural network features to representational similarity of scenes in human brain and behavior. eLife 7, e32962 (2018).

    PubMed  PubMed Central  Google Scholar 

  30. Hasson, U., Nastase, S. A. & Goldstein, A. Direct fit to nature: an evolutionary perspective on biological and artificial neural networks. Neuron 105, 416–434 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  31. Abudarham, N., Grosbard, I. & Yovel, G. Face recognition depends on specialized mechanisms tuned to view-invariant facial features: insights from deep neural networks optimized for face or object recognition. Cogn. Sci. 45, e13031 (2021).

    PubMed  Google Scholar 

  32. Jacobs, R. A. & Bates, C. J. Comparing the visual representations and performance of humans and deep neural networks. Curr. Dir. Psychol. Sci. 28, 34–39 (2019).

    Google Scholar 

  33. Radford, A. et al. Learning transferable visual models from natural language supervision. In Proc. 38th International Conference on Machine Learning (eds Meila, M. & Zhang, T.) 8748–8763 (PMLR, 2021).

  34. Bruce, V. & Young, A. Understanding face recognition. Br. J. Psychol. 77, 305–327 (1986).

    PubMed  Google Scholar 

  35. Clarke, A., Taylor, K. I., Devereux, B., Randall, B. & Tyler, L. K. From perception to conception: how meaningful objects are processed over time. Cereb. Cortex 23, 187–197 (2013).

    PubMed  Google Scholar 

  36. Clarke, A. & Tyler, L. K. Object-specific semantic coding in human perirhinal cortex. J. Neurosci. 34, 4766–4775 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  37. Devereux, B. J., Clarke, A. & Tyler, L. K. Integrated deep visual and semantic attractor neural networks predict fMRI pattern-information along the ventral object processing pathway. Sci. Rep. 8, 10636 (2018).

    PubMed  PubMed Central  Google Scholar 

  38. Gobbini, M. I. & Haxby, J. V. Neural systems for recognition of familiar faces. Neuropsychologia 45, 32–41 (2007).

    PubMed  Google Scholar 

  39. Clerkin, E. M., Hart, E., Rehg, J. M., Yu, C. & Smith, L. B. Real-world visual statistics and infants’ first-learned object names. Philos. Trans. R. Soc. B Biol. Sci. 372, 20160055 (2017).

    Google Scholar 

  40. Hall, D. G., Corrigall, K., Rhemtulla, M., Donegan, E. & Xu, F. Infants’ use of lexical-category-to-meaning links in object individuation. Child Dev. 79, 1432–1443 (2008).

    PubMed  Google Scholar 

  41. Yee, M., Jones, S. S. & Smith, L. B. Changes in visual object recognition precede the shape bias in early noun learning. Front. Psychol. 3, 533 (2012).

    PubMed  PubMed Central  Google Scholar 

  42. Carlin, J. D. & Kriegeskorte, N. Adjudicating between face-coding models with individual-face fMRI responses. PLoS Comput. Biol. 13, 1–28 (2017).

    Google Scholar 

  43. Kubilius, J., Bracci, S. & Op de Beeck, H. P. Deep neural networks as a computational model for human shape sensitivity. PLoS Comput. Biol. 12, e1004896 (2016).

    PubMed  PubMed Central  Google Scholar 

  44. O’Toole, A. J. & Castillo, C. D. Face recognition by humans and machines: three fundamental advances from deep learning. Annu Rev. Vis. Sci. 7, 543–570 (2021).

    PubMed  PubMed Central  Google Scholar 

  45. O’Toole, A. J., Castillo, C. D., Parde, C. J., Hill, M. Q. & Chellappa, R. Face space representations in deep convolutional neural networks. Trends Cogn. Sci. 22, 794–809 (2018).

    PubMed  Google Scholar 

  46. Schyns, P. G., Snoek, L. & Daube, C. Degrees of algorithmic equivalence between the brain and its DNN models. Trends Cogn. Sci. 26, 1090–1102 (2022).

    PubMed  Google Scholar 

  47. Tsantani, M., Kriegeskorte, N., McGettigan, C. & Garrido, L. Faces and voices in the brain: A modality-general person-identity representation in superior temporal sulcu. NeuroImage 201, 116004 (2019).

    PubMed  Google Scholar 

  48. Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition. Preprint at arXiv https://doi.org/10.48550/arXiv.1409.1556 (2014).

  49. Muennighoff, N. SGPT: GPT sentence embeddings for semantic search. Preprint at arXiv https://doi.org/10.48550/arXiv.2202.08904 (2022).

  50. Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 1637–1639 (2008).

    Google Scholar 

  51. Abudarham, N., Bate, S., Duchaine, B. & Yovel, G. Developmental prosopagnosics and super recognizers rely on the same facial features used by individuals with normal face recognition abilities for face identification. Neuropsychologia 160, 107963 (2021).

    PubMed  Google Scholar 

  52. Dobs, K., Kell, A. J., Martinez, J., Cohen, M. & Kanwisher, N. Using task-optimized neural networks to understand why brains have specialized processing for faces. J. Vis. 20, 660 (2020).

    Google Scholar 

  53. Cavazos, J. G., Jeckeln, G., Hu, Y. & O’Toole, A. in Deep Learning-Based Face Analytics (eds Ratha, N. K. et al.) 361–379 (Springer, 2021).

  54. Jacob, G., Pramod, R. T., Katti, H. & Arun, S. P. Qualitative similarities and differences in visual object representations between brains and deep networks. Nat. Commun. 12, 1872 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  55. Jozwik, K. M. et al. Face dissimilarity judgments are predicted by representational distance in morphable and image-computable models. Proc. Natl Acad. Sci. USA 119, e2115047119 (2022).

    CAS  PubMed  PubMed Central  Google Scholar 

  56. Song, Y., Qu, Y., Xu, S. & Liu, J. Implementation-independent representation for deep convolutional neural networks and humans in processing faces. Front. Comput. Neurosci. 14, 601314 (2021).

    PubMed  PubMed Central  Google Scholar 

  57. Tian, F., Xie, H., Song, Y., Hu, S. & Liu, J. The face inversion effect in deep convolutional neural networks. Front. Comput. Neurosci. 16, 854218 (2022).

    PubMed  PubMed Central  Google Scholar 

  58. Yildirim, I., Belledonne, M., Freiwald, W. & Tenenbaum, J. Efficient inverse graphics in biological face processing. Sci. Adv. 6, eaax5979 (2020).

    PubMed  PubMed Central  Google Scholar 

  59. Kriegeskorte, N., Mur, M. & Bandettini, P. Representational similarity analysis—connecting the branches of systems neuroscience. Front. Syst. Neurosci. 2, 4 (2008).

    PubMed  PubMed Central  Google Scholar 

  60. Karras, T., Laine, S. & Aila, T. A style-based generator architecture for generative adversarial networks. IEEE Trans. Pattern Anal. Mach. Intell. 43, 4217–4228 (2018).

    Google Scholar 

  61. Karras, T. et al. Analyzing and improving the image quality of StyleGAN. in Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition 8107–8116 (IEEE, 2019).

  62. Slone, L. K., Smith, L. B. & Yu, C. Self-generated variability in object images predicts vocabulary growth. Dev. Sci. 22, e12816 (2019).

    PubMed  PubMed Central  Google Scholar 

  63. Young, A. W. & Bruce, V. Understanding person perception. Br. J. Psychol. 102, 959–974 (2011).

    PubMed  Google Scholar 

  64. Burton, A. M., Jenkins, R. & Schweinberger, S. R. Mental representations of familiar faces. Br. J. Psychol. 102, 943–958 (2011).

    PubMed  Google Scholar 

  65. Jenkins, R., White, D., Montfort, X. & Burton, A. M. Variability in photos of the same face. Cognition 121, 313–323 (2011).

    PubMed  Google Scholar 

  66. Kramer, R. S. S., Young, A. W. & Burton, A. M. Understanding face familiarity. Cognition 172, 46–58 (2018).

    PubMed  Google Scholar 

  67. Young, A. W. & Burton, A. M. Are we face experts? Trends Cogn. Sci. 22, 100–110 (2018).

    PubMed  Google Scholar 

  68. Burton, M. A. Why has research in face recognition progressed so slowly? The importance of variability. Q. J. Exp. Psychol. 66, 1467–1485 (2013).

    Google Scholar 

  69. Ritchie, K. L. & Burton, A. M. Learning faces from variability. Q. J. Exp. Psychol. 70, 897–905 (2017).

    Google Scholar 

  70. Golan, T., Raju, P. C. & Kriegeskorte, N. Controversial stimuli: pitting neural networks against each other as models of human cognition. Proc. Natl Acad. Sci. USA 117, 29330–29337 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  71. Yamins, D. L. K. & DiCarlo, J. J. Using goal-driven deep learning models to understand sensory cortex. Nat. Neurosci. https://doi.org/10.1038/nn.4244 (2016).

  72. Kaniuth, P. & Hebart, M. N. Feature-reweighted representational similarity analysis: a method for improving the fit between computational models, brains, and behavior. NeuroImage 257, 119294 (2022).

    PubMed  Google Scholar 

  73. Khaligh-Razavi, S.-M. & Kriegeskorte, N. Deep supervised, but not unsupervised, models may explain IT cortical representation. PLoS Comput. Biol. 6, e1003915 (2014).

    Google Scholar 

  74. Schacter, D. L., Norman, K. A. & Koutstaal, W. The cognitive neuroscience of constructive memory. Annu. Rev. Psychol. 49, 289–318 (1998).

    CAS  PubMed  Google Scholar 

  75. Schacter, D. L. The seven sins of memory. Insights from psychology and cognitive neuroscience. Am. Psychol. 54, 182–203 (1999).

    CAS  PubMed  Google Scholar 

  76. Schacter, D. L., Guerin, S. A. & St. Jacques, P. L. Memory distortion: an adaptive perspective. Trends Cogn. Sci. 15, 467–474 (2011).

    PubMed  PubMed Central  Google Scholar 

  77. Bower, G. H. & Karlin, M. B. Depth of processing pictures of faces and recognition memory. J. Exp. Psychol. 103, 751–757 (1974).

    Google Scholar 

  78. Craik, F. I. M. & Lockhart, R. S. Levels of processing: a framework for memory research. J. Verbal Learn. Verbal Behav. 11, 671–684 (1972).

    Google Scholar 

  79. Schwartz, L. & Yovel, G. Social judgements improve face recognition more than perceptual judgements. J. Vis. 17, 1001 (2017).

    Google Scholar 

  80. Ganis, G., Thompson, W. L. & Kosslyn, S. M. Brain areas underlying visual mental imagery and visual perception: an fMRI study. Cogn. Brain Res. 20, 226–241 (2004).

    Google Scholar 

  81. Gelbard-Sagiv, H., Mukamel, R., Harel, M., Malach, R. & Fried, I. Internally generated reactivation of single neurons in human hippocampus during free recall. Science 322, 96–101 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  82. O.’Craven, K. M. & Kanwisher, N. G. Mental imagery of faces and places activates corresponding stimulus-specific brain regions. J. Cogn. Neurosci. 12, 1013–1023 (2000).

    PubMed  Google Scholar 

  83. Cao, Q., Shen, L., Xie, W., Parkhi, O. M. & Zisserman, A. VGGFace2: a dataset for recognising faces across pose and age. in Proc. 13th IEEE International Conference on Automatic Face and Gesture Recognition 67–74 (FG, 2018).

  84. Zhang, K., Zhang, Z., Li, Z. & Qiao, Y. Joint face detection and alignment using multi-task cascaded convolutional networks. IEEE Signal Process Lett. 23, 1499–1503 (2016).

    Google Scholar 

  85. Parkhi, O. M., Vedaldi, A. & Zisserman, A. D. F. R. Deep face recognition. In BMVC 2015-Proceedings of the British Machine Vision Conference (2015).

  86. Huang, G. B., Ramesh, M., Berg, T. & Learned-Miller, E. Labeled faces in the wild: a database for studying face recognition in unconstrained environments. In Workshop on faces in ‘Real-Life’ Images: detection, alignment, and recognition (2008).

  87. Kingma, D. P. & Ba, J. Adam: A method for stochastic optimization. Preprint at https://doi.org/10.48550/arXiv.1412.6980 (2014).

  88. Paszke, A. PyTorch: an imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 32, 8024–8035 (2019).

    Google Scholar 

  89. Deng, J. ImageNet: a large-scale hierarchical image database. in Proc. 2009 IEEE Conference on Computer Vision and Pattern Recognition 248–255 (IEEE, 2009).

  90. Ma, N., Baetens, K., Vandekerckhove, M., Van der Cruyssen, L. & Van Overwalle, F. Dissociation of a trait and a valence representation in the mPFC. Soc. Cogn. Affect. Neurosci. 9, 1506–1514 (2013).

    PubMed  PubMed Central  Google Scholar 

  91. The R Development Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2019).

Download references

Acknowledgements

We would like to thank R. Malach, M. Gilead and I. Blank for discussions of this manuscript. We thank G. Levy and J. Hajajra for their help with stimuli collection and initial analysis of experiment 4. This work was funded by an Israeli Science Foundation grant (ISF 917/2021) to G.Y. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information

Authors and Affiliations

Authors

Contributions

A.S. and G.Y. conceptualized and designed the experiment. A.S. and I.G. wrote the code and performed the analysis of the deep-learning algorithms. A.S. collected data and performed analyses of human behavioural data. I.G., O.P. and D.C.-O. designed the experiment and wrote the codes for experiment 1b. A.S. and G.Y. wrote the manuscript.

Corresponding authors

Correspondence to Adva Shoham or Galit Yovel.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Human Behaviour thanks Tomoyasu Horikawa and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 CLIP distance matrix of face images and their names.

A distance matrix of CLIP embeddings of face images and their names. Only identities that were familiar to CLIP were selected to the study. An identity is considered familiar if the distance between the embedding of its name (rows) and the embedding of its corresponding face image (columns) is closest relative to all other identities. This is indicated by the dark diagonal of the matrix.

Source data

Extended Data Fig. 2 The contribution of gender, occupation and age to DNNs representations of familiar faces.

The correlations between the RDMs for gender, occupation and age with the RDMs of the embeddings of the faces of the famous identities by visual (VGGft20), visual-semantic (CLIP) and their Wikipedia descriptions based on the semantic (SPGT) DNNs (see supplementary Table 1).

Source data

Extended Data Fig. 3 The correlations of human representations in perception and memory across all DNNs’ layers.

The mean correlations between RDMs of faces based on the embeddings of each layer of VGG-ft20 (left) and each layer of CLIP (right) with human visual similarity ratings in perception (left; N = 20) and memory (right; N = 19). Error bars indicate the standard error of the mean. Each dot indicates a participant.

Source data

Extended Data Fig. 4 DNNs visual and visual-semantic representations account for human representations beyond gender, occupation and age.

A. The mean partial correlations of the RDMs of visual (VGG-ft20), visual-semantic (CLIP) and semantic (SGPT) DNNs with RDMs of human perception (N = 20) and memory (N = 19) of the same identities, when B. gender, C. gender and occupation and D. gender, occupation and age are held out. Error bars indicate the standard error of the mean. See statistical analysis in Extended Data Table 1.

Extended Data Fig. 5 Human representations of AI-generated faces.

A. RDMs of human similarity rating of VGG-generated (left) and CLIP-generated (right) images of the celebrity faces. B. A t-SNE visualization of human RDMs of VGG-generated (left) and CLIP-generated (right) faces C. High correlations between human similarity ratings of VGG and CLIP-generated faces and RDMs of VGG and CLIP’s embeddings of the original faces: D. The mean correlations between the RDM of human similarity ratings of VGG-generated (N = 20) and CLIP-generated faces (N = 20) with Gender, Occupation and Age. Error bars represent the standard error of mean correlations. VGG: VGG embeddings of original faces; VGG-gen: human similarity ratings of the VGG-generated faces. CLIP: CLIP embeddings of original faces; CLIP-gen: human similarity ratings of the CLIP-generated faces. The AI-generated faces cannot be copyrighted and are not shown in the figure. The images can be obtained by contacting the authors.

Source data

Extended Data Fig. 6 Images of the objects used in the study.

The 20 objects that were selected for Experiment 4. For display purpose, all images were replaced by licensed images with similar appearance from Freepik (https://www.freepick.com).

Extended Data Fig. 7 The representations of objects by visual, visual semantic and semantic DNNs.

The RDMs, the correlations between them and t-SNE visualization for objects based on embeddings of the images by VGG trained on ImageNet and CLIP and SGPT embeddings of their dictionary definitions (see supplementary Table 3).

Source data

Extended Data Table 1 Statistical analysis of results reported in Extended Data Fig. 4

Supplementary information

Supplementary Information

Supplementary Figs. 1 and 2, Supplementary Tables 1–3, supplementary methods and supplementary results.

Reporting Summary

Peer Review File

Source data

Source Data Fig. 1

Similarity score (cosine distance) for each pair of familiar faces based on image embeddings of VGGft and CLIP and text embeddings of SGPT.

Source Data Fig. 2

Similarity scores for each pair of familiar faces based on the DNNs (Source Data Fig. 1) and human visual similarity rating in perception and memory.

Source Data Fig. 3

Human visual similarity ratings of AI-generated faces, and data DNNs and human similarity scores of the original images of the same faces.

Source Data Fig. 4

Similarity scores for each pair of familiar and unfamiliar faces based on VGG, CLIP and human similarity rating.

Source Data Fig. 5

Similarity scores for each pair of familiar faces based on the DNNs and human semantic similarity rating for images and names.

Source Data Fig. 6

Similarity scores for each pair of objects based on the DNN image embeddings of VGGft and CLIP and text embeddings of SGPT and human visual similarity rating in perception and memory.

Source Data Extended Data Fig. 1

CLIP image–name cosine distance.

Source Data Extended Data Fig. 2

The distance scores between all pairs of faces based on age, sex, occupation, VGGft, CLIP and SGPT.

Source Data Extended Data Fig. 3

Distance scores between all pairs of faces in each layer of VGGft and CLIP and human visual similarity ratings in perception and memory.

Source Data Extended Data Fig. 4 and Table 1

The distance scores between all pairs of faces based on age, sex, occupation, VGGft, CLIP and SGPT and human visual similarity ratings in perception and memory.

Source Data Extended Data Fig. 5

The distance scores between all pairs of faces based on age, sex, occupation and human similarity ratings of VGG-generated and CLIP-generated faces.

Source Data Extended Data Fig. 7

Similarity scores for each pair of objects based on the DNNs and human semantic similarity rating for images and names.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shoham, A., Grosbard, I.D., Patashnik, O. et al. Using deep neural networks to disentangle visual and semantic information in human perception and memory. Nat Hum Behav 8, 702–717 (2024). https://doi.org/10.1038/s41562-024-01816-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41562-024-01816-9

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing