Abstract
Artificial intelligence (AI) models trained on published scientific findings have been used to invent valuable materials and targeted therapies, but they typically ignore the human scientists who continually alter the landscape of discovery. Here we show that incorporating the distribution of human expertise by training unsupervised models on simulated inferences that are cognitively accessible to experts dramatically improves (by up to 400%) AI prediction of future discoveries beyond models focused on research content alone, especially when relevant literature is sparse. These models succeed by predicting human predictions and the scientists who will make them. By tuning human-aware AI to avoid the crowd, we can generate scientifically promising ‘alien’ hypotheses unlikely to be imagined or pursued without intervention until the distant future, which hold promise to punctuate scientific advance beyond questions currently pursued. By accelerating human discovery or probing its blind spots, human-aware AI enables us to move towards and beyond the contemporary scientific frontier.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
The DOIs of papers used for the electrochemical properties together with the PubMed identifiers of the MEDLINE entries used in our experiments can be found in our GitHub repository: https://github.com/jsourati/accelerate-discoveries. The abstracts of papers for electrochemical properties could not be shared due to copyright issues, but MEDLINE abstracts are accessible through their identifiers from the PubMed website. Source data are provided with this paper.
Code availability
All code for our algorithms can be found in the following GitHub repository: https://github.com/jsourati/accelerate-discoveries.
References
Khadherbhi, S. R. & Babu, K. S. Big data search space reduction based on user perspective using map reduce. Int. J. Adv. Technol. Innov. Res. 7, 3642–3647 (2015).
Sanchez-Lengeling, B. & Aspuru-Guzik, A. Inverse molecular design using machine learning: generative models for matter engineering. Science 361, 360–365 (2018).
Smalley, E. AI-powered drug discovery captures pharma interest. Nat. Biotechnol. 35, 604–605 (2017).
Teruya, E., Takeuchi, T., Morita, H., Hayashi, T. & Ono, K. ARTS: autonomous research topic selection system using word embeddings and network analysis. Mach. Learn. Sci. Technol. 3, 025005 (2022).
Shi, F., Foster, J. G. & Evans, J. A. Weaving the fabric of science: dynamic network models of science’s unfolding structure. Soc. Netw. 43, 73–85 (2015).
Singer, U., Radinsky, K. & Horvitz, E. On biases of attention in scientific discovery. Bioinformatics https://doi.org/10.1093/bioinformatics/btaa1036 (2020).
Tversky, A. & Kahneman, D. Availability: a heuristic for judging frequency and probability. Cogn. Psychol. 5, 207–232 (1973).
Evans, J. S. B. T. Bias in Human Reasoning: Causes and Consequences (Psychology Press, 1989).
Ehrlinger, J., Readinger, W. O. & Kim, B. in Encyclopedia of Mental Health 2nd edn (ed. Friedman, H. S.) 5–12 (Academic Press, 2016).
Chadwick, A. T. & Segall, M. D. Overcoming psychological barriers to good discovery decisions. Drug Discov. Today 15, 561–569 (2010).
Rzhetsky, A., Foster, J. G., Foster, I. T. & Evans, J. A. Choosing experiments to accelerate collective discovery. Proc. Natl Acad. Sci. USA 112, 14569–14574 (2015).
Mikolov, T., Yih, W.-T. & Zweig, G. Linguistic regularities in continuous space word representations. In Proc. 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (eds Vanderwende, L. et al.) 746–751 (Association for Computational Linguistics, 2013).
Perozzi, B., Al-Rfou, R. & Skiena, S. DeepWalk: online learning of social representations. In Proc. 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (eds Macskassy, S. et al.) 701–710 (Association for Computing Machinery, 2014).
Chitra, U. & Raphael, B. Random walks on hypergraphs with edge-dependent vertex weights. In Proc. 36th International Conference on Machine Learning (eds Chaudhuri, K. & Salakhutdinov, R.) 1172–1181 (PMLR, 2019).
Tshitoyan, V. et al. Unsupervised word embeddings capture latent knowledge from materials science literature. Nature 571, 95–98 (2019).
Burger, B. et al. A mobile robotic chemist. Nature 583, 237–241 (2020).
Swanson, D. R. Fish oil, Raynaud’s syndrome, and undiscovered public knowledge. Perspect. Biol. Med. 30, 7–18 (1986).
Swanson, D. R. Medical literature as a potential source of new knowledge. Bull. Med. Libr. Assoc. 78, 29–37 (1990).
Weeber, M., Klein, H., de Jong-van den Berg, L. T. W. & Vos, R. Using concepts in literature-based discovery: simulating Swanson’s Raynaud–fish oil and migraine–magnesium discoveries. J. Am. Soc. Inf. Sci. Technol. 52, 548–557 (2001).
Evans, J. & Rzhetsky, A. Machine science. Science 329, 399–400 (2010).
Digiacomo, R. A., Kremer, J. M. & Shah, D. M. Fish-oil dietary supplementation in patients with Raynaud’s phenomenon: a double-blind, controlled, prospective study. Am. J. Med. 86, 158–164 (1989).
Chiu, H.-Y., Yeh, T.-H., Huang, Y.-C. & Chen, P.-Y. Effects of intravenous and oral magnesium on reducing migraine: a meta-analysis of randomized controlled trials. Pain. Physician 19, E97–E112 (2016).
Chu, J. S. G. & Evans, J. A. Slowed canonical progress in large fields of science. Proc. Natl Acad. Sci. USA 118, e2021636118 (2021).
Davis, A. P. et al. The Comparative Toxicogenomics Database: update 2019. Nucleic Acids Res. 47, D948–D954 (2019).
Morselli Gysi, D. et al. Network medicine framework for identifying drug-repurposing opportunities for COVID-19. Proc. Natl Acad. Sci. USA 118, e2025581118 (2021).
Ghandehari, S. et al. Progesterone in addition to standard of care versus standard of care alone in the treatment of men hospitalized with moderate to severe COVID-19: a randomized, controlled pilot trial. Chest https://doi.org/10.1016/j.chest.2021.02.024 (2021).
Estradiol and progesterone in hospitalized COVID-19 patients https://clinicaltrials.gov/ct2/show/NCT04865029 (2022).
Mehdizadeh Dehkordi, A., Zebarjadi, M., He, J. & Tritt, T. M. Thermoelectric power factor: enhancement mechanisms and strategies for higher performance thermoelectric materials. Mater. Sci. Eng. R. Rep. 97, 1–22 (2015).
Ricci, F. et al. An ab initio electronic transport database for inorganic materials. Sci. Data 4, 170085 (2017).
Smidt, T. E., Mack, S. A., Reyes-Lillo, S. E., Jain, A. & Neaton, J. B. An automatically curated first-principles database of ferroelectrics. Sci. Data 7, 72 (2020).
Belikov, A. V., Rzhetsky, A. & Evans, J. Prediction of robust scientific facts from literature. Nat. Mach. Intell. 4, 445–454 (2022).
Sourati, J. & Evans, J. Complementary artificial intelligence designed to augment human discovery. Preprint at arXiv https://doi.org/10.48550/arXiv.2207.00902 (2022).
Xu, J. et al. Building a PubMed knowledge graph. Sci. Data 7, 205 (2020).
Torvik, V. I. & Smalheiser, N. R. Author name disambiguation in MEDLINE. ACM Trans. Knowl. Discov. Data 3, 1–29 (2009).
Ammar, W. et al. Construction of the literature graph in Semantic Scholar. In Proc. 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol. 3 (Industry Papers) 84–91 (Association for Computational Linguistics, 2018).
Ong, S. P. et al. Python Materials Genomics (pymatgen): a robust, open-source Python library for materials analysis. Comput. Mater. Sci. 68, 314–319 (2013).
Sun, Y., Han, J., Yan, X., Yu, P. S. & Wu, T. PathSim: meta path-based top-K similarity search in heterogeneous information networks. Proc. VLDB Endow. 4, 992–1003 (2011).
Grover, A. & Leskovec, J. node2vec: scalable feature learning for networks. KDD 2016, 855–864 (2016).
Hamilton, W. L., Ying, R. & Leskovec, J. Inductive representation learning on large graphs. In Proc. 31st International Conference on Neural Information Processing Systems (eds Guyon, I. et al.) 1025–1035 (Curran Associates, 2017).
Kipf, T. N. & Welling, M. Variational graph auto-encoders. Preprint at arXiv https://doi.org/10.48550/arXiv.1611.07308 (2016).
Coakley, C. W. Practical nonparametric statistics. J. Am. Stat. Assoc. 95, 332–333 (2000).
Schaffer, R. Study examines progesterone to reduce inflammation in COVID-19. Healio—EndocrineToday https://www.healio.com/news/endocrinology/20200507/study-examines-progesterone-to-reduce-inflammation-in-covid19 (7 May 2020).
Acknowledgements
We thank our funders for their generous support: the National Science Foundation (grant no. 1829366), the Air Force Office of Scientific Research (grant nos. FA9550-19-1-0354 and FA9550-15-1-0162) and DARPA (grant no. HR00111820006). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript. We thank L. Barabasi and D. Morselli Gysi for helpful data related to their network-based forecast of COVID-19 drugs and vaccines with protein–protein interactions25, and A. Jain, V. Tshitoyan and A. Dunn for sharing data and code to help replicate their work on unsupervised word embeddings and latent knowledge about materials science15. We also thank the participants of the Santa Fe Institute workshop ‘Foundations of Intelligence in Natural and Artificial Systems’, the University of Wisconsin at Madison’s HAMLET workshop and colleagues at the Knowledge Lab for helpful comments.
Author information
Authors and Affiliations
Contributions
J.S.: conceptualization, methodology, software, validation, investigation, writing—original draft and visualization. J.A.E.: conceptualization, methodology, writing—original draft, visualization and funding acquisition.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Human Behaviour thanks Chao Min, Roger Guimerà and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Hypergraph-induced transition probabilities, and schematic of our experimental design.
(a-b) Sanity checks for our hypergraph-induced transition probability similarity metric: (a) Between author and conceptual nodes: Histogram of similarities between nodes of two sets of authors and the conceptual node “coronavirus”. The two sets of authors include the authors of 5,000 randomly selected papers from journals Nature Medicine (dark purple) and Applied Optics (light purple) between 1990 and 2019. Similarities between the hypernodes comprise the logarithm of the average transition probabilities with one and two random walk steps. Histograms are plotted considering only non-zero transition probabilities: 92% of the authors of Nature Medicine (28,396 in total) but only 51% of the selected Applied Optics authors (18,530 in total) had non-zero similarity values. Average non-zero similarities associated with Nature Medicine authors (red dashed line) is nearly 5 times larger than that of Applied Optics authors (blue dashed line), implying that based on our hypergraph-induce similarity metric, authors publishing in Nature Medicine write papers much more relevant to coronavirus in comparison with those publishing in Applied Optics. (b) Between two conceptual nodes: Similarities between conceptual keywords shown on the x-axis and “coronavirus”. Similarities between the hypernodes are computed as the average transition probabilities with one and two intermediate nodes. Terms and symptoms known to be more relevant to coronavirus have larger average transition probabilities. (c) Schematic of our experimental design: Starting and ending dates of experiments are shown. For energy-related functions and 100 human diseases, we used the beginning of 2001 as prediction year and the end of 2018 as a single evaluation date (V1). For COVID-19, the prediction year is the beginning of 2020, and we cumulatively reported monthly precision values until July of 2021 (V1 to V19).
Extended Data Fig. 2 Precision-Recall (PR) curves for human-accessible predictions.
Precision-Recall (PR) curves and area under the curves (AUCs) for various human-accessible predictions: energy-related material science properties, that is, thermoelectrics (a), ferroelectrics (b) and photovoltaics (c), therapies and vaccines for COVID-19 (d), and generic drug repurposing (e). Except for COVID-19, we only displayed the PR-AUC values for the selected prediction years skipping the PR curves themselves. Note that for Receiver Operating Curves (ROC) random predictions always result in AUC of 0.5, but the PR-AUC of the random baseline depends on the ratio of positive samples in the data.
Extended Data Fig. 3 Expert density calculataion.
Calculation of expert density between property (node P) and each material (node M). Density is defined as the Jaccard index between the set of authors who have published on the property (denoted by Ap) and those who have mentioned the material in their publications (denoted by AM). The Jaccard formulation involves taking the ratio of the size of the intersection (that is, the number of overlapping authors) denoted by A∩ to the size of the union of the two sets (that is, the total number of authors) denoted by AP∪AM.
Extended Data Fig. 4 Correlations between expert density and time to discovery.
Spearman correlation coefficients between the human expert density (Jaccard index) linking properties with materials and their date of discovery if discovered. Negative correlations imply that materials with higher expert densities are likely to be discovered earlier than others. These results were obtained with the prediction year set to 2001 for energy-related properties and drug repurposing applications, and set to the beginning of 2020 for COVID-19. Turquoise and red bars represent negative and positive correlations, respectively. For seven diseases in the CTD (shown in the bottom of the figure), all discoveries were established in a single year and therefore no correlation coefficients could be obtained. This is because we did not have accurate access to the month or day of discoveries in our database. Results indicate that energy-related properties and COVID-19 all post strong negative correlations. In the case of CTD database, 67 out of 100 diseases (that is, properties) showed statistically significant correlations, among which only one disease had a positive coefficient. The mean correlation coefficients across these 67 diseases was −0.18.
Extended Data Fig. 5 Distribution of human expert densities.
Distribution of human expert densities between discovery predictions and properties: (a) drug repurposing application (considering only the 67 diseases with statistically significant Spearman correlation coefficients, see Extended Data Fig. 3); (b-d) energy-related materials science properties, that is, thermoelectricity, ferroelectricity and photovoltaic capacity, respectively; and (e) therapies and vaccines for COVID-19. Curves measure normalized histograms over the logarithm of human expert densities plotted by fitting a Beta distribution over expert densities for predictions. Solid and dashed vertical lines represent mean values for corresponding densities. It is clear that the distribution of human expert densities for hypergarph-induced metrics (transition probability and deepwalk-based similarity) are concentrated around larger Jaccard index values than word embedding models tracing content alone. In content models, all estimated densities peak at zero (0<a < 1<b, with a,b shape parameters of Beta distributions). CTD diseases are sorted by average expert similarity between them and the complete pool of drugs.
Extended Data Fig. 6 Precision-Recall area under the curve for predicting human discoverers.
Precision-Recall Area Under the Curve (PR-AUC) for predicting the human experts who will discover (discoverers of) materials possessing the following specific properties: (a) thermoelectrics, (b) ferroelectrics, and (c) photovoltaics. Materials selected were among True Positive discovery predictions of our deepwalk-based predictor (α=1). Our evaluation compares scores assigned to candidates and actual discovering experts who ultimately discovered and published the property associated with True Positives. We developed a deepwalk-based scoring function for this purpose. Expert candidates we considered here are those sampled at least once in deepwalk trajectories, produced over our five-year hypergraph. For a discovered material, scores were computed based on the proximity of experts to both property and material. An expert is a good candidate discoverer if she is close (in cosine similarity) to both property and material nodes in the embedding space. Discovered associations whose discoverers were not present in sampled deepwalk trajectories were ignored. In order to summarize the two similarities and generate a single set of human expert predictions, we ranked experts based on their proximity to the property and the material and combined the two rankings using average aggregation. This ranking was used as the final expert score in our PR-AUC computations. We compared the log-PR-AUC of this algorithm with a random selection of experts and also with a curve simulating an imaginary method whose log-PR-AUC is five times higher than the random baseline. Results reveal that predictions were notably superior to random expert selection for all electrochemical properties.
Extended Data Fig. 7 Decaying discoverability in complementary predictions.
Illustration of decaying discoverability for predictions as β, the parameter for human expert avoidance, increases. Discoverability of predictions is measured through computing the precision metric, that is, their overlapping percentage with respect to actual discoveries made after prediction year. Decreasing precision curves and their highly negative Pearson correlation coefficients are shown for (a) thermoelectricity, (b) ferroelectricity, (c) photovoltaics and (d) COVID-19. We also visualize these statistics for the remaining human diseases with a scatterplot of their Pearson correlation coefficients (e).
Extended Data Fig. 8 Discoverability and scientific merit among drug repurposing predictions.
Discoverability and scientific merit for predictions made with varying β values, our parameter for human expert avoidance, in research that repurposes drugs to treat human disease. (a) Precision values for predictions generated with eight levels of β and computed for all 400 human diseases we considered (except COVID-19). Diseases are sorted in terms of the number of relevant drugs. (b) Average theoretical scores measured through protein-protein similarity between diseases and candidate drugs for predictions generated with the same β values. We compute protein-based theoretical scores for 176 diseases out of 400 total cases (44%). In both subfigures, horizontal lines show average values across all diseases.
Supplementary information
Supplementary Information
Supplementary Discussion, Figs. 1–5 and Table 1.
Source data
Source Data Fig. 2
Precision values of predictions.
Source Data Fig. 3
Rank ratio of true positive predictions made by our deepwalk algorithm and not by the baseline.
Source Data Fig. 4
Precision shifts in discovery predictions due to adding authors; precision of predicting discoverers of materials possessing a certain property.
Source Data Fig. 6
Average discovery wait times.
Source Data Fig. 7
Overlapping percentages (precision) and average theoretical scores for predictions generated with different β values.
Source Data Fig. 8
Expectation gaps; joint probability of undiscoverability and plausibility.
Source Data Extended Data Fig. 1
Sanity checks on our hypergraph-induced transition probability similarity metric: between authors and a conceptual node, and between two conceptual nodes.
Source Data Extended Data Fig. 2
Precision–recall curves and area under the curves for predictions made for different properties.
Source Data Extended Data Fig. 4
Spearman correlation coefficients between expert density of properties and materials and their date of discovery.
Source Data Extended Data Fig. 5
Parameters of beta distributions fitted to expert densities of different properties.
Source Data Extended Data Fig. 6
Precision–recall area under the curve for predicting discoverers of a property in a particular material.
Source Data Extended Data Fig. 7
Discoverability (precision) for predictions for different β values.
Source Data Extended Data Fig. 8
Discoverability and scientific merit (plausibility) for predictions made with different β values.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sourati, J., Evans, J.A. Accelerating science with human-aware artificial intelligence. Nat Hum Behav 7, 1682–1696 (2023). https://doi.org/10.1038/s41562-023-01648-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41562-023-01648-z