Quantifying social organization and political polarization in online platforms

Waller, Isaac; Anderson, Ashton

doi:10.1038/s41586-021-04167-x

Article
Published: 01 December 2021

Quantifying social organization and political polarization in online platforms

Nature volume 600, pages 264–268 (2021)Cite this article

19k Accesses
37 Citations
445 Altmetric
Metrics details

Subjects

Abstract

Mass selection into groups of like-minded individuals may be fragmenting and polarizing online society, particularly with respect to partisan differences^1,2,3,4. However, our ability to measure the social makeup of online communities and in turn, to understand the social organization of online platforms, is limited by the pseudonymous, unstructured and large-scale nature of digital discussion. Here we develop a neural-embedding methodology to quantify the positioning of online communities along social dimensions by leveraging large-scale patterns of aggregate behaviour. Applying our methodology to 5.1 billion comments made in 10,000 communities over 14 years on Reddit, we measure how the macroscale community structure is organized with respect to age, gender and US political partisanship. Examining political content, we find that Reddit underwent a significant polarization event around the 2016 US presidential election. Contrary to conventional wisdom, however, individual-level polarization is rare; the system-level shift in 2016 was disproportionately driven by the arrival of new users. Political polarization on Reddit is unrelated to previous activity on the platform and is instead temporally aligned with external events. We also observe a stark ideological asymmetry, with the sharp increase in polarization in 2016 being entirely attributable to changes in right-wing activity. This methodology is broadly applicable to the study of online interaction, and our findings have implications for the design of online platforms, understanding the social contexts of online behaviour, and quantifying the dynamics and mechanisms of online polarization.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Quantifying social dimensions on Reddit.**

**Fig. 2: Macroscale social organization of Reddit communities.**

**Fig. 3: Distribution of political activity on Reddit.**

**Fig. 4: Political polarization of new and existing users.**

**Fig. 5: Ideological asymmetry in online polarization.**

The role of the big geographic sort in online news circulation among U.S. Reddit users

Article Open access 25 April 2023

Lia Bozarth, Daniele Quercia, … Sanja Šćepanović

No echo in the chambers of political interactions on Reddit

Article Open access 02 February 2021

Gianmarco De Francisci Morales, Corrado Monti & Michele Starnini

Chinese online nationalism as imaginary engagement: an automated sentiment analysis of Tencent news comments on the 2012 Diaoyu (Senkaku) Islands incident

Article Open access 04 April 2024

Qiaoqi Zhang & Cheng-Jun Wang

Data availability

All data are available from the pushshift.io Reddit archive²⁸ at http://files.pushshift.io/reddit/. Source data are provided with this paper. Reddit community embedding, social dimension vectors and community scores are available at https://github.com/CSSLab/social-dimensions.

Code availability

All code is available at https://github.com/CSSLab/social-dimensions. Analyses were performed with Python v3.7, pandas v1.3.3 and Spark v3.0.

References

Sunstein, C. #Republic: Divided Democracy in the Age of Social Media (Princeton Univ. Press, 2018).
Iyengar, S. & Hahn, K. S. Red media, blue media: evidence of ideological selectivity in media use. J. Commun. 59, 19–39 (2009).
Article Google Scholar
van Alstyne, M. & Brynjolfsson, E. Electronic communities: global villages or cyberbalkanization? In Proc. International Conference on Information Systems 5 https://aisel.aisnet.org/icis1996/5 (1996).
van Dijck, J. The Culture of Connectivity: A Critical History of Social Media (Oxford Univ. Press, 2013).
McLuhan, M. The Gutenberg Galaxy: The Making of Typographic Man (Univ. of Toronto Press, 1962).
Farrell, H. The consequences of the internet for politics. Ann. Rev. Pol. Sci. 15, 35–52 (2012).
Article Google Scholar
Conover, M. D. et al. Political polarization on Twitter. Proc. Intl AAAI Conf. Web Soc. Media 133, 89–96 (2011).
Google Scholar
Bail, C. A. et al. Exposure to opposing views on social media can increase political polarization. Proc. Natl Acad. Sci. USA 115, 9216–9221 (2018).
Article CAS Google Scholar
Martin, T. community2vec: vector representations of online communities encode semantic relationships. In Proc. 2nd Workshop on NLP and Computational Social Science 27–31 (2017).
Garg, N., Schiebinger, L., Jurafsky, D. & Zou, J. Word embeddings quantify 100 years of gender and ethnic stereotypes. Proc. Natl Acad. Sci. USA 115, E3635–E3644 (2018).
Article CAS Google Scholar
Bolukbasi, T., Chang, K.-W., Zou, J. Y., Saligrama, V. & Kalai, A. T. Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. Adv. Neural Inf. Process. Syst. 29, 4349–4357 (2016).
Caliskan, A., Bryson, J. J. & Narayanan, A. Semantics derived automatically from language corpora contain human-like biases. Science 356, 183–186 (2017).
Article CAS ADS Google Scholar
Kozlowski, A. C., Taddy, M. & Evans, J. A. The geometry of culture: analyzing the meanings of class through word embeddings. Am. Soc. Rev. 84, 905–949 (2019).
Article Google Scholar
Shi, F., Shi, Y., Dokshin, F. A., Evans, J. A. & Macy, M. W. Millions of online book co-purchases reveal partisan differences in the consumption of science. Nat. Hum. Behav. 1, 0079 (2017).
Article Google Scholar
Del Vicario, M. et al. Echo chambers: emotional contagion and group polarization on Facebook. Sci. Rep. 6, 37825 (2016).
Article ADS Google Scholar
Pariser, E. The Filter Bubble: What the Internet is Hiding from You (Penguin, 2011).
Flaxman, S., Goel, S. & Rao, J. M. Filter bubbles, echo chambers, and online news consumption. Public Opin. Q. 80, 298–320 (2016).
Article Google Scholar
Bakshy, E., Messing, S. & Adamic, L. A. Exposure to ideologically diverse news and opinion on Facebook. Science 348, 1130–1132 (2015).
Article MathSciNet CAS ADS Google Scholar
DiMaggio, P., Evans, J. & Bryson, B. Have American’s social attitudes become more polarized? Am. J. Sociol. 102, 690–755 (1996).
Article Google Scholar
Barberá, P., Jost, J. T., Nagler, J., Tucker, J. A. & Bonneau, R. Tweeting from left to right: is online political communication more than an echo chamber? Psychol. Sci. 26, 1531–1542 (2015).
Article Google Scholar
Adamic, L. A. & Glance, N. The political blogosphere and the 2004 US election: divided they blog. In Proc. 3rd International Workshop on Link Discovery 36–43 (2005).
An Examination of the 2016 Electorate, Based on Validated Voters https://www.pewresearch.org/politics/2018/08/09/an-examination-of-the-2016-electorate-based-on-validated-voters/ (Pew Research Center, 2018).
Hawley, G. Making Sense of the Alt-Right (Columbia Univ. Press, 2017).
Simmel, G. Conflict and the Web of Group Affiliations (Free Press, 1955).
Breiger, R. L. The duality of persons and groups. Social Forces 53, 181–190 (1974).
Article Google Scholar
Bourdieu, P. Distinction: A Social Critique of the Judgement of Taste (Routledge, 1984).
Crenshaw, K. W. On Intersectionality: Essential Writings (The New Press, 2017).
Baumgartner, J., Zannettou, S., Keegan, B., Squire, M. & Blackburn, J. The Pushshift Reddit dataset. In Proc. International AAAI Conference on Web and Social Media 14, 830–839 (2020).
Reddit privacy policy Reddit https://www.redditinc.com/policies/privacy-policy (2021).
Kumar, S., Hamilton, W. L., Leskovec, J. & Jurafsky, D. Community interaction and conflict on the web. In Proc. 2018 World Wide Web Conference 933–943 (2018).
Waller, I. & Anderson, A. Generalists and specialists: using community embeddings to quantify activity diversity in online platforms. In Proc. 2019 World Wide Web Conference 1954–1964 (2019).
Levy, O. & Goldberg, Y. Dependency-based word embeddings. In Proc. 52nd Annual Meeting of the Association for Computational Linguistics 2, 302–308 (2014).
Levy, O. & Goldberg, Y. Neural word embedding as implicit matrix factorization. Adv. Neural Inf. Process. Syst. 27, 2177–2185 (2014).
Google Scholar
Schlechtweg, D., Oguz, C. & im Walde, S. S., Second-order co-occurrence sensitivity of skip-gram with negative sampling. Preprint at https://arxiv.org/abs/1906.02479 (2019).

Download references

Acknowledgements

This research was supported by the National Sciences and Engineering Research Council of Canada (NSERC), the Canada Foundation for Innovation (CFI) and the Ontario Research Fund (ORF).

Author information

Authors and Affiliations

Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
Isaac Waller & Ashton Anderson

Authors

Isaac Waller
View author publications
You can also search for this author in PubMed Google Scholar
Ashton Anderson
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

I.W. performed the computational analysis. A.A. and I.W. designed the research, analysed the results and wrote the paper.

Corresponding author

Correspondence to Ashton Anderson.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature thanks Kenneth Benoit, Kate Starbird and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Distribution of community scores.

Left: distributions of communities on the age, gender, partisan, and affluence dimensions. Right: the most extreme communities and words on those dimensions. Word scores are calculated by averaging community scores weighted by the number of occurrences of the word in the community in 2017. Community descriptions can be found in the glossary (Supplementary Table 1).

Extended Data Fig. 2 External validations of social dimensions.

Scatter plots of the external validations of the gender, partisan, and affluence axes. The gender scores for occupational communities are plotted against the percentage of women in that occupation from the 2018 American Community Survey. The partisan scores for city communities are plotted against the Republican vote differential for that metropolitan area in the 2016 presidential election. The affluence scores of city communities are plotted against the median household income for that metropolitan area from the 2016 US Census. The blue line is the best-fit linear regression for the data; the shaded area represents a 95% confidence interval for the regression estimated using a bootstrap. \(p\)-values for correlation coefficients computed using two-sided test of Pearson correlation assuming joint normality.

Extended Data Fig. 3 Further validations of social dimensions.

Clockwise from left: The gap between university and city communities on the age dimension. The distribution of university and city communities on the age dimension; age is strongly related to label (\(r=0.91\), two-sided \(p < {10}^{-58}\), \(n=150\), Cohen’s \(d=4.37\)). The distribution of left and right wing labelled communities on the partisan dimension; partisan is strongly related to label (\(r=0.92\), two-sided \(p < {10}^{-21}\), \(n=50\), Cohen’s \(d=4.89\)). The distribution of explicitly labelled left- and right-wing communities on the partisan-ness axis as compared to the general distribution; there is a large difference in their means (Cohen’s \(d=3.27\)). For violin plots, white dot represents median; box represents 25th to 75th percentile; whiskers represent 1.5 times the inter-quartile range; and density estimate (‘violin’) extends to the minima and maxima of the data. \(p\)-values for correlation coefficients computed using two-sided test of Pearson correlation assuming joint normality.

Extended Data Fig. 4 Distributions of age, gender and partisan scores by cluster.

Distributions of raw age, gender and partisan scores, separated by cluster. Outlier communities that lie more than two standard deviations from the mean are annotated. Dashed lines represent the global mean on each dimension. Community descriptions can be found in the glossary (Supplementary Table 1).

Extended Data Fig. 5 Distributions of affluence, time, sociality and edgy scores by cluster.

Outlier communities that lie more than two standard deviations from the mean are annotated. Dashed lines represent the global mean on each dimension. Community descriptions can be found in the glossary (Supplementary Table 1).

Extended Data Fig. 6 Relationships between online social dimensions.

The relationships between the partisan dimension and (a) gender, (b) age, (c) partisan-ness. Every bar represents a bin of communities with partisan scores a given number of standard deviations from the mean, and the distribution illustrates the scores on the secondary dimension (e.g. gender in (a)). From left to right, the bars represent highly left-wing, leaning left-wing, center, leaning right-wing, highly right-wing communities. The leftmost and rightmost bars are annotated with the number of communities, and examples of the largest communities, in each group. The hex-plot in (c) illustrates the joint distribution of partisan and partisan-ness scores. Labels correspond to the categorizations used in the polarization analysis.

Extended Data Fig. 7 Polarization robustness checks.

(a) The partisan distribution of deleted and non-deleted comments in political communities. (b) The proportion of activity that took place in very left-wing (\(z < -3\)) and very right-wing (\(z > 3\)) communities over time. (c) Alternate version of Fig. 3a generated using a dataset in which the authorship of all comments was randomly shuffled. Each individual bin distribution is extremely similar to the overall activity distribution, showing that the overall activity distribution is a useful reference point for what bin distributions would look like if there were no tendency for users to comment in ideologically homogeneous communities. (d) Average distributions of political activity for authors of comments in the 25 largest political communities on Reddit (by number of comments). (e) Correlation of users’ average partisan scores over time. Each \(\left(x,y\right)\) cell represents the correlation between scores of a user in month \({t}_{x}\) and that same user in month \({t}_{y}\), for all users active in both time periods. A user is only considered active if they make at least \(10\) comments in a month. (f) The relationship between the proportion of users who polarize and the polarization threshold. The polarization threshold is the number of standard deviations a user must increase in polarization to be considered polarized. Three lines are plotted corresponding to three pairs of months; the pairs of months with the minimum (blue), maximum (orange), and median (green) proportion of users polarized when using a threshold of \(1\). A threshold of \(1\) is used in all other calculations. (g) The relationship between the proportion of users who polarize and the comment threshold. The comment threshold is the value used to filter inactive users from the calculation. Users must have at least \(x\) comments in each of the two months to be included in the calculation of the proportion of users who polarize. The same three month pairs are plotted as in part (e). There are minimal differences between different thresholds. A threshold of \(10\) is used in all other calculations.

Extended Data Fig. 8 Distribution of political activity by user group.

The distribution of political activity on Reddit over time by partisan score. Each bar represents one month of comment activity in political communities on Reddit, and is coloured according to the distribution of partisan scores of comments posted during the month (the partisan score of a comment is simply the partisan score of the community in which it was posted.) The top plot includes all activity as in Fig. 3b, while the four following plots decompose this into the subsets of activity authored by particular groups of users. Users are classified based on the average partisan score of their activity in the month 12 months prior–into left-wing (having a score at least one standard deviation to the left), right-wing (one standard deviation to the right), or center. Users with no political activity in the month 12 months prior use the label of the most recent month more than 12 months prior in which they had political activity; if they have never had political activity before, they fall into the new / newly political category (bottom).

Extended Data Fig. 9 Additional measures of ideological asymmetry.

(a) Average polarization (absolute \(z\)-score) of activity in different ideological categories over time. (b) Volume of activity (number of comments) in different ideological categories over time. (c, d) Annual change in polarization in the two partisan activity categories, decomposed into the change attributable to new (\(\varDelta n\)) and existing (\(\varDelta e\)) users as done in Fig. 4.

Extended Data Fig. 10 Implicit polarization.

The relationship between explicitly partisan and implicitly partisan activity (left: left-wing activity; right: right-wing activity.) Of users who were first active in an explicitly partisan community at time \({m}_{E}\), the proportion of them who were first active in an implicitly partisan community at time \({m}_{I}\) is denoted by the colour in cell \(\left({m}_{E},{m}_{I}\right)\). The line graphs at the top show the total proportion of users who were active in implicitly partisan communities before they were active in an explicitly partisan community (i.e. the sum of each column below the diagonal back to 2005, or the total proportion of users for whom \({m}_{I} < {m}_{E}\)).

Extended Data Table 1 Social dimension seeds

Full size table

Supplementary information

Supplementary Information

This file contains Supplementary Tables 1 and 2.

Reporting Summary

Peer Review File

Source data

Source Data Fig. 1

Source Data Fig. 2

Source Data Fig. 3

Source Data Fig. 4

Source Data Fig. 5

Rights and permissions

Reprints and permissions

About this article

Cite this article

Waller, I., Anderson, A. Quantifying social organization and political polarization in online platforms. Nature 600, 264–268 (2021). https://doi.org/10.1038/s41586-021-04167-x

Download citation

Received: 30 September 2020
Accepted: 19 October 2021
Published: 01 December 2021
Issue Date: 09 December 2021
DOI: https://doi.org/10.1038/s41586-021-04167-x

This article is cited by

How individuals’ opinions influence society’s resistance to epidemics: an agent-based model approach
- Geonsik Yu
- Michael Garee
- Yuehwern Yih
BMC Public Health (2024)
Understanding political divisiveness using online participation data from the 2022 French and Brazilian presidential elections
- Carlos Navarrete
- Mariana Macedo
- César A. Hidalgo
Nature Human Behaviour (2023)
United States politicians’ tone became more negative with 2016 primary campaigns
- Jonathan Külz
- Andreas Spitz
- Robert West
Scientific Reports (2023)
Political mud slandering and power dynamics during Indian assembly elections
- Sarah Masud
- Tanmoy Charaborty
Social Network Analysis and Mining (2023)
Growing polarization around climate change on social media
- Max Falkenberg
- Alessandro Galeazzi
- Andrea Baronchelli
Nature Climate Change (2022)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.