Introduction

STEM higher education suffers from issues of attrition and academic disparities that threaten to undermine the adequate supply of skilled workers to keep up with societal demand1. Nearly half of bachelor’s degree-seeking students leave STEM fields2, and those who persist and perform best disproportionately come from advantaged backgrounds3,4. Finding ways to help all students thrive in STEM environments is a major goal of the science community5,6.

One proposed avenue to promote student persistence and academic success is through Learning Communities (LCs)7,8,9,10. In 2019, an estimated 13% of first-year students and 22% of seniors reported having participated in some form of LC11. In general, institutions construct LCs by grouping students through some combination of shared courses, a residential living component, active learning strategies, and informal activity with the goal of promoting student academic and social integration12,13,14. Without the aid of structured interventions such as LCs15, students must make connections and find support to navigate the new college and STEM environment largely on their own. Yet, while LCs have been shown to promote performance and persistence in STEM, their direct impact on social integration in terms of student friendships remains unclear8,10,16,17,18.

Indeed, despite intuition regarding how LCs may guide the development of friendships, there is surprisingly little evidence establishing a causal relationship. Correlational studies have linked LC participation to positive relational outcomes, such as increased socializing19, heightened exposure and network formation20, and social support21. However, because universities often make LC participation voluntary, confounds between the types of students who opt into LCs and student outcomes are inevitable, thereby precluding causal inferences8. Thus, the effect of LCs on friendship has yet to receive the rigorous causal evaluation needed to demonstrate their promise.

In this study, we extend prior work on college LCs by exploring the causal impact of LCs on friendship networks in a unique setting where students are assigned to participate in a LC using a strict SAT math score cutoff. This enables us to utilize a regression discontinuity design (RDD) that can credibly support causal inferences because assignment to treatment creates a scenario that is “as good as randomized” for individuals proximate to the threshold22,23. Accordingly, results from this study advance our understanding of the link between organizational practices and relational outcomes generally, and specifically its importance for helping students build meaningful connections with peers in STEM.

Background

While social integration is important throughout college, it is particularly critical during the freshman year when students transition into postsecondary schooling12,24. Advocates of LCs point out that social integration can improve motivation25,26 and provide access to resources and information needed to succeed academically27,28. Although social integration has multiple dimensions29, encompassing faculty, staff, and peer interactions12,30, we focus on the friendship networks that first-year students develop within their major. Friendships are crucial in college31,32 as they represent some of the strongest influences on students’ attitudes, values, and behaviors33. Within the STEM context, friendships are a key factor promoting persistence34,35 and academic success28,36.

Evidence from the K-12 context provides reason to suspect that LCs promote friendship by acting as “foci” to structure student interaction patterns37. According to focus theory, foci are any “social, psychological, legal, or physical entity around which joint activities are organized” and which “actively bring people together or passively constrain them to interact” (Ref.38, pp. 1016, 1018). By enhancing proximity and promoting regular interaction opportunities, foci are a powerful force behind the development of positive sentiments, relationships, and their change throughout the life course39,40,41. Within secondary schools, research has demonstrated the importance of tracking42 and clustered sets of courses43 for understanding processes of friendship formation among students. Likewise, some work within higher education has also highlighted the association between shared classes44 and majors45 and the relationships that arise. Thus, the active manipulation of which students attend class together—at the core of the LC model—will likely shape which friendships emerge.

Based upon insight from focus theory, we expect the LC to concentrate friendships among students assigned to the same courses. We also expect the community cultivated by the LC to lead to more friendships and friendships that are stronger and more group-based than outside the LC. However, an often-overlooked side effect of these processes is that some potential friendships will be inadvertently discouraged. Students placed in the same classroom are primed for friendship while those placed in different classrooms face a structural barrier42,46,47. Hence, the LC may create divisions within the student body, which can exacerbate inequality48,49. Our analysis considers multiple friendship network outcomes—size, strength, structure, and composition (see “Methods” section for details)—as a way to evaluate the intended goal of social integration, while being cognizant of such unintended consequences19.

Our results indicate that participating in the LC led to an additional friend in the major, although this effect was only marginally significant. In addition, LC participation led to a 54 percentage-point increase in students’ share of friends in the LC. We did not find evidence that participating in the LC altered the strength or structure of students’ friendship networks. Follow-up mediation analyses substantiate the theoretical expectation that increased opportunity for interaction brought about through the LC’s block-registration into classes is the main mechanism responsible for the observed differences in friendship network outcomes.

Methods

Data and setting

Data come from two sequential cohorts of first-time entering biological sciences freshmen at a large, selective, public R1 university in the Western United States. The case under study represents a diverse environment in terms of race/ethnicity, socioeconomic background, and gender. Namely, the major cohort across years is predominantly female, with around half of students considered first-generation, and about 30–40% classified as underrepresented minorities (URM) in terms of racial/ethnic status. During the final week of Fall term, electronic surveys were sent to the entire freshman cohort (LC participants and non-participants) to collect data on friendship ties within the major and various aspects of student background to serve as controls (> 93% response rate). Information on LC participation and additional student demographic data was provided by the university. The study design and procedures were reviewed and approved by the Institutional Review Board of the University of California, Irvine.

Learning community design

For each cohort, the department implemented a simple LC program by block-registering participating students into the same introductory biology and chemistry courses (see SI Appendix Sect. S1). Eligibility for placement into the program was determined using a strict SAT math cutoff score because prior institutional research had identified this metric as a strong predictor of performance and persistence in the major. Each year the freshman cohort consisted of around 1000 students, and approximately 300 students (or one-third) below the cutoff were assigned to participate in the LC. Along with being placed into the same biology and chemistry courses, all participating students took an additional seminar together that met weekly for one hour. Students were split into groups of about 30 students for these weekly meetings that were designed to promote study skills, career advice, and help with navigating the academic environment. This LC design is relatively easy, low-cost, and the predominant model on large campuses, compared to more extensive LCs utilized in smaller settings7.

Friendship network measures

We draw upon four basic egocentric measures of students’ friendship networks50. Network size refers to the number of friends with whom a focal student is connected and is measured using total degree, where we do not differentiate who named whom as a friend (the focal student or the peer)51. Tie strength reflects the idea that relationships vary along dimensions such as closeness, intensity, and meaningfulness and is measured through the count of mutual ties—whereby both students acknowledge the relationship by naming each other as friends27,52,53. Network structure recognizes that students not only have friends, but that those friends may be connected to one another. We use density as our measure of network structure, calculated as the number of observed ties among a focal student’s friends divided by the number of potential ties51. Finally, network composition refers to the characteristics of people in one’s network (e.g., how homogenous one’s friends are). We use the proportion of friends in the learning community as our measure of network composition since, given the design of the LC, we expect opportunities and subsequent friendships with LC participants to vary greatly depending upon whether a student belongs to the LC.

Analytic strategy

Using survey and administrative data from two consecutive first-year cohorts, we tested the LC effect on friendship in two steps. First, we estimate the causal effect of LC participation through a RDD that effectively compares friendship outcomes among students whose SAT math scores placed them just above versus just below the LC threshold. Second, we estimated a series of social network models that test whether the impact of LC participation on friendship was mediated by LC organizational factors, versus alternative mechanisms that may have coincided with the assignment of students to courses and sections (i.e., potential confounds). This second step replicates the findings of the RD analysis and offers insight to how the LC had its observed effects.

Regression discontinuity approach

The regression discontinuity (RD) approach has been widely used in social science as a compelling quasi-experimental design to estimate program impacts when eligibility to a treatment is determined by whether an individual’s score exceeds a designated threshold or cut-point23. This creates a situation that approximates a “local randomization”22, where the major premise is that within a specified bandwidth around the cutoff, individuals would not be expected to differ significantly from one another, other than eligibility to program participation. In the case of this study, RD is warranted because the program uses a specific cutoff score to determine each student’s eligibility to participate in the LC; freshmen with SAT math scores below a cutoff of 600 were assigned to participate in the LC. If we assume the underlying relationship between SAT math score and friendship network measures follows a continuous relationship, and nothing other than the LC participation varies discontinuously at the cutoff, then we may attribute any observed discontinuity in friendship network outcomes at the cutoff to LC participation.

To deal with issues of noncompliance where a small proportion of students below the cutoff were exempted from participating in the LC (see SI Appendix Sect. S3), we use a fuzzy RD design. Specifically, we use learning community eligibility as an instrumental variable for actual participation in the first-year program with a two-stage least squares strategy54. Namely, we derive estimates of the “local average treatment effects”55 (or LATE) through a pooled local polynomial regression within a bandwidth of ± 70 points. For all models measuring the causal impact of the intervention on the four network outcomes, we draw upon the following equations:

$$\begin{aligned} {\text{Enroll}}_{{\text{i}}} & = \gamma_{0} + \, \gamma_{{1}} \left( {{\text{Below}}_{{\text{i}}} } \right) + \gamma_{{2}} \left( {{\text{SAT Math Distance}}_{{\text{i}}} } \right) + \gamma_{{3}} ({\text{SAT Math Distance}}_{{\text{i}}} \, *\,\,{\text{SAT}}\,\,{\text{Math Distance}}_{{\text{i}}} ) \, \\ & \quad + \, \gamma_{{4}} \left( {{\text{Below}}_{{\text{i}}} *{\text{ SAT Math Distance}}_{{\text{i}}} } \right) + {\text{X}}_{{\text{i}}} + \mu_{{\text{i}}} \\ \end{aligned}$$
(1)
$$\begin{aligned} {\text{Y}}_{{\text{i}}} & = \delta_{0} + \delta_{{1}} (\widehat{Enroll}_{{\text{i}}} ) +_{ } \delta_{{2}} \left( {{\text{SAT Math Distance}}_{{\text{i}}} } \right) + \,\delta_{{3}} ({\text{SAT Math Distance}}_{{\text{i}}} {\text{*SAT Math}}\,\,{\text{Distance}}_{{\text{i}}} ) \\ & \quad + \delta_{{4}} \left( {{\text{Below}}_{{\text{i}}} *{\text{ SAT Math Distance}}_{{\text{i}}} } \right) + {\text{X}}_{{\text{i}}} + \varepsilon_{{\text{i}}} \\ \end{aligned}$$
(2)

Equation (1) represents the first stage of the regression, where we predict LC enrollment as a function of eligibility for placement. Belowi is a binary variable indicating whether the student was assigned to the LC based on SAT math score eligibility; SAT Math Distancei is the difference between the student’s math SAT score and the cutoff threshold (i.e., 600); SAT Math Distancei * SAT Math Distancei is a quadratic term that allows for nonlinear relationships between the running variable and the outcome; Belowi * SAT Math Distancei is an interaction term that allows different slopes above and below the threshold; Xi is a vector of individual-level covariates as outlined above. Equation (2) represents the second stage of the regression, where we use the predicted probability of enrollment to estimate the local average treatment effect as indicated by the δ1 coefficient. We estimate the impact of the LC on each network outcome separately using the ivregress command in STATA version 16.1 (https://www.stata.com).

Social network analysis

We used an ERGM56 to estimate the factors that promoted friendships between students at the end of their first quarter on campus. The ERGM considers all possible directed dyads among the sample of students, where an i → j friendship was modeled separately from a j → i friendship. The model estimates the probability of observing a given network conditioned on the set of effects present in the model. We use two types of effects: nodal covariates represent student characteristics (e.g., LC participation, gender) and dyadic covariates represent similarity (i.e., homophily) or co-presence of students (e.g., in the LC, classes). Specific effects are listed in SI Appendix Sect. S4. Estimated coefficients are interpretable as the log-odds of observing a friendship in a given dyad conditional on the rest of the network. For a given effect, exponentiating the estimated coefficient indicates how a one-unit change affects the odds of a tie, assuming all other model effects remain constant. We estimated a separate ERGM for each first-year student cohort using the statnet package in R version 4.1.0 (https://www.r-project.org)57.

Research ethics

The study design and procedures were reviewed and approved by the Institutional Review Board. All research was performed in accordance with relevant guidelines and regulations. The need of informed consent was waived by the Institutional Review Board of the University of California, Irvine, due to registration under exemption category 1.

Results

Descriptive evidence

Figure 1 presents the friendship networks and distributions of network outcomes for the full set of first-year students (see “Methods” section for details). Descriptively, we find that LC students were more socially integrated, with significantly more friends and a greater share of friends in the LC compared to non-participants across years (panels c,d,i,j), but more mutual ties (panels e–f) and more dense networks (panels g,h) in only one of the years (see SI Appendix Fig. S1). Additionally, in examining the odds of having no friends (i.e., being an “isolate”) in the major, LC participants were 50% less likely to be an isolate, compared to non-participants (p < 0.01; SI Appendix Fig. S2). The sociograms in panels a-b make clear the network segmentation based on LC status, which is stronger in Year 2 (see SI Appendix Sect. S1).

Figure 1
figure 1

The biological sciences friendship networks and distributions of network outcomes by year. (a,b) Omit isolates and highlight segmentation of LC from non-LC students, which is stronger in Year 2, likely due to programmatic changes (see SI Appendix Sect. S1). The Year 2 network is also more densely connected, with an average outdegree of 2.93, vs 1.72 the prior year, which we attribute to differences in the survey instrument (see SI Appendix Sect. S1). (cj) reflect the distribution of each of our network outcomes in each year. Node, line, and density plot colors indicate LC students (red) and non-LC students (black) in each panel.

Impact of LC on friendship network outcomes

While the descriptive statistics presented above provide support for the positive association between LC participation and friendship development, it is unclear whether observed differences are due to the impact of the LC or baseline differences between LC participants and non-participants. Figure 2 visualizes the discontinuity of the four network outcomes at the SAT cutoff, where quadratic prediction lines are fitted within a bandwidth of ± 70 points around the threshold. Overall, we find visual evidence for a discontinuity in network size and network composition at the cutoff, but no discontinuity in tie strength or network structure. These patterns are supported by statistical estimates of the local average treatment effect (LATE) based on pooled local polynomial regressions (see SI Appendix Fig. S3): LC participation led to an additional friend in the major (p < 0.10) as well as a 54 percentage-point increase, on average, in the share of friends in the first-year program (p < 0.001). No significant effects were observed for the count of mutual ties (p > 0.10) or network density (p > 0.10).

Figure 2
figure 2

RD plots of friendship network outcomes at SAT math score cutoff. In (ad), RD plots are generated on the pooled sample across years, using the rdplot command in STATA with bin size selected using the default esmv method (see Ref.58 for details). While average outcomes within each bin are plotted on the entire sample, predicted quadratic lines are only fitted to those within ± 70 points around the eligibility threshold. Fitted lines adjust for mass points in the data and control for the following: race/ethnicity, gender, first-generation student status, low-income status, high school GPA, Fall term cumulative GPA, survey completion status, and year.

To aid interpretation of our RD results, we use the LATE estimates to calculate predicted network outcomes for LC participants compared to non-LC participants. As shown in SI Appendix Fig. S4, LC participants are expected to average 4.25 fellow first-year majors as friends compared to 3.25 for their non-LC counterparts. In addition, the LC affected whom students befriend: LC participants are predicted to have almost 70% of their friends in the LC, whereas their similar non-LC peers are predicted to have less than 20% of their friends in the LC program. Together, these results demonstrate that the LC had friendship network size and segmentation effects for students around the cutoff.

Given we found a marginally significant effect of LC participation on network size, in the next section, we specifically test the hypothesized mechanism through which the LC shaped friendship volume: namely, heightened opportunity to interact brought about through block-registration. Because the RD approach assumes that no other meaningful differences exist that could explain the gap at the cutoff, the follow-up network analyses provide additional validation by explicitly modeling alternative explanations that could plausibly lead to the differences we observe.

Network mediation analysis

Having demonstrated the effect of the LC on student friendships, we turn to testing the proposed mechanism by which the LC operated. This mediation analysis uses the full network of students each year and an exponential random-graph model, or ERGM56. Parameter estimates reflect the likelihood that a friendship will be present, versus absent, in a given dyad based on a given effect. Marginal effects are used to test for mediation59.

ERGM findings mirror the causal analysis. LC participants had significantly more friends overall, and significantly more friends in the LC than non-participants. As shown for the Base model in Fig. 3 (M1), the odds of a friendship were 1.1–1.3 times greater for LC participants vs. non-participants (panel a) and LC participants were 8–11 times more likely than non-participants to be friends with LC students (panel b).

Figure 3
figure 3

ERGM estimates testing alternative explanations and mediation of LC on network size and segmentation. Model 1 is the base model and demonstrates the main LC effect. Models 2–4 introduce measures of sociality, homophily, and same high school separately. Model 5 includes all alternative explanations simultaneously. Model 6 introduces the foci variables while controlling for all alternative mechanisms. (a,b) Provide the coefficients from the specified ERGMs for network size and segmentation, respectively (see SI Appendix Sect. S4 for details). Plotted distance from zero corresponds to the direct magnitude of the LC effect on each network outcome across models. Point estimates and 95% confidence intervals correspond to Year 1 (light red) and Year 2 (dark red) in both panels. Full model results in SI Appendix Figs. S16 and S17.

While we hypothesize that these effects are driven by the LC serving as a foci for friendship activity, other possibilities exist. (1) The LC may have concentrated more sociable groups of students who would otherwise make more friends even in the absence of the LC program28. (2) The LC may have drawn students who shared greater similarities than students outside the LC and hence would be more likely to become friends even in the absence of the LC program. Coupled with the power of homophily in driving friendship60, this could have created a more fertile friendship environment within the LC. (3) LC participants may have been more likely than non-participants to know one another before entering the LC program.

Models 2–5 test for these possibilities and show that they largely affected friendship in the expected manner (see SI Appendix Sect.  S4). In at least one of the years, first-generation, low-income, and commuter students named fewer friends, while higher GPA and female students named more friends, net of LC participation. Students were also likely to befriend peers who were similar in race/ethnicity, first-generation status, high-school GPA, and gender, as well as retain friends from high school. However, none of these alternative explanations accounted for the observed effect of LC participation on friendship (AME results described in SI Appendix Sect. S4 support this inference).

Model 6 introduces effects to account for the assignment of LC students to specific classes and sections. These are powerful forces driving friend selection: students were approximately 1.5–4 times more likely to befriend a classmate, and LC students were 8.5–12 times more likely to befriend someone in their same section (see SI Appendix Sect. S4). As shown in Fig. 3, with the introduction of foci effects in M6, the positive estimates of LC participation on network size (panel a) and segmentation (panel b) disappeared. The marginal effect estimates support this inference and indicate that foci mediated all of the effect of the LC on both network size and segmentation, revealing a suppression effect. Combined, these findings support our theoretical argument that the LC served as a foci that provided opportunities and support for friendship development.

Discussion

In this study, we combined a quasi-experimental approach with social network analysis to understand how learning communities shape friendships within a STEM major. Based on the regression discontinuity design, our analysis offers evidence that the LC led to an additional friend in the major, although the effect was only marginally significant. In addition, we found that LC participation significantly and substantially increased the segmentation of student friendship networks. By contrast, students developed equally close and group-based friendships regardless of LC participation. The latter may be a product of the first-year environment, where it is paramount for students to rebuild their networks and develop sources of companionship and support. Such a drive may be so fundamental61 that it can be met regardless of the added interaction opportunities provided by the LC.

These findings have important policy implications for efforts to facilitate connections among college students31,32, specifically within STEM majors28,35. Network science demonstrates how the interplay of friend selection62 and influence processes36,63 contribute to academic performance differentials in schools28, potentially exacerbating gaps among students who enter college at varying achievement levels48. College administrators should be attentive to these dynamics when designing curricular interventions64,65. Namely, the size and segmentation effects found here could have both intended and unintended academic consequences19.

On one hand, students are likely to benefit from the additional friend in the major27 that the simple LC helped to promote. Friends provide important help and support with the adjustment during the transition to college12,13,31. Particularly in competitive STEM majors, friends can be valuable sources of social capital by improving access to academic resources27,36 and fostering a heightened sense of belonging66. As a result, LCs offer to improve persistence and success in STEM through their impact on the social integration of entering students10,28,35.

On the other hand, by using SAT math score as the eligibility criterion, the LC promoted some friendships at the expense of others. Namely, through block-scheduling of coursework, the LC effectively sorted friendships by prior performance, making relationships between higher- and lower-performing students less likely42,46. Thus, findings from this study have significance for discussions around curricular interventions that actively sort individuals into peer groups, such as remedial education, “ability” grouping, or tracking, by revealing how such interventions may affect patterns of relational ties. Because friendships represent a unique influence on the lives of students27,31,33—distinct from roommates67 and larger classroom or peer groups68,69,70—the structuring of friendships could magnify STEM academic disparities71 by inhibiting diverse networks inside and outside the classroom72,73,74.

Our analysis is not without limitations. Although our quasi-experimental approach represents a more rigorous investigation relative to past work, the estimated effect is local and only applicable to individuals around the threshold. Future work may wish to extend our analysis by conducting randomized controlled trials that would enable us to estimate the average treatment effects of the LC. In a similar vein, our in-depth analysis relies upon data from one STEM major and at one location. Future research would benefit from focusing on a broader set of majors and across institutional types. Finally, whereas we utilized mutuality as one measure of tie strength, we acknowledge that there may be other ways to capture this construct that future work could explore75,76.

Notwithstanding these limitations, the present study makes important theoretical and empirical contributions regarding the role of foci in structuring networks. First, our results reveal that by constructing foci, network interventions may have both intended and unintended consequences for group outcomes65,77. Thus, for higher education administrators, manipulating coursework is a powerful form of network engineering that requires attentiveness to potential social as well as academic consequences. Second, we show that even foci with relatively low levels of constraint on interpersonal interaction can shape social relationships in significant ways38. While the current LC design did not impact mutuality or friendship density, future interventions may be able to impact these outcomes by focusing interactions more intensely. Such efforts could include placing students into study partnerships or groups that are even smaller than the classrooms and ~ 30-person study sections in the observed LC. Such actions could be especially fruitful for fostering network connections and supporting social integration for students from diverse backgrounds and other groups historically at greater risk of STEM attrition.