Introduction

Social interactions among individuals are common in both wild and domestic populations, and in animals, plants and microorganisms (Frank, 2007). With social interactions, the trait value of an individual may be affected by genes in other individuals, a phenomenon that works out as indirect genetic effects (IGEs; Griffing, 1967, 1976; Moore et al., 1997; Wolf et al., 1998). An IGE is a heritable effect of one individual on the trait value of another individual (reviewed in Wolf et al., 1998; Bijma, 2011a). A well-known example is the maternal genetic effect of a mother on preweaning growth rate of her offspring (Willham, 1963; Falconer, 1965; Kirkpatrick and Lande, 1989).

IGEs may have significant effects on the rate and direction of response to selection, and can substantially increase or decrease heritable variation in a trait (Griffing, 1967; Moore et al., 1997; Bijma and Wade, 2008; McGlothlin and Brodie III, 2009; Wilson et al., 2011; Bijma, 2011b). Thus, knowledge of IGEs is essential for understanding response to selection in socially affected traits. The magnitude of IGEs can be estimated using linear mixed models that include a direct genetic effect for the individual producing the record, and an IGE for each of its social partners (Arango et al., 2005; Muir, 2005; Bijma et al., 2007). This approach has been used both in agricultural populations of animals and plants (see, for example, Muir, 2005; Costa e Silva et al., 2013), and in natural populations (see, for example, Wilson et al., 2011). Bijma (2010a) showed that estimation of genetic parameters for IGEs in group-structured populations can be optimized by placing two families in each group. Such schemes are an attractive breeding design because they also yield a relatively high response to selection (Odegard and Olesen, 2011).

In the linear mixed model commonly used to estimate IGEs (Muir, 2005), it is assumed that an individual expresses the same IGE on each of its social partners, irrespective of whether a partner is its family member or an unrelated individual. Kin selection theory, however, predicts that individuals behave more cooperatively towards their relatives, because this increases their inclusive fitness (Hamilton, 1964). Hence, IGEs expressed on kin may differ systematically from those expressed on strangers; they may differ not only in average level, but also show incomplete correlation. Empirical evidence indeed suggests that kin recognition and preferential behaviour towards kin are widespread in both animals and plants (see, for example, Holmes and Sherman, 1982; Hepper, 1986; Olsen, 1989; Dudley and File, 2007; Biedrzycki and Bais, 2010), and at least four mechanisms for kin recognition have been described (Tang-Martinez, 2001; Mateo and Holmes, 2004; Mateo, 2004; Coffin et al., 2011).

When individuals express a different IGE on kin versus strangers, estimated breeding values for direct and indirect effects from the common linear mixed model are incorrect, and selection based on those estimates will yield suboptimal response. Moreover, when IGEs are estimated from groups composed of strangers (see, for example, Ellen et al., 2008), the resulting estimates may not accurately reflect the IGEs that occur in the relevant natural or domestic populations, which may consist of kin groups. In natural populations, limited dispersal often leads to interactions among relatives (Hamilton, 1964), whereas in livestock populations such as domestic pigs, groups often contain a number of family members (Chen et al., 2008). Thus, a potential difference between IGEs on kin vs strangers is relevant for both livestock and natural populations. The current statistical methods for estimating IGEs, however, ignore the dependency of IGEs on relatedness.

Here we propose a model for traits affected by IGEs that differ between kin and strangers, investigate whether genetic parameters of that model are statistically identifiable and develop statistical models to estimate those parameters. First we show that the full set of genetic parameters is not identifiable when IGEs differ between kin and strangers. Subsequently, we developed a reduced model, and showed that the reduced model can estimate meaningful linear combinations of the genetic parameters. In the Discussion, we consider population structures that may allow estimating the full set of genetic parameters.

Quantitative genetic model

Trait model

This section introduces the trait model when IGEs differ between kin and strangers. We consider here a population stratified into groups of n members each, where interactions occur within groups. We consider the scheme that is optimal for the estimation of IGEs in the absence of kin recognition (Bijma, 2010a). In this scheme, each group is composed of members of two families, each family contributing n/2 individuals. Generalization of results to other group structures is addressed in the Discussion.

In traditional quantitative genetics, the phenotypic value of individual i is the sum of a heritable component, Ai, known as breeding value, and a nonheritable residual, Ei (Falconer and Mackay, 1996; see Table 1 for a notation key),

Table 1 Notation keya

with IGEs that do not depend on relatedness, the phenotype of an individual stems from two components: a direct effect originating from the individual itself, and the sum of indirect effects originating from each of its n−1 group mates (Griffing, 1967),

where i denotes the focal individual, j a group mate, AD,i the direct genetic effect (DGE) of i, ED,i the corresponding non-heritable direct effect, AS,j the IGE of group mate j and ES,j the corresponding non-heritable indirect effect (subscript S, suggesting ‘social’, is used to denote indirect effects instead of a subscript I, to avoid confusion of i with I; Equation 2 is known as a variance component model of IGEs, as opposed to a trait-based model. See McGlothlin and Brodie III, 2009 for a comparison of models). Equation 2 contains two kinds of genetic effects, direct effects, AD, and indirect effects, AS. Hence, fitting Equation 2 involves the estimation of three genetic variance components; , and (throughout, σ2 denotes a variance and σ a covariance).

With different interactions among kin versus strangers, two types of IGEs may be distinguished: IGEs on kin versus IGEs on strangers. In our population structure, where n/2 members of each family make up a group, the trait model becomes:

where j denotes a family member of i, k a member of the other family in the group, the number of group mates of i from its own family, n/2 the number of group mates of i from the other family, subscript ‘Sf’ denotes IGEs on family members and subscript ‘Su’ denotes IGEs on members of the other, unrelated, family (u indicating ‘unrelated’). Equation 3 contains three genetic effects: direct effects, AD, IGEs on family members, , and IGEs on strangers, ASu. Hence, fitting Equation 3 involves the estimation of six genetic variance components: three variances , and and three covariances , and . The genetic correlation between an individual’s IGE on kin and its IGE on strangers, , reflects the difference between IGEs on kin and strangers. Equation 3 does not explicitly include a potential difference in the mean value of the IGE on kin vs strangers, because this has little consequences for the estimation of genetic parameters. Nevertheless, such a difference is relevant in statistical data analysis, and can be accommodated easily in the fixed effects part of the model (see Discussion).

Total breeding value and heritable variation

This section presents the heritable variation available for response to selection in a trait when IGEs differ between kin and strangers.

Irrespective of the trait model, response to selection in any trait can be expressed as

where R is the genetic change in mean trait level from one generation to the next because of selection, the change in mean total breeding value (AT) of the population, ι the intensity of selection, ρ the accuracy of selection and the s.d. in total breeding value (Bijma, 2011a); an equivalent expression in terms of a selection gradient can also be found there, and may be more appropriate for natural populations). In the context of Equation 4, the accuracy of selection is the correlation between an individual’s value for the selection criterion and its total breeding value (this definition applies to any selection criterion; see Falconer and Mackay, 1996 for further explanation of the ‘accuracy of selection’). The total breeding value represents the average impact of an individual’s genes on the mean trait value of the population, and is a generalization of the traditional breeding value to account for IGEs and to allow modelling of so-called emergent traits (Bijma, 2011b). Thus, analogous to the classical breeding value, the total breeding value represents an individual’s value for response to selection. As illustrated in Equation 4, in which ι and ρ are standardized parameters, the s.d. in total breeding value represents the intrinsic potential of a population to respond to selection.

For any trait model, the total breeding values follow from the genetic mean of the population (Bijma, 2011b). From Equation 3, the genetic mean of the trait value for our population structure equals

Therefore, following Bijma (2011b), an individual’s total breeding value is the sum of its DGE, 1/2n−1 times its IGE on family members, and 1/2n times its IGE on strangers,

Taking the variance of the total breeding value yields an expression for the heritable variation available for response to selection,

Note that does not reflect the additive genetic component of phenotypic variance, but the heritable variation that determines the potential of a population to respond to selection (see Equation 4 and Bijma, 2011b).

An individual’s total breeding value can be partitioned into a family component, , which summarizes all its heritable effects on family members (including the direct effect on itself) and is considered the family breeding value here, and a non-family component, , the non-family breeding value. This partitioning will be used below, where the family components of the total breeding value will be grouped for reasons of statistical identifiability. With each family contributing, 1/2n group members

Taking the variances of Equation 7 yields

Variance component estimation

Genetic parameters can be estimated using a linear mixed model including correlated random genetic effects, the so-called animal model (Henderson, 1953, 1975; Lynch and Walsh, 1998). The classical animal model includes DGEs only, but can be extended with IGEs (Muir, 2005).

Full model

The full model includes DGEs, IGEs on family members and IGEs on strangers,

where b is a vector of fixed effects with incidence matrix X, aD is a vector of DGEs with incidence matrix ZD linking observation on individuals to their own DGE, is vector of IGEs on family members with incidence matrix linking observations on individuals to the IGEs of their group mates belonging to the same family, is vector of IGEs on strangers with incidence matrix linking observations on individuals to the IGEs of their group mates belonging to the other family, g is a vector of random group effects, with and incidence matrix W linking records to groups, and e is a vector of residuals with , where I is an identity matrix. The covariance structure of the genetic terms is

where

where indicates the Kronecker product of matrices, and A is a matrix of additive genetic relationships between individuals, the so-called numerator relationship matrix (Henderson, 1985).

When fitting the full model, the results showed that there are multiple parameter combinations that give the same likelihood. Hence, using this model, the genetic parameters are statistically nonidentifiable. In particular, results showed that the variance of IGEs on strangers, , is identifiable, but that the variance components referring to interactions between family members, , and , are fully confounded. We investigated why this occurs and found that there are only five informative genetic covariances in the data, but six genetic parameters to estimate (see Appendix A). Thus, when IGEs differ between kin and strangers, it is not possible to estimate all six genetic parameters from group-structured data. This is not a problem of the estimation method, but a property of the data structure and occurs when group composition with respect to family is the same for all groups (see Discussion and Appendix A). Thus, the data structure that is optimal for estimating the variance of IGEs that do not depend on kin renders the estimation of kin-dependent IGEs impossible. In the Discussion, we consider alternative schemes that may allow estimating all parameters of the full model. Note that the variance structure given above for the residual of Equation 9 ignores the distinction between indirect effects on kin vs strangers. However, as the full model is nonidentifiable, we did not further investigate this issue.

Reduced model

Because the full model was not identifiable, we investigated a reduced model, aiming to estimate part of the genetic parameters or meaningful linear combinations. As the full model indicated that the effects because of the focal family were fully confounded, we fitted only a single term for the family of the focal individual. Therefore, the reduced model was

where aF is a vector of genetic effects due to the family of the focal individual and ZD is the incidence matrix for direct genetic effects as in the full model (Equation 9). Hence, with respect to the genetic terms, the only difference between the full and reduced model is that the term is omitted in Equation 10; the other genetic terms are the same. However, as omitting the will change both the estimates and the interpretation of the ‘direct’ genetic effects, we write in Equation 10, where subscript F suggests ‘family’, rather than . The covariance structure of the genetic terms in Equation 10 is

where

The Wg term is as in Equation 9. The covariance structure for the residual term is

where Rii=1, Rij=ρ when i and j are group mates from the same family, and Rij=0 otherwise. Hence, this structure allows for a covariance between residuals of group mates belonging to the same family. Thus, when individuals are ordered by group and by family within group, then R is block-diagonal, with blocks of size n/2, diagonal elements equal to 1, off-diagonals of blocks equal to ρ, all other off-diagonals equal to zero, and two blocks per group, one for each family. Appendix B shows that this residual variance structure together with the random group effect corresponds to the nongenetic variance structure generated by the assumed true model (Equation 3). Thus, the Wg+e in Equation 10 accounts for the variance structure generated by the term in Equation 3.

Investigation of Equation 10 showed that there are five informative genetic covariances in the data to estimate three genetic parameters, indicating that the model in Equation 10 is identifiable. To investigate the interpretation of the genetic estimates from the reduced model, we derived their expectation, assuming that the data are generated by the model given in Equation 3 (Appendix A). With

and

it follows that

Equations 12c–e sum up all the variance components considered in the true model. Equation 12e shows that the reduced model yields an estimate of the variance of IGEs on strangers. Moreover, combining Equations 12c–e with the decomposition of the total breeding value into a family and a non-family component given in Equations 7 and 8 above shows that the reduced model yields estimates of the family and non-family genetic parameters,

Thus, the variance of the total breeding value can be obtained from the reduced model as

Thus, the reduced model allows the estimation of the total heritable variation, even though not all the underlying parameters are identifiable.

Equation 13b refers to the covariance between the family breeding value and the non-family breeding value. This is a meaningful linear combination, as it expresses the covariance between genetic effects on kin (including self) and those on strangers. If this covariance is positive, members from different families are cooperative, whereas a negative value indicates competition between families.

Appendix B shows that the expectations of the nongenetic variance components in Equation 10 are given by

Equations 14a–c show that the underlying nongenetic parameters are not uniquely identifiable, because there are only three estimable parameters (, and ρ) that are a function of six unknowns. This was expected, as it is also the case for models not distinguishing between IGEs on kin vs strangers (Bijma et al., 2007).

Consequences of ignoring kin-dependent IGEs

This section investigates the bias in the estimated genetic parameters when IGEs differ between kin and strangers, although this is ignored in the statistical analysis. Thus, it is assumed that the true model generating the trait values is given by Equation 3 above, which distinguishes between IGEs on kin and strangers, whereas the statistical model used to estimate genetic parameters is the traditional direct–indirect mixed linear model (Muir, 2005),

where aS is vector of IGEs on group mates, not distinguishing between kin and strangers, and ZS an incidence matrix linking observations on individuals to the IGEs of all their group mates. The ZDaD and Wg are as in the full model (Equation 9), whereas the residual variance structure is as in the reduced model (Equation 11; see discussion below). Note that Equation 15 differs from the reduced model (Equation 10) because the term ZSaS includes the IGEs of all n–1 group mates, and not only those belonging to the other family making up the group.

To investigate the bias resulting from fitting a conventional IGE model (Equation 15) to data in which IGEs differ between kin and strangers, we derived the expectations of the estimated breeding values and variance components produced by Equation 15 when data are generated by Equation 3. Those expectations follow from the informative covariances in the data that Equation 15 utilizes to estimate the genetic parameters, and can be obtained using the method of Bijma, 2010a; see Appendix C). Results showed that

These results show that the direct genetic variance and the direct–indirect genetic covariance estimated with the conventional linear model for IGEs (Equation 15) are biased when IGEs depend on relatedness. In other words, the estimate of is biased because the right-hand side of Equation 16a differs from . Similarly, difference of the right-hand side of Equation 16b from indicates bias of . Moreover, the estimated indirect genetic variance from Equation 15 refers to the magnitude of IGEs expressed on strangers (Equation 16c). Surprisingly, despite the incorrect model assumptions, the traditional direct–indirect model yields an unbiased estimate of the total heritable variance (Equation 16d). Beware that results in Equations 16a–d are correct only if the residual covariance structure accounts for differences between indirect effects on kin and strangers, as given by Equation 11. Therefore, when the aim is to estimate total breeding values (TBVs) using the traditional direct–indirect mixed model of Muir (2005), this model should be implemented including a random group effect and the residual variance structure given in Equation 11 above.

We did not attempt to derive the expectations of estimated genetic parameters from the traditional direct–indirect model (Equation 15) when the residual variance structure is incorrect (that is, different from that given in Equation 11). The reason is that those expectations will depend not only on the assumed true genetic model (Equation 3), but also on the data structure. For example, in data consisting of many groups, the covariance between relatives in different groups will dominate the estimates, and incorrect covariances within groups may have little effect. In that case, estimates may be close to values given in Equation 16. On the other hand, when groups are fewer, information from the within-group (co)variances will become more important and results may deviate more from Equation 16.

Simulation

Methods

We used Monte Carlo simulation to validate the theoretical relationships between the true model, the reduced model and the traditional model presented above (Equations 12). Data were generated under the model in Equation 3, and analysed using either the reduced model in Equation 10 or the traditional model in Equation 15, using the ASReml software (Gilmour et al., 2006). A population of two discrete generations was simulated using R (R Development Core Team, 2011). No fixed effects were simulated. The base generation consisted of 100 sires and 1000 dams, which were unrelated. To produce the second generation, sires and dams of the first generation were mated at random, each sire being mated to 10 dams, and each dam producing 10 full-sib offspring. Individuals of the second generation were kept in 2500 groups of 4 individuals each, and each group consisted of two full-sib families, each family contributing two individuals. Table 2 shows the range of genetic parameters simulated. For each set of genetic parameters, estimates were averaged over 100 replicates. Details of the simulation are given in Appendix D.

Table 2 Parameter values used for validation of reduced and traditional models

Results

Table 3 shows a comparison between simulated and estimated values for and from the reduced model for different magnitudes of IGEs. We used of either 50 or 25% of to represent high or low indirect effects, and , of either 10, 12.5, 20 and 25% of ,, to represent high or low heritability of IGE, and a range of genetic correlations between direct effects, indirect effects on kin and indirect effects on strangers (Table 2). Results show close agreement between simulated and estimated values as proven by the relative error that is 5% in all cases (those small errors originate from stochasticity among replicates, and do not indicate systematic bias). These results confirm the theoretical relationships between the full and reduced models presented in Equations 12 and 13. Thus, the reduced model yields unbiased genetic parameters of the family and non-family breeding values, and of the total breeding value. We also compared the estimated nongenetic components with their expectations given in Equation 14, showing close agreement (results not shown).

Table 3 Errors in estimates for the reduced model

Table 4 shows a comparison between the theoretically expected values of the estimated variance components from the traditional model (Equation 16) and the empirical values estimated from the simulated data using the traditional model (Equation 15). Results confirm the theoretical expectation that the traditional direct–indirect model yields biased estimates of the direct genetic variance and the direct–indirect genetic covariance, but unbiased estimates of the genetic variance of IGEs on strangers and of the total genetic variance.

Table 4 Comparison of the expected (Equation 16) and empirical estimates for the traditional model

Discussion

We have proposed a quantitative genetic model and investigated methodology to estimate the genetic parameters of traits affected by IGEs when those IGEs differ systematically between kin and strangers. Results show that the full set of genetic parameters for the full model is not statistically identifiable. We also presented a reduced model that yields unbiased estimates of meaningful linear combinations of genetic parameters: the variance of the family breeding value, the covariance between family breeding value and IGEs on strangers and the variance of IGEs on strangers. The reduced model also provides estimates of the variance in total breeding value, and predictions of the total breeding values of individuals.

An interesting question is whether experimental designs exist that allow estimating all six genetic parameters of the full model (Equation 3). Our results show that this is not possible when pairs of individuals can be categorized into either kin or unrelated, each category shows a different IGE and group composition is the same for all groups. As long as group composition with respect to family is the same for all groups, this situation results in full confounding of the direct effect and the IGE on kin, irrespective of the composition of the groups (that is, 50/50, 25/75 and so on; Appendix A).

When differences in IGE originate from factors that usually go together with relatedness such as familiarity, rather than from relatedness per se, experimental designs that disconnect relatedness from those factors may allow estimation of the full set of genetic parameters. For example, when individuals recognize each other because of prior association (see Introduction), relatives who grow up together will recognize each other and adjust their behaviour, whereas relatives who grow up separately will interact similarly to unrelated individuals. This may, for example, occur in mammals such as grey mouse lemur (Kessler et al., 2012) or rats (Hepper, 1983, 1986), where full siblings often grow up in the same litter, whereas paternal half siblings grow up in different environments. Our preliminary investigations show that all six genetic parameters are statistically identifiable in this situation when groups consist of a mix of full sibs, half sibs and unrelated individuals. A statistically more powerful approach may come from cross-fostering designs, where full siblings that grow up in different litters may interact as if they were unrelated. When cross-fostering is impossible and a mix of full and half siblings is unavailable, a solution may come from utilizing the variation in relatedness among pairs of full siblings, estimated using genome-wide genetic markers (Hill, 1993; Visscher et al., 2006). However, as variation in relatedness among full siblings is limited, this approach will require large sample sizes.

When relatedness itself (as opposed to, for example, familiarity) is the causal factor underlying a difference in IGE, it would seem unlikely that the full set of genetic parameters can be identified. When individuals adjust their behaviour according to their relatedness to the recipient of the behaviour, as predicted by kin selection theory (Hamilton, 1964), any covariance between trait values of individuals is a function of relatedness and of genetic parameters of interest, which depends on this relatedness. This would seem to suggest full confounding.

However, variation in group composition seems to offer a solution. For example, having three different group compositions in a population may allow estimating all six genetic parameters. The first composition may have unrelated individuals only, the second may have two family members supplemented with unrelated individuals and the third may have three family members supplemented with unrelated individuals. From the first composition, , and can be estimated using the traditional direct–indirect mixed model (Muir, 2005). Then, using the reduced model, can be estimated from the second composition, and can be estimated from the third composition, as well as and again . Then, as is known from the first composition, this yields two equations with two unknowns, and thus can be solved yielding estimates of and . Moreover, the estimate of from either the second or third composition can be used to obtain , because is known from the first composition (see Equation 12b). Then all six genetic parameters are estimated. Thus, variation in group composition with respect to family seems to allow estimating all six genetic parameters. Statistical power, however, may be very limited, and further complications may arise when IGEs depend on group size (Hadfield and Wilson, 2007; Bijma, 2010b), which we did not investigate here.

When IGEs depend on relatedness, the traditional direct–indirect mixed model that ignores this dependency yields biased estimates of the direct genetic variance and the direct–indirect genetic covariance, but an unbiased estimate of the variance in total breeding value. Thus, even though the full set of genetic parameters is not statistically identifiable, the total heritable variance and total breeding values can be estimated, either using the reduced model or the traditional model. This is an important result, because kin-dependent IGEs appear to be widespread in natural and domestic populations of both animals and plants (see Introduction).

The reduced model and traditional model are statistically equivalent, that is, yield the same maximum likelihood, but represent different linear combinations of the underlying parameters. The main difference is that the estimates of the reduced model are biologically meaningful in the context of kin-selection theory (Hamilton, 1964), as they separate the effects on kin (the family breeding value) from those on unrelated individuals. The correlation between the family breeding value and IGE on strangers, for example, measures the degree of competition or cooperation between families. With the exception of the IGEs on strangers and the total breeding value, the estimates of the traditional model do not seem to have a clear biological meaning (Equation 16). Thus, the reduced model is preferable in terms of interpretation.

In this study, we have considered only the random effects; consequences of kin-dependent IGEs on the fixed effects to be included in the Xb term of the models have been ignored. When IGEs depend on relatedness, IGEs on kin vs strangers probably not only show incomplete correlation, but also differ systematically in level. In other words, individuals interacting primarily with kin probably receive more favourable IGEs than those interacting primarily with strangers, which creates a systematic difference in trait level between individuals interacting with different numbers of kin. This is not accounted for by the random effects in the model, because those are zero on average by construction. Hence, a fixed effect for the number of relatives an individual interacts with should be included in the model. This is similar to the inclusion of a fixed effect for the number of group mates when group size varies. Because estimation of a fixed effect with a few degrees of freedom is straight forward, we did not investigate this in detail. In our simulations, there was no need to account for such a fixed effect, because all individuals had the same number of kin and strangers among their group mates.

In animal and plant breeding, the focus is on improving the mean trait value of the population in the next generations. Theoretical studies have shown that group and kin selection methods utilize the total heritable variation for response to selection (Muir, 2005; Bijma et al., 2007; Ellen et al., 2008; McGlothlin et al., 2010). This theoretical expectation is supported by results from selection experiments that have used group and/or kin selection without explicit reference to the total breeding value (Wade, 1976, 1977; Goodnight, 1985; Muir, 1996). Whether or not this result extends to the situation where IGEs differ between kin and strangers is interesting, but has not been investigated to our knowledge.

To optimize selection for traits affected by interactions among individuals, the ideal selection criterion is the TBV of selection candidates estimated using all available information. This is because response to selection equals the change in mean TBV from one generation to the next, so that maximizing the accuracy of estimated TBVs also maximizes response to selection. Because Equation 4 is generally valid, this result holds irrespective of whether or not IGEs depend on relatedness (Bijma, 2011b). Hence, the availability of kin and group selection methods does not make estimated TBVs superfluous. Moreover, knowledge of the total heritable variance quantifies the intrinsic potential of a population to respond to selection, and therefore provides a measure of efficiency for breeding schemes (Bijma, 2011b). The variance in TBV, therefore, is an important parameter for both optimizing individual selection decisions and evaluation of breeding schemes. This work has shown how the definition and estimation of the variance in TBV can be extended to schemes where IGEs differ between kin and strangers. This extension of variance in TBV to schemes where IGEs differ between kin and strangers may contribute to breeding plan design and application.