Influence maximization in complex networks through optimal percolation

Morone, Flaviano; Makse, Hernán A.

doi:10.1038/nature14604

Letter
Published: 01 July 2015

Influence maximization in complex networks through optimal percolation

Flaviano Morone¹ &
Hernán A. Makse¹

Nature volume 524, pages 65–68 (2015)Cite this article

28k Accesses
780 Citations
85 Altmetric
Metrics details

Subjects

Complex networks

A Corrigendum to this article was published on 28 October 2015

Abstract

The whole frame of interconnections in complex networks hinges on a specific set of structural nodes, much smaller than the total size, which, if activated, would cause the spread of information to the whole network¹, or, if immunized, would prevent the diffusion of a large scale epidemic^2,3. Localizing this optimal, that is, minimal, set of structural nodes, called influencers, is one of the most important problems in network science^4,5. Despite the vast use of heuristic strategies to identify influential spreaders^{6,7,8,9,10,11,12,13,14}, the problem remains unsolved. Here we map the problem onto optimal percolation in random networks to identify the minimal set of influencers, which arises by minimizing the energy of a many-body system, where the form of the interactions is fixed by the non-backtracking matrix¹⁵ of the network. Big data analyses reveal that the set of optimal influencers is much smaller than the one predicted by previous heuristic centralities. Remarkably, a large number of previously neglected weakly connected nodes emerges among the optimal influencers. These are topologically tagged as low-degree nodes surrounded by hierarchical coronas of hubs, and are uncovered only through the optimal collective interplay of all the influencers in the network. The present theoretical framework may hold a larger degree of universality, being applicable to other hard optimization problems exhibiting a continuous transition from a known phase¹⁶.

You have full access to this article via your institution.

Download PDF

Systematic comparison between methods for the detection of influential spreaders in complex networks

Article Open access 22 October 2019

Sublinear domination and core–periphery networks

Article Open access 30 July 2021

Interplay between $$k$$ -core and community structure in complex networks

Article Open access 07 September 2020

Main

The optimal influence problem was initially introduced in the context of viral marketing¹, and its solution was shown to be NP-hard⁴ for a generic class of linear threshold models of information spreading^17,18. Indeed, finding the optimal set of influencers is a many-body problem in which the topological interactions between them play a crucial role^13,14. On the other hand, there has been an abundant production of heuristic rankings to identify influential nodes and ‘superspreaders’ in networks^{6,7,8,9,10,11,12,19}. The main problem is that heuristic methods do not optimize a global function of influence. As a consequence, there is no guarantee of their performance.

Here we address the problem of quantifying nodes’ influence by finding the optimal (that is, minimal) set of structural influencers. After defining a unified mathematical framework for both immunization and spreading, we provide its optimal solution in random networks by mapping the problem onto optimal percolation. In addition, we present CI (Collective Influence), a scalable algorithm to solve the optimization problem in large-scale real data sets. The thorough comparison with competing methods (Supplementary Information section I²⁰) ultimately establishes the better performance of our algorithm. By taking into account collective influence effects, our optimization theory identifies a new class of strategic influencers, called ‘weak nodes’, which outrank the hubs in the network. Thus, the top influencers are highly counterintuitive: low-degree nodes play a major broker role in the network, and despite being weakly connected, can be powerful influencers.

The problem of finding the minimal set of activated nodes^17,18 to spread information to the whole network⁴ or to optimally immunize a network against epidemics¹¹ can be exactly mapped onto optimal percolation (see Supplementary Information section IIB). This mapping provides the mathematical support to the intuitive relation between influence and the concept of cohesion of a network: the most influential nodes are the ones forming the minimal set that guarantees a global connection of the network^5,9,10. We call this minimal set the ‘optimal influencers’ of the network. At a general level, the optimal influence problem can be stated as follows: find the minimal set of nodes which, if removed, would break down the network into many disconnected pieces. The natural measure of influence is, therefore, the size of the largest (giant) connected component as the influencers are removed from the network.

We consider a network composed of N nodes tied with M links with an arbitrary-degree distribution. Let us suppose we remove a certain fraction q of the total number of nodes. It is well known from percolation theory²¹ that, if we choose these nodes randomly, the network undergoes a structural collapse at a certain critical fraction where the probability of existence of the giant connected component vanishes, G = 0. The optimal influence problem corresponds to finding the minimum fraction q_c of influencers to fragment the network: q_c = min{q ∈ [0, 1]: G(q) = 0}.

Let the vector n = (n₁,…, n_N) represent which node is removed (n_i = 0, influencer) or left (n_i = 1, the rest) in the network (), and consider a link from i to j (i → j). The order parameter of the influence problem is the probability that i belongs to the giant component in a modified network where j is absent, ν_i→j (refs 22, 23). Clearly, in the absence of a giant component we find {ν_i_→j = 0} for all i → j. The stability of the solution {ν_i_→j = 0} is controlled by the largest eigenvalue λ(n; q) of the linear operator defined on the 2M × 2M directed edges as . We find for locally tree-like random graphs (see Fig. 1a and Supplementary Information section II):

where is the non-backtracking matrix of the network^15,24. The matrix has non-zero entries only when (k → ℓ, i → j) form a pair of consecutive non-backtracking directed edges, that is, (k → ℓ, ℓ → j) with k ≠ j. In this case (equation (13) in Supplementary Information). Powers of the matrix count the number of non-backtracking walks of a given length in the network (Fig. 1b)²⁴, much in the same way as powers of the adjacency matrix count the number of paths⁵. Operator has recently received a lot of attention thanks to its high performance in the problem of community detection^25,26. We show its topological power in the problem of optimal percolation.

**Figure 1: The non-backtracking (NB) matrix and weak nodes.**

Stability of the solution {ν_i_→j = 0} requires λ(n; q) ≤ 1. The optimal influence problem for a given q (≥q_c) can be rephrased as finding the optimal configuration n that minimizes the largest eigenvalue λ(n; q) (Fig. 1c). The optimal set n* of Nq_c influencers is obtained when the minimum of the largest eigenvalue reaches the critical threshold:

The formal mathematical mapping of the optimal influence problem to the minimization of the largest eigenvalue of the modified non-backtracking matrix for random networks, equation (2), represents our first main result.

An example of a non-optimized solution corresponds to choosing n_i at random and decoupled from the non-backtracking matrix^23,27 (random percolation²¹, Supplementary Information section IID). In the optimized case, we seek to derandomize the selection of the set n_i = 0 and optimally choose them to find the best configuration n* with the lowest q_c according to equation (2). The eigenvalue λ(n) (from now on we omit q in λ(n; q) ≡ λ(n), which is always kept fixed) determines the growth rate of an arbitrary vector w₀ with 2M entries after ℓ iterations of the matrix The largest eigenvalue is then calculated by the power method:

Equation (3) is the starting point of an (infinite) perturbation series that provides the exact solution to the many-body influence problem in random networks and therefore contains all physical effects, including the collective influence. In practice, we minimize the cost energy function of influence in equation (3) for a finite ℓ. The solution rapidly converges to the exact value as ℓ → ∞, the faster the larger the spectral gap. We find for ℓ ≥ 1, to leading order in 1/N (Supplementary Information section IIE):

where Ball(i, ℓ) is the set of nodes inside a ball of radius ℓ (defined as the shortest path) around node i, ∂Ball(i, ℓ) is the frontier of the ball, is the shortest path of length ℓ connecting i and j (Fig. 1d), and k_i is the degree of node i.

The first collective optimization in equation (4) is ℓ = 1. We find , where A_ij is the adjacency matrix (equation (39) in Supplementary Information). This term is interpreted as the energy of an antiferromagnetic Ising model with random bonds in a random external field at fixed magnetization, which is an example of a pair-wise NP-complete spin-glass whose solution is found in Supplementary Information section III with the cavity method²⁸ (Extended Data Fig. 2).

For ℓ ≥ 2, the problem can be mapped exactly to a statistical mechanical system with many-body interactions which can be recast in terms of a diagrammatic expansion, equations (41)–(49) in Supplementary Information. For example, leads to 4-body interactions (equation (45) in Supplementary Information), and, in general, the energy cost contains 2ℓ-body interactions. As soon as ℓ ≥ 2, the cavity method becomes much more complicated to implement and we use another suitable method, called extremal optimization (EO)²⁹ (Supplementary Information section IV). This method estimates the true optimal value of the threshold by finite-size scaling following extrapolation to ℓ → ∞ (Extended Data Figs 3, 4). However, EO is not scalable to find the optimal configuration in large networks. Therefore, we develop an adaptive method, which performs excellently in practice, preserves the features of EO, and is highly scalable to present-day big data.

The idea is to remove the nodes causing the biggest drop in the energy function, equation (4). First, we define a ball of radius ℓ around every node (Fig. 1d). Then, we consider the nodes belonging to the frontier ∂Ball(i, ℓ) and assign to node i the collective influence (CI) strength at level ℓ following equation (4):

We notice that, while equation (4) is valid only for odd radii of the ball, CI_ℓ(i) is defined also for even radii. This generalization is possible by considering an energy function for even radii analogous to equation (4), as explained in Supplementary Information section IIG. The case of one-body interaction with zero radius ℓ = 0 (equation (59) in Supplementary Information) leads to the high-degree (HD) ranking (equation (62) in Supplementary Information)¹⁰.

The collective influence, equation (5), is our second and most important result since it is the basis for the highly scalable and optimized CI algorithm which follows. In the beginning, all the nodes are present: n_i = 1 for all i. Then, we remove node i* with highest CI_ℓ and set n_i_* = 0. The degree of each neighbour of i* is decreased by one, and the procedure is repeated to find the new top CI node to remove. The algorithm is terminated when the giant component is zero (see Supplementary Information section V for implementation, and Supplementary Information section VA for minimizing G(q) ≠ 0). By increasing the radius ℓ of the ball we obtain better and better approximations of the optimal exact solution as ℓ → ∞ (for finite networks, ℓ does not exceed the network diameter).

The collective influence CI_ℓ for ℓ ≥ 1 has a rich topological content, and consequently tells us more about the role played by nodes in the network than the non-interacting high-degree hub-removal strategy at ℓ = 0, CI₀. The augmented information comes from the sum in the right hand side of equation (5), which is absent in the naive high-degree rank. This sum contains the contribution of the nodes living on the surface of the ball surrounding the central vertex i, each node weighted by the factor k_j − 1. This means that a node placed at the centre of a corona irradiating many links—the structure hierarchically emerging at different ℓ levels as seen in Fig. 1e—can have a very large collective influence, even if it has a moderate or low degree. Such ‘weak nodes’ can outrank nodes with larger degree that occupy mediocre peripheral locations in the network. The commonly used word ‘weak’ in this context sounds particularly paradoxical. It is, indeed, usually used as a synonym for a low-degree node with an additional bridging property, which has resisted a quantitative formulation. We provide this definition through equation (5), according to which weak nodes are, de facto, quite strong. Paraphrasing Granovetter’s conundrum³⁰, equation (5) quantifies the “strength of weak nodes”.

The CI-algorithm scales as by removing a finite fraction of nodes at each step (Supplementary Information section VB). This high scalability allows us to find top influencers in current big-data social media and the minimal set of people to immunize in large-scale populations at the country level. The applications are investigated next.

Figure 2a shows the optimal threshold q_c for a random Erdös–Rényi (ER) network⁵ (marked by the vertical line) obtained by extrapolating the EO solution to N → ∞ and ℓ → ∞ (Supplementary Information section IV). In the same figure we compare the optimal threshold against the heuristic centrality measures: high-degree (HD)⁹, high-degree adaptive (HDA), PageRank (PR)⁷, closeness centrality (CC)⁶, eigenvector centrality (EC)⁶, and k-core¹² (see Supplementary Information section I for definitions). Supplementary Information sections VI and VII show the comparison with the remaining heuristics^6,11 and the Belief Propagation method of ref. 14, respectively, which have worse computational complexity (and optimality), and cannot be applied to the network sizes used here. Remarkably, at the optimal value q_c predicted by our theory, the best among the heuristic methods (HDA, PR and HD) still predict a giant component ∼50–60% of the whole original network. Furthermore, the influencer threshold predicted by CI approximates very well the optimal one, and, notably, CI outperforms the other strategies. Figure 2b compares CI in scale-free (SF) networks⁵ against the best heuristic methods, that is, HDA and HD. In all cases, CI produces a smaller threshold and a smaller giant component (Fig. 2c).

**Figure 2: Exact optimal solution and performance of CI in synthetic networks.**

As an example of an information spreading network, we consider the web of Twitter users (Supplementary Information section VIII¹⁹). Figure 3a shows the giant component of Twitter when a fraction q of its influencers is removed following CI. It is surprising that a lot of Twitter users with a large number of contacts have a mild influence on the network. This is witnessed by the fact that, when CI (at ℓ = 5) predicts a zero giant component (and so it exhausts the number of optimal influencers), the scalable heuristic ranks (HD, HDA, PR and k-core) still give a substantial giant component of the order of 30–70% of the entire network. These heuristics also, inevitably, find a remarkably large number of (fake) influencers, which is at least 50% larger than that predicted by CI (Fig. 3b and Supplementary Information section VIII). One cause for the poor performance of the high-degree-based ranks is that most of the hubs are clustered, which gives a mediocre importance to their contacts. As a consequence, hubs are outranked by nodes with lower degree surrounded by coronas of hubs (shown in detail in Fig. 3c), that is, the weak nodes predicted by the theory (Fig. 1e).

**Figure 3: Performance of CI in large-scale real social networks.**

Finally, we simulate an immunization scheme on a personal contact network built from the phone calls performed by 14 million people in Mexico (Supplementary Information section IX). Figure 3d shows that our method saves a large number of vaccines or, equivalently, finds the smallest possible set of people to quarantine; our method therefore also outranks the scalable heuristics in large real networks. Thus, while the mapping of the influencer identification problem onto optimal percolation is strictly valid for locally tree-like random networks, our results may apply also to real loopy networks, provided the density of loops is not excessively large.

Our solution to the optimal influence problem shows its importance in that it helps to unveil hitherto hidden relations between people, as witnessed by the weak-node effect. This, in turn, is the by-product of a broader notion of influence, lifted from the individual non-interacting point of view^{6,7,8,9,10,11,12,19,20} to the collective sphere: influence is an emergent property of collectivity, and top influencers arise from the optimization of the complex interactions they stipulate.

References

Domingos, P. & Richardson, M. Mining knowledge-sharing sites for viral marketing. In Proc. 8th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, 61–70 (ACM, 2002); http://dx.doi.org/10.1145/775047.775057
Pastor-Satorras, R. & Vespignani, A. Epidemic spreading in scale-free networks. Phys. Rev. Lett. 86, 3200–3203 (2001)
Article ADS CAS PubMed Google Scholar
Newman, M. E. J. Spread of epidemic disease on networks. Phys. Rev. E 66, 016128 (2002)
Article ADS MathSciNet CAS Google Scholar
Kempe, D., Kleinberg, J. & Tardos, E. Maximizing the spread of influence through a social network. In Proc. 9th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, 137–143 (ACM, 2003); http://dx.doi.org/10.1145/956750.956769
Newman, M. E. J. Networks: An Introduction (Oxford Univ. Press, 2010)
Book MATH Google Scholar
Freeman, L. C. Centrality in social networks: conceptual clarification. Soc. Networks 1, 215–239 (1978)
Article Google Scholar
Brin, S. & Page, L. The anatomy of a large-scale hypertextual web search engine. Comput. Networks ISDN Systems 30, 107–117 (1998)
Article Google Scholar
Kleinberg, J. Authoritative sources in a hyperlinked environment. In Proc. 9th ACM-SIAM Symp. on Discrete Algorithms (1998); J. Assoc. Comput. Machinery 46, 604–632 (1999)
Article Google Scholar
Albert, R., Jeong, H. & Barabási, A.-L. Error and attack tolerance of complex networks. Nature 406, 378–382 (2000)
Article ADS CAS PubMed Google Scholar
Cohen, R., Erez, K., ben-Avraham, D. & Havlin, S. Breakdown of the Internet under intentional attack. Phys. Rev. Lett. 86, 3682–3685 (2001)
Article ADS CAS PubMed Google Scholar
Chen, Y., Paul, G., Havlin, S., Liljeros, F. & Stanley, H. E. Finding a better immunization strategy. Phys. Rev. Lett. 101, 058701 (2008)
Article ADS PubMed Google Scholar
Kitsak, M. et al. Identification of influential spreaders in complex networks. Nature Phys. 6, 888–893 (2010)
Article ADS CAS Google Scholar
Altarelli, F., Braunstein, A., Dall’Asta, L. & Zecchina, R. Optimizing spread dynamics on graphs by message passing. J. Stat. Mech. P09011 (2013)
Altarelli, F., Braunstein, A., Dall’Asta, L., Wakeling, J. R. & Zecchina, R. Containing epidemic outbreaks by message-passing techniques. Phys. Rev. X 4, 021024 (2014)
Google Scholar
Hashimoto, K. Zeta functions of finite graphs and representations of p-adic groups. Adv. Stud. Pure Math. 15, 211–280 (1989)
Article MathSciNet Google Scholar
Coja-Oghlan, A., Mossel, E. & Vilenchik, D. A spectral approach to analyzing belief propagation for 3-coloring. Combin. Probab. Comput. 18, 881–912 (2009)
Article MathSciNet MATH Google Scholar
Granovetter, M. Threshold models of collective behavior. Am. J. Sociol. 83, 1420–1443 (1978)
Article Google Scholar
Watts, D. J. A simple model of global cascades on random networks. Proc. Natl Acad. Sci. USA 99, 5766–5771 (2002)
Article ADS MathSciNet CAS PubMed PubMed Central MATH Google Scholar
Pei, S., Muchnik, L., Andrade, J. S., Jr, Zheng, Z. & Makse, H. A. Searching for superspreaders of information in real-world social media. Sci. Rep. 4, 5547 (2014)
Article ADS CAS PubMed PubMed Central Google Scholar
Pei, S. & Makse, H. A. Spreading dynamics in complex networks. J. Stat. Mech. P12002 (2013)
Bollobás, B. & Riordan, O. Percolation (Cambridge Univ. Press, 2006)
Book MATH Google Scholar
Bianconi, G. & Dorogovtsev, S. N. Multiple percolation transitions in a configuration model of network of networks. Phys. Rev. E 89, 062814 (2014)
Article ADS Google Scholar
Karrer, B., Newman, M. E. J. & Zdeborová, L. Percolation on sparse networks. Phys. Rev. Lett. 113, 208–702 (2014)
Article Google Scholar
Angel, O., Friedman, J. & Hoory, S. The non-backtracking spectrum of the universal cover of a graph. Trans. Am. Math. Soc. 367, 4287–4318 (2015)
Article MathSciNet MATH Google Scholar
Krzakala, F. et al. Spectral redemption in clustering sparse networks. Proc. Natl Acad. Sci. USA 110, 20935–20940 (2013)
Article ADS MathSciNet CAS PubMed PubMed Central MATH Google Scholar
Newman, M. E. J. Spectral methods for community detection and graph partitioning. Phys. Rev. E 88, 042822 (2013)
Article ADS CAS Google Scholar
Radicchi, F. Predicting percolation thresholds in networks. Phys. Rev. E 91, 010801(R) (2015)
Article ADS Google Scholar
Mézard, M. & Parisi, G. The cavity method at zero temperature. J. Stat. Phys. 111, 1–34 (2003)
Article MathSciNet MATH Google Scholar
Boettcher, S. & Percus, A. G. Optimization with extremal dynamics. Phys. Rev. Lett. 86, 5211–5214 (2001)
Article ADS CAS PubMed MATH Google Scholar
Granovetter, M. The strength of weak ties. Am. J. Sociol. 78, 1360–1380 (1973)
Article Google Scholar

Download references

Acknowledgements

This work was funded by NIH-NIGMS 1R21GM107641 and NSF-PoLS PHY-1305476. Additional support was provided by Army Research Laboratory Cooperative Agreement Number W911NF-09-2-0053 (the ARL Network Science CTA). We thank L. Bo, S. Havlin and R. Mari for discussions and Grandata for providing the data on mobile phone calls.

Author information

Authors and Affiliations

Levich Institute and Physics Department, City College of New York, New York, 10031, New York, USA
Flaviano Morone & Hernán A. Makse

Authors

Flaviano Morone
View author publications
You can also search for this author in PubMed Google Scholar
Hernán A. Makse
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Both authors contributed equally to the work presented in this paper.

Corresponding author

Correspondence to Hernán A. Makse.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Extended data figures and tables

Extended Data Figure 1 High-degree (HD) threshold.

a, HD influence threshold q_c as a function of the degree distribution exponent γ of scale-free networks in the ensemble with k_max = mN^1/(γ−1) and N → ∞. The curves refer to different values of the minimum degree m: 1 (red), 2 (blue), 3 (black). The fragility of SF networks (small q_c) is notable for m = 1 (the case calculated in ref. 10). In this case (m = 1), the network contains many leaves, and reduces to a star at γ = 2, which is trivially destroyed by removing the only single hub, explaining the general fragility in this case. Furthermore, in this same case, the network becomes a collection of dimers with k = 1 when γ → ∞, which is still trivially fragile. This also explains why q_c → 0 for γ ≥ 4. Therefore, the fragility in the case m = 1 has its roots in these two limiting trivial cases. Removing the leaves (m = 2) results in a 2-core, which is already more robust. For the 3-core m = 3, q_c ≈ 0.4–0.5 provides a quite robust network, and has the expected asymptotic limit to a non-zero q_c of a random regular graph with k = 3 as γ → ∞, q_c → (k − 2)/(k − 1) = 0.5. Thus, SF networks become robust in these more realistic cases, and the search for other attack strategies becomes even more important. b, HD influence threshold q_c as a function of the degree distribution exponent of scale-free networks with minimum degree m = 2 in the ensemble where k_max is fixed and does not scale with N. The curves refer to different values of the cut-off k_max: 10² (red), 10³ (green), 10⁵ (blue), 10⁸ (magenta), and k_max = ∞ (black), and show that for a typical k_max degree of 10³, for instance in social networks, the network is fairly robust with q_c ≈ 0.2 for all γ. The curve with m = 2 and k_max = 10³ is replotted in the inset of Fig. 2b.

Extended Data Figure 2 Replica Symmetry (RS) estimation of the maximum eigenvalue.

Main panel, the eigenvalue , equation (92) in Supplementary Information for the two-body interaction ℓ = 1, obtained by minimizing the energy function with the RS cavity method. The curve was computed on an ER graph of N = 10,000 nodes and average degree 〈k〉 = 3.5 and then averaged over 40 realizations of the network (error bars are s.e.m.). Inset, comparison between the RS cavity method and EO (extremal optimization) for an ER graph of 〈k〉 = 3.5 and N = 128. The curves are averaged over 200 realizations (error bars are s.e.m.).

Extended Data Figure 3 EO estimation of the maximum eigenvalue.

Eigenvalue λ(q) obtained by minimizing the energy function (n) with τEO (τ-extremal optimization), plotted as a function of the fraction of removed nodes q. The panels are for different orders of the interactions. The curves in each panel refer to different sizes of ER networks with average connectivity 〈k〉 = 3.5. Each curve is an average over 200 instances (error bars are s.e.m.). The value q_c where λ(q_c) = 1 is the threshold for a particular N and many-body interaction.

Extended Data Figure 4 Estimation of optimal threshold with EO.

a, Critical threshold q_c as a function of the system size N, obtained with EO from Extended Data Fig. 3, of ER networks with 〈k〉 = 3.5 and varying size. The curves refer to different orders of the many-body interactions. The data show a linear behaviour as a function of N^−2/3, typical of spin glasses, for each many-body interaction ρ. The extrapolated value is obtained at the y intercept. b, Thermodynamic critical threshold as a function of the order of the interactions ρ from a. The data scale linearly with 1/ρ. From the y intercept of the linear fit we obtain the thermodynamic limit of the infinite-body optimal value .

Extended Data Figure 5 Comparison of the CI algorithm for different radii ℓ of the Ball(ℓ).

We use ℓ = 1, 2, 3, 4, 5, on a ER graph with average degree 〈k〉 = 3.5 and N = 10⁵ (the average is taken over 20 realizations of the network, error bars are s.e.m.). For ℓ = 3 the performance is already practically indistinguishable from ℓ = 4, 5. The stability analysis we developed to minimize q_c is strictly valid only when G = 0, since the largest eigenvalue of the modified NB matrix controls the stability of the solution G = 0, and not the stability of the solution G > 0. In the region where G > 0 we use a simple and fast procedure to minimize G explained in Supplementary Information section VA. This explains why there is a small dependence on having a slightly larger G for larger ℓ, when G > 0 in the region q ≈ 0.15.

Extended Data Figure 6 Illustration of the algorithm used to minimize G(q) for q < q_c.

Starting from the completely fragmented network at q = q_c, the Nq_c influencers are reinserted with their original degree and connected to their original neighbours with the following criterion: each node is assigned and index c(i) given by the number of clusters it would join if it were reinserted in the network. For example, the red node has c(red) = 2, while the blue one has c(blue) = 3. The node with the smallest c(i) is reinserted in the network: in this case the red node. Then the c(i)s are recalculated and the new node with the smallest c(i) is found and reinserted. These steps are repeated until all the removed nodes are reinserted in the network.

Extended Data Figure 7 Test of the decimation fraction.

Giant component G as a function of the fraction of removed nodes q using CI, for an ER network of N = 10⁵ nodes and average degree 〈k〉 = 3.5. The profiles of the curves are drawn for different percentages of nodes fixed at each step of the decimation algorithm.

Extended Data Figure 8 Comparison of the performance of CI, BC and EGP in destroying G.

We also include HD, HDA, EC, CC, k-core and PR. We use a scale-free (SF) network with degree exponent γ = 2.5, average degree 〈k〉 = 4.68, and N = 10⁴. We use the same parameters as in ref. 11.

Extended Data Figure 9 Comparison with BP for a network immunization.

a, Fraction of infected nodes f as a function of the fraction of immunized nodes q in the susceptible-infected-removed (SIR) model from the BP solution. We use an ER random graph of N = 200 nodes and average degree 〈k〉 = 3.5. The fraction of initially infected nodes is p = 0.1 and the inverse temperature β = 3.0. The profiles are drawn for different values of the transmission probability w: 0.4 (red curve), 0.5 (green), 0.6 (blue), 0.7 (magenta). Also shown are the results of the fixed density BP algorithm (open circles). b, Chemical potential μ as a function of the immunized nodes q from BP. We use an ER random graph of N = 200 nodes and average degree 〈k〉 = 3.5. The fraction of the initially infected nodes is p = 0.1 and the inverse temperature β = 3.0. The profiles are drawn for different values of the transmission probability w: 0.4 (red curve), 0.5 (green), 0.6 (blue), 0.7 (magenta). Also shown are the results of the fixed density BP algorithm (open circles) for the region where the chemical potential is non-convex. c, Comparison between the giant components obtained with CI, HDA, HD and BP. We use an ER network of N = 10³ and 〈k〉 = 3.5. We also show the solution of CI from Fig. 2a for N = 10⁵. We find in order of performance: CI, HDA, BP and HD. (The average is taken over 20 realizations of the network, error bars are s.e.m.) d, Comparison between the giant components obtained with CI, HDA, HD and BPD. We use a SF network with degree exponent γ = 3.0, minimum degree k_min = 2, and N = 10⁴ nodes.

Extended Data Figure 10 Fraction of infected nodes f(q) as a function of the fraction of immunized nodes q in SIR from BP.

We use the following parameters: initial fraction of infected people p = 0.1, and transmission probability w = 0.5. We use an ER network of N = 10³ nodes and 〈k〉 = 3.5. We compare CI, HDA and BP. All strategies give similar performance, owing to the large value of the initial infection p, which washes out the optimization performed by any sensible strategy, in agreement with the results shown in figure 12a of ref. 14.

Supplementary information

Supplementary Information

This file contains Supplementary Text and Data and Supplementary References. (PDF 1656 kb)

PowerPoint slides

PowerPoint slide for Fig. 1

PowerPoint slide for Fig. 2

PowerPoint slide for Fig. 3

Rights and permissions

Reprints and permissions

About this article

Cite this article

Morone, F., Makse, H. Influence maximization in complex networks through optimal percolation. Nature 524, 65–68 (2015). https://doi.org/10.1038/nature14604

Download citation

Received: 19 February 2015
Accepted: 20 May 2015
Published: 01 July 2015
Issue Date: 06 August 2015
DOI: https://doi.org/10.1038/nature14604

This article is cited by

Protein–protein interaction network-based integration of GWAS and functional data for blood pressure regulation analysis
- Evridiki-Pandora G. Tsare
- Maria I. Klapa
- Nicholas K. Moschonas
Human Genomics (2024)
Robustness and resilience of complex networks
- Oriol Artime
- Marco Grassia
- Filippo Radicchi
Nature Reviews Physics (2024)
Identifying key players in complex networks via network entanglement
- Yiming Huang
- Hao Wang
- Linyuan Lü
Communications Physics (2024)
DomiRank Centrality reveals structural fragility of complex networks via node dominance
- Marcus Engsig
- Alejandro Tejedor
- Chaouki Kasmi
Nature Communications (2024)
An insight into topological, machine and Deep Learning-based approaches for influential node identification in social media networks: a systematic review
- Yasir Rashid
- Javaid Iqbal Bhat
Multimedia Systems (2024)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.