Random Graphs with Arbitrary Degree Distributions and their Applications
M. E. J. Newman, S. H. Strogatz, and D. J. Watts
 Key Ideas from the Paper
 Random Graphs with Poisson Distributed Vertex Degree
 Introducing Generating Functions
 Degree of a Vertex arrived at by following a random Edge
 Distribution of Number of Neighbors $m$ Steps Away
 Distribution of Component Sizes
 Mean Component Size
 References
Paper
Key Ideas from theCertain real world networks (e.g. a community of people) could be abstracted by a Random Graph where vertex degree (no. of edges incident on the vertex) follows some distribution, e.g. flip a coin to decide if two nodes should be connected. Such abstractions could be used to study phenomenon such as passage of disease through a community. Studies of Random Graphs so far assumed vertex degree to have the Poisson distribution, but this paper studies graphs with arbitrary degree distributions.
Random Graphs with Poisson Distributed Vertex Degree
Studies so far assumed that presence of an edge between any two nodes is determined by a coin flip with probability of $1$ being $p$. Further, presence of an edge betwen two nodes is assumed independent of presence of other edges in the network. These assumptions imply that vertex degree has a Poisson distribution (as we will see).
The authors state:
If there are $N$ vertices in the graph, and each is connected to an average of $z$ edges, then it is trivial to show that $p=z/(N1)$.
Here is my short proof for this statement.
Let $E$ be the set of edges. We can say:$$\mathbb{E}[E] = \frac{zN}{2} \\ \implies \frac{1}{2}\sum_{i=1}^Np(N1) = \frac{zN}{2} \\ \implies p = \frac{z}{N1} \\ \implies p \sim \frac{z}{N} \tag{for large $N$ (Eq. 1)} $$
Given the independence assumption, the distribution of degree of a vertex, $p_k$, is thus Binomial, which for large $N$ can be approximated by a Poisson distribution.
$$ p_k = {N\choose k}p^k(1p)^{Nk} \sim \frac{z^ke^{z}}{k!} \tag{Eq. 2} $$Introducing Generating Functions
The authors express the probability distribution of a discrete random variables (e.g. degree of a vertex) as what is called a Generating Function^{2}. They then exploit properties of Generating Functions to derive some interesting insights, e.g. finding the distribution of sums of random variables.
A sequence of numbers can be encapsulated into a function (the Generating Function), with the property that when that function is expressed as the sum of a power series ($a_0 + a_1x + a_2x^2 + \cdots$), the coefficients of this series ($a_0,a_1,a_2,\cdots$) give us the original number sequence^{2}. E.g. For a graph with Poisson distributed vertex degree, $p_k$ (the probability a vertex has degree $k$) is a sequence whose Generating Function can be written as: $$ G_0(x) = \sum_{k=0}^N{N\choose k}p^k(1p)^{Nk}x^k \\ = (1p+px)^N \\ \sim e^{z(x1)} \tag{for $N\to\infty$ (Eq. 3)} $$ Several properties of the Generating Function $G_0(x) = \sum_{k=0}^Np_kx^k$ come handy.

Moments: The average degree can be written as: $$ z = \sum_kkp_k \\ = G'_0(1) \tag{First Moment (Eq. 4)} $$ For the case of Poisson distributed vertex degree, we have $G'_0(1) = z$.

Powers: The authors note (with my [annotations]):
If the distribution of a property $k$ [such as vertex degree] of an object [vertex of a graph] is generated by a given Generating Function [$G_0(x)$], then the distribution of the total $k$ summed over $m$ independent realizations of the object [sum of degrees of $m$ randomly chosen vertices] is generated by the $m^{th}$ power of that Generating Function [$G_0(x)^m$].
So the coefficient of $x^k$ in $G_0(x)^m$ is the probability that the sum of degrees of $m$ randomly chosen vertices is $k$. With this we can start talking about distribution of sums of degrees of vertices.
Degree of a Vertex arrived at by following a random Edge
Intuitively, the distribution of degree of such a vertex will be different from that of a randomly chosen vertex ($p_k$) because by following a random edge, we are more likely to end up on a vertex with a high degree than at one with a relatively lower degree. And so the probability distribution of degree of a vertex chosen as such, would be skewed towards higher degree values. Formally, let $k_i$ be the degree of vertex $V_i$, then the probability of arriving at $V_i$ will be $\frac{k_i}{\sum_{j=0}^Nk_j} \propto k_i$. And so the probability that a vertex chosen by following a random edge has degree $k$ is proportional to the probability that this vertex is chosen (which is $\propto k$) $\times$ probability this vertex has degree $k$ (which is $p_k$).
And so the authors note:
... the vertex therefore has a probability distribution of degree proportional to $kp_k$.
And so for coefficientes ($p'_k \propto kp_k$) of the Generating Function to sum to 1, we need to normalize:$\frac{\sum_kkp_kx^k}{\sum_kkp_k} = x\frac{G'_0(x)}{G'_0(1)}$.
Distribution of Number of Neighbors $m$ Steps Away
Let's first look at neighbors 1 steps away. We choose a random vertex and follow its edges to its immediate neighbors, and consider the distribution of degree of one such neighbor. The authors indicate that this distribution will be the same as that when we arrive at this vertex by following a random edge. This wasn't clear to me at first because in this case we are randomly choosing a vertex not an edge. But then it struck that vertices with high degrees are connected to more vertices than those with a low degree, and so are more likely to be arrived at when we pick a vertex randomly and visit its neighbors. The author's note for neighbors 1 step away:
$$ G_1(x) = \frac{G'_0(x)}{G'_0(1)} = \frac{1}{z}G'_0(x) \tag{Eq. 5} $$... the vertices arrived at each have the distribution of remaining outgoing edges generated by this function, less one power of $x$, to allow for the edge that we arrived along.
Now let's consider the distribution of secondneighbors of a vertex $v$, i.e. the sum of number of neighbors of immediate neighbors of $v$. (Note that this sum excludes the number of immediate neighbors). Let's first assume it is given that $v$ has $k$ immediate neighbors. Then the probability that it has $j$ secondneighbors, $p^{second}_{jk}$ is the coefficient of $x^j$ in the Generating Function $G_1(x)^k$ (from the "Powers" property of Generating Functions). So we can get $p^{second}_{j}$ by simply marginalizing over the number of immediate neighbors of $v$. And thust the authors note:> ... the probability distribution of second neighbors of the original vertex can be written as: $$ \sum_{k}p_k[G_1(x)]^k = G_0(G_1(x)) \tag{Eq. 6} $$
Distribution of Component Sizes
Armed with the background knowledge above, the authors discuss distributions of interesting properties of random graphs, such as the size of connected components.
The authors assert the following and I'm unable to prove it thus far (but let's take it for granted for now):
... the chances of a connected component containing a close loop of edges goes as $N^{1}$ which is negligible in the limit of large $N$.
The authors denote by $H_1(x)$:> ... the generating function for the distribution of the sizes of components which are reached by choosing a random edge and following it to one of its ends. Now if $q_k$ be the probability that the vertex $v$ we arrive at (by following a random edge) has $k$ other incident edges. Then the probability that this component (the one containing $v$) has size $j$, is given by:
$$ p^\text{component size}_j = q_0 \cdots \tag{Eq. 7}\\ + q_1\times\text{(coefficient of }x^{j1}\text{ in }H_1(x) \cdots \\ + q_2\times\text{(coefficient of }x^{j1}\text{ in }[H_1(x)]^2 \cdots \\ + \text{and so on} $$And thus the authors assert:
$$ H_1(x) = xq_0 + xq_1H_1(x) + xq_2[H_1(x)]^2 + \cdots \tag{Eq. 8} $$$H_1(x)$ must satisfy a selfconsistency condition of the form
We know from results given before (see Eq. 5) that $q_k$ is just the coefficient of $x^k$ in $G_1(x)$, and so the selfconsistency condition can also be written as:$$H_1(x) = G_1(H_1(x)) \tag{Eq. 9} $$
Following a similar line of reasoning as I've shared in Eq. 7, if $H_0(x)$ is the Generating Function for the distribution of size of the component which contains a randomly chosen vertex, then we can show: $$ H_0(x) = G_0(H_1(x)) \tag{Eq. 10} $$
Mean Component Size
The authors note that in practice Eq. 9 is complicated and rarely has a closed form solution for $H_1$ (given $G_1$). That said, we can still compute average component size using Eqs. 9 & 10. Using the Moments property of Generating Functions (see Eq. 4), let $\langle s\rangle$ denote the average size of the component to which a randomly chosen vertex belongs, then the authors note:
$$ \langle s\rangle = H'_0(1) \\ = G_0(H_1(1)) + G'_0(H_1(1))H'_1(1) \\ = 1 + G'_0(1)H'_1(1) \tag{Eq. 11} $$The last step follows from the fact that $G_0(1) = H_1(1) = 1$ (this is required to normalize the distributions representated by $G_0$ and $H_1$  so that the coefficients in each of those series sum to 1).
From Eq. 9, the authors note: $$ H'_1(1) = 1 + G'_1(1)H'_1(1) \tag{Eq. 12} $$
And from Eqs. 11 & 12, after eliminating $H'_1(1)$, we get (here $z_1$ is the average degree of a vertex and $z_2$ is the average number of second neighbors): $$ \langle s \rangle = 1 + \frac{G'_0(1)}{1  G'_1(1)} \\ = 1 + \frac{z_1^2}{z_1z_2} \tag{Eq. 13} $$
Let's look at that how $\langle s\rangle$ behavse versus $z_1$ and $z_2$. We can see from the plot below that for each value of $z_1$, as the value of $z_2$ starts to exceed $z_1$, the mean component size explodes. This phenomenon is called a "Phase Transition", at which point a "Giant Component" (a component of size $\Omega(N)$ ^{3}) appears in the graph. Phase Transitions (see Chapter 4, Foundations of Data Science by Profs. Blum, Hopcraft and Kannan ^{3}) are characterized by abrupt transitions in a graph (such as appearance of cycles) once the probability $p$ (of there being an edge between two random vertices, crosses a certain threshold).
import warnings
warnings.filterwarnings("ignore")
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import cm
z1 = np.arange(2, 100)
z2 = np.arange(2, 100)
xx, yy = np.meshgrid(z1, z2)
s = 1 + xx**2/(xx  yy)
fig, ax = plt.subplots()
for z1 in np.linspace(0.1,1.0, 10):
z2 = np.linspace(0.1, z1,10)
s = 1 + z1**2/(z1  z2)
_ = ax.plot(z2, s)
ax.annotate(f"z1={z1:.2f}", (z1*0.9, s[2]*1.05))
ax.set_xlabel("z2 (Mean No. of Second Neighbors)")
ax.set_ylabel(r"$\langle s \rangle$ (Mean Component Size)")
ax.set_title(r"Plot of $\langle s \rangle$ versus $z_1$, $z_2$", fontsize=14)
fig.set_size_inches(10,6)
References
3. Chapter 4, Foundations of Data Science, Avrim Blum, John Hopcroft, and Ravindran Kannan. PDF from authors website.↩