Introduction

This tutorial provides a mathematical and theoretical background for stochastic network models, with instructions on how to run the built-in models designed for learning in EpiModel. For information on how to extend these models with new mathematical frameworks, see the related tutorial New Network Models with EpiModel. If you are new to epidemic modeling, we suggest that you start with our tutorials Basic DCMs with EpiModel and Basic ICMs with EpiModel to get a background in deterministic and stochastic modeling. The material below assumes familiarity with that material.

Network models explicitly represent contact phenomena within and across dyads (pairs of individuals who remain in contact) over time. This enables partnerships to have duration in time, allowing for repeated acts with the same person, specification of partnership formation and dissolution rates, control over the temporal sequencing of multiple partnerships, and specification of network-level features. As one dyad may now be connected to other dyads, a network is formed.

Model Framework

EpiModel uses separable-temporal exponential-family random graph models (STERGMs) to estimate and simulate complete networks based on individual-level, dyad-level, and network-level patterns of density, degree, assortivity, and other features influencing edge formation and dissolution. Building and simulating a network-based epidemic models in EpiModel is a multi-step process, starting with estimation of a temporal ERGM and continuing with simulation of a dynamic network and epidemic processes on top of that network.

Dynamic network models may be estimated from several different types of empirical data, including panel data on a complete network over time. For a description of these options, please consult the help documentation and vignettes for the tergm and networkDynamic packages. EpiModel currently supports estimation of network models with summary target statistics for formation that may be estimated using egocentric network samples: a random sample of the population is drawn, and those individuals are asked about a complete or limited set of their partnerships within an interval. Network models may be parameterized with egocentric network data by inputting summary target statistics for network structures derived from this data, including the distribution of partner numbers at one point in time, assortivity in partner traits, and other dyadic and network-level features. On top of this, mean statistics summarizing the average duration of partnerships is estimated from empirical data and used to govern partnership dissolution rates.

Model Processes

EpiModel can simulate disease epidemics over these partnership networks by integrating this statistical framework for networks with stochastic transmission processes similar to those featured in icm class disease models. Similar to ICMs, network models require specification of epidemic parameters that govern the transmission, recovery, and other other transition processes of individual persons in discrete time. The three model types currently supported are the same: SI, SIR, and SIS disease types.

In contrast to DCMs and ICMs, which simulate the epidemic system with one function, network models require multiple steps:

  • An empty network structure is initialized;
  • A network model is fit and diagnosed; and
  • The epidemic processes are simulated on top of a dynamic simulated network consistent with the network structure and model fit.

Independent versus Dependent Models

A key distinction for network models concerns whether two dynamic processes, the network dynamics and the disease transmission dynamics, are treated as independent or dependent. Independent network models assume no influence of the disease simulation on the structure of the temporal network, although the structure of the temporal network certainly impacts disease. Dependent network models allow for the epidemiological and demographic processes to influence the network structure. Examples include serosorting – where disease status influences partner selection – and demographic processes (births, deaths, and migration) – where the contact network process must adapt to changing population size and composition.

Model Functions

Simulating network models in EpiModel involves three main functions for both independent and dependent models:

  1. netest estimates the generative model for the dynamic partnership networks. This function is a wrapper around the ergm and stergm functions in the ergm and tergm packages.
  2. netdx simulates time-series of the dynamic network consistent with the model fit from netest to check for model fit to the egocentric target statistics.
  3. netsim then runs the stochastic epidemic models with a network model fit from netest. For independent models, the full dynamic network is simulated at the start of each epidemic simulation. For dependent models, the network is re-simulated at each time step as a function of varying population size and changing nodal attributes.

We explain these processes in further detail with the two network modeling tutorials below, one for an independent SIS model with one mode and one for a dependent SI model with two modes.

Independent One-Mode SIS Model

In this section, we work through a model of a Susceptible-Infected-Susceptible (SIS) epidemic in a closed population. An example of an SIS disease would be a bacterial sexually transmitted infection such as Gonorrhea, in which persons may acquire infection from sexual contact with an infected partner, and then recover from infection either through natural clearance or through antibiotic treatment. We will use a simplifying assumption of a closed population, in which there are no entries or exits from the network; this may be justified by the short time span over which the epidemic will be simulated.

Network Estimation and Diagnostics

The first step in any network model is to specify a network structure, including features like size and compositional traits. Here, we construct an empty network of 1000 nodes with two races of equal node size. We use small networks that are simulated over a short number of time steps for computational efficiency in this tutorial; research-level models may require larger-scale simulations. The network.initialize function creates an empty network object with 1000 nodes but no edges. The next line creates a vertex attribute called race, which will have categories 0 and 1: there are 500 nodes of Race 0 and 500 nodes of Race 1.

nw <- network.initialize(n = 1000, directed = FALSE)
nw <- set.vertex.attribute(nw, "race", rep(0:1, each = 500))
Model Parameters

Next, we specify the partnership formation formula for the network model estimation. The formation object is a right-hand side formula, and the target.stats are the summary statistics based on egocentric network sampling that summarize the mean values in the formula. The edges term is the expected number of edges each time step. The nodefactor term specifies the number of nodes of Race 1 that in an edge; using a transformation below, we can use it to express the race-specific mean degree. The term is specified for one less than the number of groups in the attribute because twice the sum of terms for all groups is equal to the number of edges. The nodematch is the number of edges between nodes of the same race. The concurrent term is the number of nodes that have two or more edges at each time step (i.e., momentary degree of at least 2).

formation <- ~edges + nodefactor("race") + nodematch("race") + concurrent

To calculate the target statistics, we need to translate from empirical egocentric data to the statistical forms required by ERGMs. Nodes will have a mean degree of 0.5, equivalent to 250 expected edges given the network size (n/2 times mean degree). The mean degree will vary by race: the mean degree of Race 1 will be 0.75, whereas it will be 0.25 for Race 0. The nodefactor term requires only the statistic for Race 1; it is the product of the size of Race 1 and their mean degree (500 * 0.75 = 375). For racial mixing, 90% of edges are same-race. The expected number of same-race edges is the product of the number of edges and the probability of a same-race edge (250 * 0.90 = 225). The number of nodes with a concurrent degree is 100, equivalent to 10% of total nodes.

target.stats <- c(250, 375, 225, 100)

The dissolution model is parameterized from an average edge duration estimated from cross-sectional egocentric data. The dissolution model is parameterized as an offset because the dissolution coefficient is not estimated; it is instead fixed at the value implied by the average edge duration. The dissolution models may be as arbitrarily complex as any ERGM formation models, but given the analytic calculation of coefficients and other complexities, only a limited set of dissolution models are supported in EpiModel. Our dissolution model will be an “edges” model in which the probability of edge dissolution is homogeneous conditional on edge existence. The average partnership duration is 25 time steps (e.g., months or some other arbitrary unit). The dissolution_coefs function takes as input the dissolution formula and average duration and transforms this into a logit coefficient for use in estimation. See the help page of this function to see other supported dissolution models.

coef.diss <- dissolution_coefs(dissolution = ~offset(edges), duration = 25)
coef.diss
Dissolution Coefficients
=======================
Dissolution Model: ~offset(edges)
Target Statistics: 25
Crude Coefficient: 3.178054
Mortality/Exit Rate: 0
Adjusted Coefficient: 3.178054

The output from this function indicates both an adjusted and crude coefficient, which are equivalent in this case. In the next example, they will differ.

Model Fit

In EpiModel, network model estimation is performed with the netest function, which is a wrapper around the estimation functions in the ergm and tergm packages. The function arguments are as follows:

function (nw, formation, target.stats, coef.diss, constraints, 
    coef.form = NULL, edapprox = TRUE, output = "fit", set.control.ergm, 
    set.control.stergm, verbose = FALSE) 
NULL

The four arguments that must be specified with each function call are:

  • nw: an initialized empty network.
  • formation: a RHS formation formula.
  • target.stats: target statistics for the formation model.
  • coef.diss: output object from dissolution_coefs, containing the dissolution coefficients.

Other arguments that may be helpful to understand when getting started are:

  • constraints: sets the model constraints, passed to ergm and stergm (see help("ergm")).
  • coef.form: sets the coefficient values of any offset terms in the formation model.
  • edapprox: if TRUE, uses a static ERGM with coefficient adjustment (details below) rather than a full STERGM fit for the sake of computational efficiency.
  • nonconv.error: if TRUE, will error if the model has not converged after the specified number of MCMC iterations (the default of the ergm function is to allow non-converged output to be returned).

In the netest function, the network model may be estimated using one of two methods:

  1. Direct method: uses the functionality of the tergm package to estimate the separable formation and dissolution models for the network.
  2. Approximation method: uses ergm estimation for a cross-sectional network (the prevalence of edges) with a manual adjustment of the edges coefficient to account for dissolution (i.e., transformation from prevalence to incidence). This approximation method may introduce bias into estimation in certain cases (high density and short durations) but these are typically not a concern for the low density cases in epidemiologically relevant networks. As a general rule, the approximation bias is minimal when the edge duration greater than 20 time units and the mean degree is less than 1. The benefit of using the approximation is computational efficiency, since direct STERGMs on low density/short duration networks are difficult and time-consuming to fit.

Within netest, the direct method is specified by edapprox=FALSE and the indirect by edapprox=TRUE (the default). Because we have a dyadic dependent model, MCMC will be used to estimate the coefficients of the model given the target statistics.

est1 <- netest(nw, formation, target.stats, coef.diss, edapprox = TRUE)
Model Diagnostics

There are two forms of model diagnostics for a dynamic ERGM fit with netest: static and dynamic diagnostics. When the approximation method has been used, static diagnostics check the fit of the cross-sectional model to target statistics. Dynamic diagnostics check the fit of the model adjusted to account for edge dissolution. When running a dynamic network simulation as we do with EpiModel, it is good to start with the dynamic diagnostics, and if there are fit problems, work back to the static diagnostics to determine if the problem is due to the cross-sectional fit itself or with the dynamic adjustment (i.e., the approximation method). A proper fitting ERGM using the approximation method does not guarantee well-performing dynamic simulations.

For this tutorial, we will examine dynamic diagnostics only. These are run with the netdx function, which simulates from the model fit object returned by netest. One must specify the number of simulations from the dynamic model and the number of time steps per simulation. Here, we simulate the model 5 times over 500 time steps. Choice of both simulation parameters depends on the stochasticity in the model, which is a function of network size, model complexity, and other factors.

By default, the network statistics to be diagnosed are those in the formation formula of the network model, although the nwstats.formula may be used to monitor any set of statistics. In this example, we set a formula for diagnostic statistics that varies slightly from the formation formula. The difference is the specification of base=0 in the nodefactor term. If we were to specify this in the formation formula itself, there would be collinearity with the edges term. But for diagnostics, this is a useful tool to quickly see the race-specific mean degrees.

dx <- netdx(est1, nsims = 5, nsteps = 500,
            nwstats.formula = ~edges + nodefactor("race", base = 0) +
                               nodematch("race") + concurrent)

Printing an object output from netdx will show summary tables of the simulated network statistics against the target statistics. The mean and sd values for the formation diagnostics are the mean and standard deviations of those statistics across all simulations and time steps. The simulated edges is slightly higher than targeted, but within a standard deviation of that target. There is no target statistic for Race 0 in the nodefactor term because it was not in the formation formula.

dx
EpiModel Network Diagnostics
=======================
Diagnostic Method: Dynamic
Simulations: 5
Time Steps per Sim: 500

Formation Diagnostics
----------------------- 
                  Target Sim Mean Pct Diff Sim SD
edges                250  260.275    0.041 15.352
nodefactor.race.0     NA  130.062       NA 13.955
nodefactor.race.1    375  390.487    0.041 26.116
nodematch.race       225  234.185    0.041 14.015
concurrent           100  106.132    0.061 11.435

Dissolution Diagnostics
----------------------- 
               Target Sim Mean Pct Diff Sim SD
Edge Duration   25.00   23.738   -0.050 22.831
Pct Edges Diss   0.04    0.040   -0.002  0.012

There are two forms of dissolution diagnostics. The edge duration row shows the mean duration of partnerships across the simulations; it tends to be lower than the target unless the diagnostic simulation interval is very long since its average includes a burn-in period where all edges start at a duration of zero (illustrated below in the plot). The next row shows the percent of current edges dissolving at each time step, and is not subject to bias related to burn-in. The percentage of edges dissolution is the inverse of the expected duration: if the duration is 25 time steps, then we expect that 1/25 or 4% dissolve each day.

Plotting the diagnostics object will show the time series of the target statistics against any targets. The other options used here specify to smooth the mean lines, give them a thicker line width, and plot each statistic in a separate panel. The black dashed lines show the value of the target statistics for any terms in the model. Similar to the numeric summaries, the plots show a good fit over the time series.

par(mar = c(3,3,1,1), mgp = c(2,1,0))
plot(dx)