This tutorial documents how to use EpiModel to solve new types of stochastic network models. New stochastic models require specifying new module functions. These functions may either replace the existing modules routinely called in netsim, or they may supplement them by adding new processes into the chain of operations at each time step in the simulation.

The Modular Framework

Modules for network model simulations are specified in the control.net helper function. Within that function, modules are identified by the .FUN parameters, the input of which is a standardized R function described below. Users can replace existing modules, for example births.FUN by writing a new function for births and passing it into control.net.

function (type, nsteps, start = 1, nsims = 1, ncores = 1, depend, 
    rec.rand = TRUE, b.rand = TRUE, d.rand = TRUE, tea.status = TRUE, 
    attr.rules, epi.by, use.pids = TRUE, pid.prefix, initialize.FUN = initialize.net, 
    deaths.FUN = deaths.net, births.FUN = births.net, recovery.FUN = recovery.net, 
    edges_correct.FUN = edges_correct, resim_nets.FUN = resim_nets, 
    infection.FUN = infection.net, get_prev.FUN = get_prev.net, 
    verbose.FUN = verbose.net, module.order = NULL, set.control.stergm, 
    save.nwstats = TRUE, nwstats.formula = "formation", delete.nodes = FALSE, 
    save.transmat = TRUE, save.network = TRUE, save.other, verbose = TRUE, 
    verbose.int = 1, skip.check = FALSE, ...) 
NULL

Each of the arguments listed with a .FUN suffix is a pointer to tell netsim which function to use to simulate those processes. Each call to netsim begins by initializing the simulation with the steps specified in initialize.FUN, the function for which is initialize.net.

Additional modules may be added into this mix by setting arguments in control.net with names ending in .FUN but not among those listed. These new functions are passed through into the control.net through the ... argument. We provide examples of building new modules below.

Module Structure

To develop new module functions, one must understand the structure of the objects created internally during a call to netsim. Some of the information in these data is returned at the end of the call as part of the output (within the returned object of class netsim), while some other is used only internally and discarded. Some of the features that are helpful to know include:

  1. The main data structure storing the information passed among the modules is dat. Every module has only two input arguments: dat and at, the current time step. All the inputs, such as parameters, and the outputs, such as summary epidemiological statistics, are contained on dat. Each module reads dat, updates internal data within dat, and then outputs dat for the next module to work on.

  2. Attributes of individuals in the module are stored in a sublist to dat called attr. This is a named list, with each element of the list as a separate attribute in a vector. All vectors are exactly the same length and correspond to one attribute per individual in the population. netsim always creates five attributes by default: active, status, and infTime, entrTime, and exitTime. The active attribute keeps track of whether an individual is currently alive in the the population or has left it through some process such as death or out-migration. The status attribute indicates current infection status as a character (“s”, “i”, “r”, for susceptible, infected, and recovered); infTime records the time at which each individual was infected (NA for susceptible nodes). The entrTime and exitTime contain the time steps at which the person enters or leaves the population.

  3. Summary model statistics are contained in a sublist to dat called epi. This list stores information on the sizes of the population with each disease status at each time point, as well as the size of flows among those attribute states. The default approach for built-in models is to calculate summary prevalence statistics, such as the size or frequency of infection in the population, in the get_prev.FUN module as the last step in each time step; summary incidence statistics are typically calculated within the modules in which the event, such as births or infections, has occurred. But for simplicity, it is possible to embed all summary statistics within an action module.

  4. In EpiModel time loops, the initial network is coded as time 1. The only module that is run at this time step is the initialize.FUN module for initialization of things like network structure or disease status. All other functions will start running at time 2 and again for the number of time steps specified in control.net. In the the module examples below, you will notice that some data are created at time step 2 and then later continually updated for the following time steps: the coding is of the form if (at == 2) {create something for time steps 1 and 2} else {edit that thing for time steps 3+}. That is a shorthand way of creating new data structures without having the edit the initialization module, which is a little unwieldy (it handles a lot of tasks). It is your choice where and when to create and update data structures within dat.

  5. netsim handles attributes differently depending on whether they are embedded within the formation formula for the ERGM that was estimated in netest. By default, network attributes that are contained within the formula will automatically be transferred to dat$attr, but left at their fixed state for the duration of the simulation; the except is the attribute status, which of course can vary over time as a function of disease transmission. Passing an attribute on the initial network as a vertex attribute but not using within the ERGM will result in that attribute being removed from the network object. Therefore, attributes that are necessary for the epidemic mechanics but not the ERGM model (for example, age in Example 1 below) must be initialized on the dat network structure independent of the network object passed in through the output of netest in the est argument to netsim.

The examples below provide practice with this functionality. You can also explore these features and more both by reading through the code for netsim and for the other functions that it calls, or by stepping through a call to netsim using a browser tool like debug. Examining the code through the debugger works beautifully and seamlessly in an IDE like Rstudio.

Example 1: A Different Demographic Approach

In this example, we show how to build in an aging process, age-specific mortality, and a constant growth model for births. This will require one new module function and two replacement module functions, along with the needed parameters.

Aging Module

To introduce aging into a model, we need to write a new module function, here called aging, that performs the necessary processes. Writing this illustrates some key requirements of any module added to netsim. First, to describe what the function actually does: at the initial time step in the loop, which is 2 for the reasons noted above, persons in the population are randomly assigned an age between 18 and 49 years. At the subsequent time steps, their age is incremented by one month, as our time unit for the simulation is a month.

aging <- function(dat, at) {
  
  ## Attributes
  if (at == 2) {
    n <- sum(dat$attr$active == 1)
    dat$attr$age <- sample(18:49, n, replace = TRUE)
  } else {
    dat$attr$age <- dat$attr$age + 1/12
  }
  
  ## Summary statistics
  if (at == 2) {
    dat$epi$meanAge <- rep(mean(dat$attr$age, na.rm = TRUE), 2)
  } else {
    dat$epi$meanAge[at] <- mean(dat$attr$age, na.rm = TRUE)
  }
  
  return(dat)
}

As described above, all of the modules in netsim use the same two functional arguments, dat, and at. The former is the master data object that carries around individual attributes, summary outcomes, and other data to be output at the end of the simulation. All individual-level attributes are saved in the dat object in a sublist called attr. The active vector of attributes is a binary attribute indicating whether each person is active or not. Therefore, n in time step 2 queries the size of the active population so the sample function knows how many ages to produce. Everyone’s age is stored on that attr list in a new vector called age. That vector is modified for everyone at each subsequent time step. We can add summary statistics to any module by saving them onto dat$epi. Here, we create a vector called meanAge at time step 2 that contains the mean of everyone’s age in the population; at time steps 3 and onward, a new summary statistic and appended on that vector.

Death Module

Whereas the aging module defined above is an original process added to netsim that did not exist in built-in models, death already has a module. In this case, we will replace the existing death module for susceptibles with our new one. In the existing module, the probability of death is based on a fixed risk that may vary by disease status; the parameters that control this, ds.rate and di.rate and so on, are input in the parameters set through param.net. Here, we will replace approach with a simple death module that takes advantage of our new age attribute: we will model the probability of death as a nonlinear function of increasing age, like this:

ages <- 18:49
death.rates <- 1/(70*12 - ages*12)
par(mar = c(3.2, 3.2, 1, 1), mgp = c(2, 1, 0))
plot(ages, death.rates, pch = 20, xlab = "age", ylab = "Death Risk")