This tutorial documents how to use EpiModel to solve new types of stochastic network models. New stochastic models require specifying new module functions. These functions may either replace the existing modules routinely called in `netsim`

, or they may supplement them by adding new processes into the chain of operations at each time step in the simulation. Note that for this tutorial, the terms birth and death are used for the more general terms of arrivals and departures found in the EpiModel documentation.

Modules for network model simulations are specified in the `control.net`

helper function. Within that function, modules are identified by the `.FUN`

parameters, the input of which is a standardized R function described below. Users can replace existing modules, for example `births.FUN`

by writing a new function for births and passing it into `control.net`

.

```
## function (type, nsteps, start = 1, nsims = 1, ncores = 1, depend,
## rec.rand = TRUE, a.rand = TRUE, d.rand = TRUE, tea.status = TRUE,
## attr.rules, epi.by, use.pids = TRUE, pid.prefix, initialize.FUN = initialize.net,
## departures.FUN = departures.net, arrivals.FUN = arrivals.net,
## recovery.FUN = recovery.net, edges_correct.FUN = edges_correct,
## resim_nets.FUN = resim_nets, infection.FUN = infection.net,
## get_prev.FUN = get_prev.net, verbose.FUN = verbose.net, module.order = NULL,
## set.control.ergm, set.control.stergm, save.nwstats = TRUE,
## nwstats.formula = "formation", delete.nodes = FALSE, save.transmat = TRUE,
## save.network = TRUE, save.other, verbose = TRUE, verbose.int = 1,
## skip.check = FALSE, ...)
## NULL
```

Each of the arguments listed with a `.FUN`

suffix is a pointer to tell `netsim`

which function to use to simulate those processes. Each call to `netsim`

begins by initializing the simulation with the steps specified in `initialize.FUN`

, the function for which is `initialize.net`

.

Additional modules may be added into this mix by setting arguments in `control.net`

with names ending in `.FUN`

but not among those listed. These new functions are passed through into the `control.net`

through the `...`

argument. We provide examples of building new modules below.

To develop new module functions, one must understand the structure of the objects created internally during a call to `netsim`

. Some of the information in these data is returned at the end of the call as part of the output (within the returned object of class `netsim`

), while some other is used only internally and discarded. Some of the features that are helpful to know include:

The main data structure storing the information passed among the modules is

`dat`

. Every module has only two input arguments:`dat`

and`at`

, the current time step. All the inputs, such as parameters, and the outputs, such as summary epidemiological statistics, are contained on`dat`

. Each module reads`dat`

, updates internal data within`dat`

, and then outputs`dat`

for the next module to work on.Attributes of individuals in the module are stored in a sublist to

`dat`

called`attr`

. This is a named list, with each element of the list as a separate attribute in a vector. All vectors are exactly the same length and correspond to one attribute per individual in the population.`netsim`

always creates five attributes by default:`active`

,`status`

, and`infTime`

,`entrTime`

, and`exitTime`

. The`active`

attribute keeps track of whether an individual is currently alive in the the population or has left it through some process such as death or out-migration. The`status`

attribute indicates current infection status as a character (“s”, “i”, “r”, for susceptible, infected, and recovered);`infTime`

records the time at which each individual was infected (`NA`

for susceptible nodes). The`entrTime`

and`exitTime`

contain the time steps at which the person enters or leaves the population.Summary model statistics are contained in a sublist to

`dat`

called`epi`

. This list stores information on the sizes of the population with each disease status at each time point, as well as the size of flows among those attribute states. The default approach for built-in models is to calculate summary*prevalence*statistics, such as the size or frequency of infection in the population, in the`get_prev.FUN`

module as the last step in each time step; summary*incidence*statistics are typically calculated within the modules in which the event, such as births or infections, has occurred. But for simplicity, it is possible to embed all summary statistics within an action module.In EpiModel time loops, the initial network is coded as time 1. The only module that is run at this time step is the

`initialize.FUN`

module for initialization of things like network structure or disease status. All other functions will start running at time 2 and again for the number of time steps specified in`control.net`

. In the the module examples below, you will notice that some data are created at time step 2 and then later continually updated for the following time steps: the coding is of the form`if (at == 2) {create something for time steps 1 and 2} else {edit that thing for time steps 3+}`

. That is a shorthand way of creating new data structures without having the edit the initialization module, which is a little unwieldy (it handles a lot of tasks). It is your choice where and when to create and update data structures within`dat`

.`netsim`

handles attributes differently depending on whether they are embedded within the formation formula for the ERGM that was estimated in`netest`

. By default, network attributes that are contained within the formula will automatically be transferred to`dat$attr`

, but left at their fixed state for the duration of the simulation; the except is the attribute`status`

, which of course can vary over time as a function of disease transmission. Passing an attribute on the initial network as a vertex attribute but not using within the ERGM will result in that attribute being removed from the network object. Therefore, attributes that are necessary for the epidemic mechanics but not the ERGM model (for example, age in Example 1 below) must be initialized on the`dat`

network structure independent of the network object passed in through the output of`netest`

in the`est`

argument to`netsim`

.

The examples below provide practice with this functionality. You can also explore these features and more both by reading through the code for `netsim`

and for the other functions that it calls, or by stepping through a call to `netsim`

using a browser tool like `debug`

. Examining the code through the debugger works beautifully and seamlessly in an IDE like Rstudio.

In this example, we show how to build in an aging process, age-specific mortality, and a constant growth model for births. This will require one *new* module function and two *replacement* module functions, along with the needed parameters.

To introduce aging into a model, we need to write a new module function, here called `aging`

, that performs the necessary processes. Writing this illustrates some key requirements of any module added to `netsim`

. First, to describe what the function actually does: at the initial time step in the loop, which is 2 for the reasons noted above, persons in the population are randomly assigned an age between 18 and 49 years. At the subsequent time steps, their age is incremented by one month, as our time unit for the simulation is a month.

```
aging <- function(dat, at) {
## Attributes
if (at == 2) {
n <- sum(dat$attr$active == 1)
dat$attr$age <- sample(18:49, n, replace = TRUE)
} else {
dat$attr$age <- dat$attr$age + 1/12
}
## Summary statistics
if (at == 2) {
dat$epi$meanAge <- rep(mean(dat$attr$age, na.rm = TRUE), 2)
} else {
dat$epi$meanAge[at] <- mean(dat$attr$age, na.rm = TRUE)
}
return(dat)
}
```

As described above, all of the modules in `netsim`

use the same two functional arguments, `dat`

, and `at`

. The former is the master data object that carries around individual attributes, summary outcomes, and other data to be output at the end of the simulation. All individual-level attributes are saved in the `dat`

object in a sublist called `attr`

. The `active`

vector of attributes is a binary attribute indicating whether each person is active or not. Therefore, `n`

in time step 2 queries the size of the active population so the `sample`

function knows how many ages to produce. Everyone’s age is stored on that `attr`

list in a new vector called `age`

. That vector is modified for everyone at each subsequent time step. We can add summary statistics to any module by saving them onto `dat$epi`

. Here, we create a vector called `meanAge`

at time step 2 that contains the mean of everyone’s age in the population; at time steps 3 and onward, a new summary statistic and appended on that vector.

Whereas the aging module defined above is an original process added to `netsim`

that did not exist in built-in models, death already has a module. In this case, we will replace the existing death module for susceptibles with our new one. In the existing module, the probability of death is based on a fixed risk that may vary by disease status; the parameters that control this, `ds.rate`

and `di.rate`

and so on, are input in the parameters set through `param.net`

. Here, we will replace approach with a simple death module that takes advantage of our new age attribute: we will model the probability of death as a nonlinear function of increasing age, like this:

```
ages <- 18:49
death.rates <- 1/(70*12 - ages*12)
par(mar = c(3.2, 3.2, 1, 1), mgp = c(2, 1, 0))
plot(ages, death.rates, pch = 20, xlab = "age", ylab = "Death Risk")
```

Since we are using monthly time units, the death rates need to be specified with that same scale. Our life expectancy will be 70 years: each month closer to that age one gets, the risk of death increases. This is not necessarily meant to fit a real-world demographic life table but the age-specific mortality rates generally *do* follow a similar form.

In our death module, life expectancy will be a variable parameter, with the death rate calculation to be set in the module function. Who dies at any time step will be based on random draws from binomial distributions which probabilities equal to their age-specific risks.

```
dfunc <- function(dat, at) {
# Parameters
idsElig <- which(dat$attr$active == 1)
nElig <- length(idsElig)
nDeaths <- 0
# Processes
if (nElig > 0) {
ages <- dat$attr$age[idsElig]
life.expt <- dat$param$life.expt
death.rates <- pmin(1, 1/(life.expt*12 - ages*12))
vecDeaths <- which(rbinom(nElig, 1, death.rates) == 1)
idsDeaths <- idsElig[vecDeaths]
nDeaths <- length(idsDeaths)
# Update nodal attributes on attr and networkDynamic object
if (nDeaths > 0) {
dat$attr$active[idsDeaths] <- 0
dat$attr$exitTime[idsDeaths] <- at
dat$nw <- deactivate.vertices(dat$nw, onset = at, terminus = Inf,
v = idsDeaths, deactivate.edges = TRUE)
}
}
# Summary statistics
if (at == 2) {
dat$epi$d.flow <- c(0, nDeaths)
} else {
dat$epi$d.flow[at] <- nDeaths
}
return(dat)
}
```

The IDs of those eligible for the transition are determined by their current value of the active variable, again held in the `attr`

list on `dat`

. If there are any persons eligible, then the process will proceed. The ages of only those eligible are abstracted from the `attr`

list. The life expectancy parameter is queried from the parameter list, `param`

, containing the parameters passed into the model from `param.net`

as described below. An individual death rate is calculated for each person based on their age and that life expectancy. With those rates, a Bernoulli coin flip is performed for each death-eligible person. If there are any deaths, the active attribute must be toggled from 1 to 0 for those people.

Within the network object, located at `dat$nw`

, each of the dying nodes are “deactivated” from the network. This process removes their existing edges, if they are currently within a partnership, and disallows them from having future partnerships. Similar to our aging module, we count the number of deaths that have occurred within the time step and save that as a flow size in the `d.flow`

vector within the `epi`

sublist. At time step 2, that vector is created; at time steps 3 and onward, the summary statistic is appended onto that vector.

The built-in birth module for `netsim`

simulates the number of new births/arrivals at each time step as a function of a fixed rate, contained in the `b.rate`

parameter specified in `param.net`

, and the current population size. In the case of bipartite models, it is possible to specify that the size of the first group only should be considered (e.g., in case that mode 1 represents women in the population). In this example, we will use a different approach by using a constant growth model: the size of the population is expected to grow linearly at each time step based on a fixed growth rate at the outset of the simulation. Therefore, the number of new births that must be generated at each time step is the difference between the expected population size and the current population size.

```
bfunc <- function(dat, at) {
# Variables
growth.rate <- dat$param$growth.rate
exptPopSize <- dat$epi$num[1] * (1 + growth.rate * at)
n <- network.size(dat$nw)
tea.status <- dat$control$tea.status
numNeeded <- exptPopSize - sum(dat$attr$active == 1)
if (numNeeded > 0) {
nBirths <- rpois(1, numNeeded)
} else {
nBirths <- 0
}
if (nBirths > 0) {
dat$nw <- add.vertices(dat$nw, nv = nBirths)
newNodes <- (n + 1):(n + nBirths)
dat$nw <- activate.vertices(dat$nw, onset = at, terminus = Inf, v = newNodes)
}
# Update attributes
if (nBirths > 0) {
dat$attr$active <- c(dat$attr$active, rep(1, nBirths))
dat$attr$status <- c(dat$attr$status, rep("s", nBirths))
dat$attr$infTime <- c(dat$attr$infTime, rep(NA, nBirths))
dat$attr$entrTime <- c(dat$attr$entrTime, rep(at, nBirths))
dat$attr$exitTime <- c(dat$attr$exitTime, rep(NA, nBirths))
dat$attr$age <- c(dat$attr$age, rep(18, nBirths))
if (tea.status == TRUE) {
dat$nw <- activate.vertex.attribute(dat$nw, prefix = "testatus",
value = 0, onset = at,
terminus = Inf, v = newNodes)
}
}
# Summary statistics
if (at == 2) {
dat$epi$b.flow <- c(0, nBirths)
} else {
dat$epi$b.flow[at] <- nBirths
}
return(dat)
}
```

As with the life expectancy parameter, the growth rate will be a parameter that we pass into the module function through `param.net`

and which will be stored on `dat$param`

. The expected population size uses that the starting population size, stored in the summary statistics list `epi`

, where the number needed is the difference between the expected and current population size, calculated again with the active attribute on `dat`

.

Any birth function must do two things: add new nodes onto the network object and then update all of their attributes with their initial values. In our function, the `add.vertices`

function will add new nodes onto the network, and the `activate.vertices`

function will allow them to actively form edges within the model. We will need to set the 5 standard attributes for these new incoming births, as well as our new age attribute, on the `attr`

list. The data for the new births will positionally be located at the end of the vector, so we append the values specified above using the `c`

function. So new births will be active, disease susceptible, have no infection time, have an entry time equal to `at`

, have no exit time, and enter the population at age 18. We additionally use a time-varying attribute on the network object for disease status, `testatus`

for temporally-extended status, to track the changes in disease status for each node at each time point. This attribute value will automatically be changed to 1 if the person becomes infected.

For these examples, we will simplify things by fitting a basic random-mixing model for our ERGM. For the dissolution coefficient adjustment we take the mean of the death rates as a basic approximation to the overall mortality schedule for the population.

```
nw <- network.initialize(500, directed = FALSE)
est <- netest(nw, formation = ~edges, target.stats = 150,
coef.diss = dissolution_coefs(~offset(edges), 60, mean(death.rates)))
```

`## Starting maximum pseudolikelihood estimation (MPLE):`

`## Evaluating the predictor and response matrix.`

`## Maximizing the pseudolikelihood.`

`## Finished MPLE.`

`## Stopping at the initial estimate.`

To parameterize the epidemic model, it is necessary to collect all the new elements that we created. For the model parameters, the transmission modules are unchanged and use the same infection probability and act rate as the built-in models. Our new birth module requires a growth rate parameter. Since our time units are months, we specify a 1% yearly growth rate in terms of monthly growth. The death module requires an input of a life expectancy. The module expects that in terms of years. Initial conditions are specified like a built-in SI model.

```
param <- param.net(inf.prob = 0.15, growth.rate = 0.00083, life.expt = 70)
init <- init.net(i.num = 50)
```

For controls settings, we specify the model type, number of simulations and time steps per simulation, similar to any built-in model. To replace the default death module, it is necessary to input the new `dfunc`

function name as the `deaths.FUN`

parameter. A similar approach is used for the birth module. Both of these functions must be sourced into memory before running `control.net`

: the control function will try to look up this data and save it within the object output. To add a new module into `netsim`

, give the module a name not among those modules available for replacement; but the module name must end with `.FUN`

to register as a module. One important note on ordering: within a time step, replacement modules will be run in the order in which they are listed in the `control.net`

help documentation; additional modules like `aging.FUN`

will be run in the order in which you list them in `control.net`

. Additional modules will run first and existing/replacement modules will run second. This ordering of all the modules may be changed and explicitly specified using the `module.order`

argument in `control.net`

.

```
control <- control.net(type = "SI", nsims = 4, ncores = 4, nsteps = 250,
departures.FUN = dfunc, arrivals.FUN = bfunc, aging.FUN = aging,
depend = TRUE, save.network = FALSE)
```

Finally, the input for that module argument is the function itself. Because we have not input any parameters that automatically trigger a dependent network model (the vital dynamics parameters), it is necessary to explicitly set that `depend=TRUE`

.

The model is simulated using the standard method of inputting the fitted network model object, the parameters, initial conditions, and control settings.

`mod <- netsim(est, param, init, control)`

Printing the model object shows that we now have non-standard birth and death output for flows. Additionally, we have the `meanAge`

data saved as other output (summary statistics saved in `epi`

named ending with `.num`

will automatically be classified as compartments and `.flow`

as flows).

`mod`

```
## EpiModel Simulation
## =======================
## Model class: netsim
##
## Simulation Summary
## -----------------------
## Model type: SI
## No. simulations: 4
## No. time steps: 250
## No. NW modes: 1
##
## Model Parameters
## -----------------------
## inf.prob = 0.15
## growth.rate = 0.00083
## life.expt = 70
## act.rate = 1
##
## Model Output
## -----------------------
## Variables: s.num i.num num meanAge d.flow b.flow si.flow
## Transsmissions: sim1 ... sim4
```

Basic model plots show the simulation results for the prevalence and absolute state sizes over time. The same sort of graphical plotting options available for any built-in model are also available for these new expansion models.

```
par(mfrow = c(1,2))
plot(mod, main = "State Prevalences")
plot(mod, popfrac = FALSE, main = "State Sizes", sim.lines = TRUE,
qnts = FALSE, mean.smooth = FALSE)
```