This tutorial documents how to use EpiModel to solve new deterministic compartmental models (DCMs). New model types incorporate model compartments, parameters, and structures different from the built-in SI/SIR/SIS model types in EpiModel. This extension tutorial assumes a solid familiarity with both R programming and epidemic model parameterization the other tutorials. If you are not familiar with DCMs or running this model class in EpiModel, consult the Basic DCMs with EpiModel tutorial.

The EpiModel DCM Framework

First load the EpiModel package:

library(EpiModel)

To start let’s examine the mathematical syntax for our basic SI model in a closed population (no births or deaths). We first run this model in EpiModel to see the normal syntax that avoids any specific reference to the model function specifying the mathematical structure. As detailed in the basic tutorial, EpiModel automatically chooses the correct model function as a result of the input parameters, initial conditions, and control settings.

param <- param.dcm(inf.prob = 0.5, act.rate = 0.25)
init <- init.dcm(s.num = 500, i.num = 1)
control <- control.dcm(type = "SI", nsteps = 100)
mod <- dcm(param, init, control)
plot(mod)

Model Functions

Within EpiModel, DCMs are solved using the deSolve package, using this framework to define a model as a function that includes calculations for dynamic rates and the system of differential equations for each compartment in the model. To see what one of these looks like, we can use the control.dcm argument print.mod = TRUE. When the dcm function is run with this, only the model structure is printed to the console.

control <- control.dcm(type = "SI", nsteps = 100, print.mod = TRUE)
mod <- dcm(param, init, control)
function(t, t0, parms) {
  with(as.list(c(t0, parms)), {

    # Derivations
    num <- s.num + i.num

    # Parameters
    lambda <- inf.prob * act.rate * i.num / num
    if (!is.null(parms$inter.eff) && t >= inter.start) {
      lambda <- lambda * (1 - inter.eff)
    }

    # Flows
    si.flow <- lambda * s.num

    # ODEs
    dS <- -si.flow
    dI <- si.flow

    # Output
    list(c(dS, dI,
           si.flow),
         num = num)
  })
}
<bytecode: 0x7f922a25d658>
<environment: namespace:EpiModel>

The standard specification for model functions is with the top two lines to wrap the input and output into a list structure. The mathematical calculation of main derivatives, dS and dI, are an arithmetic function of the force of infection, lambda and the size of the susceptible population, s.num. The base model functions include a modifier on the lambda to simulate an intervention if those parameters are specified in the model.

An alternative method for accessing the built-in model functions is to consult the internal help page for those functions.

?dcm.mods

There are currently a total of 12 model types: three disease types (SI, SIR, and SIS), two group number specifications (one versus two groups), and two demographic settings (open versus closed populations). An equivalent method to access the same model function as above is by printing the internal model function.

print(mod_SI_1g_cl)
function(t, t0, parms) {
  with(as.list(c(t0, parms)), {

    # Derivations
    num <- s.num + i.num

    # Parameters
    lambda <- inf.prob * act.rate * i.num / num
    if (!is.null(parms$inter.eff) && t >= inter.start) {
      lambda <- lambda * (1 - inter.eff)
    }

    # Flows
    si.flow <- lambda * s.num

    # ODEs
    dS <- -si.flow
    dI <- si.flow

    # Output
    list(c(dS, dI,
           si.flow),
         num = num)
  })
}
<bytecode: 0x7f922a25d658>
<environment: namespace:EpiModel>

Steps to Writing a Model Function

Each mathematical model solved using EpiModel must have its own R function. The core function structure should be the the same for each model:

  1. It must include the overall function structure, including the two lines at the top and two lines at the bottom. These lines should not be changed at all. They pass in the initial conditions and fixed parameters to be evaluated over time.
  2. Next input are all dynamic calculations, including varying parameters like lambda (the force of infection), which changes as a function of infected population size. Here, we include the calculation for num, the total population size. This is not dynamic in this particular model because of the closed population, but it allows for a shorthand formula for lambda. The values for s.num and i.num, the number susceptible and infected, are passed into this function via the initial conditions we set in init.dcm. Dynamic calculations also include composite statistics that you would like to output with the model (e.g., prevalence <- i.num/num).
  3. The differential equations should next be specified very similarly to how they are written in Stella or Madonna software. The names of the differential equations are arbitrary (e.g., dS below) but must be consistent with the output list. The derivatives for the model not only include the compartment sizes for the disease states, but also any flows in the model (si.flow in this model); that is because the flow sizes are calculated using the same method of numerical integration as the compartment sizes.
  4. The output list is what deSolve outputs in solving the model. This list should include, at the least, all the differential equations that you have defined in the function, including the flows. This list first specifies the differential equation in a combined vector: c(dS, dI, si.flow). Always list the derivative objects first, in this combined vector format, and in the same order that they are entered in the initial conditions list. After writing these equations, one can also output dynamic statistics from the model. These should be named in the manner below to ensure that the output object has appropriate column names.
  5. Finally, the function should be named something unique and relevant, and then saved to memory. We will show the details of that below.

Parameters, Initial Conditions, and Controls

To define the parameters, initial state sizes, and control settings for the model specifications use the the param.dcm, init.dcm, and control.dcm helper functions.

  1. Parameters are entered in the same manner they are with built-in models, but there must be consistency between the parameter names entered in param.dcm and the variables called within your new model function.
  2. The initial conditions are also entered similarly, but it is vital to remember that states must be in exacty the same order as the differential equations in the function output. If you are experiencing unexpected model results, this is the first thing to check. Also, remember to enter all flow sizes as both initial conditions here. The variable names for flows show end in .flow: this tells EpiModel to calculate the flow sizes at each time step as a lagged difference rather than a cumulative size. If the flows are not named to end with .flow, errors will be the result.
  3. Finally controls are also entered similarly, but one no longer uses the type parameter, and instead uses the new.mod parameter to specify the model function.

Details of these steps will be further described in the two examples below.

Example 1: SEIR Model

EpiModel includes a built-in SIR model, but here we show how to model an SEIR disease like Ebola. The E compartment in this disease is an exposed state in which the person is not infectious to others. Following some basic parameters for Ebola in the popular science to date, we model this disease using parameters for \(R_0\), the average durations spent in the exposed and infected phases, and the case fatality rate. This model will use a simplifying assumption that the only deaths in the population are due to Ebola.

Persons infected then either die from infection or recovery at an equal rate, defined by the average time spent in the infected state. Other simplifying assumptions here are that dead persons do not transmit disease (many infections have occurred this way) and that the rate of contacts remains the same over the infected phase (it probably declines, but there is not good data on this).

Model Function

We use four parameters in the model: R0 is the initial reproductive number, e.dur is the duration of the exposed state, i.dur is the duration of the infectious state, and cfr is the case fatality rate expressed as a proportion of those who will die among those infected. Given estimates on \(R_0\) and the duration of infection, we infer a simplistic lambda by mathematical rearrangement.

The four differential equations are defined as a function of state sizes and parameters. For the infected state, in-flows are from the exposed stage and out-flows are either to the recovered state (1 - the CFR will recover) or death (CFR will die).

SEIR <- function(t, t0, parms) {
  with(as.list(c(t0, parms)), {
    
    # Population size
    num <- s.num + e.num + i.num + r.num
    
    # Effective contact rate and FOI from a rearrangement of Beta * c * D
    ce <- R0 / i.dur
    lambda <- ce * i.num/num
  
    dS <- -lambda*s.num
    dE <- lambda*s.num - (1/e.dur)*e.num
    dI <- (1/e.dur)*e.num - (1 - cfr)*(1/i.dur)*i.num - cfr*(1/i.dur)*i.num
    dR <- (1 - cfr)*(1/i.dur)*i.num
    
    # Compartments and flows are part of the derivative vector
    # Other calculations to be output are outside the vector, but within the containing list
    list(c(dS, dE, dI, dR, 
           se.flow = lambda * s.num,
           ei.flow = (1/e.dur) * e.num,
           ir.flow = (1 - cfr)*(1/i.dur) * i.num,
           d.flow = cfr*(1/i.dur)*i.num),
         num = num,
         i.prev = i.num / num,
         ei.prev = (e.num + i.num)/num)
  })
}

In the model output, we include the four derivatives, but also several other calculated time-series to analyze later. These are all included as named elements of the list, with compartments and flows specified as derivatives in the named vector and other summary statistics included in the output list.

Model Parameters

For model parameters, we specify the following values. Our sensitivity analysis will vary the CFR, given the potential for variability of this across health systems. The starting population is roughly 1 million persons in which 10 have been exposed at the outset. Note that we specify the initial value for all the flows named in the model, with a starting value of 0 (i.e., there are no transitions at the outset of the model simulation). We simulate this model over 500 days, specifying that our model function is SEIR as defined above.

param <- param.dcm(R0 = 1.9, e.dur = 10, i.dur = 14, cfr = c(0.5, 0.7, 0.9))
init <- init.dcm(s.num = 1e6, e.num = 10, i.num = 0, r.num = 0,
                 se.flow = 0, ei.flow = 0, ir.flow = 0, d.flow = 0)
control <- control.dcm(nsteps = 500, dt = 1, new.mod = SEIR)
mod <- dcm(param, init, control)

The model shows that three models were run, where the CFR was varied. All the output in the model matches our model function specs.

mod
EpiModel Simulation
=======================
Model class: dcm

Simulation Summary
-----------------------
No. runs: 3
No. time steps: 500

Model Parameters
-----------------------
R0 = 1.9
e.dur = 10
i.dur = 14
cfr = 0.5 0.7 0.9

Model Output
-----------------------
Variables: s.num e.num i.num r.num se.flow ei.flow ir.flow 
d.flow num i.prev ei.prev

Model Results

Let’s first ignore the sensitivity analysis and examine the model with the middle CFR of 70%. Here is what the prevalence and incidence are from the model. In the initial epidemic, the prevalence curve is approximated by an exponential growth model. The peak incidence occurs roughly one year into the epidemic.

par(mfrow = c(1, 2))
plot(mod, y = "i.num", run = 2, main = "Prevalence")
plot(mod, y = "se.flow", run = 2, main = "Incidence")