EpiModel provides a variety of utility functions to help in estimating, simulating, and analyzing the output of a stochastic network epidemic model. In this short vignette, we provide a basic overview of some of these core functions and how they are used in network modeling.

# Estimation Utilities

These utility functions are used in the estimation phase of stochastic network modeling. These build on the multitude of functions already available for dynamic exponential-family random graph models in the network, networkDynamic, and tergm packages.

## Calculating Dissolution Coefficients with dissolution_coefs

The dissolution_coefs function calculates dissolution coefficients from a vector of partnership durations, and then applies an adjustment to those coefficients to account for a competing risk of dissolution due to death of one or more members of that partnership.

First, we discuss the transformation from duration to coefficients. Typically in models based on empirical data, one would start with an estimate of the mean duration of partnerships, with appropriate consideration for both right-censoring for ongoing partnerships and left truncation for partnerships not queried before a time point. With the resulting means, duration information comes into a dynamic network model estimation through closed-form calculation of the fixed coefficient values that the model should stochastically target. Some of the theory and mathematics for this are further developed in the vignette for the networkDynamic package. The first goal of the dissolution_coefs function in EpiModel is to make these calculations behind-the-scenes.

Second, the duration estimate from the empirical data may have not accounted for the impact of deaths on partnership dissolution. The overall rate of partnership dissolution is therefore comprised of an endogenous probability of partnerships dissolved based on an assumed static population in which there is no death, and also the exogenous probability of partnership dissolution due to death. The dissolution_coefs function applies an adjustment to the transformed duration estimate by accounting for the competing risks. The formula for the death correction is:

$logit \left( 1 -\frac{P(E_t) - P(N_t)}{P(\bar{N_t})} \right)$

where $$P(E_t)$$ is the overall probability of a partnership dissolving at time $$t$$, $$P(N_t)$$ is the probability of either node dying at time $$t$$, and thus $$P(\bar{N_t})$$ is the probability of both surviving at time $$t$$.

In the first example, we specify the dissolution model as a simple edges term. The mean duration of all partnerships in the network in the absense of the competing risk of death is 25 (months, although the time unit is arbitrary). We obtain the coefficient from the dissolution_coefs function by specifying these two arguments to the function.

dissolution <- ~offset(edges)
duration <- 25
coefs <- dissolution_coefs(dissolution, duration)
coefs
Dissolution Coefficients
=======================
Dissolution Model: ~offset(edges)
Target Statistics: 25
Crude Coefficient: 3.178054
Mortality/Exit Rate: 0
Adjusted Coefficient: 3.178054

Four pieces of information are stored in the coefs object, as shown in the printed output. Note that the adjusted and crude coefficient are the same in this model because there has been no adjustment for exogenous deaths. The default death rate that is an exongenous influence on partnership dissolution is zero. The dissolution_coefs class object is passed directly to the netest function for estimation.

The next example shows that adjusting for that influence of death with a rate of 1/1000: this implies under a exponential or geometically distributed time-to-death, the average lifespan is 1000 (for example, months or 83.3 years). The adjustment raises the crude coefficient for dissolution increases from 3.18 to 3.23: existing relationships now persist longer with respect to their propensity for dissolution from “natural causes”.

dissolution_coefs(dissolution, duration, d.rate = 0.001)
Dissolution Coefficients
=======================
Dissolution Model: ~offset(edges)
Target Statistics: 25
Crude Coefficient: 3.178054
Mortality/Exit Rate: 0.001
Adjusted Coefficient: 3.229321

Currently, dissolution modeling in EpiModel is currently limited to ~edges only models: one specifies a single coefficient that is a function of the mean duration of all partnerships, and in the network simulation, the duration of partnerships is exponentially distributed around that mean.

## Checking Balance of Degree Distributions with check_bip_degdist

Bipartite (two-mode) network models that incorporate mode-specific degree distributions must be balanced. That is, the number of partnerships implied by a population size in the first mode with a given its degree distribution must equal the number of partnerships implied by a population size in the second mode with a given degree distribution.

The check_bip_degdist function helps ensure this balance given the population sizes in each of the modes and the fractional degree distributions in each. The population sizes are simply specified as the number in each mode. The degree distributions are the fractions of persons in each mode with a degree of 0, a degree of 1, a degree of 2 and so on. This information is routinely estimated from empirical datasets to parameterize models.

First, an imbalanced distribution is shown below. The first and third columns show the input distributions, the second and fourth columns show the number of persons implied by that distribution and the mode size. The total is actually different: it is the number of partnerships overall and is obtained by summing the products of degree value and node count. Here, the function returns an imbalanced distribution message that the total partnerships between the modes do not match.

# An imbalanced distribution
check_bip_degdist(num.m1 = 500, num.m2 = 500,
deg.dist.m1 = c(0.40, 0.55, 0.03, 0.02),
deg.dist.m2 = c(0.48, 0.41, 0.08, 0.03))
Bipartite Degree Distribution Check
=============================================
m1.dist   m1.cnt   m2.dist   m2.cnt
Deg0       0.40      200      0.48      240
Deg1       0.55      275      0.41      205
Deg2       0.03       15      0.08       40
Deg3       0.02       10      0.03       15
Edges      1.00      335      1.00      330
=============================================
** Mode 1 Edges > Mode 2 Edges: 0.015 Rel Diff 

So we alter the distribution by moving 5 people in mode 1 who had a degree of 3 to a degree of 2. Another possibility would be to increase the size of the mode 2 population slightly, so that count products would grow. In either case, it is necessary to think about and provide scientific justification for these types of changes if working from empirical data.

# A balanced distribution
targets <- check_bip_degdist(num.m1 = 500, num.m2 = 500,
deg.dist.m1 = c(0.40, 0.55, 0.04, 0.01),
deg.dist.m2 = c(0.48, 0.41, 0.08, 0.03))
Bipartite Degree Distribution Check
=============================================
m1.dist   m1.cnt   m2.dist   m2.cnt
Deg0       0.40      200      0.48      240
Deg1       0.55      275      0.41      205
Deg2       0.04       20      0.08       40
Deg3       0.01        5      0.03       15
Edges      1.00      330      1.00      330
=============================================
** Edges balanced ** 
targets
[1] 330 200 275  20   5 240 205  40  15

Since these numbers typically enter into the target statistics in a network model, one may save the function output to an object, which contains the total edges and then nodal counts for mode 1 followed by mode 2.

Last updated: 2017-06-01 with EpiModel v1.5.0