vignettes/L1_FalseDiscoveryRate.Rmd
L1_FalseDiscoveryRate.Rmd
This vignette provides an overview of the primary function of the
linkage scenario portion of the phylosamp package: how to estimate the
false discovery rate given a sample size. In the examples provided, we
use the default assumption
argument (multiple transmissions
and multiple links, mtml
), though alternative assumptions
can also be specified.
The most basic function of the package is
translink_tdr()
, which calculates the probability that an
identified link represents a true transmission event. This calculation
relies on the following parameters:
Param | Variable Name | Description |
---|---|---|
\(\eta\) | sensitivity | the sensitivity of the linkage criteria for identifying transmission links |
\(\chi\) | specificity | the specificity of the linkage criteria for identifying transmission links |
\(\rho\) | rho | the proportion of infections sampled |
\(M\) | M | the number of infections sampled |
\(R\) | R | the average reproductive number (also denoted \(R_\text{pop}\), see below) |
translink_tdr(sensitivity=0.99, specificity=0.95, rho=0.75, M=100, R=1)
## Calculating true discovery rate assuming multiple-transmission and multiple-linkage
## [1] 0.2334906
In other words, given a sample size of 100 infections (representing 75% of the total population), a linkage criteria with a specificity of 99% for identifying infections linked by transmission and a specificity of 95%, fewer than 25% of identified pairs will represent true transmission events. Increasing the specificity to 99.5% has a significant impact on our ability to distinguish linked and unlinked pairs:
translink_tdr(sensitivity=0.99, specificity=0.995, rho=0.75, M=100, R=1)
## Calculating true discovery rate assuming multiple-transmission and multiple-linkage
## [1] 0.7528517
The other core functions are designed to calculate the expected
number of true transmission pairs identified in the sample
(translink_expected_links_true()
) and the total number of
linkages one can expect to identify given the sensitivity and
specificity of the linkage criteria and a particular sample size and
proportion (translink_expected_links_obs()
).
translink_expected_links_true(sensitivity=0.99, rho=0.75, M=100, R=1)
## Calculating expected number of links assuming multiple-transmission and multiple-linkage
## [1] 74.25
translink_expected_links_obs(sensitivity=0.99, specificity=0.95,
rho=0.75, M=100, R=1)
## Calculating expected number of links assuming multiple-transmission and multiple-linkage
## [1] 318
It is important to recognize that \(R\) in these functions represents the average \(R\) in the sampled population (alternatively denoted \(R_\text{pop}\)). Because any sampling frame contains a finite number of cases, there will always be more cases than infection events (at minimum, all infectees in a transmission chain plus a single index case), so \(R_\text{pop}\leq1\). For outbreaks with a single introduction, \(R_\text{pop}\) is approximately equal to 1; sampling frames containing cases from separate introduction events will have lower values of \(R_\text{pop}\).