The nCoV Sandbox is a running analytic blog we are “writing” as we try to apply some methods we had in the very early stages of development, and some old friends, to the 2019 nCoV outbreak. It also is us trying to run some analyses to get our on handle on, and keep up to date on, the epidemiology of the emerging epidemic.
This is a bit of an excercise in radical transparency, and things are going start out very messy…but will hopefully get cleaner and more meaningful as things go. But the old stuff will (for the moment) remain at the bottom for posterity.
One of the big questions about the nCoV-2019 epidemic is how do we reconcile the appareant \(R_0\) of the large epidemic in Wuhan with the apparent lack of onward transmission elsewhere. Jess Metcalf and I do a rough analysis of the overdispersion or control that would be needed ot reconcile these anlayses Here.
Because all of the attempts at line list data from public source are not really keeping up with the pace of the epidemic as detailed media reports, etc. become less common I have been generally ignoring them…but perhaps it is time just to check out how recent updates to the “Kudos” line list have changed the picture, if at all.
The source of line list data is starting to shift as countries get there first few cases, which still make it to media reports, etc. Almost all ealry cases in the line list are from China, while the later ones are from elsewhere.
ggplot(kudos, aes(x=symptom_onset, fill=as.factor(country))) +
geom_bar()
Not much more data on cases, but worth a brief reprise on the age distribution vs. merge. Automating this in fuctions as well.
Figure: Odds ratio of death versus 50-59 year olds by age group for MERS-CoV and nCoV-2019. Log-scale.
Table: Odds ratio of death by age group for MERS=CoV and nCoV-2019
age_cat | nCoV | MERS |
---|---|---|
0-9 | - | 0.41 (0.11, 1.26) |
10-19 | - | 0.19 (0.05, 0.52) |
20-29 | - | 0.22 (0.12, 0.41) |
30-39 | 0.45 (0.06, 2.36) | 0.20 (0.11, 0.35) |
40-49 | 0.47 (0.06, 2.50) | 0.52 (0.31, 0.87) |
50-59 | 1 | 1 |
60-69 | 6.69 (2.24, 24.82) | 2.86 (1.59, 5.26) |
70+ | 19.80 (6.36, 76.63) | 4.92 (2.79, 8.95) |
The few new cases have not changed the picture much. How does the age distribution of reported cases in China compare with the age distribution of cases nationally.Age distribution data from https://www.populationpyramid.net/china/2019/.
Goal is to do a better job of recreating the daily case counts in each area so we have implied epidemic curves to work with for some of the more sophisticated stuff (hopefully) to come.
First let’s load in the data. Currently using only confirmed cases (driven a bit by data source), but unclear how long this will be viable.
Things looks a little funny prior to the first, but this does seem like it should give a rough pseudo epidemic curve for the purpose of anlaysis.
First goal for the day, dig in deeper on the age specific data and compare with the MERS-CoV data in a bit more detail.
First as always, load and sumarize the most recent Kudos line list (https://docs.google.com/spreadsheets/d/1jS24DjSPVWa4iuxuD4OAXrE3QeI8c9BC1hSlqr-NMiU/edit#gid=1187587451)
## Warning: The following named parsers don't match the column names: date
## Warning: Removed 31 rows containing non-finite values (stat_count).
Note that we don’t have any linelist information on the deaths that occured before arou 1/15 in this line lisat. Moving forward with this data comparing with MERS-CoV data from Saudi Arabia through summer 2014.
Figure: Odds ratio of death by age group for MERS=CoV and nCoV-2019. Log-scale.
Table: Odds ratio of death by age group for MERS=CoV and nCoV-2019
age_cat | nCoV | MERS |
---|---|---|
0-9 | - | 0.41 (0.11, 1.26) |
10-19 | - | 0.19 (0.05, 0.52) |
20-29 | - | 0.22 (0.12, 0.41) |
30-39 | 0.14 (0.01, 1.02) | 0.20 (0.11, 0.35) |
40-49 | 0.16 (0.01, 1.19) | 0.52 (0.31, 0.87) |
50-59 | 1 | 1 |
60-69 | 5.88 (1.80, 23.39) | 2.86 (1.59, 5.26) |
70+ | 17.71 (4.74, 82.15) | 4.92 (2.79, 8.95) |
Take aways from OR of death comparison
What if nCoV symptomatic and death rates were identical to those of MERS-CoV. How many cases would the current line list represent? How about the full data if they follow a similar age distribution?
Using mortality and infection rates for this paper in AJE on MERS-CoV symptomatic ratios and IFRs ratios (10.1093/aje/kwv452), and a lot of assumptions:
Table: Implied number of cases and needed ratio of IFR in nCoV and MERS-CoV to reconcile deaths and implied cases.
Age | pr alive | pr dead | est. cases | est. dead | MERS symptomatic ratio | MERS IFR | Implied Infections by SR | Implied Infections by IFR | IFR Ratio to Reconcile |
---|---|---|---|---|---|---|---|---|---|
0-9 | 0.02 | 0.00 | 89.48 | 0.00 | 0.11 | 0.10 | 813.45 | 0.00 | 0.00 |
10-19 | 0.04 | 0.00 | 178.96 | 0.00 | 0.11 | 0.05 | 1626.91 | 0.00 | 0.00 |
20-29 | 0.10 | 0.00 | 425.03 | 0.00 | 0.14 | 0.05 | 3035.93 | 0.00 | 0.00 |
30-39 | 0.22 | 0.03 | 1006.65 | 2.74 | 0.23 | 0.08 | 4376.74 | 34.29 | 0.01 |
40-49 | 0.20 | 0.03 | 872.43 | 2.74 | 0.39 | 0.17 | 2237.00 | 16.14 | 0.01 |
50-59 | 0.14 | 0.10 | 648.73 | 10.97 | 0.60 | 0.38 | 1081.22 | 28.88 | 0.03 |
60-69 | 0.16 | 0.41 | 738.21 | 43.90 | 0.78 | 0.63 | 946.42 | 69.68 | 0.07 |
70+ | 0.12 | 0.44 | 514.51 | 46.64 | 0.88 | 0.79 | 584.67 | 59.04 | 0.10 |
Overall | 1.00 | 1.00 | 4474.00 | 107.00 | 0.46 | 0.31 | 14702.34 | 208.03 | 0.02 |
So, if the symptomatic ratio for nCoV 2019 is similar to what was implied by the confirmed cases of MERS-CoV (and other assumptions hold) the following things are true.:
Note this is interesting note it is the result of a thought experiment only!!!
Three goals for today:
Age distribution and epicurve for cases where we have individual line list information.
## Warning: Removed 18 rows containing non-finite values (stat_count).
Now lets look at some basic infomration on survival by age group and gender.
## Waiting for profiling to be done...
age_cat | alive | dead | OR | CI |
---|---|---|---|---|
(0,10] | 2 | 0 | 0.0000000 | NA,2.4442929329126e+305 |
(10,20] | 4 | 0 | 0.0000000 | NA,1.54362475336342e+123 |
(20,30] | 16 | 0 | 0.0000000 | NA,5.68642814649887e+36 |
(30,40] | 25 | 1 | 0.1900000 | 0.01,1.41 |
(40,50] | 26 | 1 | 0.1826923 | 0.01,1.36 |
(50,60] | 19 | 4 | 1.0000000 | - |
(60,70] | 10 | 16 | 7.6000000 | 2.15,32.49 |
(70,80] | 1 | 7 | 33.2500000 | 4.34,723.13 |
(80,90] | 1 | 10 | 47.5000000 | 6.56,1014.13 |
## Waiting for profiling to be done...
gender | alive | dead | OR | CI |
---|---|---|---|---|
male | 72 | 27 | 1.0000000 | - |
female | 35 | 12 | 0.9142857 | 0.58,1.99 |
Take aways from the line list data:
Now lets start to look at the aggregate cumulative case data as that is going to be the most widely available, complete and the basis for most of our predictive style analyses.
First we will focuse on Mainland China, Hong Kong and Macau.
jhucsse <- read_JHUCSSE_cases("2020-01-25 23:59", append_wiki = TRUE)
## Warning: All formats failed to parse. No formats found.
##Filter to China:
jhucsse_china <- jhucsse %>%
filter(Country_Region%in%c("Mainland China", "Macau", "Hong Kong"))
jhucsse_china %>% drop_na(Confirmed) %>%
filter(Update>"2020-01-14") %>%
ggplot(aes(x=Update, y=Confirmed, col=Province_State)) +
geom_line() + scale_y_log10()
Looking at all provinces, so let’s narrow it to places that at some point experience at least 25 confimed cases and look vs. a straight log-linear line.
Note that is is not quite right for real exponential growth since we are looking at the cumulative report rather than the
tmp <- jhucsse_china%>%filter(Confirmed>=25)
tmp <- unique(tmp$Province_State)
## Look at consitencey in exponential groqth by areas.
analyze <- jhucsse_china %>% drop_na(Confirmed) %>%
filter(Update>"2020-01-14") %>%
filter(Province_State%in%tmp)
#Get the slopes for each province.
slopes <- analyze %>% nest(-Province_State) %>%
mutate(slope=map_dbl(data, ~lm(log10(.$Confirmed)~as.Date(.$Update))$coef[2])) %>%
select(-data) %>% mutate(exp_scale=10^(slope))
## Warning: All elements of `...` must be named.
## Did you want `data = c(Country_Region, Update, Confirmed, Deaths, Recovered, Suspected)`?
kable(slopes, digits=2)
Province_State | slope | exp_scale |
---|---|---|
Hubei | 0.14 | 1.37 |
Zhejiang | 0.29 | 1.96 |
Guangdong | 0.24 | 1.75 |
Henan | 0.61 | 4.07 |
Chongqing | 0.46 | 2.89 |
Hunan | 0.44 | 2.77 |
Anhui | 0.41 | 2.58 |
Beijing | 0.18 | 1.52 |
Sichuan | 0.37 | 2.35 |
Shanghai | 0.20 | 1.58 |
Shandong | 0.41 | 2.55 |
Jiangxi | 0.36 | 2.27 |
Guangxi | 0.41 | 2.57 |
Jiangsu | 0.40 | 2.49 |
#ggplot(slopes, aes(x=Province_State, y=slope)) +
# geom_bar(stat="identity") + coord_flip()
##Plot the exponential growth rate in eaach against a linear rate.
jhucsse_china %>% drop_na(Confirmed) %>%
filter(Update>"2020-01-14") %>%
filter(Province_State%in%tmp)%>%
ggplot(aes(x=Update, y=Confirmed, col=Province_State)) +
geom_point() + scale_y_log10() + stat_smooth(method="lm", se=FALSE)
Leaving it there for the moment due to lack of aggregate data.
Cumulative analysis preliminary so too early to say much but:
Simple snapshot as of 2020-24-1 based on snapshot of linelist data derived from public sources from: https://docs.google.com/spreadsheets/d/1jS24DjSPVWa4iuxuD4OAXrE3QeI8c9BC1hSlqr-NMiU/edit#gid=1449891965
(AKA the Kudos list).
This is some very basic episnapshots that should be improved in the coming days.
First just take a rough look at the age distribution of cases. Ten year increments.
source("R/DataLoadUtils.r")
kudos <- readKudos2("data/Kudos Line List-1-24-2020.csv") %>%
mutate(age_cat = cut(age, seq(0,100,10)))
#Age distribution of cases.
require(ggplot2)
ggplot(drop_na(kudos, age_cat),
aes(x=age_cat, fill=as.factor(death))) +
geom_bar( color="grey") + coord_flip() + xlab("Age Catergory")
Next, are we seeing any obvious differences in mortality by gender or age?
## Waiting for profiling to be done...
gender | alive | dead | OR | CI |
---|---|---|---|---|
male | 54 | 15 | 1.00 | - |
female | 23 | 8 | 1.25 | 0.48,3.31 |
## Waiting for profiling to be done...
age_cat | alive | dead | OR | CI |
---|---|---|---|---|
(10,20] | 2 | 0 | 0.00 | 0,2.393735337629e+91 |
(20,30] | 8 | 0 | 0.00 | NA,8.12332729629327e+59 |
(30,40] | 20 | 1 | 0.70 | 0.03,18.71 |
(40,50] | 21 | 1 | 0.67 | 0.02,17.8 |
(50,60] | 14 | 1 | 1.00 | - |
(60,70] | 8 | 9 | 15.75 | 2.34,319.1 |
(70,80] | 1 | 3 | 42.00 | 2.83,1775.73 |
(80,90] | 1 | 8 | 112.00 | 9.58,4388.42 |
Even as sparse as this data is, this is showing some clear evidence of and age relationship.
Epidemic curve of line list cases. Not super informative at this point.
ggplot(kudos, aes(x=symptom_onset, fill=as.factor(death))) +
geom_bar()
## Warning: Removed 11 rows containing non-finite values (stat_count).
A touch interesting that all deaths are early on. This suggests either (A) surveillance was really biased towards deaths in the early days, or (B) a lot of the later reports have not had time to die.
[Note that there was perviously a 1-23-2020 summary but that was too preliminary even for this]