The Diamond Princess cruise was placed under quarantine on Feb 5 after a former passenger (who was on the boat from Jan 20-25) tested positive for COVID-19 in Hong Kong on Feb 1. During the quarantine, which ended on Feb 19, a subset of passengers were tested for COVID-19 every day. As infection numbers rose, many people questioned the efficacy of the quarantine. On Feb 20, the Japanese National Institute of Infectious Diseases reported that the majority of transmission occurred prior to the quarantine.

One overlooked factor in the outbreak is that the surveillance protocol changed after the quarantine was implemented. In the beginning, only symptomatic cases were being tested and confirmed, but by the end, every passenger was being tested regardless of symptoms. Without accounting for this change, it may appear that there was more transmission after the quarantine started than prior to lockdown. In this blog post, we investigate how our understanding of the Diamond Princess outbreak changes when we account for the change in surveillance.

Based on the publicly reported surveillance data, we estimated the daily reproduction number on the cruise, i.e. the number of people that each infected person was infecting each day. A reproduction number above 1 indicates that the virus is spreading, while below 1 indicates that the virus is dying out. We observe how estimates change when using the raw data and when we account for the change in surveillance.

Data overview

The Diamond Princess data consists of a dates and the cumulative tests, positive tests, and “asymptomatic” cases for those dates. However, an “asymptomatic” case may merely be a case that has yet to develop symptoms, so that term should not be taken literally. Asymptomatic cases are not formally counted until Feb 15, at which time it was noted that there were 35 previously discovered asymptomatic cases, which we assigned to Feb 13. This results in more symptomatic cases at the beginning and more asymptomatic cases at the end.

Here is a summary of the testing protocol outlined in two field documents by NIID (one and two):

Using all cases

We use the EpiEstim package in R for our analysis, which requires dates, number of infections per date, and an estimate of the serial interval. The serial interval is the time between the infections in a transmission chain, e.g. if I infect you with a disease, then it’s the time between my infection and your infection. This is frequently estimated by measuring the time between symptom onset times. For this analysis, we use two serial interval estimates:

  1. The estimate from SARS (mean: 8.4 days, sd: 3.8)
  2. An early estimate for COVID-19 (mean: 4.7, sd: 2.9)

We’ll start out by fitting a model to all of the raw data as reported:

When using the SARS serial interval estimate, there is a large spike in the reproduction number right after the quarantine is implemented. A reproduction number of 64 would certainly qualify as a super-spreading event, but there are reasons to be skeptical of using the SARS serial interval. The serial interval is impacted by the contact rate; if a person infrequently interacts with others, the time between their symptom onset and their contacts will likely be longer. On a cruise with swimming pools, buffets, and a casino, contact rates may be higher and thus the serial interval would be shorter. We’ll think of the SARS estimate as a worst-case scenario.

With the COVID-19 estimate, the reproduction number falls below 1 three days after the implementation of the quarantine (Feb 8), however this may be due to the fact that there was very little testing done on Feb 8-9. The reproduction number rebounded to 6 on Feb 10 before settling in between 2-3 until falling below 2 on Feb 18.

However, this analysis probably biases the reproduction number high. The serial interval is often estimated as the time between the symptom onset of a primary case and the symptom onset of a secondary case. Since these “asymptomatic” cases could be symptomatic cases that have not presented symptoms yet, we are violating that definition. By including more cases, we are inflating the amount of transmission and thus reproduction number would be biased high.

This also doesn’t account for the delays between symptom onset and confirmation, which changed when the protocol changed. All of the data appear to be confirmation dates, rather than the dates of symptom onset. From what we can gather from the NIID reports and the surveillance data, there is a 1-3 day lag for cases reported prior to Feb 11. The protocol then changes to testing by age group, starting with the eldest and febrile first. Therefore, it is possible that a younger person could be symptomatic for a week prior to testing.

Adjusting for changes in surveillance

In this analysis, we remove the asymptomatic cases and adjust the symptomatic cases. We assume that the first case was reported at the symptom onset time. Cases confirmed before Feb 11 are randomly assigned to have a symptom onset date 1-3 days prior to confirmation. Cases confirmed after Feb 11 are randomly assigned to have a symptom onset date between Feb 10 and the day prior to confirmation.

The faded lines and bars show where there is missing asymptomatic data, which would affect those estimates.

Under these assumptions, we see that the peak reproduction number (and thus transmission) may have occurred prior to quarantine and that the quarantine may have helped in reducing the spread of COVID-19. However, even without the asymptomatic cases, the reproduction number doesn’t fall below 1 until Feb 12 at the earliest.

As more information becomes available we will update this analysis.