2 min read

Cholera in Victorian England

Reading History of Data Visualization following finding some wonderful analysis of Snow’s Cholera map in {Cholera} and {HistData} I saw Farr’s timeseries of Cholera deaths in England, 1849.

The blue and yellow timeseries will be familiar to anyone, anywhere from covid.

Cholera 1849 has special significance - it is likely to be one of few modern pandemics that was completely unmitigated.

data <- read_csv("https://raw.githubusercontent.com/jimr1603/1849-cholera/main/1849%20cholera%20deaths%20England.csv", na = c("-", "")) %>%
  rename("month" = 1) %>%
  rename("cause_of_death" = 3) %>%
  select(month, cause_of_death, `1`:`31`) %>%
  filter(!is.na(cause_of_death)) %>%
  mutate(month = if_else(is.na(month), lag(month), month)) %>%
  pivot_longer(`1`:`31`) %>%
  rename(day_of_month = name) %>%
  mutate(date = str_glue("1849-{month}-{day_of_month}")) %>%
  mutate(date = ymd(date)) %>%
  filter(!is.na(date)) %>%
  rename(deaths = value) %>%
  mutate(day_of_week = wday(date, label = TRUE, week_start = 1))

Straight away, we can replicate the graph:

data %>%
  ggplot(aes(x = date, y = deaths, colour = cause_of_death)) +
  geom_line() +
  ggthemes::scale_colour_few()

Idea: adjust Cholera deaths to account for Diarrh. deaths. Something like Jan-Apr represents a constant number of daily deaths, and daily deaths above that might be mis-labelled Cholera deaths.

I see the sharp peaks and troughs in the Cholera data, and wonder if there is a day-of-week bias in the data.

data %>%
  mutate(week = week(date)) %>%
  group_by(week, cause_of_death) %>%
  mutate(avg = mean(deaths)) %>%
  ungroup() %>%
  mutate(variance = deaths - avg) %>%
  group_by(day_of_week, cause_of_death) %>%
  summarise(variance = mean(variance)) %>%
  ggplot(aes(x = day_of_week, fill = cause_of_death, y = variance)) +
  geom_col(position = "dodge") +
  ggthemes::scale_fill_few()

There is, but it appears to be anti-Friday. I would have guessed anti-Sunday.

Finally, a rolling average reveals something closer to the differential SIR model:

data %>%
  arrange(date) %>%
  group_by(cause_of_death) %>%
  mutate(smoothed = zoo::rollmean(deaths, k = 20, fill = NA)) %>%
  ungroup() %>%
  ggplot(aes(x = date, y = smoothed, colour = cause_of_death, align = "center")) +
  geom_line() +
  ggthemes::scale_colour_few()

Something like this could be used to fit a SIR model to the data.