about summary refs log tree commit diff
path: root/users/tazjin/covid
diff options
context:
space:
mode:
authorVincent Ambo <mail@tazj.in>2020-11-27T02·21+0100
committertazjin <mail@tazj.in>2020-11-27T02·24+0000
commitde44fdf92a65c25f2aba6a445287a4572a8837cc (patch)
tree7d4878789f17f3093ef8a14e451a10ee2ef6d006 /users/tazjin/covid
parentf2f3c4228d027a8ac47f9a70b3b99f57e4225c3e (diff)
feat(tazjin/covid): Add jq script for wrangling CDC data r/1923
Change-Id: Ia23f528d28126d6a2d0dd2d7327ec686c7e90df4
Reviewed-on: https://cl.tvl.fyi/c/depot/+/2158
Reviewed-by: tazjin <mail@tazj.in>
Tested-by: BuildkiteCI
Diffstat (limited to 'users/tazjin/covid')
-rw-r--r--users/tazjin/covid/us_mortality.jq36
1 files changed, 36 insertions, 0 deletions
diff --git a/users/tazjin/covid/us_mortality.jq b/users/tazjin/covid/us_mortality.jq
new file mode 100644
index 000000000000..584be3ef9afe
--- /dev/null
+++ b/users/tazjin/covid/us_mortality.jq
@@ -0,0 +1,36 @@
+# This turns the CDC mortality data[0] into a format useful for my
+# excess mortality spreadsheet. The US format is by far the worst one
+# I have dealt with, as expected.
+#
+# This requires miller for transforming the CSV appropriately.
+#
+# Params:
+#  state: abbreviation of the state to extract ('US' for whole country)
+#  period: time period (either "2020" for current data, or anything else
+#          for historical averages)
+#
+# Call as:
+#  mlr --icsv --ojson cat weekly.csv | \
+#    jq -rsf us_mortality.jq --arg state US --arg period 2020
+#
+# [0]: https://www.cdc.gov/nchs/nvss/vsrr/covid19/excess_deaths.htm
+
+def filter_period(period):
+  if period == "2020"
+  then . | map(select(.["Time Period"] == 2020))
+  else . | map(select(.["Time Period"] == "2015-2019"))
+  end;
+
+def collate_weeks(period):
+  (. | map(.["Number of Deaths"]) | add) as $count
+  | {
+    count: (if period == "2020" then $count else $count / 5 end),
+    week: .[0].Week,
+  };
+
+. | map(select(.Type == "Predicted (weighted)"))
+  | map(select(.["State Abbreviation"] == $state))
+  | filter_period($period)
+  | group_by(.Week)
+  | map(collate_weeks($period))
+  | .[] | "week \(.week): \(.count)"