Lags in Data Reporting

By November 2, 2020Commentary

I enjoy data analysis but I am not an expert in using Excel or the stats package R, so a lot of what I do, I do on my phone with raw calculations, so sometimes it takes me longer than I would like to finish analyses I think might be interesting.  And while the data to me is the most useful aspect of the epidemic to examine, the research and the public response are also important so I end up spending a lot of time reading abstracts across many websites and then reading full papers of what I cull from the abstracts.  Anyway, I hope this week to catch up on several data projects I have had underway and this is the first of them.  And it took several hours to track all this data through daily table changes.

I have several times mentioned concerns about how Minnesota’s data is reported, as it is often lagged, and the public won’t understand that.  I have been tracking on a daily basis changes in data reported.  Four important categories of data are tests, cases, hospitalizations and deaths.

Testing is reported as tests actually done on a day.  There is a lag, as labs report with different speeds and some are quite late and we get occasional dumps of very large test numbers.  As with all these categories, I looked at the testing table every day for an extended time, in this case from September 30.  (I am only looking at PCR testing to be consistent, as antigen testing was only recently added.)   And as with most of the categories, the number of tests said to have been done on a certain day, changes with every new table on a reporting day.  There are a lot of tests being done.  Sometime in September there appears to have been a relatively large revision to tests from much earlier, in August or July, because the cumulative test number changes.  (By the way, well over a third of all Minnesotans have now been tested at least once.)   Fortunately on a percent basis not huge changes, but not negligible numbers.  In terms of how long it takes for a day’s tests to be relatively complete, looking just at the pulls from the week ending October 31, for most days, the tests appear to be around 80% or more reported within one day, and very close to 100% reported with two to three days.  Every day has a few tests added or subtracted to it in subsequent days’ tables, but really minor changes for the most part.  I suspect that most labs are doing electronic reporting to the state so the lag is low, and then there are periodic small revisions with occasional large ones due to a lab screwup.

Cases are given both on the number reported on a day, and then there is a table giving results by specimen collection date.  I looked at changes to the cases by specimen date table every day starting in early August.  But I am just going to focus on the last month of tables and dates of report going back to September 1.  As with tests, there are regular changes to the numbers for any given date in succeeding days’ tables.  So if I took the October 2nd table, for example, and compared it to the October 1st one, and starting examining cases with specimen collection dates of July 1 or later, the earliest date with any change is July 18, and that is just one case, with the next change on August 26, two cases added, and August 27, one case added, then September 3, one case.  Beginning with September 14, there is a change to almost every day’s total, but these are generally less than 1% until about a week before the date of the table.  The biggest additions in that date’s table occur in the three to five days before the table is published.  If I jump ahead to my last pull, the October 31 table, it is interesting that for lots of days in earlier August even, 1, 2 or 3 cases are being added, but nothing significant.  In mid to late September a few more cases are being added but still not significant.  Now if I look at the last ten days to two weeks of tables, so pretty current, what I observe is that cases for specimens collected on a certain date are generally fairly complete within a week.  So, for example, on the October 19 table, October 18, the last day for which cases are reported, has 10, but by the table on the 20th, there were 122, by the 21st, there were 305, by the 22nd, 456, by the 23rd, 590, by the 24th, 653, the 25th, 663, the 26th, 663, the 27th, 666, and on the table for 10/31, it was 674.  That is a pretty typical progression.

Hospitalizations has been changed in recent weeks to be admissions on a day versus daily census.  There appears to be a slight lag in this reporting.  The state has a table of hospitalizations by date of admission.  I looked at that table every day since September 29, looking at hospital admissions reported from September 1 on.  There is a lag on reporting of hospital admissions.    And unfortunately, it can be pretty lengthy and odd.  For example, comparing the October 1 table with the September 30 table, 55 new admissions were added.  About 80% were in the prior seven days, but a hospital admit was added for September 6, September 12 and September 17.  On the next day’s table, however, October 2, 36 admissions were added, and all were within the prior seven days, with the earliest being September 25.  The next day, October 3rd, the table actually has a reduction of one admission on each of September 2 and September 12.  The one addition on each of September 14 and 17.  Net change was 55 admissions, with over 90% going to days in the prior week.  Then on October 4, the prior day’s addition to September 14 was reversed.  40 net new admissions on the table on October 4, all of which, other than the reversal, were in the prior 5 days.  The same pattern persists, with odd additions or removals many days or weeks later, some 6 weeks older or more.  I have no idea why admissions get removed.  And when you see the net changes on a day from date to date of report, you realize that could reflect some additions and some deletions.  Anyway, if you roll forward through each day with the table being updated, you see a similar pattern.  There also is some lumpiness in number of admits added to the table from day to day, with a weekend effect apparent.  Without doing a sophisticated “completion” factor analysis, it is apparent that the number of admissions on a specific date are pretty complete within a week, but there will be weird changes up and down going on for some time.  And as with cases announced on a day, new admits announced on a day are actually spread over several preceding days, and the eventual daily actual admits looks smoother than those announced on a day.

Deaths are reported in Minnesota in a fairly straightforward and very misleading manner.  Minnesota gives us only the date of report and has a table of deaths by that date of report.  I have looked at those numbers reported every day since September 1, from the tables posted from October 13 through October 31.  There are no changes, as there shouldn’t be if you are using date of report.  I did find one death reported change in an earlier week in the summer when I was comparing the Minnesota and CDC reports.  While the data doesn’t change, it is misleading because most people think it is when the death happened and in fact the deaths being reported on a day, actually occurred from several days to weeks to months earlier.

So to summarize, the deaths reported on a day isn’t going to change, hospital admits for a day of admission you can be pretty comfortable are quite close to the ultimate number within a week later; cases you can be fairly confident in the final number within a week to ten days later; and testing, leaving aside big retroactive test result dumps, is very complete within two to three days.  Hope all this helps you interpret the data that gets dumped every day, and helps you understand what days that data may relate to.

Leave a comment