Use Case 10: Historical archive of disease outbreaks
From Demand-Driven Open Data for HHS
Use case summary
- Title: Historical archive of disease outbreaks
- Work item: https://github.com/demand-driven-open-data/ddod-intake/issues/10
- Status: Closed. While the CDC only provides historical information through 2014, information back through 1888 is available via Project Tycho https://www.tycho.pitt.edu/. See the Solution section for more information.
- Each week, CDC posts information on communicable diseases in the Morbidity and Mortality Weekly Report (MMWR) (an example: http://www.cdc.gov/mmwr/pdf/wk/mm6410md.pdf). The data is also available on data.gov (example: https://data.cdc.gov/NNDSS/NNDSS-Table-II-Mumps-to-Rabies-animal/d69q-iyrb). But the data is not historical, except for a couple recent years. It would be good to create a historical archive for each disease, perhaps as a separate table, updated weekly? monthly?
- Value to industry/public: A historical archive of information related to outbreaks of communicable diseases would be a rich data source for those tracking outbreaks of whooping cough, syphilis, West Nile, or other communicable diseases.
Current data and limitation
- Data source: CDC Data Catalog, MMWR tables
- Limitations: History limited to 2014
Short term workaround
- Project Tycho from the University of Pittsburgh has digitized data on notifiable disease from the weekly MMWR back through 1888 and has made it available publicly
- The data is freely available, but account registration is required
- Full methodology available in this article http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4175560/?report=classic, but in summary:
- Digitized all data available in tabular format that listed etiologically defined cases or deaths by week for locations in the US
- Extracted and sanitized all data of reported counts (weekly tallies) of cases or deaths and the reporting locations, periods, and diseases