Use Case 31: Explaining utilization with demographic data

From Demand-Driven Open Data for HHS
Jump to: navigation, search

Use Case

Use case summary


Warning: This topic is more broad than typical use case and may need to be broken into multiple use cases
  • The goal is to see how much health care utilization can be explained by demography. CMS has done a very good job preparing spreadsheets with listing a range of quality indicators broken down by year and hospital referral region (tertiary hospital catchment area). New insights could be gained by joining this data set to American Community Survey (Census) data for the same areas. It’s thought, for example, that people that are illiterate or whose language isn’t English are less likely to take their medications as they should, which in turn causes worse outcomes. Census tracks that, so if we, say, ran a regression on lower extremity amputation rates (a proxy for serious unmanaged diabetes) as explained by primary language, ethnicity, time in country, etc. it might explain some of the differences.
  • There's value in creating a dashboard based solely on the CMS Medicare under 65 geographic public use files. These are people who are medically indigent although not necessarily poor or on Medicaid. CMS has real datasets based on AHRQ-approved quality indicators. Use shapefiles and crosswalks from Dartmouth Atlas to produce choropleth maps showing utilization by indicator by county or HRR.


  • Value to customer: Social determinants of health
  • Value to industry/public: Understanding social determinants of health has high potential return on investment and avoid unnecessary medical costs.


  • Optum

Current data sources and limitations

CMS Geographic Variation PUF

  • Data source: Geographic Variation Public Use File
  • How it's used: has quality indicators broken down by year and hospital referral region
  • Limitations:
    • Provide file versions with short field names without spaces that could easily be imported into analytics tools
    • Don't break data into separate Excel sheets for each year. (Although PowerPivot can read this format.)
    • Identifying dual eligibles: There’s just a field "Percent Eligible for Medicaid" saying what percentage of the people on the row are dual-eligible. Should would be nice if there were two rows, one Medicare only and one both. Dual eligible are an important topic, yet it's hard to find clear datasets with it.

Census ACS & Dartmouth Atlas

  • Data source:
  • How it's used: Joining CMS to Census data
  • Limitations:
    • The CMS data is listed by Hospital Referral Region, but Census data is by zip code or census tract number. The Dartmouth Atlas project has another spreadsheet that crosswalks zip codes to hospital referral region. Data for several Census zip codes should be combined to form a hospital referral region (there are many zip codes in a HRR). It would be a lot simpler if CMS reported the data by zip code in the first place. It would make the analysis more accurate because it would enable drill down to a lower level. A single hospital referral region has the population of a small city; the average zip code is 8,000 people.
    • Adding shapefiles: In order to map the hospital referral regions, we need to have the geometries of those regions. Dartmouth makes those freely available.

AHRQ Quality Indicators


  • Fields:
    • Field names are in "documentation" tab of Geographic Variation Public Use File excel files
  • Update frequency: _______
  • Joins between datasets: _______
  • Lag time: _______
  • History: _______
  • Delivery mechanism: _______


Short term workaround

  • ___

Long term solution

  • ___