Use Case 24: Mapping of NPI number to PubMed ID

From Demand-Driven Open Data for HHS
Jump to: navigation, search


Use case summary


  • In order to relate the academic publications of a physician or other healthcare provider to all the other metadata available through other HHS offerings, it is necessary to have a convenient way to map the Pubmed ID (PMID) number of an article or other NLM author ID back to the physician. This information will help give broader insight into the publications and types of subject areas that a doctor or healthcare provider is involved in.


  • Value to customer: W2O Group (along with just about anyone else who is interested in ranking doctors) would use this dataset to combine with existing metadata on each physician to determine research thought leadership, and to determine which other physicians may be research collaborators
  • Value to industry/public: Both industry and public ventures would benefit from increased transparency into what research papers are published by each physician, who they are collaborating with, and where their research funding is coming from (visible in the articles linked to in each PMID). Most importantly, the huge numbers of people who try to help the public understand who the thought leaders are would be able to leverage PubMed as a resource. Essentially, this would be the basis for multiple "second opinion engines" that would provide substantial value to the public.


  • W2O Group: Communications consulting to healthcare (including hospital systems, insurers, pharma/biotech, medical device, health IT, consumer health)

Current data and limitation

  • Data source: PubMed
    • PubMed indexes citations for biomedical literature from MEDLINE, life sciences journals, and online books
    • Authors can be searched by name, however there many be many variations of an author's name depending on how the author was listed on a particular publication.
    • Publications indexed in PubMed are all assigned a PubMed ID (PMID).


Develop a database that lists a physician or healthcare provider NPI number, and correlates that with a comma delimited list of Pubmed IDs. This list should be generated dynamically from Pubmed, to provide the most up to date information possible.

  • Fields:
  • Pubmed ID
  • NLM author ID
  • NPI
  • Date generated
  • _ semantic categorization of subject area _
  • Update frequency: Monthly
  • Format: CSV download


Short term workaround

  • Note that PubMed does not assign author IDs. While there had been historical interest at that National Library of Medicine in developing author IDs, NLM abandoned the idea after third-party movements to establish author identifiers (also commonly referred to as researcher IDs) took off (see Instead, NLM will accept author identifiers from these third-party sources when supplied by the publishers with the citation data. Name disambiguation is a well-known issue in bibliometrics. PubMed disambiguates authors using the Computed Author display sort:
  • Author Identifiers (aka researcher IDs): There are a couple of efforts to create researcher IDs
    • ORCID ( ORCID provides a persistent digital identifier to distinguish a researcher through the grant submission and publication process. It is free for researchers to register for an ORCID. Many journals are starting to require that authors submit their ORCID with paper submissions. While NIH is linking ORCID with the new SciENcv (, it is not yet requiring grantees have an ORCID.
    • ResearcherID (Thomson Reuters, ResearcherID also assigns a unique identifier to each researcher to manage their publication list and track other bibliometrics such as citation counts and h-index. ResearcherID integrates with Web of Science and is ORCID compliant
  • There is no existing database that links NPI to PMID. The strategy to obtain a list of publications for a physician would be as follows:
    • Pull the NPI of the physician of interest
    • Run a PubMed author search of the physician of interest and pull back PMIDs of the citation hits. The citation hit list is exportable as a PMID list (text file) by going to "Send to" in the upper right corner and selecting "File" -> "Format: PMID List"
  • The primary caveat to this strategy is the issue of name disambiguation (for both the NPI and the PubMed search); on the publication side, this will best be addressed by the movement to establish author identifiers. The best incentive for researchers to register for an author identifier is likely provided by journal publishers, which can require authors to provide an ORCID or equivalent author identifier in order to publish.

Long term solution

  • Ideally, HHS would establish a database as described, linking NPI and ORCID or other author identifiers. However, it is unclear which HHS agency would have the incentive to take this on. While PubMed falls under the purview of NIH, physicians are only a subset of the authors indexed in PubMed, so establishing a database of linked NPI and ORCID is likely to be of limited interest.
  • This is likely an effort that would be best pursued by third-parties, particularly since all the data required to undertake the effort is publicly available, though there are known issues with name disambiguation in both the NPI and PubMed databases, as described above.