From time to time we’ll highlight a data set on karmadata. Today I’ll provide a quick look at the NIH RePORTER grants database.
The RePORTER database (which replaced the old CRISP database) “provides access to reports, data, and analyses of NIH research activities, including information on NIH expenditures and the results of NIH supported research.” In other words, we get to see our tax dollars at work.
When looking at these data sets I’ll try to highlight what is great about the source data/website (I can’t just be complaining all the time), and then highlight the value that we’re able to add.
The data itself (provided in both csv and XML) contains the funding agency (NIH, NCI, etc), the organization receiving the grant, the location, the principal investigators running the study, a list of terms associated with the project, and the amount funded for the project. The RePORTER website has some pretty nice functionality for aggregating and ranking by those different entities. You can play around with that tool here. You can even map the data and drill down to view grants awarded to different states. Neat. The greatest limitation is probably the fact that you can only analyze the data one fiscal year at a time, but overall it’s a pretty nice presentation of the data.
The first thing I look for when I get my hands on a new dataset is the potential entities that we can standardize to. This was a fun dataset for me because of all the entities that can be teased out. In addition to the aforementioned entities, we were able to match the terms list to drugs and diseases. The RePORTER database also provides an ID for the principal investigators, but unfortunately, much like the reviewer ID from BMIS, it is not unique. We consolidate those entries. We consolidate different company names to resolve to a unique ID, and then we are ready to go: city, state, country, organization, principal investigator, drug, disease, and time. A robust database for both building our entity profiles and creating cool visualizations.
Some facts we have gleaned from the database:
- Johns Hopkins leads the way in NIH funding since FY2000 (with more than $7.5 billion)
- NIH funding increased steadily from 2000 until peaking in 2010 at $38 billion
- Boston leads the way in funding over that time (score one for Boston in the Boston-New York rivalry)
- NIH funding was not limited to the United States. $5.4 billion were funded outside the US since 2000, with South Africa leading the way
That should give you a flavor for what you can do with the dataset. Try copying one of my datacards and discovering your own insights.