The Reproducibility Challenge With Economic Data

One basic standard of economic research is surely that someone else should be able to reproduce what you have done. They don't have to agree with what you've done. They may think your data is terrible and your methodology is worse. But as a minimal standard, they should be able to reproduce your result, so that the follow-up research can then be in a position to think about what might have been done differently or better.  This standard may seem obvious, but during the last 30 years or so, the methods for reproducibility have been transformed. 

Lars Vilhuber describes the shift in "Reproducibility and Replicability in Economics" in the Harvard Data Science Review (Fall 2020 issue, published December 21, 2020). Vilhuber is the Data Editor for the journals published by the American Economic Association (including the Journal of Economic Perspectives where I work as Managing Editor). Thus, he heads the group which oversees posting of data and code for new empirical results in AEA journals--including making sure that an outsider can use the data and code to reproduce the actual results reported in the paper. 

To jump to the bottom line, Vilhuber writes: "Still, after 30 years, the results of reproducibility studies consistently show problems with about a third of reproduction attempts, and the increasing share of restricted-access data in economic research requires new tools, procedures, and methods to enable greater visibility into the reproducibility of such studies."

It's worth noting that reproducibility has come a long way. Back in the 1980s and earlier, researchers who had completed a published empirical research paper. but then moved on to other topics, often did not keep their data or code--or if they did keep them, the data and code were often full of idiosyncratic formats and labelling that worked fine for the original researcher (or perhaps for the research assistants of the original researcher who did a lot of the spadework), but could be impenetrable to a would-be outside replicator.  By contrast, a fair share of modern economics research can post the actual data, computer code, documentation for what was done, and so on. In this situation, you may disagree with how the researcher chose to proceed, but you can at  least reproduce their result easily. 

However, here I want to emphasize that a lot of the difficulties with reproducibility arise because finding the actual data used in an economic study is not as easy as one might think. Non-economists often think of economic data in terms of publicly available data series like GDP, inflation, or unemployment, which anyone can look up on the internet. But economic research often goes well beyond these extremely well-known data sources. One big shift has been to the use of "administrative" data, which is a catch-all term to describe data that was not collected for research purposes, but instead developed for administrative reasons. Examples would include tax data from the Internal Revenue Service, data on earnings from the Social Security Administration, data on details of health care spending from Medicare and Medicaid, and education data on teachers and students collected by school districts. There is also private-sector administrative data about issues from financial markets to cell-phone data, credit card data, and "scanner" data generated by cash registers when you, say, buy groceries. 

1 2 3 4
View single page >> |
How did you like this article? Let us know so we can better customize your reading experience.


Leave a comment to automatically be entered into our contest to win a free Echo Show.