Interdependencies In Federal Statistics
Yes, the headline above this post would have to be in the running for “least likely to attract readers.” And yes, I still mourn the decision, made for budgetary reasons 15 years ago, for the federal government to stop publishing an annual Statistical Abstract of the United States, which had been around for 130 years. But my point is that when talking about federal statistics, it’s easy to focus on specific big picture numbers like unemployment rates and inflation. It’s conversely easy to undervalue the extent to which the value of federal statistical agencies is created by their ability to reach out to a wide array of data sources in systematic way, so that it is possible to compare the combined results over time.
The American Statistical Association has published “The Nation’s Data at Risk: 2025 Report” (December 2025). It’s full of facts about the federal statistical agencies, including their modest budgets that have been getting cut in real terms for more than decade, and the value of their output. Here, I’ll just focus on their role in pulling together data from a variety of sources.
For a basic example, consider a report called Science and Engineering Indicators, which is published every two years. It’s pulled together by a branch of the National Science Foundation called the National Center for Science and Engineering Statistics. For example, the 2024 report is available here. As the ASA report notes: “S&E Indicators is widely used and cited across the public and private sectors and viewed as an important input to the measure of U.S. economic competitiveness.” If you care about these issues, it’s a basic resource.
But of course, the underlying data on science and engineering doesn’t just grow on trees, waiting to be picked. Instead, the underlying data comes from an array of government and private sources, which need to be culled and compiled. The ASA report notes:
For the 2024 cycle of S&E Indicators, there were seven indicator areas: K-12 education; higher education; science, technology, engineering, and mathematics (STEM) labor force; research and development (R&D); industry activities; innovation; and public attitudes towards S&E. Figure 2.2 illustrates the dependence of each indicator area, on the right side, on various data providers, on the left side: specific statistical agencies and the broader categories of international data providers, other (non-statistical agency) federal data providers, and private-sector data providers. In this Sankey diagram, the widths of flows linking providers and indicators are proportional to the number of times a provider’s datasets are used, which is the first number in parentheses in the figure’s labels for both providers and indicators. The second number in parentheses is the number of unique datasets for each provider and indicator. For example, 9 NCES datasets are used a total of 65 times in the 2024 cycle. For the K-12 indicator area, three NCES datasets are used 22 times. The [federal” statistical agency nodes and flows are in blue.

Here’s another example closer to my core interests in economics: the data on “personal income,” which is a key part (about three-quarters) of measuring gross domestic product. In the diagram, the list of categories down the right-hand-side show the components of personal income. The list of sources on the left-hand-side show where the data comes from. The top five (in blue) are federal statistical agencies, but obviously, much of the data is generated from other parts of the government and from nongovernment sources as well.

Looking ahead to the future of federal statistics, this capability to reach out to a wide array of data sources is only going to becom more important. For many decades, a number of key government statistics have relied on results from household survey, but the accuracy of these surveys was always disputable and the response rate has been dropping. The statistical agencies (and economic researachers) have responded by trying to shift toward “administrative” data–that is, data already generated for other purposes. For example, firms already need to submit wage data to states for the administration of unemployment insurance programs, and that data on wages is surely more accurate and complete than self-reported survey data on what people earn. As another example, it may be possible to scrape websites for data on prices in a way that allows calculations of inflation to be made more rapidly and accurately. Research projects on these alternative forms of measurement is ongoing.
Ultimately, the importance of federal statistics is whether you want to be able to evaluate past, present, and future policies based on consistent and regularly collected data, or whether you prefer government to be based on whatever charismatic anedotes bubble to the top of social media.
More By This Author:
Snapshots Of The U.S. Income DistributionUS-China Competition For AI Markets
AI And Jobs: Interview With David Autor
Disclosure: None.