Data-Driven Journalism Delivers … Mostly

April 25, 2017

SEJournal Online Reporter's Toolbox banner

Data journalism holds out the seductive promise of making journalism more evidence-based and scientific, writes our author. Sometimes the goal is achieved, sometimes not. Photo: Mike Tigas, Flickr Creative Commons.

ToolBox: Data-Driven Journalism Delivers … Mostly

By Joseph A. Davis

Data-driven journalism has been a buzzword and a craze for the past five years or so, although it goes back much further. While it is not a silver bullet, journalists can expand their skills by learning more about it.

Data journalism is a very broad term for a set of tools (or a set of toolboxes). It is just one among many toolkits that can help you as a journalist. But it is not for every journalist and not for every story.

Even while much of today’s news media trend away from fact and evidence toward opinion and feeling, data journalism holds out the seductive promise of making journalism more evidence-based and scientific. Sometimes the promise is illusory. Often it delivers.

The environmental beat is the Saudi Arabia of data. The rich and large pool of data resources that federal and state agencies collect and make available could be exploited almost endlessly. Many of these can help environmental reporters do better stories. But poorly conceived data journalism stories can also be disasters.

The bad news is that many data-driven stories can be hard and take specialized skills. The good news is that new tools and more abundant data are making data journalism stories easier and faster all the time.

The job of working with public records, such as these police files, can be tedious and time-consuming. Computers offer many tools for making it easier. Learn about some other important uses of data-driven journalism. Photo: Mike Tigas, Flickr Creative Commons.

Old-school editors and producers may know that data journalism stories take time and resources, money and teamwork, things they may not have much of as they grind out daily news. Sometimes the great stories that result are worth it.

PCs, net put data journalism on new level

Data journalism is not new, actually. Think of Woodward and Bernstein poring over Library of Congress circulation cards in “All the President’s Men.”

But it got better when the advent of the PC put computing power within the hands of ordinary people in the late 1970s and early 1980s. And it got even more powerful with the widespread use of the internet and world wide web in the 1980s and 1990s.

Historically, one of the foundational connections between public data and public-interest reporting (at least on the environmental beat) was the establishment of the Toxics Release Inventory in the late 1980s (the era of 1200 bps telephone modems).

At first, it was called “computer-assisted reporting,” or CAR. It was the province of some geeky individuals who communicated with each other on “listserves.” They still do (see NICAR-L). The National Institute of Computer-Assisted Reporting had found a home at Investigative Reporters and Editors (IRE) by the 1990s.

The field expanded a lot in subsequent decades. The rise of “geographical information systems,” or GIS, and affordable/accessible mapping software and formats amounted to a further revolution. Environmental information wants to be shown on maps.

As software and web publishing developed, “data visualization” became an important branch of data journalism — one depending as much on the arts of design and communication as on the science of data.

In recent decades, to some extent, federal and state governments have begun putting more of their data in accessible formats and putting it online. It has been a revolution of sorts, heralded perhaps by enactment of the “E-FOIA” amendments to the Freedom of Information Act in 1996, which require that when data exists in electronic format that it be given to the requester that way.

There followed a so-called “open data” movement during the Obama administration urging government to put even more data online more transparently. This is still going on, but has switched to a more defensive stance under the Trump administration, i.e., a campaign by academics to preserve data before President Donald Trump removes it from the web.

Don’t oversell your data-driven reporting

Much of the environmental data available goes largely unused by journalists. This is both bad news and good. Good news because unrealized opportunities abound. Bad news because unused databases tend to get defunded, shut down or taken offline.


Data journalism will only get you

a few steps closer to the truth,

but not all the way.


Although many amateurs think data offers cyborg-like strength and infallibility, the truth is that it doesn’t. Data are collected from humans by humans and can be full of human mistakes.

One of the worst things a journalist can do is oversell the reliability and significance of data. People who dislike your story will immediately claim the methodology is bad and the facts are wrong. Your editor will act astonished, believe them and blame you.

Data journalism will only get you a few steps closer to the truth, but not all the way. Often those steps are crucial. You will need most of the other tools in a journalist’s arsenal to go the full distance — curiosity, skepticism, critical thinking, scientific reasoning, fact-checking, ground-truthing, interviews, telephones, reference libraries, public records, onsite visits, long meetings, expert opinions, clear writing and more.

But data-driven journalism is very good for certain things. Actually, there are several different kinds of data journalism, each good for a different kind of thing. But let’s list some main uses of data-driven journalism:

  • Finding a needle in a haystack. That is, searching. Searching with intelligence to conquer the vastness of the data with computer power.
  • Data “dating service.” Matching things that become news when matched. For example, political contributions and government action.
  • Finding patterns and associations. Do all the Congress members voting for coal live in the Appalachians and the Rockies? Do African-American kids have more lead poisoning?
  • Finding significance. Association does not prove causality. But data, combined with the discipline of statistics, can help us understand whether and when it does.
  • Finding trends. In some subject areas, you can draw a line through data points and get a trend. Temperatures are an example.
  • Polling public opinion. News organizations do a lot of public opinion polling (possibly too much?). Beneath it all is data, and a statistical understanding of how likely a sample is to represent the whole population.
  • Predicting. Prediction is a hazardous enterprise, yet news media love it. Data is the raw material from which many predictions are made. It gets very technical.
  • Automation. The job of working with public records and documents can be tedious and time-consuming. Computers offer many tools for making it easier.
  • Display and mapping. Today, much of what we see in the news arena is likely to be on a screen. Data is the raw material of many good maps and graphics.

Another way to look at data-driven journalism is as a specialized set of skills. You may not need to have all of them, but you may need to find and work with people who do. These include fluency with database and spreadsheet software, the mathematical discipline of statistics, ability to use several computer programming languages, fluency with GIS software and more.

Environmental data journalism, in practice

It’s easier to see the power of data in environmental journalism if you look at some great examples from recent years. Here are some of the best:

  • Hazardous industrial facilities. A team at the Houston Chronicle did a high-impact series on hazardous chemical facilities in the area (one of the densest concentrations of petrochemical plants in the country). The “Chemical Breakdown” series by Mark Colette and Matt Dempsey,  which ran in mid-2016, used an obscure but powerful database about legally required “Risk Management Plans.” The results offered no reassurance that a catastrophe like the West, Texas, ammonium nitrate explosion in 2013 would not happen again.
  • Lead poisoning of children. The lead-tainted drinking water failures at Flint, Mich., which became a national story in 2015 and 2016, turn out to be typical of many other cities with aging water systems across the country. While some who follow drinking water knew this, Reuters used data journalism to spell it out in a series that ran in late 2016. Drinking water is not the only source of lead that poisons children. The paint in aging buildings is another. Reuters used neighborhood-level data on the prevalence of childhood lead poisoning in test results. Reuters reporters M.B. Pell and Joshua Schneyer painstakingly collected the data from state health departments and the U.S. Centers for Disease Control, often with public records requests.
  • Oil train hazards to communities. The fracking boom and tar sands produced more crude oil than existing pipelines could carry to market. So huge tanker trains began carrying more and more of it — despite the 2013 Lac-Mégantic rail disaster in Canada that killed more than 40 people. There was inadequate safety and transparency on oil trains until former President Barack Obama’s Transportation Department pushed the railroads. But because of railroad reluctance, many communities remained ignorant of hazardous trains rolling through their midst. Investigative work in 2014 by the Associated Press and ProPublica pried much of the data loose and mapped it so that people could see the threats they faced.
  • Inspections of oil/gas wells on federal land. Oil and gas drilling has occurred on federal lands for a long time. Even before Obama issued fracking rules in March 2015, federal land managers were supposed to inspect these wells to prevent pollution and other risks. Following up on a Government Accountability Office report in 2014, the Associated Press got data about federal well inspections, and filed a story noting that four out of ten high-risk wells on federal land were not being inspected.
  • Fracking chemical secrecy. Drillers inject a brew of chemicals into oil and gas wells for hydraulic fracturing of shale formations — and some of those chemicals can potentially contaminate drinking water aquifers. But the “Halliburton loophole” Congress passed in 2005 exempts frackers from Safe Drinking Water Act requirements that they disclose what they are pumping underground. Industry-sponsored databases leave out “trade secrets” in reporting toxic chemicals. EnergyWire’s Mike Soraghan crunched the data in a 2012 investigation that showed at least two-thirds of fracking fluid disclosures made by drilling companies omitted at least one ingredient, claiming trade secrecy. A Bloomberg team did a similar project the same year.
  • EPA “poisoned places” watch list. Sometimes, just having the list is enough. For years, U.S. Environmental Protection Agency maintained a secret “watch list” of the worst Clean Air Act violators. In 2014 a crack investigative team at the Center for Public Integrity got the list by means of a FOIA request. Collaborating with NPR journalists, Jim Morris, Chris Hamby and Liz Lucas combed through the list, comparing it with EPA’s more public list of “high priority violators.” They did not do fancy data manipulations — they just used the list to walk through the many case studies of hard-core polluters and explain the many reasons why EPA and the states were doing a poor job of enforcement. It was part of their ongoing “Poisoned Places” project. An embarrassed EPA ended use of the watch list that year. But the lists (for water and waste as well as air) are still archived on the EPA site.
  • Contaminants in drinking water. The 47,500 drinking water systems in the U.S., overseen by state agencies and the EPA, have been trying to improve the safety of public drinking water since before the Safe Drinking Water Act of 1974 was enacted. Yet for many years, news media reported on the issue only rarely. In 2009-10, Charles Duhigg of the New York Times did a sweeping series on “Toxic Waters,” which included drinking water. Duhigg collected contaminant records for all 47,500 water systems — and made it available to readers in an interactive format allowing them to look up their own water.
  • Air pollution near schools. In December 2008, at the very end of the Bush years, USA TODAY did a very ambitious nationwide project based on the geographic connections between toxic air pollution and the nation’s schools. It was called “The Smokestack Effect.” It used a database that geolocated almost all of the nation’s schools (127,800 of them), along with another tool based on EPA’s Toxics Release Inventory. The so-called Risk-Screening Environmental Indicators model geographically integrates the releases of many chemicals, their environmental fate and potential human exposure. USA TODAY’s team, which included dozens of people, ground-truthed the information by doing its own measurements near schools. And then the news organization put it in an interactive form that allowed readers all over the United States to look up their own schools.
  • Computer models predicting risk. Three years before Hurricane Katrina hit, New Orleans Times-Picayune reporters Mark Schleifstein and John McQuaid did a prophetic 2002 series, “Washing Away,” which anticipated what could happen if a category 5 hurricane hit the city dead-on. Computer reporting played only a background role in this journalistic tour-de-force. The reporters used a computer model that simulated the sloshing of coastal waters in a storm surge as they might interact with the city’s complex system of levees. They also used statistical concepts to explain the likelihood of a “big one.” Their discussion of vulnerable populations prompted municipal authorities to do evacuation and emergency planning which probably saved tens of thousands of lives.
  • Natural gas pipeline hazards to communities. A terrible gas pipeline rupture and explosion in New Mexico in 2000 killed 10 campers. Then in July 2001, an investigative series in the Austin American-Statesman by Ralph Haurwitz and Jeff Nesmith put that event in a national context of pipeline company neglect and lax federal regulation. They used data from the Office of Pipeline Safety. Following the terrorist attacks of 9/11 that same year, the government shut down public access to pipeline routing and safety information. But their work arguably prompted Congress to tighten pipeline safety law in 2002. Without much data, there was very little journalism watchdogging pipelines for almost a decade — until the San Bruno, Calif., pipeline explosion in 2010 that killed eight.

Some resources

SEJ itself hosts many data resources. Here’s a list of recent writeups from SEJournal’s WatchDog, TipSheet and Toolbox columns that offer guides to data resources.

Reading and other resources

Some useful groups and sites

  • National Institute of Computer-Assisted Reporting (NICAR). NICAR is a branch of Investigative Reporters and Editors. It has a data library which makes important databases available conveniently at low cost. NICAR’s training programs are great.
  • Whatever its failings in openness, the Obama administration made strides in making government data more accessible online. The website is a hub for finding online data from almost all agencies.
  • Data Driven Journalism. DDJ is too good and important a thing to keep just for the United States. The site/organization named “Data Driven Journalism” is an initiative of the European Journalism Centre. They have loads of training and networking resources. Plus, check out the hashtag #DDJ.
  • Tow Center for Digital Journalism. Another way computers have changed journalism is by revolutionizing publishing platforms. This is not pure data journalism per se, but the connections are manifold and important. The Tow Center at Columbia Journalism School is a good place to get an overview.

Joseph A. Davis is editor of SEJournal’s WatchDog Tipsheet, and writer of the weekly TipSheet and monthly Issue Backgrounders.

* From the weekly news magazine SEJournal Online, Vol. 2, No. 17. Content from each new issue of SEJournal Online is available to the public via the SEJournal Online main pageSubscribe to the e-newsletter here.  And see past issues of the SEJournal archived here.


SEJ Publication Types: