|Data journalism holds out the seductive promise of making journalism more evidence-based and scientific, writes our author. Sometimes the goal is achieved, sometimes not. Photo: Mike Tigas, Flickr Creative Commons.|
ToolBox: Data-Driven Journalism Delivers … Mostly
By Joseph A. Davis
Data-driven journalism has been a buzzword and a craze for the past five years or so, although it goes back much further. While it is not a silver bullet, journalists can expand their skills by learning more about it.
Data journalism is a very broad term for a set of tools (or a set of toolboxes). It is just one among many toolkits that can help you as a journalist. But it is not for every journalist and not for every story.
Even while much of today’s news media trend away from fact and evidence toward opinion and feeling, data journalism holds out the seductive promise of making journalism more evidence-based and scientific. Sometimes the promise is illusory. Often it delivers.
The environmental beat is the Saudi Arabia of data. The rich and large pool of data resources that federal and state agencies collect and make available could be exploited almost endlessly. Many of these can help environmental reporters do better stories. But poorly conceived data journalism stories can also be disasters.
The bad news is that many data-driven stories can be hard and take specialized skills. The good news is that new tools and more abundant data are making data journalism stories easier and faster all the time.
The job of working with public records, such as these police files, can be tedious and time-consuming. Computers offer many tools for making it easier. Learn about some other important uses of data-driven journalism. Photo: Mike Tigas, Flickr Creative Commons.
Old-school editors and producers may know that data journalism stories take time and resources, money and teamwork, things they may not have much of as they grind out daily news. Sometimes the great stories that result are worth it.
PCs, net put data journalism on new level
Data journalism is not new, actually. Think of Woodward and Bernstein poring over Library of Congress circulation cards in “All the President’s Men.”
But it got better when the advent of the PC put computing power within the hands of ordinary people in the late 1970s and early 1980s. And it got even more powerful with the widespread use of the internet and world wide web in the 1980s and 1990s.
Historically, one of the foundational connections between public data and public-interest reporting (at least on the environmental beat) was the establishment of the Toxics Release Inventory in the late 1980s (the era of 1200 bps telephone modems).
At first, it was called “computer-assisted reporting,” or CAR. It was the province of some geeky individuals who communicated with each other on “listserves.” They still do (see NICAR-L). The National Institute of Computer-Assisted Reporting had found a home at Investigative Reporters and Editors (IRE) by the 1990s.
The field expanded a lot in subsequent decades. The rise of “geographical information systems,” or GIS, and affordable/accessible mapping software and formats amounted to a further revolution. Environmental information wants to be shown on maps.
As software and web publishing developed, “data visualization” became an important branch of data journalism — one depending as much on the arts of design and communication as on the science of data.
In recent decades, to some extent, federal and state governments have begun putting more of their data in accessible formats and putting it online. It has been a revolution of sorts, heralded perhaps by enactment of the “E-FOIA” amendments to the Freedom of Information Act in 1996, which require that when data exists in electronic format that it be given to the requester that way.
There followed a so-called “open data” movement during the Obama administration urging government to put even more data online more transparently. This is still going on, but has switched to a more defensive stance under the Trump administration, i.e., a campaign by academics to preserve data before President Donald Trump removes it from the web.
Don’t oversell your data-driven reporting
Much of the environmental data available goes largely unused by journalists. This is both bad news and good. Good news because unrealized opportunities abound. Bad news because unused databases tend to get defunded, shut down or taken offline.
Data journalism will only get you
a few steps closer to the truth,
but not all the way.
Although many amateurs think data offers cyborg-like strength and infallibility, the truth is that it doesn’t. Data are collected from humans by humans and can be full of human mistakes.
One of the worst things a journalist can do is oversell the reliability and significance of data. People who dislike your story will immediately claim the methodology is bad and the facts are wrong. Your editor will act astonished, believe them and blame you.
Data journalism will only get you a few steps closer to the truth, but not all the way. Often those steps are crucial. You will need most of the other tools in a journalist’s arsenal to go the full distance — curiosity, skepticism, critical thinking, scientific reasoning, fact-checking, ground-truthing, interviews, telephones, reference libraries, public records, onsite visits, long meetings, expert opinions, clear writing and more.
But data-driven journalism is very good for certain things. Actually, there are several different kinds of data journalism, each good for a different kind of thing. But let’s list some main uses of data-driven journalism:
- Finding a needle in a haystack. That is, searching. Searching with intelligence to conquer the vastness of the data with computer power.
- Data “dating service.” Matching things that become news when matched. For example, political contributions and government action.
- Finding patterns and associations. Do all the Congress members voting for coal live in the Appalachians and the Rockies? Do African-American kids have more lead poisoning?
- Finding significance. Association does not prove causality. But data, combined with the discipline of statistics, can help us understand whether and when it does.
- Finding trends. In some subject areas, you can draw a line through data points and get a trend. Temperatures are an example.
- Polling public opinion. News organizations do a lot of public opinion polling (possibly too much?). Beneath it all is data, and a statistical understanding of how likely a sample is to represent the whole population.
- Predicting. Prediction is a hazardous enterprise, yet news media love it. Data is the raw material from which many predictions are made. It gets very technical.
- Automation. The job of working with public records and documents can be tedious and time-consuming. Computers offer many tools for making it easier.
- Display and mapping. Today, much of what we see in the news arena is likely to be on a screen. Data is the raw material of many good maps and graphics.
Another way to look at data-driven journalism is as a specialized set of skills. You may not need to have all of them, but you may need to find and work with people who do. These include fluency with database and spreadsheet software, the mathematical discipline of statistics, ability to use several computer programming languages, fluency with GIS software and more.
Environmental data journalism, in practice
It’s easier to see the power of data in environmental journalism if you look at some great examples from recent years. Here are some of the best:
- Hazardous industrial facilities. A team at the Houston Chronicle did a high-impact series on hazardous chemical facilities in the area (one of the densest concentrations of petrochemical plants in the country). The “Chemical Breakdown” series by Mark Colette and Matt Dempsey, which ran in mid-2016, used an obscure but powerful database about legally required “Risk Management Plans.” The results offered no reassurance that a catastrophe like the West, Texas, ammonium nitrate explosion in 2013 would not happen again.
- Lead poisoning of children. The lead-tainted drinking water failures at Flint, Mich., which became a national story in 2015 and 2016, turn out to be typical of many other cities with aging water systems across the country. While some who follow drinking water knew this, Reuters used data journalism to spell it out in a series that ran in late 2016. Drinking water is not the only source of lead that poisons children. The paint in aging buildings is another. Reuters used neighborhood-level data on the prevalence of childhood lead poisoning in test results. Reuters reporters M.B. Pell and Joshua Schneyer painstakingly collected the data from state health departments and the U.S. Centers for Disease Control, often with public records requests.
- Oil train hazards to communities. The fracking boom and tar sands produced more crude oil than existing pipelines could carry to market. So huge tanker trains began carrying more and more of it — despite the 2013 Lac-Mégantic rail disaster in Canada that killed more than 40 people. There was inadequate safety and transparency on oil trains until former President Barack Obama’s Transportation Department pushed the railroads. But because of railroad reluctance, many communities remained ignorant of hazardous trains rolling through their midst. Investigative work in 2014 by the Associated Press and ProPublica pried much of the data loose and mapped it so that people could see the threats they faced.
- Inspections of oil/gas wells on federal land. Oil and gas drilling has occurred on federal lands for a long time. Even before Obama issued fracking rules in March 2015, federal land managers were supposed to inspect these wells to prevent pollution and other risks. Following up on a Government Accountability Office report in 2014, the Associated Press got data about federal well inspections, and filed a story noting that four out of ten high-risk wells on federal land were not being inspected.
- Fracking chemical secrecy. Drillers inject a brew of chemicals into oil and gas wells for hydraulic fracturing of shale formations — and some of those chemicals can potentially contaminate drinking water aquifers. But the “Halliburton loophole” Congress passed in 2005 exempts frackers from Safe Drinking Water Act requirements that they disclose what they are pumping underground. Industry-sponsored databases leave out “trade secrets” in reporting toxic chemicals. EnergyWire’s Mike Soraghan crunched the data in a 2012 investigation that showed at least two-thirds of fracking fluid disclosures made by drilling companies omitted at least one ingredient, claiming trade secrecy. A Bloomberg team did a similar project the same year.
- EPA “poisoned places” watch list. Sometimes, just having the list is enough. For years, U.S. Environmental Protection Agency maintained a secret “watch list” of the worst Clean Air Act violators. In 2014 a crack investigative team at the Center for Public Integrity got the list by means of a FOIA request. Collaborating with NPR journalists, Jim Morris, Chris Hamby and Liz Lucas combed through the list, comparing it with EPA’s more public list of “high priority violators.” They did not do fancy data manipulations — they just used the list to walk through the many case studies of hard-core polluters and explain the many reasons why EPA and the states were doing a poor job of enforcement. It was part of their ongoing “Poisoned Places” project. An embarrassed EPA ended use of the watch list that year. But the lists (for water and waste as well as air) are still archived on the EPA site.
- Contaminants in drinking water. The 47,500 drinking water systems in the U.S., overseen by state agencies and the EPA, have been trying to improve the safety of public drinking water since before the Safe Drinking Water Act of 1974 was enacted. Yet for many years, news media reported on the issue only rarely. In 2009-10, Charles Duhigg of the New York Times did a sweeping series on “Toxic Waters,” which included drinking water. Duhigg collected contaminant records for all 47,500 water systems — and made it available to readers in an interactive format allowing them to look up their own water.
- Air pollution near schools. In December 2008, at the very end of the Bush years, USA TODAY did a very ambitious nationwide project based on the geographic connections between toxic air pollution and the nation’s schools. It was called “The Smokestack Effect.” It used a database that geolocated almost all of the nation’s schools (127,800 of them), along with another tool based on EPA’s Toxics Release Inventory. The so-called Risk-Screening Environmental Indicators model geographically integrates the releases of many chemicals, their environmental fate and potential human exposure. USA TODAY’s team, which included dozens of people, ground-truthed the information by doing its own measurements near schools. And then the news organization put it in an interactive form that allowed readers all over the United States to look up their own schools.
- Computer models predicting risk. Three years before Hurricane Katrina hit, New Orleans Times-Picayune reporters Mark Schleifstein and John McQuaid did a prophetic 2002 series, “Washing Away,” which anticipated what could happen if a category 5 hurricane hit the city dead-on. Computer reporting played only a background role in this journalistic tour-de-force. The reporters used a computer model that simulated the sloshing of coastal waters in a storm surge as they might interact with the city’s complex system of levees. They also used statistical concepts to explain the likelihood of a “big one.” Their discussion of vulnerable populations prompted municipal authorities to do evacuation and emergency planning which probably saved tens of thousands of lives.
- Natural gas pipeline hazards to communities. A terrible gas pipeline rupture and explosion in New Mexico in 2000 killed 10 campers. Then in July 2001, an investigative series in the Austin American-Statesman by Ralph Haurwitz and Jeff Nesmith put that event in a national context of pipeline company neglect and lax federal regulation. They used data from the Office of Pipeline Safety. Following the terrorist attacks of 9/11 that same year, the government shut down public access to pipeline routing and safety information. But their work arguably prompted Congress to tighten pipeline safety law in 2002. Without much data, there was very little journalism watchdogging pipelines for almost a decade — until the San Bruno, Calif., pipeline explosion in 2010 that killed eight.
Some SEJ.org resources
SEJ itself hosts many data resources. Here’s a list of recent writeups from SEJournal’s WatchDog, TipSheet and Toolbox columns that offer guides to data resources.
- Environmental Sleuthing Toolbox (WatchDog, March 16, 2016). A review of some of the more vital and productive databases for environmental journalism.
- Toolbox: Data Resources for Dams, Impoundments, and Levees (SEJ, October 5, 2013). A collection of data resources about dams, levees and other water control structures.
- EPA’s ECHO Enforcement Database: Under Pruitt, Watch Enforcement Data (TipSheet, February 28, 2017). Some updated tips for tracking EPA enforcement under the Pruitt administration.
- EPA Beefs Up ECHO Database for Drinking Water Enforcement (WatchDog, October 26, 2016). A quick look at a major upgrade of EPA’s enforcement database.
- Abandoned Mine Data Now a Bit More Available to Public (WatchDog, October 28, 2015). There are almost 50,000 abandoned mine sites like the Gold King Mine where a 2015 blowout caused havoc. Now you can track them.
- EPA Intends To Add Natural Gas Processing Plants to Toxics Inventory (WatchDog, October 28, 2015). Addition of natural gas processing to the TRI was a major expansion.
- Chemical Plant Data Can Reveal Local Stories (WatchDog, November 11, 2015). A watchdog report uses data to show chemical safety enforcement is lacking — and points the way to many local stories.
- New Power Plant Database Offers Local Stories for New Energy Era (WatchDog, December 16, 2015). Improvements in an Energy Information Administration database on electric power plants makes it easier to find and report local stories.
- National Atlas Is a Trove for Environmental Journalism Projects (WatchDog, January 6, 2016). The U.S. Geological Survey’s National Atlas is a powerful basic tool to help visualize all kinds of environmental data.
- Colorado River Portal Offers Data Tool for Environmental Journalists (WatchDog, January 20, 2016). The Interior Department offers a data tool for exploring the coming crunch on water from the Colorado River, a water source for 40 million Americans in the Rockies and Southwest.
- Toxics Database a Key Tool for Environmental Journalists (Reporter’s Toolbox, February 21, 2017). The latest annual iteration of the Toxics Release Inventory, a foundational tool for journalism about toxic chemicals in the environment.
- EPA Releases Latest Toxics Release Inventory — a Key Tool for Journos (WatchDog, February 3, 2016). An earlier year’s edition of the Toxics Release Inventory.
- Group Issues Data Map Showing Poor, Minorities Face More Toxic Risks (WatchDog, February 3, 2016). The watchdog nonprofit Center for Effective Government shows how EPA databases can document the greater environmental health risks faced by the poor and minorities.
- EPA Grants and Contracts Databases Offer Gumshoes Eye on Agency (WatchDog, February 17, 2016). Databases on grants and contracts at EPA offer a basic tool for investigative journalism on all kinds of subjects. That might include corruption.
- Data Journalism Tools and Tips from MIT (WatchDog, March 2, 2016). The Knight Science Journalism program at MIT partakes of the institution’s historic hacking legacy. It put together a how-to toolkit on data journalism for a 2016 conference.
- Reporter’s Toolbox: Data Mining Made Easy — A Primer (Reporter's Toolbox, Nov. 22, 2016). A four-step primer designed to help you use readily available tools to collect, clean up and analyze data, then use it to tell your stories.
- Finding Pipelines Near You with Databases (WatchDog, April 13, 2016). Follow this link and links to earlier stories for data on pipelines — a sure-fire news nexus.
- High-Hazard Chem Plants — Can Secrecy Substitute for Safety? (WatchDog, April 13, 2016). A quick discussion of how secrecy regarding data on hazardous chemical plants can kill innocent people.
- Hazardous Sites Near You? There’s a Database for That! (TipSheet, April 4, 2017). The good news is that the Houston Chronicle has preserved a key database on the nation’s most hazardous chemical sites. The bad news is that there are more that are not in it.
- Ag Department To Release Food Safety Inspection Data (WatchDog, July 20, 2016). The Obama administration set out to release facility-specific food safety inspection data. The future of this initiative under Trump may be unclear.
- Impaired Waters Lists: A Tool for Water Pollution Reporting (WatchDog, September 28, 2016). EPA’s lists of water that do not meet Clean Water Act standards are hard to get at (go to the states). But they are essential for reporting on water pollution.
- TipSheet: Coal-Ash Issue Not So Easy To Dispose Of (TipSheet, December 13, 2016). Coal ash is an issue that will not go away because there is so much of it threatening to pollute people’s water with toxic heavy metals. There is data.
- Got Coal Ash? Southeast Database Helps Watchdog Power Plants (WatchDog, October 26, 2016). This database was constructed by a regional clean water advocacy group.
- TOOLBOX: EWG Ag Subsidy Database a Tall Silo of Environmental Stories (WatchDog, June 27, 2012). This longstanding database on farm subsidies (and that includes conservation) retains its relevance as Congress starts ramping up to the next Farm Bill.
- TOOLBOX: Contractor Misconduct Database Offers Gumshoes Leads (WatchDog, November 20, 2008). There is actually a database on misconduct by government contractors, maintained by the nonprofit Project on Government Oversight. Can you find any contractors for environmental agencies?
- Voluntary Fracking Disclosure Database Gets 'F' from Harvard Study (WatchDog, April 24, 2013). A reminder, perhaps, that data can deceive as well as disclose. Still, it’s a starting point for investigating the toxic chemicals injected into oil and gas wells that may affect people’s drinking water.
- Corps Puts Searchable National Levee Database Online (WatchDog, November 2, 2011). Defective levees (or monster floods) can kill people. The Army Corps of Engineers does a good job of disclosing data on levees within just their own program (which includes relatively safe ones).
- EWG Database Helps Public, Journos Find Drinking Water Threats (WatchDog, February 5, 2014). The nonprofit Environmental Working Group has compiled a National Drinking Water Database that in some ways goes well beyond the EPA data it is based on. The full story of health risks from drinking water remains untold. Data helps.
Reading and other resources
- “The New Precision Journalism,” by Philip Meyer (Midland, 1991). This is an update of the original book, published in 1973, that set the standards for “computer assisted reporting.” You can read it online here.
- “A Guide to Computer Assisted Reporting,” by Pat Stith (Poynter, 2005). This Pulitzer-winning investigative reporter helped found the National Institute for Computer-Assisted Reporting.
- Digging for Truth with Data, by Brant Houston (Global Investigative Journalism Network, 1995/2015). The author was director of Investigative Reporters and Editors.
- CAR Hits the Mainstream, by Susan McGregor (Columbia Journalism Review, March 18, 2013). Chronicles CAR’s journey beyond the geek fringe.
- The Benefits of Computer-Assisted Reporting, by Jason Method (NiemanReports, Fall 2008). This issue of NiemanReports focuses on investigative journalism.
- "Data Journalism or Computer Assisted Reporting," by Elena Egawhary and Cynthia O'Murchu (Centre for Investigative Journalism, 2012).
- "Global Database Investigations: The Role of the Computer-Assisted Reporter," by Alexandre Léchenet (Reuters Institute for the Study of Journalism, 2014).
- The Golden Age of Computer-Assisted Reporting Is at Hand, by Mathew Ingram (NiemanLab, May 20, 2009).
- A Brief History of Computer-Assisted Reporting, by Susan E. McGregor (Tow Center for Digital Journalism, March 18, 2013).
Some useful groups and sites
- National Institute of Computer-Assisted Reporting (NICAR). NICAR is a branch of Investigative Reporters and Editors. It has a data library which makes important databases available conveniently at low cost. NICAR’s training programs are great.
- Data.gov. Whatever its failings in openness, the Obama administration made strides in making government data more accessible online. The Data.gov website is a hub for finding online data from almost all agencies.
- Data Driven Journalism. DDJ is too good and important a thing to keep just for the United States. The site/organization named “Data Driven Journalism” is an initiative of the European Journalism Centre. They have loads of training and networking resources. Plus, check out the hashtag #DDJ.
- Tow Center for Digital Journalism. Another way computers have changed journalism is by revolutionizing publishing platforms. This is not pure data journalism per se, but the connections are manifold and important. The Tow Center at Columbia Journalism School is a good place to get an overview.
Joseph A. Davis is editor of SEJournal’s WatchDog Tipsheet, and writer of the weekly TipSheet and monthly Issue Backgrounders.
* From the weekly news magazine SEJournal Online, Vol. 2, No. 17. Content from each new issue of SEJournal Online is available to the public via the SEJournal Online main page. Subscribe to the e-newsletter here. And see past issues of the SEJournal archived here.