Spreadsheets Can Find Patterns in Words, Not Just Numbers

May 1, 2009

By DAVID POULSON

 Reporters traditionally use spreadsheets to analyze numbers and quickly calculate thousands of records.

But increasingly they also use them to analyze words and to find patterns in their notes. You don't have to know a mean from a median in those spreadsheets to uncover a hot story angle, rule out dead ends or keep yourself organized during a complex investigation.

Susanne Rust used a Microsoft Excel spreadsheet to analyze more than 250 peer-reviewed studies about the health effects of the chemical bisphenol A. The Milwaukee Journal Sentinel reporter started with five columns — date, author, journal, whether the study found an effect and who funded it.

As she worked, she added new categories. And her spreadsheet largely populated with words, not numbers, grew to more than 50 columns. They included such things as the number of animals studied, how they were exposed, their bedding type and the health endpoint that was examined.

Rust found that 168 studies looked at low-dose effects. And her spreadsheet analysis helped her discover the basis of a powerful story: 132 of those studies disclosed health problems, including hyperactivity, diabetes and genital deformities.

Perhaps equally interesting is that all but one of the studies indicating health problems were conducted by non-industry scientists. But nearly three-fourths of those that found the chemical harmless were funded by industry.

Rust also ruled out a story tip that some studies may be skewed because a particular strain of rats might be insensitive to the chemical. Her work did not disclose that pattern.

"It's funny, you know, although I used Excel all the time as an anthropology grad student, when I became a journalist, it never occurred to me to use it — until I got on this story," said Rust, a member of SEJ.

"Now, once again, it's become indispensable in my work." Former SEJ president Mike Mansur, an investigative reporter for The Kansas City Star, also found a spreadsheet valuable for more than crunching numbers. He built one to investigate complaints against the Kansas City Police Department.

Not sure of what he'd find, Mansur created columns for each bit of data on the form used to investigate the complaints. They included things like location of complaint, date filed, date resolved, outcome, gender and race of complainant, gender and race of officer.

"I learned how long each complaint took to wind through the system, how often video cameras were not on or failed, how often the officer was exonerated," Mansur said. "It also gave me an easy way to search my paper files without ever touching them. If I wanted the complaint at 67th and Swope, I searched for those terms and the number popped up."

Indeed, organization and quick access to information is why many reporters use spreadsheets.

ProPublica reporter Joaquin Sapien created in Excel a chronology of how the Federal Emergency Management Agency defused worry over formaldehyde levels inside trailers provided to hurricane victims. The story drew upon two years of email conversations and formal correspondence between federal officials.

Sapien, also an SEJ member, created fields for date, agency, description of communication, newspaper clips.

"Each time that a relevant email was exchanged between agency officials, I plugged a description of the email into a cell that corresponded with the appropriate date," Sapien said.

He linked descriptions to the supporting documents saved on his hard drive. A pattern emerged showing that the objections of many low-level officials were ignored while senior officials insisted that the formaldehyde did not pose a health hazard.

"The process made several hundred pages of documents far easier to process, and gave me a quick accessible format that was extremely helpful in writing the story," Sapien said.

What's more, the spreadsheet became the basis of an online timeline providing a visual aid for readers.

Excel can even sort through multiple interviews to help get a sense of where the story is. Marcy Burstiner, a former reporter who teaches investigative reporting at Humboldt State University, has developed a system for entering into a spreadsheet interview questions and answers, the major points they elicit and quotes relevant to those points.

Sorting on the major points elicited in the interviews will group all the information that produced them, Burstiner said. Points without a lot of supporting information may be irrelevant or perhaps require more investigation. But similar points, gathered from perhaps dozens of interviews, may become the structure of the story. And sorting them groups all the quotes, paraphrases and other information that support those points.

"In other words, Excel outlines your story for you," Burstiner said. "Then you can relabel cells according to major points and continued on page 26 then shuffle those around if you want to see if there is a better order to those points."

Adding columns for "people to interview" or "data to get" quickly creates a list of holes to fill, she said.

"By putting your interviews into Excel you treat the anecdotes, statements, opinions, etc. as pieces of data and you can work with them as you would any other data."

Besides better organizing your notes, story information and even your story, these techniques can produce other benefits: Susanne Rust and her reporting partner Meg Kissinger have already won a George Polk Award, the Edward J. Meeman award sponsored by the Scripps Howard Foundation and the 2008 John B. Oakes Award for Environmental Reporting for their series on common household chemicals, including bisphenol A.

David Poulson is the associate director of the Knight Center for Environmental Journalism where he teaches environmental, investigative and computer-assisted reporting.

** From SEJ's quarterly newsletter, SEJournal Spring, 2009 issue.

DAVID POULSON