A Medicare database, second-hand stats and social-science findings shed light on how to handle data with care.
By Stephen Rynkiewicz
Journalists look for reassurance in data, as a way to validate what their sources tell them. Scientists aren’t so sure – they joke about “data” being the plural of “anecdote.” Two sources are better than one, except when they’re both wrong.
“You can really jump to the wrong conclusions if you don’t have an understanding of the background of the data,” says Matthew Roberts, informatics manager for Chicago’s health department.
Researchers make a familiar complaint about their data: Getting quoted out of context. The issue’s playing out in a Medicare database of payments doctors took from medical suppliers. Both groups found errors in the data, and payments on research in progress were withheld.
“They just made some wrong guesses about what the data meant,” he said. The final Medicare database gives companies a chance to comment — to give details that could suggest a productive partnership rather than a conflict of interest.
With MacArthur Foundation funding, the conference connected data analysts with nonprofit workers to sort out public-access problems.
Some issues will be familiar to investigative reporters, such as “dirty data.” Nonprofits have limited resources to scrub their spreadsheets for inconsistencies or coding mistakes.
“Now I’m in wet-blanket mode,” Roberts said. “A lot of our data is not very clean to begin with.”
Online databases can reveal a number quickly, but not necessarily the definitions and methods behind it. “It can be immediately, incorrectly interpreted by anybody, right?”
Health care analysts are also wary of crossing the line on patient privacy. He notes that Illinois inadvertently named West Nile virus victims simply by reporting the gender and age of a county’s victims.
“If you take a look at the obituaries in a small county,” he said, “for any of those given days where the date of death was mentioned, you could pretty quickly figure out who was the 84-year-old male who had died from disease x.”
The Society of Professional Journalists ethics code calls on journalists to take responsibility for their work’s accuracy. That includes stats no less than anecdotes.
Pulitzer Prize-winner John Ullmann tells reporters to “interview their data,” to get familiar with the numbers and what’s behind them. Here are some questions to ask:
Is it accurate? Secondhand statistics should be handled with care. Often they’re stripped from their original context.
Lately I tried to verify a marketing manager’s figures on how many customers companies lose every year. He attributed them to “a recent case study conducted by Bain Consulting.” His numbers were the same ones cited in Fred Reichheld’s 1990s book “The Loyalty Effect.” The Bain & Co. director was merely speculating about the payback from controlling turnover. Somehow his hypothetical figures had morphed into hard numbers.
Is it timely? Even the latest data may not reflect fast-moving events. The Census Bureau updates its 10-year count, but warns users about changes in questions and sample size.
Is it relevant? Watch for “survey bias,” signs that the methods might be dictating the outcome. Customer responses or social media polls may not say much about the general population.
The BBC reports that 16 percent of 18- to 25-year-olds “may be suffering from internet addiction disorder.” But it didn’t question whether the signs of net addiction, such as being “irritable when interrupted,” were addictive or even web-related behaviors.
Not till the last sentence was it noted that online addiction isn’t a recognized psychiatric disorder. But one caution flag fluttered high in the story: The survey was attributed to a marketing agency.
Is it complete? Directory listings often reflect who’s paying for them, or who’s placed source material online.
“The datasets we do get, we know that they’re not accurate,” Samia Makik of the Chatham Business Association told the Chicago conference. Her program and others put businesses online to keep shoppers in the community.
“The reason they’re going elsewhere is that’s what they find when they’re searching,” she said. “‘Well, this person has a deal, I’m going to go out there.'”
Is it trustworthy? Novel conclusions attract all kinds of editors, including those at medical journals. Peer review standards vary widely.
Seemingly conclusive findings can prove on further review to be random noise. That doesn’t stop scientists. They just figure the odds that they’re right, and try to improve them. That’s how they’re comfortable with their conclusions on, say, global warming.
Journalists can feel secure too, as long as they’re just as critical with stats as with sources.