The perils of dirty data and “overpriced” ham

So, we paid a government contractor $1.1 million for 2 pounds of sliced ham? That seemed to be the story as the Drudge Report started linking items from the federal government’s Recovery.gov site this morning.

aviary-drudgereport-com-picture-1

Not so fast, said the Agriculture Department, as it swatted down Drudge’s reports with a rare rebuttal.

Now, all this back-and-forth might seem a bit excessive for a few pounds of sliced ham, but it illustrates one of the perils of transparency without context. Our government spends billions each year to collect, maintain and analyze all kinds of data. But it’s collected by humans, who make mistakes.

Take the ham fiasco. Everything appears above board in the original description on Recovery.gov. But because of the “Description of Work/Service Performed” field, it looks like we paid a bunch of money for some pork. That’s not necessarily an error, but it’s definitely not clear to most readers. (I probably would have drawn the same conclusion, although I would check it out first before writing a story.)

The people who deal with government data on a regular basis know all too well the problems associated with collecting and disseminating data. In the field of computer-assisted reporting, we call it “dirty data,” and we’re on guard for it all the time. (A chunk of my time as Database Editor is spent cleaning up data we get from various local, state and federal sources.)

Here’s how the folks at the Institute for Analytic Journalism put it in 2006:

An uncountable number of public agency databases have been created in the past 30 years. More and more, public and private decision-makers draw on this collected, digital data to make decisions about everything from disciplining doctors to zoning decisions to law enforcement to deciding who gets to vote. The often-unquestioned assumption is that the data, as found, analyzed and presented by a government or quasi-government agency, is valid. Increasingly, anecdotal evidence indicates that data is riddled with serious errors. Often, if initial investigations indicate the data is too suspect — and the cost to clean the data by hand or automatically too high — then good and important analysis and investigations are put aside.

The Government Accountability Office recently put out its own report on the subject of government data. The report is mainly a guide for government auditors, but they recognized the problems of all these disparate sources of data, and the public’s appetite to put it all online.

While this guide focuses only on the reliability of data in terms of completeness and accuracy, other data quality considerations are just as important. In particular, consider validity. Validity (as used here) refers to whether the data actually represent what you think is being measured. For example, if we are interested in analyzing job performance and a field in the database is labeled “annual evaluation score,” we need to know whether that field seems like a reasonable way to gain information on a person’s job performance or whether it represents another kind of evaluation score.

In journalism, we try to follow the age-old advice of, “If your Mother says she loves you, check it out.” Maybe Drudge should do the same thing?

–Paul

Written by Paul Monies




Categorized under:

If you enjoyed this post, please consider to leave a comment or subscribe to the feed and get future articles delivered to your feed reader.

Comments

very nice put up, i surely love this internet website, carry on it

The Zune concentrates on being a Portable Media Player. Not a internet browser. Not a game machine. Maybe within the future it’ll do even greater in those areas, but for now it is a amazing method to organize and listen to your music and videos, and is without peer in that regard. The iPod’s strengths are its web browsing and apps. If those sound more compelling, perhaps it’s your greatest choice.

Hey there! I know this is kind of off topic but I was wondering if you knew where I could get a captcha plugin for my comment form? I’m using the same blog platform as yours and I’m having trouble finding one? Thanks a lot!

Wow! Thank you! I constantly needed to write on my site something like that. Can I take a portion of your post to my blog?

I must admit that this is often 1 fantastic insight. It surely gives a business the opportunity to acquire in around the ground floor and really be a part of generating some thing distinctive and tailored for their needs.

Magnificent website. Plenty of useful information here. I am sending it to some friends ans also sharing in delicious. And obviously, thanks for your sweat!

A lot of thanks for all your valuable work on this web page. My aunt delights in carrying out research and it’s simple to grasp why. Most of us learn all regarding the compelling method you present efficient tips and tricks on your website and in addition encourage response from others on the area of interest plus my girl is certainly being taught a great deal. Take advantage of the rest of the year. You are carrying out a brilliant job.

Thank you, I’ve recently been hunting for details about this subject for ages and yours is the best I have found so far.

This web site is known as a stroll-by means of for all the information you wished about this and didn’t know who to ask. Glimpse here, and you’ll undoubtedly uncover it.

Leave a comment

(required)

(required)