The problem with 'more data'

Chip Oglesby bio photo By Chip Oglesby

Recently, SCPC wrote about the problem with online check registers in county school districts. As more and more data is placed online, we need a way to standardize data so that has context and it’s not just sitting there. That’s what we would call ‘naked transparency.’

The naked transparency movement marries the power of network technology to the radical decline in the cost of collecting, storing, and distributing data. Its aim is to liberate that data, especially government data, so as to enable the public to process it and understand it better, or at least differently.

Before we rally the troops, we have to realize that getting more data, data that we own, from government officials on all levels doesn’t equal more transparency or accountability.

Data is only part of the Transparency Cycle

In a blog post by the Sunlight Foundation, they posted a very interesting graphic that shows how the ‘Transparency Cycle’ works. It has no beginning or end because it’s part of an ongoing process. Government Agencies (State Ethics Board) for example, are responsible for organizing data and giving web developers API’s who work with Graphic Designers who Give data context by visualizing it. Designers work with Journalists who help build public awareness through context and raising public awareness by reporting anomalies. Engaged Citizens work with Advocacy Groups who Organize and take action to hold the public and lawmakers accountable for what’s going on in government.

Tim Berners-Lee, the founder of the internet has envisioned a new type of web, one of linked data, where the dots are able to be connected. Berners-Lee gives five points of open linked data.

  1. make your stuff available on the web (whatever format)

  2. make it available as structured data (e.g. excel instead of image scan of a table)

  3. non-proprietary format (e.g. csv instead of excel)

  4. use URLs to identify things, so that people can point at your stuff

  5. link your data to other people’s data to provide context

State Comptroller Richard Eckstrom’s state government spending transparency site accomplishes 4 of the 5 goals, a great accomplishment in my opinion. Our school websites on the other hand, meet only one of the 5 requirements. PDF’s with no structure, give engaged citizens no way to ingest and analyze more than one month worth of data.

I was able to go in and scrap a PDF off of Berkley County’s transparency website and run the information through Many Eyes to get this chart that’s featured below. Ideally, there should be a simpler way for a developer or designer to visualize this information through API’s.

Eckstrom’s website is faced with the same type of problem. It focuses on the month to month expenditures, and if I want to build a database, I would have to download 12 separate .csv files to enter into another database to visualize.

All data is dirty

Once we’re able to actually collect data through publicly accessible API’s, does that necessarily mean the info is clean? Not really.

Since data input still relies on human-beings we are all prone to make mistakes. Remember the disaster of recovery.gov? There was a huge scandal because of all of the ‘ghost’ districts where money was being spent. The to main views here are simple “It happened on purpose, democrats are trying to steal/take our money” or “It was just a simple mistake, a slip of the finger or some congressional page didn’t know what district they were in.”

Also, if you browse the transparency data portal from the Sunlight Foundation and look for campaign contributions, names can be misspelled and instead of using proper nouns for occupation such as “Owner: Fast Bucks” a donor may simply list occupation as “store owner.”

This can lead to a few errors. It makes it hard to track who’s actually giving because a researcher will have to double check which company the donor works for to help connect the dots.

Data needs context

Once the data is published, it still needs context. PDF’s are good for looking at a small record, but what if we want to compare values over a given year, or the past six years? How do we know when a company a lobbyists represents gives a lawmaker money for his PAC so that he may be influenced to vote a certain way?

Spending all day pouring through massive amounts of information can be tedious and lead to the wrong conclusions. Instead, there should be automated processes in place that alert people via email, text, tweet when anomalies arise. Like the internet, quietly working in the background, but always on.

Designers and reporters also play an important roll in this because they can help clarify misunderstandings someone may have.

Data doesn’t equal transparency

Once we get the data, it’s been check to be accurate, and given context all is not complete in the transparency cycle. Government could publish every single bit of data it has, recorded votes, transit information, gis maps, but what good will it do if it just sits there?

It’s up to engaged citizens and Advocacy Groups to take the information from Designers, Developers, Journalists and Bloggers and form grassroots movements to hold government responsible. Data without action is done for naught.

Once Citizens and Groups organize and take action, they along with others can work with Lawmakers to actually make a change.

Transparency alone will not lead to more accountability in government. Data.gov and recovery.gov are great examples, Federal government have given citizens monitoring tools.

In South Carolina, we face battles of our own. South Carolina Senate, comprised of only 46 people cannot simply decide if they’ll vote on the record because they say it’s unconstitutional. They’ve also argued that verbal roll-call voting takes too long, and I agree, it does. But there are solutions out there. Open-source software can be written so that bills, amendments and earmarks can be posted online 72 hours early for the public can expect them, then house and senate members could vote on the bills so that we can connect the dots to see where change and influence is happening.

The question that South Carolina faces is: Who’s going to be first in the Transparency Cycle?