Recently at work, I needed to import a substantial quantity of data from
some spreadsheets into an SQL database. Due to multiple maintainers and
changing needs, the spreadsheets were a mess of special cases, shorthand
notation, and minor errors. I gave up on importing them as-is, and asked
for the relevant data to be extracted into a format I defined. Writing a
converter for the latter was much easier (although errors kept cropping
up even in the new spreadsheet).
Today I read about Google Refine, a
program dedicated to dealing with messy data. The video demonstrations
on the project page show off some useful capabilities, such as grouping
together and canonicalising values from a column in a few steps, scaling
numeric values, and rearranging the data in various ways.
Refine is a Java program that you install and interact with using
a web browser. I've never used such a program, and I suspect it may be
painful to install, but I'm going to try it if any more spreadsheets
appear on the horizon.
Google Refine was renamed
again and is now known as