A tale of four Rewired States. Making quick APIs. / Aug 9th 2010

As I was reflecting on the awesomeness of Young Rewired State, in particular, and Rewired State UK Online, I started to see a pattern emerging in my involvement in these fantastic events. Unsurprisingly it involves APIs. The four Rewired States I’ve been thinking about relating to APIs are the first National Hack the Government Day, Rewired Culture, Young Rewired State 2010 and the Rewired State Get Online day which happened on Saturday. 

For each of these events I built a custom API. APIs are useful, especially in the time constrained environment of a hackday. They allow you to build many things so quickly. At Rewired State UK Online at the weekend a very small number of developers built a lot of things very quickly based off of the APIs (here and here) that Sym Roe and I built beforehand. Looking back over the four Rewired States that I’ve made an API for, I’ve realised I’ve built them using some quite different methods.

Two of them imported the provided data, a CSV file of schools for National Hack the Government Day and a rather difficult to work with SQL dump of the Government Art Collection for Rewired Culture.

The other two involved just-in-time data use using YQL as a clean way to create a data source from other data sources. I’ve been playing with this pattern since Bonnier Hack Day. I’ll talk about it a bit in a post that’s in the works about Childs I Klimb, a way of telling a story of people climbing Kilimanjaro for charity. 

Interestingly each approach has it’s long term pitfalls but all are very useful for moving things along at hackdays. The import mechanism clearly has the potential problem of data going out of date and needing to do the tricky thing of reimport. This is of course if you can find out that it’s changed. We’re getting quite good at publishing public data, we just need to come up with ways of versioning it and telling people when it’s changed. Then there’s the other hazard of import, that the form of the new data not matching the old and breaking the importer. 

This second problem is the one which can bite the just-in-time API idea too. For scraping webpages, as in the UK Online fake API I made, a change in the page structure can lead to the API failing. I hadn’t considered it before, but had to deal with it in real time at Young Rewired State, the same is true of CSV data. Changing the order of the columns can be quite disastrous. It’s blindingly obvious now, but I hadn’t thought of it before. Part way through the day an extra column had been added into the lovely spreadsheet of scraped data I was using as a datasource for YQL. This broke some casting and in turn broke the data returns.

I’ve always agreed with the publish early, publish raw idea on public data. CSVs are an ideal data transfer format. Lightweight, easy to parse, easy to work with. My only caveats have always been about unique URLs for the latest and archived versions of the data and also that the data should contain the correct descriptors, preferably URLs for the things they’re describing. I now have another caveat now. It relates to the structure of the data inside and the structure of the table in the CSV, but moreover it relates to how these things are created and curated. CSVs often result from products such as Microsoft Excel. They’re created by humans for humans and often for making graphs. What we need to do is to find ways of describing templates and then validating the spreadsheets against those templates in almost a schema validation form before they are published, publishing that template/schema alongside it.

Powered by Tumblr
Jaggeree makes social applications like the game "And I Saw..." We have a few more up our sleeve at the moment when we find time to breathe between the client work!

Archive