Three libraries, a framework and an API - how ContentTagger got built in 7 hours / Mar 12th 2009

"You can build prototypes in the time it takes to have a meeting"

Simon Willison - Open Platform launch

This is phrase which sticks in my mind from Tuesday and I make no apologies for including it in 2 blog posts in a day. It has to be said that I don’t want ever to be in a 7 hour meeting (that was roughly how long ContentTagger took to build). However it’s still quite a quick turnaround and it got some extra niceness from a couple of “watercooler” moments (although one of them may have added on a couple of hours). 

I’ve been interested in tagging, and more specifically “controlled vocabulary tagging” since I was a scientist. We do it all the time internally. We have definitions for things and we associate them with other instances of those things or similar things. When we explain new concepts to people we often use points of reference which are shared. This is the idea behind ContentTagger, which is the sort of semi-semantic companion piece to Stamen’s lovely ApiMaps. It looks at what our editors have tagged, what people have already tagged on Delicious - if it’s been bookmarked - and then gives you the opportunity to start wiring the item of content into the world of Freebase saying whether something is about an person/place/thing or just mentions it.

This is it. Please have a play.


So, down to how it was made. Well the framework and how it is hosted is Google AppEngine. It’s built using the basic AppEngine webapp framework and Django templates and the Python client library for OpenPlatform. I want at some point to move to using the Django stack on AppEngine but a last minute hack (started the night before the night before the launch) isn’t a time to get all experimental. It obviously uses the Guardian OpenPlatform Content API. The search goes off to the Content API and pulls back 10 items matching the terms or if you add no terms just the latest 10, I’ll add in paging at some point soon and a more explicit “I feel lucky button”. There’s some caching in there too but always within the bounds of the HTTP time to live headers from the API.  

Getting an item is easy. The Guardian provides back both an ID number which is a useful foreign key for The Guardian content and also the URL of the item in It’s useful to have both for this hack as we’ll use the URL to hunt stuff in Delicious. When I’m storing the tags created by users I’ve got fields for both of these so when I generate the API I can return tags based either on Guardian ID or URL.

Getting the controlled vocabulary is always the hardest part of anything where you’re trying to tag with context. This bit is nicely solved (along with the wiring Guardian to Freebase) through a lovely Freebase suggests JavaScript library. This was the first “watercooler” moment when Simon and I were talking about the idea en route to lunch one day and he mentioned seeing this library. I can then associate items with Freebase tags (will be microformatting this very soon) and can also in the future return Guardian items with specific Freebase tags or sending back a list of tags for a specific item ID or Guardian URL (wiring back and forth). The API for ContentTagger is about half built at the moment; finishing that would have needed more coffee and less sleep. So at the end of evening/late-night 1 we had something that could go talk to the Content API and could allow controlled vocabulary tagging.

Armed with a half finished ugly prototype and my trusty N95 I went back to Kings Place for the busy day before launch. I saw a photo opportunity with the lovely letters on poles in The Guardian lobby and decided that would make a good background and that I should really aim for something which looked a bit nicer (especially as Tom Carden had done such a lovely job on ApiMaps).


Then the second watercooler moment happened late in the day before the launch. At the end of the day I was chatting to Matt McAlister about progress. He suggested that it would be great to somehow bring in Delicious tags too. Enter library number 3; Michael Noll’s Delicious Python library. A couple of quick tweaks to the imports and one method made it play nice with AppEngine’s stack and pulling in a couple of dependencies got it bringing in some data. Evening number two was spent on cleaning up the UI and hooking it all together so that it looked and behaved coherently. I decided to hint at the ContentTagger API by making it pull in the list of tags assigned asynchronously which was done very easily with JQuery

So at 2am (we have a lovely traffic graph to prove how late we were working on demos) after about 7 hours of work at most ContentTagger was ready for its morning press call (during which, just to prove it was a live demo, it threw an unhandled 500 error - error handling; that’s on the to do list).

I’m going to clean up the code and UI and then maybe make it a bit more social, give some hints about how many tags created and a bit of a browse/search tags function. However it proved a point nicely that with modern frameworks it is possible to create something of meaning at the prototype stage which weaves together the fabric of the internet; linking up content from disparate sources around a few specific foreign keys.

I believe we’re just entering a golden age of the digital artisan. Where makers of things (I really like Tom Armitage’s description that he’s a maker) can easily build elegant solutions to problems in a reasonable amount of time (and this is important in this climate as it relates to money) which would one or two years ago been hard to do and cost more. The more APIs and libraries there are the more elaborate things we can make. Tom Coates’ vision of the Age of Point-at-Things is fast becoming the age of point at resources and link them all together. 

Today, David Cushman talked about the revolution which will come when we’re all coders. We’re not there yet, but it will be sooner than many think.

Powered by Tumblr