Searching for data is always frustrating. While updating (or rather extending it back into the past) my personal database on CPI inflation I had to make my way through the jungle of different formats, time-spans, frequencies etc. that come from different statistical bodies (by the way, of all statistical agencies I checked out these days, UK’s Office for National Statistics is the weirdest! Just my humble opinion, of course).

When there was just a couple of gaps left, I turned to They got a nice web-site and, more importantly, they got data I needed. Unfortunately, there was no easy way to export these data — one has to select part of the page, paste it somewhere and clean it of rubbish like text and percentage marks. It would take ages to rip all those sparse figures … but wait, I got a perfect toolkit for this kind of job. After five minutes of “googling” and fine-tuning the tools my one-line monster script was ready. Here’s how it looks:

for i in {2006..2011}; do curl -s${i}.aspx | grep "nbsp;%" | cut -d"<" -f2 | cut -d">" -f2 | cut -d"&" -f1 | awk 'NR % 2 == 0' | sed 's/$/+100/' | bc -l; done

This script crawls through all webpages that contain the data I need, searches for my figures, formats them into a human-readable form and displays them. Of course, it can be further improved, say, to add a date column, write data to a file etc., but it’s already done the job and I probably won’t need it any time soon. I just couldn’t resist bragging a bit about my script (and praising the unix-way, of course!).

Leave a Reply