How BBC iPlayer broke its URLs

I've become a big user of the BBC iPlayer service, even though I already have a pretty sophisticated PVR hard disk recorder sitting under my telly. The crucial difference between the two services is that with the BBC iPlayer, you don't have to remember to record a programme in advance. It also helps that I can (cough) download the iPhone versions to my iPod Touch to watch on the way to work (this isn't strictly supported as it circumvents the 7 day catchup window, even though I'm nearly always watching the programmes within a day or two of broadcast anyway).

One of the nice things about the BBC iPlayer service, aside from the ability to watch telly you've missed, is that each TV programme now has a nice URL that you can e-mail to friends (or bookmark on delicious if you're a geek). It might sound trivial, but before iPlayer, TV programmes were represented haphazardly on the BBC website, either getting a full-blown Flash microsite which quickly dated, or simply an entry in the 7 day listings which soon then disappeared from the web, possibly re-appearing if the programme was repeated, and then disappearing again.

Some smart people at the BBC realised that providing a URL for each TV and radio programme was the first thing any broadcaster website should be built upon. Tom Coates wrote about it back in 2004 in The Age of Point-at-Things. This thinking was then first adopted in the Radio 3 website, which had URLs like http://www.bbc.co.uk/radio3/worldroutes/pip/9z4hw/, written about by Tom in Developing a URL structure for broadcast. The alphanumeric code for BBC programmes was invented, and christened the 'pip' (programme information page).

Step forward a few years, Tom leaves the BBC, and the BBC Programmes site (often called simply '/programmes') is launched, with a URL for every TV and radio programme broadcast (from launch onwards), in the form of http://www.bbc.co.uk/programmes/b00f2dfv. The alphanumeric ids are still there, only they're a few digits longer, and seem to have been internally renamed 'pids' (programme ids), for unknown reasons. Job done, right?

Not quite. Shortly after the /programmes site launches (in beta), the iPlayer service launches. Now, the streamable videos could have been located at the existing /programmes/pid URLs, but for reasons of branding or infrastructure or whatever, end up at a new URL, in the form of http://www.bbc.co.uk/iplayer/episode/b00f2dfv. Note though that the same 'pid' is used. Having the two URLs perhaps isn't the end of the world, there's a page with the programme information (decription, credits), and a page with the actual video. The distinction between the two pages has started to merge though, with description info available on the iPlayer page, and the video available on the /programmes page - so maybe it wasn't such a good idea after all.

What's really annoyed me though, and the reason for this post, is that the iPlayer URLs themselves have now changed again. The Stephen Fry programme linked to above, for example, is actually now linked to at http://www.bbc.co.uk/iplayer/episode/b00f2dfv/Stephen_Fry_in_America_New_World/. Yup, it's gained an extra string of the programme name, with spaces replaced by underscores. Let's be clear: there is no good reason to do this at all.

Stuffing keywords into URLs seems to be fashionable at the moment (it's the default behaviour of software like Wordpress), but the rationale is pretty flimsy. The reasons given are firstly that it improves search engine ranking and secondly that it makes users more likely to click on your URL when they see it in the search results. The first reason might be marginly true in some edge cases, but search engines have more than enough information to go on already - they can look at the html <title>, the <h1> tag and the anchor text of pages and sites linking to that page. Which in the case of a TV programme, will nearly always contain the programme title anyway, so adding it to the URL in the hope that it'll boost your search ranking is just wishful thinking. The second reason has more justification, but again is pretty marginal, especially as users seeing the bbc.co.uk should already feel more than confident enough to click the link, without having to see the programme title in the URL.

It's bad enough changing your URLs arbitrarily with no good reason (see Cool URIs don't change), but it's even worse to not redirect (with a '301 Moved Permanently' header code) from your old URL to your new ones, which is how the iPlayer site is behaving at the moment.

Even even worse is to set up your server to simply ignore the string in the last part of the URL, and to simply return the page that you would have got had this not been present. This means that I can craft ficticious URLs like http://www.bbc.co.uk/iplayer/episode/b00f2dfv/Stephen_Fry_is_our_saviour/, or http://www.bbc.co.uk/iplayer/episode/b00f085h/James_May_is_a_big_old_fool/ and have them still work, returning '200 OK' header codes and BBC content. I can put them in my blog, save them to delicious, and they may even show up in search engines. And the age of being able to point-at-things at a permanent URI is broken.

C'mon BBC, sort it out.

Update: as a prime example of why putting titles in URLs is such a bad idea, the URL for the most recent editio of Top Gear is http://www.bbc.co.uk/iplayer/episode/b00fm0xc/Top_Gear_Series_12_Episode_2_(new_series)/. Yup, someone at the BBC decided it'd be a good idea to add '(new series)' to the programme title in the iPlayer database, presumably because old series get repeated so often that it's hard for users to tell what's new or not (hint: it should be possible for the website to work this out from the data programmatically, and then display a 'NEW' icon). So now the URL (which is meant to be 'permanent' & 'stable') has 'new' in it. Which will be accurate for, oh, about 6 months?