I’ve become a big user of the BBC iPlayer service, even though I already have a pretty sophisticated PVR hard disk recorder sitting under my telly. The crucial difference between the two services is that with the BBC iPlayer, you don’t have to remember to record a programme in advance. It also helps that I can (cough) download the iPhone versions to my iPod Touch to watch on the way to work (this isn’t strictly supported as it circumvents the 7 day catchup window, even though I’m nearly always watching the programmes within a day or two of broadcast anyway).
One of the nice things about the BBC iPlayer service, aside from the ability to watch telly you’ve missed, is that each TV programme now has a nice URL that you can e-mail to friends (or bookmark on delicious if you’re a geek). It might sound trivial, but before iPlayer, TV programmes were represented haphazardly on the BBC website, either getting a full-blown Flash microsite which quickly dated, or simply an entry in the 7 day listings which soon then disappeared from the web, possibly re-appearing if the programme was repeated, and then disappearing again.
Some smart people at the BBC realised that providing a URL for each TV and radio programme was the first thing any broadcaster website should be built upon. Tom Coates wrote about it back in 2004 in The Age of Point-at-Things. This thinking was then first adopted in the Radio 3 website, which had URLs like http://www.bbc.co.uk/radio3/worldroutes/pip/9z4hw/, written about by Tom in Developing a URL structure for broadcast. The alphanumeric code for BBC programmes was invented, and christened the ‘pip’ (programme information page).
Step forward a few years, Tom leaves the BBC, and the BBC Programmes site (often called simply ‘/programmes’) is launched, with a URL for every TV and radio programme broadcast (from launch onwards), in the form of http://www.bbc.co.uk/programmes/b00f2dfv. The alphanumeric ids are still there, only they’re a few digits longer, and seem to have been internally renamed ‘pids’ (programme ids), for unknown reasons. Job done, right?
Not quite. Shortly after the /programmes site launches (in beta), the iPlayer service launches. Now, the streamable videos could have been located at the existing /programmes/pid URLs, but for reasons of branding or infrastructure or whatever, end up at a new URL, in the form of http://www.bbc.co.uk/iplayer/episode/b00f2dfv. Note though that the same ‘pid’ is used. Having the two URLs perhaps isn’t the end of the world, there’s a page with the programme information (decription, credits), and a page with the actual video. The distinction between the two pages has started to merge though, with description info available on the iPlayer page, and the video available on the /programmes page – so maybe it wasn’t such a good idea after all.
What’s really annoyed me though, and the reason for this post, is that the iPlayer URLs themselves have now changed again. The Stephen Fry programme linked to above, for example, is actually now linked to at http://www.bbc.co.uk/iplayer/episode/b00f2dfv/Stephen_Fry_in_America_New_World/. Yup, it’s gained an extra string of the programme name, with spaces replaced by underscores. Let’s be clear: there is no good reason to do this at all.
Stuffing keywords into URLs seems to be fashionable at the moment (it’s the default behaviour of software like WordPress), but the rationale is pretty flimsy. The reasons given are firstly that it improves search engine ranking and secondly that it makes users more likely to click on your URL when they see it in the search results. The first reason might be marginly true in some edge cases, but search engines have more than enough information to go on already – they can look at the html <title>, the <h1> tag and the anchor text of pages and sites linking to that page. Which in the case of a TV programme, will nearly always contain the programme title anyway, so adding it to the URL in the hope that it’ll boost your search ranking is just wishful thinking. The second reason has more justification, but again is pretty marginal, especially as users seeing the bbc.co.uk should already feel more than confident enough to click the link, without having to see the programme title in the URL.
It’s bad enough changing your URLs arbitrarily with no good reason (see Cool URIs don’t change), but it’s even worse to not redirect (with a ’301 Moved Permanently’ header code) from your old URL to your new ones, which is how the iPlayer site is behaving at the moment.
Even even worse is to set up your server to simply ignore the string in the last part of the URL, and to simply return the page that you would have got had this not been present. This means that I can craft ficticious URLs like http://www.bbc.co.uk/iplayer/episode/b00f2dfv/Stephen_Fry_is_our_saviour/, or http://www.bbc.co.uk/iplayer/episode/b00f085h/James_May_is_a_big_old_fool/ and have them still work, returning ’200 OK’ header codes and BBC content. I can put them in my blog, save them to delicious, and they may even show up in search engines. And the age of being able to point-at-things at a permanent URI is broken.
C’mon BBC, sort it out.
Update: as a prime example of why putting titles in URLs is such a bad idea, the URL for the most recent editio of Top Gear is http://www.bbc.co.uk/iplayer/episode/b00fm0xc/Top_Gear_Series_12_Episode_2_(new_series)/. Yup, someone at the BBC decided it’d be a good idea to add ‘(new series)’ to the programme title in the iPlayer database, presumably because old series get repeated so often that it’s hard for users to tell what’s new or not (hint: it should be possible for the website to work this out from the data programmatically, and then display a ‘NEW’ icon). So now the URL (which is meant to be ‘permanent’ & ‘stable’) has ‘new’ in it. Which will be accurate for, oh, about 6 months?
Jo Saull said:
All good points. I like the use of the double ‘even’ expression. Are you going to create some more silliy URL’s just to prove the point?
PS – my dwell time on the blog has increased by 50% since the ‘down with brown’ policy.
Jason Cartwright said:
Oh dear. Well, I’m a fan of strings in URLs like this, but this “go on, just tack the title on the end, no need to validate it” approach is asking for trouble.
Also, they should have used hyphens instead of underscores – http://www.mattcutts.com/blog/dashes-vs-underscores/
Frankie Roberto said:
@Jason Actually, it seems that Google understands underscores as word separators now: http://news.cnet.com/8301-10784_3-9748779-7.html
Jason Cartwright said:
Not quite what he said apparently…
http://www.mattcutts.com/blog/whitehat-seo-tips-for-bloggers/
“If you read Stephan Spencer’s write-up, he says some people thought that underscores are the same as dashes to Google now, and I didn’t quite say that in the talk. I said that we had someone looking at that now. So I wouldn’t consider it a completely done deal at this point. But note that I also said if you’d already made your site with underscores, it probably wasn’t worth trying to migrate all your urls over to dashes. If you’re starting fresh, I’d still pick dashes.”
Hard to keep up with all of this :-)
Frankie Roberto said:
Good point. The reason I prefer underscores to dashes as a replacement for spaces in URLs (which otherwise have to be encoded as %20) is that hyphens are a commonly-used character within written English, which can convey meaning, whereas underscores generally aren’t.
For example, if you have the title “I’ve just seen a man-eating shark!”, converting spaces to underscores preserves the meaning (ive_just_seen_a_man-eating_shark) but converting spaces to hypens renders it ambiguous (ive-just-seen-a-man-eating-shark).
Pingback: pinkegobox.net » Blog Archive » annoyed rant from the BBC
Freddy said:
It looks like they’ve fixed the redirect:
GET -Sd http://www.bbc.co.uk/iplayer/episode/b00fky7g/
GET http://www.bbc.co.uk/iplayer/episode/b00fky7g/ –> 301 Moved Permanently
GET http://www.bbc.co.uk/iplayer/episode/b00fky7g/The_Barristers_Episode_2/ –> 200 OK
But not the actual content of the title …
Frankie Roberto said:
Yep, they’re 301 redirecting from the old URLs to the new ones with the title appended now.
They’re still not actually checking that the appended title is the right one though. http://www.bbc.co.uk/iplayer/episode/b00fky7g/The_Barristers_are_idiots/ returns 200 OK…
Dan said:
Hi,
Even more interesting is the way that the iPlayer is providing direct links for downloading their content for portable devices. For instance:-
http://directdl.iplayer.bbc.co.uk/windowsmedia/Survivors_Episode1_200811232100_mobile.wmv
http://directdl.iplayer.bbc.co.uk/windowsmedia/Survivors_Episode2_200811252100_mobile.wmv
Generally the format is:-
Programme – Episode – Year – Month – Day – Time
Although I have noticed that this does have very slight variations sometimes.
Even more interesting is that these links for direct downloads don’t have any GeoIP restrictions. Similar to accessing content through the Download Manager – once you’ve spoofed the GeoIP restrictions by accessing the main programme page on the iPlayer website – you aren’t stopped from downloading from another country.
I wonder how long it will take them to find a way to stop this.
Out of interest do you know how they are generating those codes within the URL e.g. b00ftcqn, b00drmbs, b0079t3d – although they mostly seem to be prefixed with ‘b00′
Frankie Roberto said:
Hi Dan. I don’t know about the GeoIP filtering, as I’m in the UK and can access the site legitimately. The wmv files are DRM encrypted though aren’t they? I tried one out on my new Nokia N96, and it played fine, but the quality wasn’t nearly as good as the iPhone/iPod Touch version.
As for the PID (Programme ID codes), I think they are going through them fairly sequentially, but they use a combination of numbers and consonants (no vowels, to avoid accidentally spelling out rude words, although b00bxxxx got through!).
Dan said:
Hi Frankie,
Who is to say I’m not accessing them legitimately!?! :-) Although I would admit that onion-routing is a fantastic technology!
Yes, I’m very unhappy about the DRM. Very silly. And means that simple things like fast forward, and rewind don’t work. There are some very good methods for removing the DRM if you look around. And personally I’m not interested in distributing the content, that would be naughty! But I don’t like Windoze Media Player, it has restrictions on the length of time you can keep a file for (what say if I forget to watch it!), you can’t fast forward/rewind (have we slipped into the dark ages?), and it won’t work on a linux box, which I’m using currently. So for the moment, I’ll stick to doing everything I can to break GeoIP and DRM. I wish the BBC would see what power they could seize in using open source software, instead of pitting themself against something which was designed to break conventions…the internet.
*rant over*
D
Anonymous said:
http://www.google.com/search?q=james+may+is+a+big+old+fool
Frankie Roberto said:
@Anonymous Heh.
TheKnight said:
I know this may be old hat now but has anyone tried taking the “_mobile” bit out of the previous
http://directdl.iplayer.bbc.co.uk/windowsmedia/Survivors_Episode1_200811232100_mobile.wmv
?
Cause wonderful things may happen…
Pingback: Frankie Roberto – Pragmatism in URL design