41 mph? The Evidence Against the Sunday Times Article

The Sunday TimesYesterday The Sunday Times newspaper (UK) had an article on page 5 of the main section entitled “40mph city cyclists defy speed limits” (in the paper) “City cyclists turn roads into racetracks” (on the website) written by Nicholas Hellen and Georgia Graham which repeatedly refers to a Segment in London where the average speed of the fastest riders is 41 mph. I was contacted by Georgia last Thursday and spoke at length to Nicholas about Strava and how it works and particularly about how you can’t trust the timings (and hence speed) of short segments. But from the beginning of the conversation it was very clear what their angle was going to be and basically wanted me (or someone) to be able to quote saying that Strava encourages me to break the law (speeding (which I pointed out to him isn’t actually breaking the law) and jumping red lights) in built up areas. It doesn’t and I don’t. In this post I’ll do my best to explain why that 41 mph should actually be more like 31 mph.

The Sunday Times Article

 

The key parts of the article (after the headline) are as follows:

Cyclists are racing around inner-city streets at speeds of up to 41mph

WHO does that cyclist hurtling past at 41mph think he is? Bradley Wiggins?

The record speed for one crowded section of London’s South Circular Road, which mostly has a 30mph speed limit, is 41mph. On a nearby road, nicknamed “Gunning it! On Armoury Way” by competitors, a rider has clocked a time of 33.3mph.

Two riders, identified as Tris M and George B, are recorded as averaging 41mph on a short section of the South Circular near Barnes. The only way of displacing them is by again breaking the speed limit.

The Sunday Times tested three routes in central London, each of them ridden more than 20,000 times by Strava users, to establish whether it was possible to match cyclists’ times without running red lights or breaching the Highway Code.

In each case, a motorbike, travelling at the 30mph speed limit, clocked slower times than those recorded by the cycling kings and queens, as well as cyclists much further down the leaderboard on each route.

Now, having had the opportunity to read the article (unfortunately it is behind a pay-wall but I’ve also posted the free section of it on my Facebook page) it is clear how this story came into being. The focus of the first quater of the article and the shock tactic headline is based around a segment in London whose KOM has a top speed of 41 mph and hence encourages every other Strava user to break the speed limit and jump red lights in order to try and challenge that time. Myself and two others are quoted saying how useful they find Strava of training, enjoyment and motivational purposes that go some whay to readdress the balance but the damage and misleading has already been done.

To put things into perspective, Chris Hoy can reach a top speed of 48.5 mph (78 kph) and that is with thighs the size of most people’s waist and in a climate control velodrome built to maximise speed (hot and humid to reduce wind resistance) so to think that a commuter can come anywhere close to 40 mph over a few hundred metres should be enough to make people question the numbers.

The Strava segment in question is called “upper rich road” and it’s current leaderboard is:

upper rich road Strava leaderboard

Up until a few moments ago there was another rider in 1st position with a time of 6 seconds and an average speed of 108.4 mph (174.4 km/h). We can forget about this rider’s data because it is clearly not correct to the extent that Nicholas actually asked me how a rider could get such a high speed. I suppose we should be pleased that he didn’t choose to use this speed in the article’s opening paragraph but he does appear to have assumed that all the other data must be correct even having just seen for himself how wildly inaccurate it is capable of being.

Timings (and therefore average speeds) on segments can be inaccurate for three main reasons:

  1. GPS devices playing catch-up – the most common main cause of the fast times on this segment. Due to a combination of riding between tall buildings/trees or just poor quality recording equipment, often the device will lose track of where it is and then find it again causing the recorded track to jump unfeasibly quickly to the new location.
  2. Ride not actually covering the segment – this is most apparent with segments that form a loop. Say a segment covers 30 km but the start and finish points are only a few metres apart then in the past, riders crossing those 2 points over just a few seconds were being awarded the segment and often getting ludicrously fast average speeds. Strava have rectified this and now ensure that the ride covers at least 75% of the segment’s route. Not foolproof but they have to allow for genuine GPS drift to ensure the people genuinely riding the segment are awarded it. Unfortunately I believe historical data still applies to the segments leaderboards. I would imagine this could be rectified by flagging and recreating the segment concerned.
  3. Start and finish points of a rider’s effort for a segment are specific to points recorded by their GPS device rather than the start and finish points of the segment. This potentially results in different riders recording different times and speeds for the same segment even if they were riding next to each other throughout. I have previously written an in depth explanation of this which you can read at your leisure but basically the shorter (< 1km) and straighter the segment is the greater the effect this can have to twist the leaderboard.

So here is the map of the segment:

upper rich road Strava map

Seeing as this is a very short segment at only 290.68 m and is dead straight, lets plug the segment id into my Alternative Leaderboard and see what comes out:

upper rich road veloview alternative leaderboard 1

First off the rider 2nd in the list, Tris M is showing up as NaN. This is due to this segment not being “popular” as far as he is concerned. Maybe he has hidden it from his list personally or Strava have hidden for other reasons. Chances are he doesn’t know it exists and so would never have knowingly tried to get a time on it.

The table shows the following details:

  • Time Pos – this is the position shown in Strava’s own leaderboard. It is purely based on the time taken between the matched start and end points of the rider’s GPS trace.
  • Time – the number of seconds taken.
  • Speed Pos – the position based on the Actual Speed of the rider calculated using the Actual Distance they covered.
  • Actual Speed – the average speed the rider travelled over the Actual Distance they covered.
  • Seg Speed – this is the average speed shown by Strava. It is calculated using the Time and the distance of the original segment.
  • Actual Distance – due to the way Strava match a riders start and finishing positions of a segment this can differ greatly from that of the segment. They are restricted to the points recorded by the rider’s GPS device. At present they are not interpolated to the points where the rider crossed the segment’s start and end points.

It is clear from the list that for those top 10 riders the actual distance travelled varies massively (183.6 m right up to 293.7 m) with only a single rider having covered the entire 290m of the segment.

Numbers can be tricky for lots of people to visualise so here are some pictures that illustrate it pretty well. First up a rider whose data has been matched up pretty well (the red line is the route of their ride and the blue line is the section of their ride matched up to the segment. This rider passed in both directions down this stretch of road on this ride):

upper rich road rider map

The start and end points match up almost exactly with that of the segment and his distance is recorded at 308.5 m, just a little over the segment itself.

Now, a rider whose data matching isn’t quite so great:

upper rich road rider map

This rider’s distance is considerably shorter. It looks like the gps device is struggling and only locks onto a position midway along the segment.

And finally one that hasn’t really worked at all (but is still matched):

upper rich road rider map

A very confused gps device. This GPS trace is actually from our KOM George B. No wonder he got such a fast time although I can’t explain how he covered 275.6 m in the process!

Ordering the Alternative Leaderboard by position speed (so we don’t actually care how far they travelled, just how fast they were going) then we get the following:

upper rich road veloview alternative leaderboard ordered by actual speed

 

We know we can strike out George B due to his very poor GPS trace but we can actually also strike out James S and Mark E for similar reasons. This leaves us with a new King Of the Mountain: james b. Well done James!

The actual KOM’s average speed? 31.9 mph (51.33 km/h)

Not such a shock headline now is it?

Just to see how much better Strava’s segment matching is now compared to the past I decided to create a near duplicate of this segment and although it has cleared a number of the spurious rides from the list George B. still sits at the KOM position. Putting it into the Alternative Leaderboard only requires you to ignore George and you once again get the true KOM: jame b.

Don’t blame Strava

This post might well seem to be pointing the finger at Strava but it certainly isn’t. Strava can only do so much with the data it is given and as you have seen from the images above, often the data can be terrible, but people still want all their segments matching! Strava introduced the 75% matching rule so hopefully that will remove a large number of the spurious rides from the leader boards but in my opinion they also need to interpolate those start and finish points and retrospectively apply that to all their data or more of these type of articles will inevitably appear. A huge data processing task though and lots of people will lose their KOM’s in the process but a necessary pain I feel.

As for whether certain segments should exist in the first place I’ll leave that for another day. If people think they are dangerous then flag them, that is what that button is for.

Update: An interesting point made to me by Mr Hellen on the 12th Feb (after the article was published) is whether Strava needs to make it more obvious to users that the times and speeds on their leaderboard (and hence the placings) can be subject to error due to everything I’ve mentioned in this article. A new user coming to the system might well take these speeds at face value.

Conclusion

If you’ve made it down this far then well done for enduring my logic/evidence based rant. Ever since discovering Ben Goldacre’s “Bad Science” column in the Guardian a few years back I can’t bring myself to read or believe in much “news” any more without the niggling, or sometimes blindingly obvious doubt that the journalist involved either haven’t done their research properly or are just representing the statistics in a way to shock rather than educate. This article in The Sunday Times is no exception. If I hadn’t been interviewed for it or it had been hidden away in a supplement somewhere I might have let it lie but seeing as Nicholas specifically chose to ignore a number of my points about the reliability of the data and go for the shock headline instead then I’ve had to make a point of putting this together.

I don’t doubt that some people attempt to improve their Strava placings on their commutes to/from work and some of them probably jump red lights in doing so but I’d hazard a guess that they would jump those red lights even if they weren’t recording their ride. If you stopped every rider who jumped a red light and asked if they were recording their ride for Strava and jumped the light specifically to improve their time on a segment (even if none of them lied) then your yes percentage would be near to zero.

This particular segment is nothing remarkable and regular users of Strava will probably know that without their GPS devices recording incorrectly in their favour they will never make it into the top 10 of the leaderboard and Strava not being able to show a completely accurate leaderboard is obviously down to the bad data it is provided with, but it has a damn good try. The moment this became a problem was when The Sunday Times (a rather large and influential newspaper and home to the one and only David Walsh) decided to take its data at face value as the basis for their headline and article in a way clearly designed to further aggravate the relationship between motorists and cyclists.

 

0 thoughts on “41 mph? The Evidence Against the Sunday Times Article

    • Ha, editing a GPX file is easily done for good or bad and wondered if such a site existed! We need some kind of GPX anti-doping testing going on. I reckon that could be automated if you really wanted to as well. SADA (Strava Anti-Doping Agency) might be appropriate?

  • Great read for a data junkie like myself. And another example of that you can never draw any conclusions from a data set without also examining how the said data was acquired. That goes for any data, not just cycling-nerd-data.

  • A well considered piece. The press aren’t really into the truth, they’re solely concerned with shifting units. A sad state of affairs.

    Just to throw it into the ring, I have also heard of cyclists who are alleged to have ‘cheated’ (count me out of this one – I don’t use strava) by recording sections of climbs on hills in South Wales whilst driving a car. How petty is that?

    • Those people that choose to drive on those hills in south Wales share the same mentality as those that choose to dope in pro cycling PERIOD!

  • Personally I would automatically discount pretty much any story in the first 16 pages of the Sunday Times, particularly anything with Nicholas Hellen’s name on it.

    So many of these stories are based on a nugget of truth that is then spun for the sake of creating a furore.

    I used to work on the news section of the Sunday Telegraph and we would always laugh about how few of the Sunday Times stories we could follow up because they were so full of holes.

  • It will be obvious to any cyclist that the claim is bonkers. Sadly, the target audience are not only not cyclists, they are actively looking for ways not to become cyclists, to avoid addressing insecurities about fitness and health, and environmental impact of driving.

    I did once exceed 50mph on a bike: a recumbent, going down a long hill with a good surface on a very clear day. Unusual conditions. 40mph on a bike is terrifying with current road surface conditions, and pretty much unachievable anywhere in London anyway due to traffic, road layout and pedestrians.

  • 41mph? hmmm I’d like to think I could do it…. but (insert clip of traindriver discussion in BTTF3 ) – I’d need to get some straight and level track…. (etc)

  • It strikes me they’re missing out on a whole shock article about sat navs causing drivers to speed in order to beat the journey times =D

    Not sure why Strava has leaderboards if the data is so bad, looks like its only really applicable to yourself and your own equipment.

    Once again I’m reminded why I ignore almost everything in the mainstream media these days. Haven’t bought a newspaper in years.

    • There’s just a margin of error based on the 3 factors mentioned in the article but as the distance of a segment increases the percentage effect of this error drops rapidly. Still potentially a few seconds out either way but if that is over 9 minutes then that is unlikely to change the leaderboard significantly. Personally I ignore any segment under 1km, unless I’m the KOM of course ;-).
      Also, if a segment has any real meaning, say Holme Moss hill climb, then if someone gets a new KOM then chances are the previous KOM will make damn sure that the new KOM hasn’t achieved it accidently (by a GPS error out of their control or by leaving the GPS recording as they drove home after a ride). If they have then either a friendly comment on the ride to ask them to correct/remove it or just flag it.
      It seems fairly obvious that nobody actually cares at all about the segment chosen in the article and so the Strava community hasn’t be bothered to clean up the data.

  • Ben, although you say we can’t blame Strava for cyclists breaking speed limits, we can blame them for displaying obviously faulty gps data, no? Because the George B gps data, sliced segment does not even cover Strava’s 75% coverage rule, it appears it only covers about 10%. In addition, George B’s ending gps point is not anywhere close to the end point boundary. Excellent explanation by the way!

    • Thanks for your comment salad. I added an update early on today saying that perhaps Strava should have a disclaimer somewhere on their site that says their leaderboard times/speeds are subject to error (but I would think that potential error is virtually impossible to quantify).

      I think it boils down to what gps data you consider to be “faulty”, no gps data is completely fault free. If you look at this rider’s segment (on the duplicate segment I created and mentioned above) http://app.strava.com/activities/12901854#669372738 it looks like it is matched quite well but still clocks an average speed of 142.6 km/h. However George B’s is still matched and from the map it looks like nothings been matched at all although according to the data he covered 265.52 m on that segment, so from a distance perspective he actually covered the majority of the segment, no idea why it doesn’t show on the map though. Software will always have bugs in it!

      If you look at the information about segment matching on Strava’s website (https://strava.zendesk.com/entries/20950148-Segment-Matching-Issues) they say ” we make sure that at least 75% of the data in between matches the segment data”. So for these very short segments you could potentially just get 2 data points falling within the segment and that algorithm then thinks its fine as that is 100% matching. If the GPS device just happened to be catching up with itself at the time then that could quite easily result in such a match. That combined with Smart Recording leaves these really short segments open to these kind of results. Attempting to interpolate the start/end positions will go some way to resolve this but I would think there will still be enough anomalies to result in incorrect leaderboards due to these gps catch-ups.

      My lack of blame for Strava is more down to how, from a raw data perspective, their algorithms will never be able to tell how inaccurate the data they are given actually is. A track might keep speeding up and slowing down, is that due to a poor gps signal or was the rider doing intervals? They can’t automatically attempt to tidy up data by matching the tracks to a mapping system either as very few of the mountain biking or off-road running tracks will match.

      I’m really trying to focus on the data side of things with all of this rather than philosophical or legal blame as I’m not educated to talk from those perspectives.

  • “The speed limit” only applies to motorised vehicles so cyclists can not break it. Specific bylaws can be applied to roads imposing a speed limit but these are nonstandard and would need to be clearley posted.

  • Great article.

    The search for evidence got me thinking about data, and how much power KOM Tris M would need to produce over 16 seconds to generate an average speed of 66 kph. So I had a little play with Bike calculator [http://bikecalculator.com/index.html]. I used data from the Strava Segment page to assume temperature, gradient etc and applied my weight and winter training bike weight to the calculator.

    Assuming Tris M is a similar average weight to me, he’d be maintaining 1,338 watts for the duration of the 16 seconds he took to complete the segment, or 16.7 watts per kilogram.

    According to Andrew Coggan’s Power Profiles [http://home.trainingpeaks.com/articles/cycling/power-profiling.aspx] a world class international cyclist can produce between 21 and 24 watts per kilo for 5 seconds, and 10.7 and 11.5 watts per kilo for 1 minute.

    16.7 watts per kilo for 16 seconds sounds pretty darn world class to me.

    Another comparison: the fastest man on the planet in the last 200 meters of a race produces about 1,600 watts. That’s recent world champion Mark Cavendish. And this segment is nearly 300 meters. Chapeau Tris M.

    Sunday Times – do your research!

  • One other issue that is probably worth mentioning is that GPS devices are not made equal. Mobile phones tend to produce pretty bad data compared to dedicated units like Garmin Edge. On the segment in question, 7 out of 10 “top finishers” in the official leaderboard used their mobile phones to record tracks. Your “true KOM” James B. likewise used a mobile phone. Close-up inspection of his track indicates a highly implausible acceleration from 5 mph to 34 mph in about 6 seconds. I think we can exclude him as a victim of a GPS glitch as well.

    All of the remaining three have good GPS position/speed data, their actual speeds were in all three cases below 30 mph (Jack C. managed about 27.3), and their speeds were inflated by the distance-traveled effect you describe.

    As to the question who’s the real KOM on this segment, filtering out all cell phone athletes and inspecting the remaining rides by hand, it seems to me that it’s a three way tie between Jack C., Mo B. and Keyser S., all of whom averaged just over 27 mph, and, out of 3, only Mo B. actually went above 30 mph at any point.

    • I stand corrected. Liam H. (#5 on the alternative leaderboard in the alternative segment) managed to travel the entire distance of the segment without ever slowing below 31 mph. He is a pro cyclist, the ride in question is 87 miles at the average speed of 26 mph without a single stop, and, according to Google, the date of the ride corresponds to something called “London–Surrey Cycle Classic”. For some unclear reason, he does not even appear in the leaderboard for the original segment, but it’s almost certain that either he or some other racer from the London-Surrey peloton should hold the KOM there.

      • Thanks for doing the further investigations. I ran out of time to go any further down the list but had done enough to prove my point but I’m sure you’re right about other riders data being less than reliable.
        I think some phones are actually pretty good at recording (newer iPhones and the more expensive Androids) but there are plenty of older/cheaper devices that don’t really cut the mustard. I wonder how much of it also comes down to where the phone is stored as well? If it is tucked safely away in a rucksack then is the recording worse?
        I’ve also seen some pretty poor recording by Garmin 800’s as well so don’t think it is simple as saying phone data is bad and Garmin data is good.
        Considering the varying quality of the data I’m always amazed at how well Strava manage to match up segments in such a fast time. Maybe there should be a minimum allowed length of segment in order to ensure a higher, minimum level of accuracy (although I’m not sure if that minimum length should be in distance or time, both have their +’s and -‘s)?

        • I have an expensive Android (Motorola Razr i) and yes, its GPS is decent, I might even say comparable to Edge. I used to have an older iPhone and its GPS was quite horrible. Garmins are generally good, but they have very little internal memory and they will sometimes go into memory-saving mode with 5-10 s or even more between data points. Look at Gus P. (Garmin Edge 705, #3 on the official leaderboard). He goes through the segment area with data points spaced 10-15 s apart (at one point, even up to 17 s.) But individual points are pretty good, the only thing wrong with them is that they show him on the wrong side of the road a few times.

          As to the minimum allowed length/time – here’s an even shorter segment near where I live, but with a much cleaner leaderboard:

          http://veloviewer.com/NewLeaderboard.php?segmentId=2583427

          I suppose it would get messy too if it were ridden by 4000 people with cell phones, but for now, with 12 people on the list and 11 out of 12 using Garmins, it seems okay.

  • Nice bit of stats wrangling there. I was surprised to see myself near the top of that segment list particularly as I enter it from a right turn and exit on a left, usually expecting the lights to change. Once your engine got hold of me I was drastically demoted. Why? Anomalous 67kph reading from the Garmin. I think 29.1 to 67 in 3 seconds is a little beyond me.

  • You should sell your alternative leaderboard to Strava so they don’t have to do the coding to make their leaderboards actually accurate. Not only would it be a service to riders, it wouldn’t make cyclists look like such jerks. The newspaper is taking an inaccurate data presentation and running with it. Selling the hype…

    • Thanks for the comment Scott. My alternative leaderboard only works better than Strava’s own leaderboard for a tiny proportion of segments as it just uses the distance the GPS unit thinks it covered between the matched start and end points of a segment. Should the points recorded BETWEEN those two end points be subject to error (i.e. zig-zagging around due to a bad signal or just GPS drift) then the distance value recorded will be higher than that which the rider actually travelled. That is why only short, straight segments really work with my alternative leaderboard as these erratic recordings are less likely (although the segment in question has lots of shocking data for whatever reasons). The more accurate thing to do is to just work out the extra distance covered (or not covered) by a rider just at those two end points and recalculate the average speeds based on those (completely ignoring the distance the GPS thinks it travelled) but the amount of processing required for each rider to determine that is too much for me to contemplate doing.
      Hype sells!

  • I’m using Strava on a HTC Sensation. I also have a Cycle GPS recorder. Some discrepancies between the 2 devices have been :

    1. 1 Ride – Max speed HTC 43.6mph, GPS Recorder 33.6mph.

    2. 1 Ride – HTC Distance/time = 4.6 miles 15.09. GPS = 3.4 miles 14.26.

Comments are closed.