Wednesday, March 22, 2017

Wrangling F1 Data With R - Print Copy

By the by, I popped a printable copy of the latest version of Wrangling F1 Data With R onto Lulu, so if you fancy over 400 pages of R, you can find a copy here...

Tuesday, March 14, 2017

Reading WRC Stage Charts

Over the last few weeks, I've started dabbling with the automated generation of rally reports at a stage level using WRC rally results data. (See examples here.)

Part of the plan is to explore a method for producing simple text based reports based on observations made about various charts of stage split times. So what sorts of chart are in the mix? The following charts are all based on the same stage, but offer different perspectives over it.

The first chart type displays stage split time rank positions. For each driver, record the rank position at each split and at the end of the stage. The drivers are arranged on the horizontal x-axis in order of start position.

In this first chart, we see how Sordo was ranked first overall at each split point when all drivers' times are considered. (So for example, even though Evans would have ranked first when he passed through each split point, we actually record his ranked positions when the stage is complete and all times are available for ranking purposes.) The line markers represent the position at each split point, with the first split leftmost and the final end-of-stage position rightmost. 

Where the lines for each driver are relatively flat, it shows the driver held position across each split. Where the lines trend down left to right it shows the driver improved their rank position across the stage. Where the lines trend up ad the the right, it shows they lost position over the course of the stage.

The second chart type shows the stage split delta time (in seconds) to the first driver on the stage. For each split, calculate the running stage split time relative to the time recorded by the first driver on the stage. (This means that the first driver will have a stage split delta of 0s at each split and at the end of the stage.)

The leader within the stage at each split point is highlighted with the red palette. The labels show the overall rank position at each split.

In this case, horizontal lines show the driver kept pace with the first car (for example, Hanninen), faster with a negative time (vertical y-axis, below the first car, such as Sordo) or slower with a positive time (above the first car, such as Bertelli). Traces moving up and to the right show the driver lost time relative to the first car on the stage. Traces moving down to the right show the driver gained time compared to the first car on stage as they progressed through the stage.

The third chart type shows the stage sector delta time (in seconds) relative to the fastest stage sector time. That is, calculate the "sector time" of each driver (the time between each split point along the stage), find the best (fastest) sector time at each split, and then for each driver find their sector delta from this fastest time. 
Using this chart, we can get information about how much time each driver gained or lost compared to the driver who got between two split points in the shortest time. The labels show the split sector time rank. In this case, we see that despite being ranked first in terms of overall (accumulated) stage time at each split (the first chart), Sordo does not record the fastest time between splits 1 and 2 (that honour goes to Meeke), or splits 2 and 3 (where Ogier was fastest), although he does record the fastest time going between the start and split 1, and between split 3 and the end of the stage.

The final chart type shows the stage split delta time (in seconds) relative to the overall leader at start of stage. The stage leader is identified as the driver whose split times are all 0s (so Meeke was the rally leader prior to this stage). In the chart below, the driver who was ranked best overall (i.e. in terms of accumulated stage time) at each split is highlighted, although it should be possible instead highlight the driver who was fastest in each sector, or was in the lead of the rally overall prior to the start of the stage.

In this case, we see that Sordo made only a slight gain on rally leader Meeke in terms of time, whereas Hanninen lost about 6 seconds to him between each and every split point.

Sunday, January 29, 2017

"Sector Times Deltas" from WRC Rally Stage Split Times

For the longer stages on the WRC rally, one or more split times track the progress of each car through the rally stage. The results record the accumulated stage time of the first car as it passed each timing control, (I'm not sure what is reported if the first car doesn't make it all the way through the stage?), with an offset reported for the cars that follow:

From this information, we can find the accumulated stage time of each car - simply add the offset to the corresponding split time of the first car.

Given the accumulated stage time of a car as it passes each timing control, we can also calculate the time the car spent going between those controls: for two consecutive split times, simply subtract the first split-time from the second. In circuit racing, this would be called a sector time (I'm not sure if rallyspeak uses the same terminology?). Adding up all the sector times recorded by a car should give its stage time.

If for each sector we find the minimum sector time, we can subtract this from the each car's sector time to find the sector time "delta" for that car - the time over the fastest time to complete that sector. (An ultimate stagetime is then given as the sum of the minimum sector times for the stage. A driver could then be said to drive an ultimate stage if his sector times are the fastest in each sector for that stage.)

Just as we could chart split time deltas at stage splits (the times reported in the WRC results shown above), as well as rebased deltas relative to an arbitrary driver, we can also chart the deltas to the minimum sector time for each driver.

The following chart from Stage 3 of the 2017 Monte Carlo rally shows how Ogier was 40s behind the minimum sector time for the first sector, lost a few seconds in the second, but was then on the ball in the latter half of the stage. Evans lost 40s or so in each of the first 3 sectors, and dropped 20s in the last sector. Hanninen was competitive for the first three sectors, but lost time somewhere in the last sector.

(The number labels represent the overall rally ranking (rather than the RC1 relative class ranking shown in the charts in the previous post) based on sector time for that sector. A driver driving an ultimate stage would have position 1 for each sector.)

As well as grouping by driver, we can group by sector. Trends within a sector may show how conditions improved or worsened over the course of the runners. The grouping also highlights drivers who lost time in a particular sector - the number labels in this case represent the car number.

So for example, we see how cars 1, 3 and 20 lost significant amounts of time in sector 1, cars 3 and 30 in sectors 2 and 3, and car 20 in sector 4. Indeed, the last 4 runners in sector 4 all lost more than 15 seconds.

Notes on how to generate these charts will be published at some point...

Saturday, January 28, 2017

Rebasing WRC Stage Split Times - How Elfyn Evans Lost Time on Stage 3 of the Monte Carlo 2017 WRC Rally

As well as visualising split times within a stage relative to the first car passing through the stage (for example, WRC Rally Stage Reports - Split Times (stage 3, Monte Carlo 2017), we can also rebase the split times relative to the times recorded through the stage by any particular driver.

For example, rebasing split times for stage 3 of the Monte Carlo 2017 rally relative to Elfyn Evans' times shows how he lost time relative to everyone except Ogier and Serderidis on the first split, but then continued to lost time at a more or less constant rate over the remaining splits to everyone except Serderidis, who had a spectacularly miserable time.

Note also how this chart is further annotated using colour to highlight the times of a specified driver (which need not be the driver relative to whose times the other drivers' deltas are calculated).

Friday, January 27, 2017

WRC Rally Stage Reports - Split Times (stage 3, Monte Carlo 2017)

I had another tinker with the WRC results/timing data last night, and sketched a chart inspired by seasonal subseries charts that shows the split times and final stage time for each driver in a particular stage.

Drivers are ordered according to the order in which they started the stage. The points within each driver series correspond to the separate split times, in order. The y-axis is the time delta in seconds to the first driver onto the stage, which is why Ogier's times are flat with a delta of 0s at each timing point. The number labels are the in-class rank of the driver at each timing point.

So what can we learn, just from the chart? Firstly, Ogier seemed to lose it at the start. From the Red Bull TV (WRC Rally) coverage (and WRC/FIA footage on Youtube), it seems Ogier went for a little excursion at the start of the stage (stage 3), which accounts for his poor performance there, losing around about 40s compared to the other runners. The trends on Neuville's and Meeke's times are more or less flat, which shows they were approximately keeping time with each other, and with Ogier, across the other checkpoints. Latvala and Breen lost small amounts of time at each split, (the slight upward trend), Lefebvre more so, and Evans and Serderidis big time. Hanninen looked to be making time over the first part of the stage (the downward trend) before losing it on the last part.

Here's another view, rendered using seaborn.

I'll plot a code recipe at some point. Also, I think I'm pretty much committed to trying a Jupyter/python workflow for another data junkie data wrangling book on Leanpub, based around the WRC data...

Wednesday, January 25, 2017

Playing with Python / Pandas - WRC Stage Chart, Monte Carlo 2017

Whilst many of the data sketches I've produced for the F1DataJunkie blog have been generated using R/ggplot2, I spend more of my time now using the Python programming language, and in particular the pandas tabular data wrangling package.

This is partly why my F1 data wrangling tailed off last year. But new year, and all that, so I thought I'd start playing with a new data set, and a new set of charts - around WRC (World Rally Championship) data.

Here's the first sketch I came up with, described (with code) here - a stages chart for the RC1 cars in the 2017 Monte Carlo WRC rally:

This was generated using matplotlib, and shows the overall classification (across all classes) of the RC1 cars at the end of each stage. The boxed numbers showing the ranking of the driver outside the top 12 (there were 12 RC1 cars registered at the start of the rally).

Sunday, May 29, 2016

F1 2016 Monaco Race Battlemaps

The following reading is based solely on the charts and in absence of having seen the race or read any race reports.

A quick sketch of the race track chart shows how the 2016 Monaco F1 Grand Prix evolved.

The cars were bunched for the first few laps, presumably behind a safety car? Thereafter,  RIC's early dominance was cut short as HAM made ground and took the lead, then it was a close fought battle at the front for the middle third of the race, with HAM drawing away at the end.

The second half of the race also saw a close fought battle for second, with a close group of several cars in the middle order.
So how do the battle maps read?

Winner HAM started the race behind teammate ROS and slipping away from RIC up front, before taking the position from ROS and halting RIC's advance as ROS fell away at about 2s per lap. Presumably following a pit stop by RIC(?), which granted HAM the lead, RIC made a quick return to the battle for first, tussling closely with HAM for the rest of the race before falling back a second a lap at the end of the race.
From RIC's perspective, once the race was properly underway he pulled away from ROS, until HAM got past ROS and managed to hold the gap to ahead, before taking the lead as RIC pitted(?). RIC pulled HAM back over the course of 5 laps, but thereafter spent the race nipping at HAM's heels. Behind, PER ebbed and flowed, losing time over several laps, before pulling it back, then losing out again several times.

PER seems to have had an interesting time. Trailing SAI for the first quarter of the race, with ALO close behind, PER spent laps 21 to 28 sandwiched between ROS and VET, presumably before pitting? For the last half of the race, PER managed the gap to RIC 10s ahead, chipping in to the gap whenever VET fell off a second or two behind, before losing it again as the battle to fend of VET reasserted itself.
VET's race began being sandwiched between HAM ahead and HUL behind, until all change (a pit stop?) and VET found himself behind MAS, then ROS briefly, before settling into the forty plus laps of sitting on PER's rear. Behind, it was pretty much clear air once ALO fell away.
So it looks like ALO was fifth? After his early battle with PER ahead, and BUT behind, before BIT fell away, ALO was chasing HUL and then SAI for a while as GUT made ground from behind, fell away, recovered, and fell away again. When SAI was replaced by VET ahead, VET's advantage was evident has he sped off in to the distance, while ALO gamely held off ROS for over forty laps before HUL seems to have snatched sixth on the final lap?

So how did HUL see the race? Caught between VET and SAI from the off, HUL joined battle with ROS around lap 32/33(ish?!) and then pestered him till the end before snatching the place on the final lap. Behind, SAI kept trying, but he too fell off at the end.
So ROS is presumably not the happiest of bunnies tonight? Once the race started, RIC sped off into the distance ahead, whilst HAM presumably fumed behind? Once HAM passed, SAI, then PER gave grief from behind before pit stops (presumably) established a new order of ROS stuck between ALO in front and HUL behind. As the race entered the last few laps, ALO started to draw away but HUL was tenacious behind, stealing the place on the last lap.

I'm guessing SAI was classified next? He was giving grief to HU throughout the race, with PER and ALO behind for the first half of the race, then BUT behind for the second half, though far enough behind for there to be enough of a breathing space behind that there was no threat from a DRS assist.
So JB / BUT had his own race, it seems? A scrappy first quarter of the race chasing GUT then WEH, with an attack from GRO behind, before WEH was passed and left, only for VES to come on a charge and pass at a great rate of knots. SAI was just too far ahead to provide much close racing there, and no threat from behind in the middle third as GUT fell off. The last third saw MAS chasing BUT down, but the race was three or four laps too short for the threat to require any close defense from the attack behind,
MAS' early race was spent defending against GRO, and then VET, the latter battle allowing PER to pull away ahead. As the race entered the second quarter WEH fell off behind, the GUT ahead was passed and easily pulled away from. MAS made in-roads into the gap ahead, pulling BUT back at an impressive rate, but the road ran out before a full passing attack on BUT could be seriously considered.
BOT doesn't seem to have had much fun. With ALO pulling away at the start. BOT was stuck between WEH and ERI. Mid-way through the race, BOT managed to take WEH and pull away rapidly, only to be lapped by the cars fighting for first and second, then third and fourth. Being able to lap HAR probably didn't offer much solace, but having started to manage the gap to GUT ahead in the final quarter, the last few laps saw BOT take the gap down and pass GUT on the final lap.

So GUT then seems to have had an eventful time. His early tussle with BOT ahead and RAI behind saw him cut down a gap to ALO before having to defend against VES. ALO pulled away ahead, then BUT,  while MAS made the pass from behind and then he too drove away. As GUT was lapped by the leaders, he managed to lap HAR in turn, and then poor old GRO. Behind, BOT looked to be safely back in the mirrors, but after taking a chunk out of the gap in the penultimate lap, GUT lost out to BOT on the final lap. 

WEH seems to have spent the race watching BUT then BOT in his mirrors, though he did get to see MAS then BOT speed off ahead, only to have GRO then sit on his tail as he started to get lapped by all and sundry. It looks as if GRO fell off right at the end? 
GRO had a slightly more interesting time perhaps? He seems to have taken HAR and then NAS ahead before hooking up with WEH? NAS looks like he may have threatened mid-way though the race, but then disappeared?

Of the rest, what happened to VES? It looks as if he was stuck behind NAS and MAG for a bit, but then he seems to have made a break, accelerating away from MAS as he chased down BUT and then GUT, though BUT charged back, albeit without being able to pass. When VES race came to an early end, he was nipping at SAI's rear end and holding off BUT behind.

So, what actually happened? Time to read some race reports, I think, and try to catch the race highlights...