## Thursday, January 29, 2015

### Calculating Track Position from Laptime Data

When all the cars in a race are on the same lap, the track position of each car and their race positions are all in sequence. However, as cars start to get lapped, the order in which the cars cross the start/finish line (the track position) may bear little, if any, resemblance to their race positions.

So how can we capture the track position of each car - that is, the order in which they cross the start/finish?

The timing sheets published via the FIA website include a Race History Chart that tabulates the order in which cars pass the start/finish line relative to the laps completed by the current leader of the race. As the example below shows, if the leader laps a car on any given lead lap, the passed car does not have a time recorded for the previous leader lap because it did not complete that lap.

Unfortunately, the FIA don't release the timing sheets as data, preferring instead to use immutable PDF documents. (That doesn't mean we can't scrape the data of course...)

So how might we generate the track position given data we do have ready access to? The ergast database, for example, published lap time information - so can we use that to recreate track positions? Indeed we can...

One observation we might make is that a race track is a closed circuit; the second that the accumulated race time to date is the same for each driver, given that they all start the race at the same time. (The race clock is not started as each driver passes the start finish line - the race clock starts when the lights go green. To this extent, drivers lower placed on the grid server a positional time penalty compared to cars further up grid. This effective time penalty corresponds to the time it takes a lower placed car to physically get as far up the track as the cars in the higher placed grid positions.)

If we get hold of all of the lap time data for a particular race, with laptimes described in a milliseconds column,  we can find the track position of a car in the following way.

First, identify which leader’s lap each driver is on and then use this as the basis for deciding whether a car is on the same lap - or a different one - compared with any car immediately ahead or behind on track. One way of doing this is on the basis of accumulated race time. If we order the drivers by the accumulated race time, and flag whether or not a particular driver is the leader on particular lap, we can count the accumulated number of “lap leader” flags to give us the current lead lap count irrespective of how many laps a given driver has completed.

library(plyr)

#For each driver, calculate their accumulated race time at the end of each lap
lapTimes=ddply(lapTimes, .(driverId), transform,
acctime=cumsum(milliseconds))

#Order the rows by accumulated lap time
lapTimes=arrange(lapTimes,acctime)
#This ordering need not necessarily respect the ordering by lap.

#Flag the leader of a given lap - this will be the first row in new leader lap block

This gives a result of the form:

## 1              button    TRUE
## 2            hamilton   FALSE
## 3  michael_schumacher   FALSE
## 22             button    TRUE
## 23           hamilton   FALSE

A Boolean TRUE value has numeric value 1, a Boolean FALSE numeric value 0.

#Calculate a rolling count of leader lap flags.
#Recall that the cars are ordered by accumulated race time.
#The accumulated count of leader flags is the lead lap number each driver is on.

So when we count the flags, we get something like this:

## 1              button       1
## 2            hamilton       1
## 3  michael_schumacher       1
## 22             button       2
## 23           hamilton       2
## 24 michael_schumacher       2

Let’s now calculate the track position for a given lead lap, where the leader in a given lap is in both race position and track position 1, the second car through the start/finish line is in track position 2 (irrespective of their race position), and so on. (In your mind’s eye, you might imagine the cars passing the finish line to complete each lap, first the race leader, then either car in second, or a lapped back marker, and so on.) Specifically, we group by leadlap and then accumulated race time within that lap, and assign track positions in incremental order.

trackpos=1:length(position))

We now have track - as well as race - positions:

##     code lap position acctime leadlap trackpos
## 616  BUT  33        1 3100735      33        1
## 617  HAM  33        2 3111538      33        2
## 618  VET  33        3 3113745      33        3
## 619  SEN  32       16 3115035      33        4
## 620  RIC  32       17 3115829      33        5
## 621  ALO  33        4 3125951      33        6
## 622  WEB  33        5 3131009      33        7
## 623  MAL  33        6 3133006      33        8
## 624  RAI  33        7 3141269      33        9
## 625  KOB  33        8 3147051      33       10
## 626  GLO  32       18 3150703      33       11
## 627  PER  33        9 3153818      33       12
## 628  ROS  33       10 3159053      33       13
## 629  VER  33       11 3162088      33       14
## 630  DIR  33       12 3172712      33       15
## 631  MAS  33       13 3177681      33       16
## 632  PET  33       14 3184974      33       17
## 633  PIC  32       19 3186685      33       18
## 634  KOV  33       15 3188375      33       19

In this example, we see Timo Glock (GLO) has only completed 32 laps compared to 33 for the race leader and the majority of the field. On track, he is placed between Kobyashi (KOB) and Perez (PER).

This code will form part of forthcoming chapter in the Wrangling F1 Data With R book, initially in a chapter that revisits an old idea: battle charts.