Tuesday, December 16, 2014

Spotting Contested Positions in F1 Races Using Graph Theory

In the latest update to the Wrangling F1 Data With R book, I posted a recipe describing how to automatically identify the positions being contested in a race (which could equally be the championship race) by virtue of positions that had changed hands lap on lap.

The method comes via an answer to a question posted on Stack Overflow about how to spot disjoint sets of grouped items in a list. The trick is to construct a graph in which edges are placed between elements in each subset, and then clusters identified from the whole set of items.

So for example, in this fragment of a lap chart showing race positions going from one lap to another, we see several position changes:


Many of the drivers do not change position at all, but there are position changes between four distinct groups of drivers: those in 1st and 2nd; those in 4th, 5th and 6th; those in 9th and 10th; and those in 17th, 18th and 19th.

If we connect nodes in a graph for each driver going from the position they held in the previous lap to the position they hold in the current graph (and ignore drivers that didn't change position), we get the following groupings:


Notice how the nodes - representing positions - are connected to each other by arrows, showing how a car placed in one position moved to another position. So for example, we see that the cars in positions 9 and 10 changed place with each other, as did those in positions 1 and 2. The car in 19th went to 18th, the one in 18th to 17th, and the one in 17th fell back to 19th. And so so.

The chapter containing the code for constructing the graph and partitioning it into separate clusters can currently be found as part of the preview for the Wrangling F1 Data With R book... but I'm not sure how long it will remain so...

See also: OUseful.info - Identifying Position Change Groupings in Rank Ordered Lists

Saturday, December 13, 2014

Career Comparison - Championship Position vs Age - Jenson Button and Fernando Alonso

This week finally saw the announcement of Alonso's move to McLaren and the retention of Jenson Button, so with the driver line up sorted there, how do these drivers compare in terms of the their F1 careers?

The following diagrams plot each driver's season standings for each year they've spent in F1 up to the end of the 2013 season against age (Button in blue, Alonso in red), along with the team they were driving for at the time.

The best fit lines represent linear, quadratic and cubic performance models respectively, of the form pos ~ I(age-30) +I( (age-30)^2 ) + I( (age-30)^3 ).



The confidence limits around each line show how variable Button's career has been compared to Alonso's more consistent career trajectory.

These charts were generated using code described in the "Career Trajectory" chapter of Wrangling F1 Data With R.

Tuesday, December 2, 2014

Position Change Charts

Inspired by an old Joe Saward post on lap charts I had a quick doodle around the notion of position change charts that plot the names of drivers against laps on just the laps where their position at the end of the lap was different to the position on the previous lap.
This chart shows just the position changes for each driver over the course of the race; the leftmost labels correspond to grid positions. The trick to reading this chart is to look left from a driver label to the previous occurrence if the same label: this position gives the position from which the change too place. The intervening gap is the length of time that driver held the position to the left. Emphasising pit stop laps though the use of italics, for example, would add further richness to this chart.

For a complete description of how to generate this chart using data from the ergast API, see the Wrangling F1 Data With R book.

Sunday, November 23, 2014

Maximising Team Points Hauls

With the final race of the 2014 season run, the Mercedes drivers' battle for the Drivers' Championship over, and the future of McLaren's drivers still uncertain, now may be a good time to ask how well the drivers supported each other in terms of maximising team points haul.

Let's start with the Mercedes. The following charts shows how the drivers fared in terms of ranked position in each race, and points taken. The coloured drop line identifies which driver had the upper hand and also clearly indicates how far apart the drivers were.





In terms of points, the team's points haul across the rounds of the 2014 championship can be summarised using the following chart (final race points have been halved for the purposes of this chart):


The horizontal x-axis shows the number of points taken in a particular race by the highest placed driver in the team. The vertical y-axis is shows the number of points taken in a corresponding race by the lower placed team-mate. The red line is the points maximisation line - points on the line show that the team maximised points in a race given the position of the highest placed driver in the team.

The numbers represent a count of races where a particular points combination occurred. The circle is size proportionate to this value.

If we split the drivers out and generate co-ordinate points based on the points taken across the driver pairing for each race, we get the following style of chart.


This time, we have two guides representing the points support each driver offers the other. Marks away from the dotted line show how far away a driver was from maximising the team points haul based on the the points taken by the higher placed driver in the team. If there are lots of marks in the lower right half of the chart, the driver on the vertical y-axis is the underperformer. If the marks appear in the top left half, the driver  identified on the horizontal x-axis is the underperformer. Marks on the red dotted line show the x-axis driver was better placed, but team points were maximised. Conversely, marks on the blue dotted line show the y-axis driver was higher placed, but again, given that position, team points we maximised. If the team always maximised points, the magenta best fit line would be within the two dotted lines.

Here are the corresponding charts for McLaren.









These charts are working sketches and are likely to appear in some form in the Wrangling F1 Data With R book. Data used to generate the charts was obtained from the ergast API.


Friday, November 21, 2014

Lap Position Count Charts

Whilst putting together a simple routine to calculate the number of laps led by each driver from the ergast data, it struck me that we could count - and chart - the number of racing laps held in a particular position by each driver in a particular race.

The following chart summarises the 2012 Australian Grand Prix in this way.

#Count the number of laps each driver held each position for
posCounts=ddply(lapTimes,.(driverRef,position),summarise,poscount=length(lap))

#Set the transparency relative to the proportion of the race in each position
alpha=function(x) 100*x/max(lapTimes$lap)
#Rotate the x-tick labels
xRotn=function(s=7) theme(axis.text.x=element_text(angle=-90,size=s))

g=ggplot(posCounts)
#For each driver, plot the number of laps in each race position
g=g+geom_text(aes(x=driverRef,y=position,label=poscount,alpha=alpha(poscount)),size=4)
g+theme_bw()+xRotn()+xlab(NULL)+ylab(NULL)

Drivers are aligned along the bottom according to rank position at the end of the race. (Drivers who were unclassified are ranked according to how far into the race they got, and what position they were in when they retired in the case of two or more unclassified drivers having gone on on the same lap.

The number shows the number of laps completed in each race position; the transparency level is also indicative of this value. The pink circle shows the position the driver was in on their last lap, it's size proportional to the total number of laps the driver completed in the race. The empty grey circles show the drivers' grid positions.

Where the pink circle is off the diagonal, it shows that a driver was in a higher position at the point they exited the race than they were finally classified at. The larger the red circle, the closer they were to the end of the race at the point they left it. So for example in this case, we see that Maldonado appears to have been in 6th position quite deep into the race, despite being ranked 13th in the end.

The large lap counts shared by Vettel and Hamilton for second and third position don't tell us hwo these were distributed - was Hamilton in second during a large part of the race, for example, then ceding to Vettel for the latter half of the race, or were they continually changing positions in a hard fought fight? To distinguish that, we would need to look to the actual lapchart, or another metric.

A couple of other summary details that are missing from the chart include a total lap count for each driver, and an indication of the actual final classification of each driver (eg to distinguish those drivers that were unclassified).

The full recipe for creating this chart from data obtained from the ergast database can be found in the Wrangling F1 Data With R book.

Monday, November 17, 2014

F1 Drivers' Championship Showdown, 2014

I tweaked the code for my F1 Drivers' Championship winning combinations explorer to show how many points each driver could win - or lose - by in Abu Dhabi, assuming I've got my sums right.

So if Rosberg wins, he loses the championship by 3 points if Hamilton comes second, but takes it by 3 if Hamilton comes in third.

If Hamilton fails to finish, and Rosberg comes in 6th, Rosberg loses it by a single point. If Hamilton is 10th, and Rosberg 5th, or if Hamilton is 7th and Rosberg is third, Hamilton wins by a single point. And so on.

See the interactive version here.

Elements of this recipe may form part of a forthcoming chapter in the Wrangling F1 Data With R book.

Saturday, November 8, 2014

F1 2014 Championship Race - Round 17 Results

In the previous post I demonstrated a couple of charts that showed the evolution of the drivers' championship race up to round 17. In this post, I increase the information density of the lapchart styled display in two different ways through the use of text annotations.

As before, the data comes from the ergast API:

#Load in the core utility functions to access ergast API
source('ergastR-core.R')

#Get the standings after each of the first 16 rounds of the 2014 season
df=data.frame()
for (j in seq(1,17)){
  dft=seasonStandings(2014,j)
  dft$round=j
  df=rbind(df,dft)
}
#Data is now in: df

The returned data contains the championship standing, and points to date, for each driver at the end of each round. We can derive further data elements from it:

#Sort the data by ascending round and position
df=arrange(df,round,pos)
#Find how many points ahead of the driver behind each driver is
df=ddply(df,.(round),transform,diffbehind=diff(c(points[[1]],points)))

#Sort by ascending round and descending position
df=arrange(df,round,desc(pos))
#Find how many points behind the driver ahead each driver is 
df=ddply(df,.(round),transform,diff=diff(c(points[[1]],points)))
#Derive how many points each driver scored in each race
df=ddply(df,.(driverId,year),transform,racepoints=diff(c(0,points)))

As before, we can generate a base chart:

library(ggplot2)
library(directlabels)

#The base chart
g=ggplot(df,aes(x=round,y=pos,group=driverId))

charter=function(g) {
  g=g+geom_line()
  #Remove axis labels and colour legend
  g=g+ylab(NULL)+xlab(NULL)+guides(color=FALSE)
  #Add a title
  g=g+ggtitle("F1 Drivers' Championship Race, 2014")
  #Add the line labels, resized (cex), and with an x-value offset
  g=g+geom_dl(aes(label=driverId),list("last.points",cex=0.7,dl.trans(x=x+0.2)))
  #Add right hand side padding to the chart so the labels don't overflow
  g=g+scale_x_continuous(limits=c(1,20))
  g
}

g=charter(g)

Let's annotate the chart - firstly with data showing the number of points gained at each race. As previously, crossed lines show changes in championship standing between consecutive rounds:

g+geom_text(data=df,aes(label=racepoints),vjust=-0.4,size=3)


That's okay, insofar as it goes, but we could perhaps add in colour relative to the number of points scored in each race to highlight the higher values a little more clearly.

g+geom_text(data=df,aes(label=racepoints,col=racepoints),vjust=-0.4,size=3)


The default colour scheme scales from black to light blue. The higher values look a little washed out to me, making me think it might be worth exploring other colour mappings to highlight the higher values more clearly.

Annotating the chart with points scored per race helps us see how well each driver fared in a particular race, but the chart does not give us a sense of how many points separate drivers in the championship standings at the end of each round. We can address this by using the total number of championship points scored to date as the text label, preserving the an indication of the number of points awarded for each race by using the colour dimension.

g+geom_text(data=df,aes(label=points,col=racepoints),vjust=-0.4,size=3)+scale_color_continuous(high='red')


Looking down a column, we can compare the number of points separating drivers in the drivers championship at the end of each round. From the colour field we can see how drivers placed next to each other compared in terms of points awarded in each round. Looking along a line, we can (if necessary) calculate the number of points obtained in a particular round as a simple subtraction.

Elements of this recipe may form part of a forthcoming chapter in the Wrangling F1 Data With R book.