Saturday, November 1, 2014

Hamilton Chases Record Streak of Consecutive F1 Wins in the Same Season By a British Driver

So it seems that in the run up to the United States Grand Prix at the Circuit of the Americas, Lewis Hamilton "is attempting to become the first British driver since Nigel Mansell in 1992 to win five races in a row" (Guardian, "Futures of Force India and Sauber become subject of speculation").

One of the new chapters I pushed yesterday to Wrangling F1 Data With R covers "streakiness", so I thought I could try to use the routines described there to review in season streaks of length five or more from previous seasons using data from the ergast database.

As a first (optimisation) pass, I thought I'd identify British drivers who have won 5 or more races in a season; this could then be followed by looking for streaks of 5 or more wins by those drivers within their multiple-win seasons.

Firstly, we can get the drivers of a particular nationality with multiple wins within a season by querying the ergast database using a query along the lines of: = dbGetQuery(ergastdb,
 'SELECT driverRef, d.driverId, nationality, MAX(wins), year 
 FROM driverStandings ds JOIN races r JOIN drivers d 
 WHERE ds.raceId=r.raceId AND ds.driverId=d.driverId 
 AND ds.driverId IN (SELECT DISTINCT driverId FROM drivers WHERE nationality="British") 
 GROUP by year,d.driverId 
 HAVING MAX(wins)>=5')

This gives us a set of results of the form:

    driverRef driverId nationality MAX(wins) year
1       clark      373     British         7 1963
2       clark      373     British         6 1965
3     stewart      328     British         6 1969
4     stewart      328     British         6 1971
5     stewart      328     British         5 1973
6        hunt      231     British         6 1976
7     mansell       95     British         5 1986
8     mansell       95     British         6 1987
9     mansell       95     British         5 1991
10    mansell       95     British         9 1992
11 damon_hill       71     British         6 1994
12 damon_hill       71     British         8 1996
13   hamilton        1     British         5 2008
14     button       18     British         6 2009

We can then generate streak reports for each of those drivers in each of those years, identifying the follow streaks of 5 wins or more within a season by a British driver using the streakReview() function:

ddply(,.(driverRef,year),function(x) streakReview(x$driverRef,length=5,topN=1,years=x$year,typ=1))

  driverRef year start end l                startc                          endc starty
1     clark 1965     1   6 6 Prince George Circuit                   N├╝rburgring   1965
2   mansell 1992     1   5 5               Kyalami Autodromo Enzo e Dino Ferrari   1992
  endy brokenbyy                    brokenbyc
1 1965      1965 Autodromo Nazionale di Monza
2 1992      1992            Circuit de Monaco

  • In 1965, Jim Clark won the first 6 rounds of the season, starting with a win at Prince George Circuit with the last win in the streak at the N├╝rburgring.
  • In 1992, Nigel Mansell won the first five rounds of the season, starting at Kyalami with the final win of the streak at Autodromo Enzo e Dino Ferrari.

For more detailed code examples on wrangling Formula One data with R, see the Wrangling F1 Data With R book.

No comments:

Post a Comment

There seem to be a few issues with posting comments. I think you need to preview your comment before you can submit it... Any problems, send me a message on twitter: @psychemedia