Monday, October 14, 2013

QB's Don't Have (Passing) Rhythm

While I have generally been able to maintain a semi-weekly update schedule, the next few months are going to be quite busy (that thesis isn't going to write itself). I will try to keep updating regularly, but I'm not making any promises. Feel free to check the blog regularly to see if there are updates, but if you'd rather use a more efficient method to see when I update put your email address into the box on the right (under 'Get Email Updates') and you'll automagically get an email each time I make a new post. If you're more into twitter, following me @PhDfootball will also let you know when I post (and has the added bonus of giving you direct access to all of my delightfully informative thoughts and comments). Regular updates will hopefully resume after the new year, when the title of this blog will be even more accurate!
'Rhythm' is an often-used buzzword in football circles, especially pertaining to a quarterback who is known for being inconsistent. To take a quantitative look at this concept I break down each pass as a function of the one thrown before it, looking for evidence that completing a pass can jump-start a passer into completing more. While this analysis is admittedly superficial, it's a good starting point to tackling this subject. Ultimately, there is no evidence that one pass completion begets another, an argument against the idea that QBs can get into a rhythm.

Last season the Jets tried to run an offense involving two quarterbacks, with Mark Sanchez running the regular offense and Tim Tebow coming in to run wildcat-style plays. This was an unarguable failure.

A common reason given by announcers and sportswriters for this unconventional scheme's lack of success was that it never allowed one quarterback to "get into rhythm." That certainly seemed true enough; several times Sanchez would complete a couple of nice passes, then Tebow would come in and run for a few yards, then the drive would stall out once Sanchez came back in.

This is, of course, the same reason given for the failure of Tom Landry's plan to let Craig Morton and Roger Staubach alternate snaps for an entire game (a loss) during the 1971 season.

As usual, there's never any attempt by the announcers to explain what 'rhythm' is or how to tell if a quarterback is in it; this wishy-washy term is generally used as a catch-all to explain why a signal-caller is (or isn't) playing well.

But maybe there is some truth to the idea. There is plenty of anecdotal (and some scientific!) evidence for players getting into 'the zone' during a game, which certainly sounds similar to the concept of 'rhythm'. And football commentators have been using the term for as long as I can remember without any pushback or criticism.

Let's take a look at the concept of QB rhythm (I'll drop the pretentious quoting from this point onward), first attempting to define it in a quantitative manner and then looking at data to determine its validity.

For this experiment I need play-by-play data, which (as usual) comes from Armchair Analysis. Next we need to quantify what statistics could be employed to quantify how much a QB is in rhythm. What would be an observable of a quarterback in rhythm?

The obvious choice is completions. Generally a QB who is in rhythm should be completing several passes in a row, while you would expect a passer who is out of rhythm to be very scattershot. It's difficult to look at completion streaks, as drives can be of variable lengths and we could accidentally bias ourselves towards looking only at very good quarterbacks, who are more likely to have long completion streaks in the first place.

Therefore we'll look at the effect a completion has on just the next pass. While not perfect, this will at least minimize the risk of bias. Additionally, to avoid including situations where one team is being blown out and throwing every play, we'll only include data from the first three quarters of games.

First of all, over the entire sample the completion percentage is a healthy but unspectacular 56.8%. If a quarterback can get into rhythm by completing passes, we'd expect the overall completion percentage on passes attempted after a completed pass to be higher than this overall figure.

Interestingly, it turns out that the opposite is true. If you only look at plays directly after a pass, NFL QBs have a completion percentage of 56.2%. If you loosen your restrictions and check the completion rate specifically for the next pass (even if there may have been several runs in between),  the completion percentage is 56.3%.

Now, it might be that our data are somewhat biased to lower completion percentages because we have to throw out the first completion of each drive. Therefore it might be that we should expect a slightly lower completion percentage than the total 56.8% figure.

To check this possibility I did 1000 random resamplings of the data, keeping the drive data constant but shuffling the type of play (and the result). For both scenarios this test produced completion percentages 56.8+/-0.2%, exactly the same as the overall completion percentage. So if anything, completing their previous pass seems to make quarterbacks more likely to misfire on their next.

Discussion and Conclusions
So what gives? While I'll be the first to admit that this analysis is by no means perfect, it seems pretty clear that this line of inquiry doesn't show any evidence for getting into rhythms. At the very least we can now say that just because a QB has completed a couple passes in a row he's not about to keep up the trend.

One important caveat, especially for the Tebow-Sanchez and Morton-Staubach situations, is that this analysis covers drives where, the vast majority of the time, the QB stayed on the field for every play. Even for wildcat plays the quarterback usually lines up at wide receiver rather than going to the bench - in this way the surprise of the playcall is preserved until the offense breaks their huddle.

With the data currently at my disposal, I can't distinguish between plays where the QB is on the field and those where he is not. Even with that information, there are so few instances where the QB does leave the field during a drive that finding any signal amongst the noise would likely be impossible.

Despite these (very reasonable) concerns, the case against QB rhythm seems fairly strong. While I could believe that quarterbacks get into zones over the course of a season, it doesn't appear to happen on a drive-by-drive basis.

Monday, September 30, 2013

Not All Fumbles Are Created Equal

A fumble can be a key play in a football game, where just a single turnover can be the difference between a win and a loss. Recovering a fumble is therefore a hugely critical act. While the recovery itself is at least a mostly random event, the location of the fumble can significantly alter the odds that the defense will recover it. Fumbles behind the line of scrimmage are more likely to be recovered by the offense, while fumbles after a successful rush or pass are more likely to get scooped up by the defense.

Nothing in football can change the momentum of a game faster than a turnover. A positive turnover differential is highly correlated with winning, so it's no wonder that teams are constantly talking about making fewer of them. While interceptions are generally directly caused by poor decision-making by the quarterback, the apparent random nature of fumbles makes them so much more exciting (and vexing, when your team is the one doing the fumbling).

Of course, fumbles aren't really random. Usually a player doesn't just accidentally drop the football, and defensive players are taught to hold offensive players up while their teammates attack the ball. However, the act of recovering a fumble is generally considered to be a random event, one that is entirely based on luck. (I'm not quite as convinced of this assertion as the sites I just linked; I've seen too many players try to pick the ball up when the should have fallen on it, or fall on it only to have the ball squirt away. But testing this is not the focus of this post so I'll leave it be for now.)

It's important to recognize that this does not mean that all fumbles have the same probability of being recovered by a certain team—you wouldn't want to use fumble recoveries as a random number generator, for instance. The more players on the defense near the fumble, the more likely one will make the recovery. Conversely, if only the fumbling player is aware that he's fumbled (such as on the quarterback-running back exchange), the offense will be more likely to recover. By this logic, a team's chance of recovering a fumble should be strongly dependent on where the fumble occurs relative to the line of scrimmage.

Data come from the Armchair Analysis database, which I queried for all plays which resulted in fumbles, as well as all subsequent plays (to determine whether the fumbling team maintained possession). To avoid potential errors in this method of determining the recovering team, I excluded fumbles occurring on fourth down. To avoid biases from teams altering their strategy at the end of a half, I only used data from the first and third quarters. As usual all errors are bootstrapped.

First I selected only fumbles made by offensive players—specifically QBs, RBs, and WRs (I lumped TEs in with the wide receivers). From here, computing the fraction of fumbles recovered by the defense is relatively simple, and it turns out that overall the defense recovers 54.80±1.002% of all offensive fumbles—slightly (but statistically significantly) more than half. This is not hugely surprising, given that the defense is much more focused on whoever has the ball than offensive players are.
Figure 1: Breakdown of fumble recovery probability as a function of position relative to the line of scrimmage. The horizontal red bar shows the overall defensive fumble recovery rate, while the bins are shaded proportionally to which offensive positions are responsible for the fumbles.

Figure 1 shows the defense's fumble recovery rate as a function of field position, split up into bins with roughly even numbers of fumbles per bin to maintain a constant signal-to-noise ratio. The histogram has also been split up into positions, showing who is responsible for the lost fumbles.

The most striking feature in Figure 1 is the clear dichotomy between fumbles that occur just behind the line of scrimmage and the ones that occur after positive yards have been gained. This makes intuitive sense: most of the fumbles behind the line of scrimmage are likely occurring in the center-quarterback or quarterback-running back exchange, which happen before the defense has had a chance to get into the backfield. (The uptick in defensive recoveries more than ~10 yards behind the line of scrimmage is almost certainly due to strip-sacks.) Once the offense gets beyond the line of scrimmage, however, most fumbles are going to be directly caused by the defense, in a region of the field where defensive players greatly outnumber the offense.

Interestingly, a fumble on a very successful play, one that gains more than 20 yards, isn't more likely to be recovered by the defense than the average play. I'm not sure exactly why that is, but it may be that a larger proportion of long plays end up near a sideline, and therefore any fumbles have a higher likelihood of going out of bounds.  Since in this analysis a fumble out of bounds counts as an offensive recovery, it could be artificially depressing the defensive recovery rate.

Discussion and Conclusions
It's clear that the location of a fumble is of significant importance, as there is a ~20% swing in a defense's chance of recovery with just a few yards' change in position. A quarterback that drops the snap from the center will generally only be responsible for a wasted down, but a receiver who catches a 5-yard quick slant and can't hold on is likely to be the direct cause of a turnover. It's no wonder that running backs who fumble rarely last very long in the NFL; most of their runs will end up right in the range where the D is most likely to come up with a recovery.

Monday, September 16, 2013

Penalties II: Crowd Noise

Crowd noise is generally considered to be a contributing factor in causing false-start penalties on visiting teams. Some stadiums are well-known for focusing deafening amounts of noise on the field, usually near the endzones. To determine if crowd noise really does cause false-starts, I compared the discrepancy between false-starts called on the home and away team as a function of distance from an endzone. While distance from midfield and likelihood of visiting team false-start penalties is correlated, it is not strongly significant.

In Part I of my investigation into penalties, I found that there was a statistically significant discrepancy in the number of false-start penalties between the home and away teams. At the time I chalked this result up to crowd noise and focused my analysis on other types of penalties, but I later realized that, while frequently quoted as fact, I've never seen any hard evidence on the subject.

If crowd noise does affect a visiting offense's snap count, it's logical to expect that the effect will be largest when fans surround the field on three sides—near the endzones. This provides a way to isolate crowd noise from other variables surrounding false-starts, e.g. the possibility that traveling to away games makes teams more prone to jumping before the snap, or that referees are somehow biased even for such apparently cut-and-dried calls.

As usual the data come from Armchair Analysis. For this project I created a new table containing information on the field location of all plays as well as whether a penalty was called—because of the huge number of plays in the database this query took ~8 hours and therefore was not feasible to do on the fly during the analysis.

The percentage of plays that result in false-start penalties as a function of field position is shown in Figure 1. I'm not sure what's going on when the offense is backed up by their own endzone; I've checked the data and didn't find anything out of the ordinary. It's possible this is due to the relatively small number of samples that close to the goal line—regardless of the cause, it doesn't appear to affect the rest of the data, so I will simply exclude this bin from the remainder of the analysis.
Figure 1: Percentage of plays which result in false-starts as a function of distance from the offense's goal line. Black points are for the home team, while red points show penalties committed by the away team. X error bars show the range of yards included in each bin, while y error bars are the bootstrapped uncertainties.
At just about every point on the field the away team commits more false-starts, which is unsurprising given what we already knew. If you squint just right, however, it does appear the away team gains 'penalty parity' around midfield.

As always, trusting your eyes is a poor way to do statistics. Therefore I ran a correlation analysis, folding the data at the 50 to test raw distance away from the nearest endzone rather than position on the field. The result is a fairly weak correlation (Spearman ρ of 0.28) that is far from statistically significant (p-value of 0.24).

Discussion and Conclusions
Our eyes do lie, apparently, at least about the statistical significance of the correlation. It's difficult to figure out what to make of this (non) result—the original hypothesis certainly seemed reasonable, and I don't believe that crowd noise has zero effect on false-starts. You can certainly make a strong argument for a correlation based on watching some of these false starts happen.

So why no significant correlation? Well, it certainly is possible that crowd noise actually isn't playing a huge role after all, although I don't have another explanation for why referees would be more likely to call a false-start on the visiting team. It's also possible that my original assumption, that crowd noise is amplified near endzones, is incorrect. Another option is to look at Figure 1 and note that the correlation appears to be more significant in the offense's own end of the field: maybe fans are more vocal when the away team is starting a drive, but as the offense moves down the field they get quieter.

Ultimately, the only way to know for sure just how crowd noise affects the game would be to attach sound meters to the players. The NFL may already do this, for all I know; they already mic the players and coaches, so at the very least crowd noise information should be (in principle) recoverable from these recordings. If I ever get my hands on this sort of data I will certainly try this sort of analysis again, but for now the stats I have on hand are are just too rough to make any definitive conclusions.

Monday, September 2, 2013

Quarterback Rating II: Let the Rookie Sit

Many franchises use high draft picks on quarterbacks, rightly understanding their importance to a well functioning team. There is enormous pressure to start these players right away, but is that a good idea? Based on a player's peak quarterback rating, the answer appears to be no. Whether this is due to the pressure of the job breaking fragile young QBs or because the teams most likely to start a rookie passer are also most likely to have other problems is unclear, but either way indicates that teams should be wary about throwing young quarterbacks right into the fire.

As mentioned in my last post, the quarterback is the focal point of every NFL offense; all offensive plays run through his hands. Unsurprisingly, teams place heavy emphasis on the selection and training of talent at the position. Promising (and even some not-so-promising) quarterbacks are hot commodities—since 1990 nearly 60% of first overall picks in the NFL draft have been QBs.

The pressure on these passers is intense, especially from teams which expect immediate production from their rookie signal-caller. Even teams who intend to let the new QB learn from the bench for his first year frequently find their plans changed by injuries or pressure from fans.

But is it good for young quarterbacks to get rushed into starting roles like this? There are certainly QBs that find success after starting as rookies - Peyton Manning comes to mind, and Russell Wilson and Robert Griffin III certainly seem to be in good shape. But for each success story there are plenty of high-profile failures.

Certainly for at least some of those quarterbacks there are other good reasons for why they didn't live up to expectations, but considering how important the position is the ratio of successes to failures seems frighteningly low. Of course, on this blog gut feeling isn't good enough; let's see what we can prove.

Data once again comes from Armchair Analysis using the same queries as in the last post to compute seasonal QB ratings for every passer in the database.

It is important to note that while a truly heroic feat of data collection and organization, the Armchair Analysis database is not perfect. While inconsistencies appear to be minor (at worst) for most of the statistics, there seem to be somewhat larger issues with data on when players were drafted.

For instance, Carson Palmer is listed as being a rookie in 2004, but was actually drafted in 2003. The database also doesn't handle players taken in the supplemental draft very well. Overall, however, the data quality seems to be very good, and I'm confident that the results are not significantly biased by any typographical mistakes.

In order to produce as unbiased a sample as possible, I restricted the investigation to quarterbacks who have at least four seasons, including their rookie season, in the database. Additionally, the quarterback must have thrown more than 150 passes in at least one of those seasons.

Determining a reliable measure for quarterback skill is a non-trivial task; ESPN's Total Quarterback Rating involves "several thousand lines of code" and the website I just linked implies that advanced computational techniques, such as machine vision, are involved. I (sadly) don't have the amount of time necessary to do something this complex, so I'll be sticking to the regular old-fashioned QB rating.

An additional roadblock comes from grading quarterbacks over their careers. As it turns out, a QB's passer rating in one season is a surprisingly poor predictor of their rating in the following season.

While it's clear that QB rating is an imperfect measure of a passer's skill, it still works reasonably well as an overall gauge of competence at the position. To try to avoid the year-to-year issues I'll only look at a quarterback's peak QB rating—their absolute best season.

Alright, that was a lot of explanation, now let's get to the good stuff. Figure 1 shows a histogram of the maximum QB ratings of the entire sample (in gray). The majority of signal-callers in the sample have peak QB ratings between ~75 and ~90, with an average peak QB rating of 83. But there's a fair amount of variance in the sample, from 55 (Mike McMahon, who managed to start seven games for the ill-fated 2005 Eagles) to 118 (2011 Aaron Rodgers, who actually got a rating of 123 in the regular season but this data includes his poor performance in the Packers' playoff loss).

Figure 1: Peak QB ratings of the sample.
I've broken this sample down in two ways. First, I've selected all quarterbacks who threw 150+ passes in their rookie year (purple histogram). I next select QBs who have met the passing criterion for at least 4 seasons (gold histogram). While not perfect, passers who stick around in the NFL for several seasons are going the be the best and most reliable quarterbacks, so length-of-tenure is a good proxy for the most skilled playeres.

The results are notable—the 4+ year starters have a uniformly higher peak QB rating than the group who saw significant action as rookies. In fact, the veteran QBs are responsible for all seasons with a QB rating above ~90, while no passer is given 4+ years in the league as a starter without at least one rating above 70.

Not only are these two distributions notably different, they're significantly dissimilar. The standard test to see if two samples of data come from the same underlying distribution is the Kolmogorov-Smirnov test (usually abbreviated as the KS test). This test says that the two sub-samples are distinct with only an 8% chance of error. However, we know that the rookie starter and long-tenured QB distributions actually do come from the same parent distribution. In addition, there is clearly overlap between the two sub-distributions.

The net result of these facts is that there is an even smaller chance of error than the KS test would indicate. A full Monte Carlo simulation indicates that there is actually a 99.3% certainty that these two distributions are distinct.

Discussion and Conclusions
This result, in its barest form, means that the conditions that result in a quarterback starting as a rookie are different from those that lead to success in the NFL. Note that it does not necessarily mean that a rookie quarterback will not have a long, productive career; there is some overlap between the two sub-distributions. Nor does it mean that the solution to the problem is to make sure all rookie QBs stay far away from the field; the teams that are in a position to redshirt their first-year passers are also the ones most likely to already be in better position to protect their investments when they do make it to the field—for instance the Packers with Aaron Rodgers or the Patriots with Ryan Mallett.

But despite these caveats, these findings are still quite interesting, and indicate that quarterbacks who see significant rookie action tend to have lower ceilings than signal-callers who are rested at the start of their careers. This result indicates that teams who are looking to use a high draft pick on a (hopefully) franchise quarterback should resist the urge to play him right away, and maybe consider upgrading their other positions of need first before drafting their QB of the future. So the next time your favorite team passes on a flashy gun-slinger in order to draft a boring left tackle, don't judge them too harshly.

Monday, August 19, 2013

Quarterback Rating I: Year-to-Year Progression

Using quarterback ratings I've charted out a QB's average improvement from his first season as the starter. On average a QB sees only a minor ~10-point rating boost in his second year, with his rating remaining flat (or lower) for the rest of his career. Additionally, very few (~20%) players will ever have a season with a QB rating more than 20 points higher than their first year. These results indicate that a quarterback's first season is a reliable indicator of their future success, and that passers who struggle in the early stages of their career are unlikely to show significant long-term improvement.

As the guy responsible for handling the ball on every single offensive play, the quarterback is unambiguously the most important player on a team. So when a team drafts a new quarterback the pressure is extremely high - both on the player to perform to expectations and on the management to ensure they're getting a good return on their (significant!) investment.

In recent years QBs have been asked to step in and start as rookies with increasing frequency. Last year saw a record 5 rookie signal-callers taking the majority of their team's snaps. While this year's draft appears to have a definite lack of QBs ready to start immediately, it's a virtual certainty that a few desperate teams will roll the dice on their shiny new gunslingers.

With the importance of the quarterback position and proliferation of young, untested starters, it's critical for teams to accurately evaluate QBs, not only as college prospects but even while they're playing in the NFL. While there is clearly worth in exploring how quarterback talent is evaulated for the NFL draft, the sheer number of college teams, and the limited opportunities given for the best players to play against each other, make it very difficult to perform such an analysis without more advanced tools.

Fortunately, charting the progression of quarterbacks once they enter the NFL is also interesting, and somewhat easier due to the small number of teams and the high level of competition. A good manager is always watching how their players progress, and it's highly relevant to know whether a struggling QB is merely inexperienced or a hopeless cause. There are several ways to dig into this topic: for now I'll focus on computing the average year-to-year progression of NFL quarterbacks as a general barometer for how a QB should be expected to develop.

As usual the data come from the Armchair Analysis database. I first queried the database for the identifiying information for all QBs, then fed that into a query which returned all game stats for each QB. From there season totals were computed.

Finally, the seasonal QB rating for each quarterback was determined. Because the QB rating can be highly biased if a passer only has a small number of attempts in a given year, I only took ratings from seasons in which the QB threw at least 150 passes.

A (relatively) simple way to track a signal-caller's improvement over time is to compare their QB rating from a given season to earlier seasons. An aggregate plot comparing a passer's QB rating from later seasons to their first 'full' season (full being a season where the QB attempted at least 150 passes) is shown in Figure 1. The data are shown as black points, while the averages (and standard errors) are shown in red.
Figure 1: QB rating improvement from first season as a function of years in the league. Red points show average improvements.

While there is significant scatter, it is clear that on average a QB only shows improvement between their first and second full seasons. After that, performance stabilizes until the 7th season or so, where it begins to decrease (although the data appears to show that the few QBs who make it to their 10th season are able to maintain their improved performance).

This performance boost, at only 5-10 rating points, is moderate at best, and indicates that a quarterback's first full season is a strong indicator of their future success. Of course, this is only an average and as such somewhat of an abstraction - clearly not all QBs will follow exactly this trend.

To gain more insight into the maximum potential improvement over a quarterback's career I've plotted a histogram of peak QB rating improvement (or minimum reduction, the sad reality for some passers) in Figure 2. It's clear from this figure that the majority of signal-callers never progress beyond a 20-point improvement in QB rating, even during their best seasons, with only 20% of all passers in the sample beating this threshold1.
Figure 2: Histogram of peak QB performance compared to a player's first starting season.

Discussion and Conclusions
Even at their very best, this analysis shows that most quarterbacks shouldn't be expected to show dramatic improvement at any point during their careers, and only moderate improvement from their initial starting season. This analysis indicates that even a rookie QB's ceiling can be estimated with reasonable certainty, and has clear ramifications for evaluating quarterbacks. For instance, this is bad news for Andrew Luck (first year QB rating of 76.5), Ryan Tannehill (76.1), Jake Locker (74.0), and Brandon Weeden (72.6), who are all unlikely to ever see a triple-digit rating but are tabbed as the starters heading into 2013.

These results also lend credence to the arguments of impatient fans, who expect to see immediate results from new QBs and have no patience for any 'adjustment period', 'learning curve', or any other excuse offered by a team for a young passer's poor play. I had always assumed these fans were merely short-sighted, unwilling to wait and see how a player would develop. But now it's much more difficult to dismiss their concerns so easily.

1: The two players in the sample with a 40+ point QB rating improvement? Alex Smith and Eli Manning. 

Monday, August 5, 2013

Penalties I: Referee Bias

In addition to making the occasional blown call, multiple sources have noted that referees appear to have a subtle, pervasive, likely subconscious, home-team bias. Here I attempt to quantify that bias, using different categories of penalties to highlight any discrepancy between penalties that require no interpretation (and should not be subject to this sort of bias) and penalties that involve the judgement of the referees (and therefore would be prone to bias). I find that there is a small but statistically significant discrepancy between judgement-call penalties on the home and away teams, with the visitors getting flagged an average of ~0.1 more times per game. What is most striking about this result is not its statistical significance but how small it is, a testament to the (often overlooked) fact that NFL referees are generally quite good at their jobs.

If you watch football for long enough, eventually you'll see a play that makes you uncontrollably angry—Specifically, angry at the refs. How could they have blown that call so badly? Were they even watching the play?

This outrage, however, usually fades fairly fast—you have some reluctant understanding that what's obvious to you from the super-slo-mo replay is not as crystal clear when seen at full speed, and most individual calls/non-calls have a small impact on the final score. (Of course, there are some notable exceptions).

Individual plays such as these are so infrequent that they are not well-suited to statistical analysis. However, it is also possible that referees can be biased by the location of the game, either because the refs are from the area or are subconsciously influenced by the cheering home crowd. The NFL mitigates the former issue by rotating crews between stadiums, but what about the latter?

Unfortunately, while some work has already been done on this very issue, actual numbers on any bias appear to be thin on the internet ground. General assertions from non-open-access sources1 abound, as do people using studies of soccer(!) officiating to back up their claims about the NFL. I did run across an interesting article that attempted to quantify home/away bias in individual officiating crews, but it unfortunately suffers from a small (13 weeks) sample size and a lack of errors — is calling an average of 1.5 extra penalties on the away team a significant effect or have they just shown how noisy their data is? (The fact that the sum of each crew's 'bias' is close to zero is circumstantial evidence for the latter case).

Once again my data come from the thorough folks at Armchair Analysis. In addition to providing data on individual penalties, they also aggregate the calls into one of several helpful categories. Using their categories as jumping off points, I lumped almost all penalties in the entire data set into one of four categories:
  • Judgement: Penalties like holding, pass interference, and illegal use of hands, for both offense and defense.
  • Timing: False starts, offsides, encroachment, and neutral zone infractions.
  • Positioning: All kinds of illegal blocking penalties (e.g. blocks in the back, crackback blocks, tripping).
  • Dumb: Taunting, roughing the passer, giving him the business, etc.

I split up the penalty data into home and away bins, then computed the average number of penalties per game in each category. To get a sense of the uncertainties, I bootstrapped the data. These averages are shown in Figure 1.
Figure 1:Average penalties per game in each of the four categories discussed in the Data Section.

For both penalties relating to positioning (the illegal blocks) and dumb penalties there is stastically zero referee bias; both the home and away teams get flagged at the same rate (within the errors). This is not surprising, as these calls are fairly cut-and-dried, with little room for interpretation. Also not surprising is that the away team suffers more timing penalties (~0.2 more per game) — despite also being generally black and white, things like false starts and offsides are the fouls most likely to be affected by crowd noise.

For judgement call penalties like holding or pass interference, however, there is a small but statistically significant excess of penalties for the away team, with the visitor receiving an average of 2.70±0.03 penalties while the home team only gets called 2.59±0.03 times per game. These fouls should not be significantly affected by crowd noise, and thus indicate that referees do indeed hold a slight bias in favor of the home team.

Discussion and Conclusions
So it seems that NFL refs are indeed biased. But honestly, one tenth of a penalty per game is a pretty small bias. Since teams only play 8 away games during the regular season, this is less than one extra penalty, and since each team also plays 8 home games over time things should average out. Even in the playoffs, where a #6 seed would have to play 3 away games to make it to the Super Bowl, this bias shouldn't play a large role. The real story here is how fair  NFL officials are, even when calling fouls in front of 80,000 rabid, screaming, angry fans.

1: In an interview with Wired,  one of this book's authors cites this sort of referee bias as the reason why the Seahawks lost Super Bowl XL. I find this frightening, as anyone who writes an entire book about statistics should know that you can't apply statistical trends to individual events. I assume(hope?) that he was just speaking off the cuff and was therefore not very thorough with his answer.

Shout out to Sonographer's Cup winner Andrew "Lulu" Schaffrinna, without whom this post (and indeed, any future studies of penalties) would almost certainly never have happened.

Monday, July 22, 2013

Are Underdogs Winning the Super Bowl More Often than they Should?

In the 2004 Super Bowl the first-seeded New England Patriots beat the third-seeded Carolina Panthers by three points to win their second NFL championship. 2010 featured the top-ranked New Orleans Saints earning their franchise's first Super Bowl win.

In the 10 seasons since the NFL's last re-alignment (before the 2002 season) these are the only two times a #1 seed has won the big game. It seems pretty odd that the top seeds, teams which only have to win two home games to make it to the big game, are only batting .200.

There's obviously a lot of potential reasons for this discrepancy. One that tends to get mentioned frequently is the first-round bye given to the top two seeds. The logic goes that the week off, rather than helping a team rest up and prepare for the Divisional round, somehow hurts them, possibly by disrupting the natural rhythm of the week.
I've showed that on average the home team wins about 57% of the time during meaningful games of the regular season. If the bye week is the cause of this Super Bowl drought then it seems reasonable that we should find that the first and second seeds are winning their home playoff games at a lower frequency than expected.

A list of the seeding of the teams in the last 10 Super Bowls is all that's necessary for this experiment, so I simply made the list by hand from Wikipedia, which has fairly comprehensive coverage of each year's playoffs.

Wikipedia also has a comprehensive page on the Monte Carlo method, but in short it works by repeatedly generating random realizations of the problem at hand and comparing the results of the randomized trials to the real data. Given enough runs, the Monte Carlo method should converge to a stable result, allowing us to see if the assumptions that went into the Monte Carlo simulation are valid statistical representations of reality.

The Monte Carlo algorithm was set up to predict the expected number of Super Bowl appearances for each seed, under the assumptions that home field advantage was a flat 57% and that the different rankings of the teams had no bearing on game outcomes. No additional advantage from the bye week was programmed into the model.

The number of Super Bowl appearances for each seed (AFC and NFC seeds combined) is shown in Table 1. Note that even a 7% home-field advantage results in an additional ~1.5 Super Bowl appearances per decade for the #1 seeds than if there was no home field advantage (with no home field advantage both the #1 and 2 seeds each make it to 5 out of 10 Super Bowls, as would be expected).
Table 1: Playoff Model Predictions
SeedPredicted # of AppearancesActual # of AppearancesPredicted # of WinsActual # of Wins

The standard deviations of all the results are listed next to each predicted value; the relatively small sample of Super Bowls results in fairly large margins of error in the simulation.

Regardless, it's pretty clear that the extra bye week isn't hampering the first two seeds from getting to the championship. The #2 seed has made almost as many appearances as predicted, while the #1 seed is, if anything, reaching the Super Bowl more often than they should be.

Because I was interested, I also computed the number of times each seed wins the Super Bowl. For this calculation I made the additional assumption that there is no home field advantage in the Super Bowl, which seemed reasonable given that the game is held on neutral ground. Those results are also presented in Table 1.

Discussion and Conclusions
The errors are fairly large, but the overall match between the model and the data indicates that there is neither an extra advantage or disadvantage to having the bye (although there is tantalizing — but not quite significant —evidence that #1 seeds aren't winning as many Super Bowls as they should). Without a larger sample size, however, any firm conclusions would be premature.

Unfortunately, when it comes to the Super Bowl you only get one new data point a year, so it's going to be quite awhile before the signal may stand out from the noise. One interesting note to mull over while waiting for more data: in the first five of the post-realignment playoffs, a #1 seed reached the Super Bowl all five years. Since then, only three top seeds have made it to the big game, while the last three Super Bowl winners were all ranked 4th or lower.

Monday, July 8, 2013

Home Field Advantage II: The Cold Weather Edge

To investigate the effect that weather has on home field advantage, I've compared the average temperature difference between home and visiting teams over more than a decade's worth of games. I find that when the temperature differential is larger than 20° F the team coming from the colder city always has an advantage against the warmer-weather franchise compared to the overall home team win percentage, even when the cold-weather team is the visitor. This result persists even after the data are corrected for the effect of teams which have played against each other multiple times, and indicates that there may be some persistent advantage gained by teams which become acclimatized to poor playing conditions, although why this should be is unclear.

A few posts ago I investigated the effect that distance has on home field advantage, and found that teams traveling East had a much more difficult time playing on the road than visiting franchises coming from the West (or traveling North/South). However, as I noted in that post, distance is but one of many possible components of home field advantage.

Because NFL teams are scattered all over the country, many games (especially toward the end of the season) happen between teams used to dramatically different climates. Along with distance, the temperature differential is extensively discussed in the lead-up to a big game. The most notable example of this trend is the coverage of the Tampa Bay Buccaneers longstanding cold-weather futility. (This coverage, interestingly, largely ceased after the Bucs beat the Eagles in Philadelphia in the NFC championship game — only their second-ever win in temperatures below 40° Fahrenheit — en route to winning Super Bowl XXXVII.)

Of course, just because pundits and announcers like to talk about the weather doesn't mean it actually has any impact on the outcome. And the Buccaneers were a historically bad franchise for over a decade before their Super Bowl win. Let's dig in and find out exactly what (if any) impact the weather really has.

While my other home-field advantage study used game results I personally downloaded from, that data did not include any temperature information. The Armchair Analysis database, however, has plenty of information on game conditions.  From this database I obtained game results as well as weather information for every regular season game between 2000 and 2011.

Before digging into the temperature data I first computed the home team win percentage for the entire Armchair Analysis sample. Overall, the home team wins 56.9% of the time —only 1.1% less than for my data. This consistency is very encouraging, and indicates that results obtained with one data set can be accurately compared with the other.

To integrate the temperature data into the win-loss results I first computed the average temperature for every stadium in the league for each week of the regular season (Figure 1). Because the sample size for a given week is fairly small (roughly 5 games per week per field) I included the temperatures for the weeks immediately before and after as well, which helped to smooth out the 'wrinkles' and should provide more accurate averages.
Figure 1: Average home-field temperature for each team over the course of the regular season. Hotter temperatures are red, while colder temperatures are blue.
For teams playing in a dome I set the temperature at 72° F. For stadiums with a retractable roof I used the ambient temperature when the roof was open and 72° when the roof was closed.

Most of Figure 1 makes sense — Green Bay gets frighteningly cold in December and January, while all three Floridian teams play in fairly warm conditions. Kansas City is colder than I would have thought, however, and Pittsburgh has comparable weather to icy Buffalo. But overall it seems as though there is enough data in the sample to produce reasonable weekly averages.

With temperature averages established, I next determined the expected temperature differential between the home and away teams for every game in my sample. Note that in the following analysis I am not using the actual temperatures for each game but rather the averages for that week for the two teams. While somewhat more abstracted than using the real temperatures, sticking with the averages significantly simplifies things — if I used the specific game time temperatures for each game I would have to compare the expected temperatures for both the home and away teams to the actual conditions. Seeing how really extreme weather affects teams would be interesting, but that analysis is for another post.

Figure 2 shows the home team's win percentage as a function of the average temperature differential. The overall home team winning percentage is also shown, as are 1-sigma bootstrapped error bars.
Figure 2: Home team win percentage broken up by average temperature differential. The red line shows the home team's win percentage for the entire sample.
When the visiting team comes from a city with a temperature less than 20° different than the home city there is essentially no change in home field advantage (although there is weak evidence that the away team does better when visiting cities with similar weather). However, there is a dramatic shift for temperature differentials larger than ±20° — When a warm-weather team travels to the frozen North, they are almost 10% less likely to win than on average, while the situation reverses completely when a team used to duking it out in the cold road-trips to more tropical climes.

Discussion and Conclusions
Before digging in to the provocative results in Figure 2, some caution is advised. While each bin has several hundred individual games, it is possible that a few specific matchups between divisional rivals could be biasing the results. For instance, the NFC North has two teams with some of the coldest weather in the league (Green Bay and Chicago) as well as two teams which play in domes (Detroit and Minnesota). Depending on how the League draws up the schedule this division could contribute up to four games a year in the most extreme temperature differential bins — exactly the ones which show a significant change in home field advantage.

So how do we know that the apparent trend with temperature isn't the merely the result of the Packers and Bears beating up on the Lions and Vikings over the past decade (or Patriots-Dolphins, or Chiefs-Chargers, etc)?  Controlling for this potential source of bias is actually fairly simple — just give every distinct matchup is given the same weight in the computation.

Basically, for every combination of home and away teams present in a bin, I've computed the total home team winning percentage instead of treating each game as a separate event. These matchup winning percentages are added together in the same way that the original games were in Figure 2 to produce a corrected temperature differential histogram — Figure 3.
Figure 3:
Despite all of my concern, Figure 3 shows only a slight reduction in the trends when teams with multiple matchups are taken into account. Now it's possible to evaluate these results with at least some confidence that they aren't being dominated by just a few teams.

And the results are certainly interesting — especially if you root for a cold weather team! Not only does playing up North make it tough on visitors, with the home team winning ~7% more games than for teams in moderate climate and nearly 65% of the time overall, but the advantages provided by a frigid home environment appear to persist even when traveling.

It's not too surprising to find that rough weather can be difficult for a team which isn't used to it, but I wouldn't have predicted that fair-weather franchises would have just as much trouble when hosting teams used to the cold. I couldn't say how — perhaps teams used to playing in unpleasant conditions simply become extra excited about games where they know they won't need to worry about wearing gloves and sleeves!

Monday, June 24, 2013

Field Position and Scoring Probabilities: Half of the Red Zone is a Dead Zone (for Touchdowns)

Any drive's scoring chances increase as the offense moves down the field, but exactly what impact an additional X yards gained provides is not generally known (or at least not commonly discussed). In this post I've charted out a team's scoring chances for a first-down situation at any point on the field. In addition to a dramatic increase in touchdown percentage for all drives that have a first down within 10 yards of the end zone, there is a leveling off in the fraction of drives ending in touchdowns right outside of this zone. While the root causes of these features are not made clear by this analysis, they may be due to the necessity for different offensive and defensive tactics near the endzone.

As a team drives down the field, excitement naturally builds. Each first down brings them closer to the end zone and a touchdown. At least, it should. How much does each first down improve your chances of scoring, and are there any parts of the field where having a first down closer to the goal line doesn't help matters?

To obtain the necessary data I queried my copy of the Armchair Analysis database for all plays in the first three quarters. I ignored the final period so as not to bias the results with desperation drives from teams attempting a late rally. I then used a python script to find all first-down plays and the end result of the drive they occurred on.

This resulted in 63182 first downs over 17164 scoring drives. Roughly 60% of these plays were on touchdown drives, while the rest were on series that resulted in field goals (I completely ignored safeties, for the record). This uneven distribution is unsurprising, given that TD drives generally cover more of the field (and thus generate more first downs) than FG drives.

A plot of how likely a drive is to end in points as a function of field position is shown in Figure 1. It shows the fraction of scoring drives that result from a first down at a given yard line, with the opponent's end zone denoted by zero. Errors were determined via bootstrapping, and due to the sheer number of samples in this data set they are small. 
Figure 1: On any given drive, having a first down at a given point on the field is plotted against the probability of the drive ending with a score.

As expected, the likelihood of scoring any points increases monotonically (aside from a couple of bumps and wiggles most likely due to statistical fluctuations) from the offense's end zone to the other team's goal line. On a team's own side of the field the relationship is linear, with a field position boost of ten yards resulting in roughly a 10% increase in scoring probability.

Once you cross midfield, however, the odds of scoring take a distinct upturn. Looking at the data split into the different types of scores (red and blue points in Figure 1) shows that this uptick is the result of field goals, which makes some sense given that a team starting at the 50 only needs a couple of first downs in order to be in field goal range.

Inside the opponent's 30, the percentage of drives ending in field goals levels off because the offense is already within field goal range — getting additional yardage doesn't make you more able to attempt a field goal. The likelihood of ending the drive with a touchdown, however, continues to increase.

After a leveling off between 10-20 yards away from the opposing team's goal, the TD percentage rockets upwards for first-and-goal situations at the expense of field goals. Ultimately, a first-and-goal at the 1-yard line gives the offense an 85% chance of scoring a touchdown and an almost 95% chance of getting any points.

Discussion and Conclusions
It's somewhat surprising to see the dramatic increase in TD% when the offense is within the opponent's 10-yard line. This implies that there's something different about that last 10 yards — either it becomes significantly easier to score a touchdown (doubtful; I think the opposite is probably true), or teams are more likely to go for it on all four downs when they're so close to scoring. It's also possible that there's a psychological shift, providing a boost of adrenalin to the offense. A full investigation of  these possible explanations is beyond the scope of this post, but might be worth revisiting in the future.

Of further note is the lack of improvement in a team's touchdown chances inside the red zone but outside the 10-yard line. This is in stark contrast to the dramatic ramp-up of TD% once a team reaches a first-and-goal scenario. While the TD% in this region stagnates, however, FG% increases correspondingly, leaving a smooth increase in the total scoring probability.

On it's own, the leveling off of the touchdown percentage wouldn't be inconsistent with random statistical fluctuations, such as the apparent increased scatter in the total scoring percentage around the 50 yard line. But the consistency of the feature around the opponent's 10-yard line, along with the corresponding increase in the frequency of field goals, indicates that this phenomenon is real.

So it seems like there is indeed a bottleneck effect when a team gets ~15 yards away from a touchdown, likely due to the difficulty of getting a first down very close to the goal line. This bottleneck disappears once a team gets into a first-and-goal situation, possibly the result of a team's increased willingness to go for it on fourth and goal. So the next time your team has to settle for a field goal when they had first-and-10 from the 12, take a small comfort in knowing that they weren't in quite as good of a spot as it seemed.

--A huge shout out to Kenny Rudinger for noticing that my preliminary results for this post were obviously in error,  allowing me to sort out the bugs in my analysis code *before* subjecting my boneheaded mistakes to public scrutiny.

Monday, June 10, 2013

Quantity over Quality in the NFL Draft

NFL teams live and die by the draft. A franchise which drafts well consistently can look forward to years of sustained success, but just a single year of bad evaluations can cripple a team for several seasons. Drafting will never be an exact science, and it's not obvious why some teams appear to be better at it. In this post I investigate what might give these teams their advantage, and find that while drafting better players is correlated with winning more games, simply acquiring more draft picks has a stronger effect on a team's success. This result indicates that teams should focus on obtaining as many selections as possible rather than staking their fortunes to a few highly rated prospects.
One of the great things about the NFL is the level of parity. No matter how bad the previous year was, your favorite team is always 'just one year' away from turning it all around. Every year their seems to be one or two teams who dramatically improve their fortunes — look no further than the 2008 Miami Dolphins or 2012 Indianapolis Colts for examples.

Of course, these teams usually crash right back down to Earth (c.f. the 2009 Dolphins). But some teams seem to be near the top of the pack year after year.

Quantitative evidence for the above statement comes from the postseason. While 28 different teams have made the playoffs at least once since 2006 (sorry Buffalo, Cleveland, Oakland, and St. Louis fans!), only 10 have made the playoffs in more than half of those seasons. If you want teams that have made 5+ out of 7, your sample drops to five — the Colts, Patriots, Steelers, Ravens, and Giants. So why are these five teams so consistently successful, while most of the rest of the NFL is so streaky?

One possibility is that these teams win so much because they draft better than the rest of the NFL. Teams constantly have to refresh their talent pool as players age; a team which is able to more accurately evaluate college talent should have a huge advantage over teams which can't.

But is that true? Is it even possible to draft well? Some evidence would argue that it is not — there have been plenty of high-profile draft busts in recent memory (e.g. Vernon Gholston),  and undrafted stars like Arian Foster immediately tell you that good players are still falling through the cracks.

So the question becomes how to quantify drafting savvy, which is clearly a difficult thing to do. (If it wasn't, teams would have already figured out how to draft better!)

I downloaded a comprehensive list of draft results between 1990 and 2011 from Pro Football Reference, which (among other things) lists year, round, team, and when the player left the league. This data isn't perfect, as there are players who spend time outside of football before re-entering the league, but those players are outliers who shouldn't affect the results very much.

Coupled with the draft data I have team win-loss records for each of the aforementioned seasons. These records were compiled from individual game scores downloaded from To make things a little easier I will strip out individual teams from the equation and aggregate all teams together, and then look only at how prior drafts affect current win-loss records over all franchises.

If teams are truly bad at picking talent, then every pick would be essentially equivalent to rolling the dice. Now, we know that this isn't quite true, as otherwise you'd have many more first round busts and late-round diamonds. But what if it was?

If you assume that teams are totally incapable of evaluating talent, then the optimum strategy to build a winning team becomes clear: stockpile draft picks. In this scenario if you are drafting more warm bodies than other teams, by the laws of probability you will also acquire more talented players. Assuming you can separate the wheat from the chaff in training camp (a dubious assertion, I know, but let's not go down this rabbit hole now), you'll come out ahead in the long run.

Even if you make a weaker assumption about a front office's ability to diagnose talent in the draft — maybe that coaches and GMs lack the ability to discriminate between talent levels within a single round of the draft — the logic of grabbing as many picks as possible still holds. This is especially true given the low value teams seem to place on draft picks in future years. If you can give up your first round draft pick this year in exchange for a team's first round draft pick next year plus a second rounder, you essentially get an extra chance at winning the second round 'lottery' by agreeing to wait one more year before trying to get a good first-rounder.

It's simple enough to compute how additional draft picks impact win percentage. Figure 1 shows the Spearman correlation coefficient between the number of draft picks above the NFL average and win percentage. Just looking at the current year's draft isn't enough; Figure 1 shows several ranges of years — each point on the figure has a Y-to-X range of years. For example, the point at (6,2) shows that the players drafted between 2 and 6 years ago have a (relatively) strong effect on how well a team is currently doing.  It's not the most obvious plot to look at, but it conveys a lot of information in a compact way.
Figure 1: Correlation between number of draft picks in prior seasons with win percentage. The X axis shows how far back in time we count draft picks, while the Y axis shows the minimum number of years before the current season a pick must be made to be counted. A higher Spearman coefficient indicates that surplus of draft picks in that range of years is more strongly correlated with win percentage.

Only correlations with greater than 95% significance are plotted. The strongest correlation is only 0.123, for players drafted between 2 and 10 years ago. I can further break down the data with the strongest correlation by round — Figure 2.
Figure 2: Round-by-round analysis of the strongest correlation in Figure 1. Axes are the same as Figure 1, but with draft round numbers instead of years.

First off, if you only look in the first 1-3 rounds, there isn't any significant correlation. This is probably a function of teams' general reluctance to deal draft picks in the early rounds, which leads to a smaller sample size and therefore a weaker confidence level. The next interesting thing is the sudden pickup in significance when Round 4 is included in the calculation. And going later than Round 4 doesn't help out your win percentage. So having extra picks in Round 4 (and likely earlier) does much more for you than having extra picks in the last few rounds.

Alright, so now we know that additional draft picks can boost your win percentage by a small amount, but only if you look over several years and focus on earlier draft picks. Now we need to test how this compares to a measure of drafting skill.

Estimating how good at drafting a team is is much more difficult than just comparing win totals to the number of draft picks. While certainly not perfect, a decent proxy is the length of a player's tenure in the NFL; if, say, the Giants have drafted the same number of total players as the Cardinals over the last five years but have twice as many which are still in the league, logically it would seem that the Giants are doing a better job identifying talent.

In Figures 3 and 4 I've plotted the same metrics as in Figures 1 and 2, but looking at the number of drafted players still on the team instead of the raw draft numbers.

Figure 3: Same as Figure 1, but computing the correlations between the number of players still in the NFL and win percentage.
Figure 4: Same as Figure 2, but for the strongest correlation in Figure 3.

Of course, the number of players still in the league is also dependent on the total number of draft picks a team has. So we expect a correlation at least as strong as for the raw draft picks.

The first thing to notice is that many more ranges of years produce statistically significant correlations. Many of these correlations are larger than the strongest correlation from Figure 1, although the peak correlation is still in roughly the same location. Looking at this peak correlation by round, however, the largest correlations are not much larger than when looking at raw draft picks.

Discussion and Conclusions 
Before really jumping into the detailed analysis, it's important to note that none of these correlations are very large — that is to say that at best historical drafting ability plays only a small role in determining how well a team will do in a given year. This is perhaps not hugely surprising, given that there are many other variables (injuries, suspensions, contract holdouts, varying strength-of-schedule) which affect a team's fortunes but have nothing to do with drafting. 

Despite this, however, many of the correlations are statistically significant, which means they are very likely to be real. It's always important when looking at correlations to remember that even significant correlations do not necessarily imply causation. But in this case, when it's fairly clear that a team's ability to draft well should directly impact their on-field success, it seems reasonable to assume a causal link. 
Let's first discuss the intriguing results in the round-by-round breakdown. It's clear that considering later rounds in the analysis doesn't significantly improve the correlation. The broadness of this result implies that it is not a statistical aberration, which then means that adding extra late-round picks doesn't significantly help your team — the logical conclusion here is that it makes the most sense to package up your 5th, 6th, and 7th round picks and grab extra 4th and earlier selections.

The main conclusion, however, has to be that drafting players who survive in the league isn't much more better than simply drafting extra players. It's possible (probable?) that the number of second- and third-string players that stick around the league for a long time (so-called 'career backups') are biasing the results. The best way to test this hypothesis would be to construct some way of comparing player skill and add this into the analysis, but of course such a statistic (which would have to accurately compare such disparate positions as quarterback and defensive tackle) would not be simple to create.

Looking at the data on the raw draft picks indicates that there is indeed some advantage to be gained just by stockpiling draft choices. Given how teams appear to undervalue their draft picks in future years, a forward thinking team should be able to trade away picks in a current draft in exchange for extra picks in the next year. Repeating this strategy over several years would (in theory) lead to a large surplus of picks.

The correlations are small, but every little advantage in the NFL matters. As long as teams are willing to give up many future-year and/or late-round picks in order to move up just a few spots in the first couple of rounds, there will always be opportunities for a patient team to gobble up the extra selections. Bill Belichick's Patriots — 10 playoff appearances in the 13 years — are well-known for doing just that.

Wednesday, May 29, 2013

Home Field Advantage: Distance Doesn't Matter, But Time Zones Do

The existence of home-field advantage is well-known and not in dispute, but the magnitude of the home team's advantage is not often discussed. Additionally, it's reasonable to assume that while some of this effect is due to crowd noise or unconscious referee bias there may also be a component that results from the distance the away team needs to travel. It turns out that while the absolute distance traveled doesn't affect home-field advantage, teams that have to cross time zones — specifically traveling East — do almost 10% worse than on average.
Whenever there's a big game, TV commentators always like to spend some of their pregame chitchat on the topic of home field advantage. When the visiting team has a false start or delay of game, the announcers are quick to blame crowd noise. Even after a team has locked up a playoff spot they still play hard until they've secured (or have no chance at) home-field for the playoffs. The subject comes up enough that it's been the subject of at least one hour-long NFL network production. One of the reasons the Super Bowl is held at a neutral location is to remove this advantage (and it's also why basebal and the basketball organize their playoff series the way they do).

But what exactly is home field advantage? Is it some constant that applies for all visiting teams? Is it only applicable for those teams who play in loud stadiums (Seattle, Kansas City) or locations with hostile climates (Green Bay, Buffalo)? How strong is the effect?

Anyone who remembers the last several Super Bowls has good reason to be skeptical of the power of playing at home: 5 out of the last 8 Super Bowls have been won by the 4th seed or worse. In the same time period, the number one seed — the team with home field advantage — from either conference has only won once.

Obviously just looking at Super Bowl winners doesn't provide a very large sample size, and there can be other complicating factors — for instance in 2010 the Jets secured a wild-card spot with an 11-5 record, but had to go to Indianapolis because the 10-6 Colts won their division. So for this experiment we'll look at nearly two decades of regular season games, and probe the role that distance plays in the effect.

I downloaded game information (teams and scores) for all regular season games between the 1995 and 2011 seasons from using a Python web scraping script. Going further back in time might be useful, but it runs the risk of biasing your data if teams change the way they travel.

Determining the home time zone for each team is generally straightforward, except for the Cardinals. Since most of Arizona doesn't observe daylight savings time, during the summer months and part of fall the Cardinals are in the pacific time zone. But in the winter, when the rest of the country falls back, Arizona goes to Mountain time. For simplicity's sake I used Week 8 of the regular season as a cutoff for determining which time zone the Cardinals were in, as that's usually around when the switch happens.

The first question to answer is whether home field advantage exists at all. Counting up the entire data set, the home team wins 58.0% of the time, with a 1-sigma bootstrapped error of 0.76%. That's pretty significant — If a team played all 16 of their games at home on average they'd win one extra game per season!

So home field advantage is clearly a real effect. But what controls the advantage? This result is league-wide and over many years, so we can discount stadium-specific effects.

A natural next step is to investigate the effect distance has on home-field advantage. Toward this end I computed the stadium-to-stadium distance between the two teams for every game. A plot of the win percentage of the home team as a function of the distance the away team had to go is shown in Figure 1. The error bars are 1-sigma bootstrapping errors, and the red bar shows the average value of the whole dataset.
Figure 1
Figure 1: Home team win % as a function of how far the away team had to travel. The red line shows the average home-field advantage.

The widths of the bins were chosen to keep the number of samples roughly constant in each bin, which is why the bins at small distances are narrower than the bins at large distances. While there is a general trend towards a larger advantage when the away team has to travel a long distance, it's very weak. Only the smallest-distance bin is inconsistent with the global average, and it's not off by very much.

Pure distance isn't the only deleterious effect of traveling, however. Crossing time zones can mess with circadian rhythms, and an East coast team has three time zones to travel to play one of the California or Washington teams.

Figure 2 shows the relationship between the number of time zones traveled and the home team's win percentage. Firstly, when the home and away teams are both in the same timezone, the home-field advantage is lower than the average — when the Dolphins travel up to Buffalo the Bills don't have a full 58% home field advantage.
Figure 2
Figure 2: Win % as a function of the number of time zones crossed by the away team. Lines are the same as in Figure 1.

From East to West the time zones get increasingly negative — New York's time zone is -5 hours, while San Francisco's time zone is -8 — so an East coast team traveling to a West coast team would be on the left side of the plot.

When a team travels one or more time zones West, the effect of home-field advantage is roughly the same as the overall average of 58%. However, when teams go East the benefits to the home team are well above the average, hovering closer to 65% and significantly larger than their errors.

Discussion and Conclusions
It seems pretty clear that while home field advantage is very real, it's more dependent on time zones than pure distance. What's really interesting is that the home team only gains an advantage when the visitors are traveling East; there is essentially no additional benefit (over the average home field advantage) to hosting a team which has come to the Pacific Coast from somewhere else in the country.

This finding has consequences for NFL scheduling. The league goes out of its way to ensure that when teams from one division play another, each team plays the same number of home and away games. This consideration is admirable, however it falls somewhat short of ideal given these results, because both the AFC and NFC West have a team in the central time zone (Kansas City and St. Louis). So when an East Coast team hosts the Chiefs instead of another AFC West franchise, they lose out on the added home field advantage.

This discovery has serious ramifications for deciding where to place franchises as well. An NFL owner's job is to do everything in their power to win a Super Bowl. If you know that placing a team on the West Coast will put you at a disadvantage, isn't it your duty as an owner to make sure your team plays in the East? This is especially relevant given the current attempts to put a team in Los Angeles: any owner looking to buy into a new L.A. team is automatically going to be at a disadvantage.

Social Media Bar

Get Widget