Monday, September 30, 2013

Not All Fumbles Are Created Equal

A fumble can be a key play in a football game, where just a single turnover can be the difference between a win and a loss. Recovering a fumble is therefore a hugely critical act. While the recovery itself is at least a mostly random event, the location of the fumble can significantly alter the odds that the defense will recover it. Fumbles behind the line of scrimmage are more likely to be recovered by the offense, while fumbles after a successful rush or pass are more likely to get scooped up by the defense.

Nothing in football can change the momentum of a game faster than a turnover. A positive turnover differential is highly correlated with winning, so it's no wonder that teams are constantly talking about making fewer of them. While interceptions are generally directly caused by poor decision-making by the quarterback, the apparent random nature of fumbles makes them so much more exciting (and vexing, when your team is the one doing the fumbling).

Of course, fumbles aren't really random. Usually a player doesn't just accidentally drop the football, and defensive players are taught to hold offensive players up while their teammates attack the ball. However, the act of recovering a fumble is generally considered to be a random event, one that is entirely based on luck. (I'm not quite as convinced of this assertion as the sites I just linked; I've seen too many players try to pick the ball up when the should have fallen on it, or fall on it only to have the ball squirt away. But testing this is not the focus of this post so I'll leave it be for now.)

It's important to recognize that this does not mean that all fumbles have the same probability of being recovered by a certain team—you wouldn't want to use fumble recoveries as a random number generator, for instance. The more players on the defense near the fumble, the more likely one will make the recovery. Conversely, if only the fumbling player is aware that he's fumbled (such as on the quarterback-running back exchange), the offense will be more likely to recover. By this logic, a team's chance of recovering a fumble should be strongly dependent on where the fumble occurs relative to the line of scrimmage.

Data come from the Armchair Analysis database, which I queried for all plays which resulted in fumbles, as well as all subsequent plays (to determine whether the fumbling team maintained possession). To avoid potential errors in this method of determining the recovering team, I excluded fumbles occurring on fourth down. To avoid biases from teams altering their strategy at the end of a half, I only used data from the first and third quarters. As usual all errors are bootstrapped.

First I selected only fumbles made by offensive players—specifically QBs, RBs, and WRs (I lumped TEs in with the wide receivers). From here, computing the fraction of fumbles recovered by the defense is relatively simple, and it turns out that overall the defense recovers 54.80±1.002% of all offensive fumbles—slightly (but statistically significantly) more than half. This is not hugely surprising, given that the defense is much more focused on whoever has the ball than offensive players are.
Figure 1: Breakdown of fumble recovery probability as a function of position relative to the line of scrimmage. The horizontal red bar shows the overall defensive fumble recovery rate, while the bins are shaded proportionally to which offensive positions are responsible for the fumbles.

Figure 1 shows the defense's fumble recovery rate as a function of field position, split up into bins with roughly even numbers of fumbles per bin to maintain a constant signal-to-noise ratio. The histogram has also been split up into positions, showing who is responsible for the lost fumbles.

The most striking feature in Figure 1 is the clear dichotomy between fumbles that occur just behind the line of scrimmage and the ones that occur after positive yards have been gained. This makes intuitive sense: most of the fumbles behind the line of scrimmage are likely occurring in the center-quarterback or quarterback-running back exchange, which happen before the defense has had a chance to get into the backfield. (The uptick in defensive recoveries more than ~10 yards behind the line of scrimmage is almost certainly due to strip-sacks.) Once the offense gets beyond the line of scrimmage, however, most fumbles are going to be directly caused by the defense, in a region of the field where defensive players greatly outnumber the offense.

Interestingly, a fumble on a very successful play, one that gains more than 20 yards, isn't more likely to be recovered by the defense than the average play. I'm not sure exactly why that is, but it may be that a larger proportion of long plays end up near a sideline, and therefore any fumbles have a higher likelihood of going out of bounds.  Since in this analysis a fumble out of bounds counts as an offensive recovery, it could be artificially depressing the defensive recovery rate.

Discussion and Conclusions
It's clear that the location of a fumble is of significant importance, as there is a ~20% swing in a defense's chance of recovery with just a few yards' change in position. A quarterback that drops the snap from the center will generally only be responsible for a wasted down, but a receiver who catches a 5-yard quick slant and can't hold on is likely to be the direct cause of a turnover. It's no wonder that running backs who fumble rarely last very long in the NFL; most of their runs will end up right in the range where the D is most likely to come up with a recovery.

Monday, September 16, 2013

Penalties II: Crowd Noise

Crowd noise is generally considered to be a contributing factor in causing false-start penalties on visiting teams. Some stadiums are well-known for focusing deafening amounts of noise on the field, usually near the endzones. To determine if crowd noise really does cause false-starts, I compared the discrepancy between false-starts called on the home and away team as a function of distance from an endzone. While distance from midfield and likelihood of visiting team false-start penalties is correlated, it is not strongly significant.

In Part I of my investigation into penalties, I found that there was a statistically significant discrepancy in the number of false-start penalties between the home and away teams. At the time I chalked this result up to crowd noise and focused my analysis on other types of penalties, but I later realized that, while frequently quoted as fact, I've never seen any hard evidence on the subject.

If crowd noise does affect a visiting offense's snap count, it's logical to expect that the effect will be largest when fans surround the field on three sides—near the endzones. This provides a way to isolate crowd noise from other variables surrounding false-starts, e.g. the possibility that traveling to away games makes teams more prone to jumping before the snap, or that referees are somehow biased even for such apparently cut-and-dried calls.

As usual the data come from Armchair Analysis. For this project I created a new table containing information on the field location of all plays as well as whether a penalty was called—because of the huge number of plays in the database this query took ~8 hours and therefore was not feasible to do on the fly during the analysis.

The percentage of plays that result in false-start penalties as a function of field position is shown in Figure 1. I'm not sure what's going on when the offense is backed up by their own endzone; I've checked the data and didn't find anything out of the ordinary. It's possible this is due to the relatively small number of samples that close to the goal line—regardless of the cause, it doesn't appear to affect the rest of the data, so I will simply exclude this bin from the remainder of the analysis.
Figure 1: Percentage of plays which result in false-starts as a function of distance from the offense's goal line. Black points are for the home team, while red points show penalties committed by the away team. X error bars show the range of yards included in each bin, while y error bars are the bootstrapped uncertainties.
At just about every point on the field the away team commits more false-starts, which is unsurprising given what we already knew. If you squint just right, however, it does appear the away team gains 'penalty parity' around midfield.

As always, trusting your eyes is a poor way to do statistics. Therefore I ran a correlation analysis, folding the data at the 50 to test raw distance away from the nearest endzone rather than position on the field. The result is a fairly weak correlation (Spearman ρ of 0.28) that is far from statistically significant (p-value of 0.24).

Discussion and Conclusions
Our eyes do lie, apparently, at least about the statistical significance of the correlation. It's difficult to figure out what to make of this (non) result—the original hypothesis certainly seemed reasonable, and I don't believe that crowd noise has zero effect on false-starts. You can certainly make a strong argument for a correlation based on watching some of these false starts happen.

So why no significant correlation? Well, it certainly is possible that crowd noise actually isn't playing a huge role after all, although I don't have another explanation for why referees would be more likely to call a false-start on the visiting team. It's also possible that my original assumption, that crowd noise is amplified near endzones, is incorrect. Another option is to look at Figure 1 and note that the correlation appears to be more significant in the offense's own end of the field: maybe fans are more vocal when the away team is starting a drive, but as the offense moves down the field they get quieter.

Ultimately, the only way to know for sure just how crowd noise affects the game would be to attach sound meters to the players. The NFL may already do this, for all I know; they already mic the players and coaches, so at the very least crowd noise information should be (in principle) recoverable from these recordings. If I ever get my hands on this sort of data I will certainly try this sort of analysis again, but for now the stats I have on hand are are just too rough to make any definitive conclusions.

Monday, September 2, 2013

Quarterback Rating II: Let the Rookie Sit

Many franchises use high draft picks on quarterbacks, rightly understanding their importance to a well functioning team. There is enormous pressure to start these players right away, but is that a good idea? Based on a player's peak quarterback rating, the answer appears to be no. Whether this is due to the pressure of the job breaking fragile young QBs or because the teams most likely to start a rookie passer are also most likely to have other problems is unclear, but either way indicates that teams should be wary about throwing young quarterbacks right into the fire.

As mentioned in my last post, the quarterback is the focal point of every NFL offense; all offensive plays run through his hands. Unsurprisingly, teams place heavy emphasis on the selection and training of talent at the position. Promising (and even some not-so-promising) quarterbacks are hot commodities—since 1990 nearly 60% of first overall picks in the NFL draft have been QBs.

The pressure on these passers is intense, especially from teams which expect immediate production from their rookie signal-caller. Even teams who intend to let the new QB learn from the bench for his first year frequently find their plans changed by injuries or pressure from fans.

But is it good for young quarterbacks to get rushed into starting roles like this? There are certainly QBs that find success after starting as rookies - Peyton Manning comes to mind, and Russell Wilson and Robert Griffin III certainly seem to be in good shape. But for each success story there are plenty of high-profile failures.

Certainly for at least some of those quarterbacks there are other good reasons for why they didn't live up to expectations, but considering how important the position is the ratio of successes to failures seems frighteningly low. Of course, on this blog gut feeling isn't good enough; let's see what we can prove.

Data once again comes from Armchair Analysis using the same queries as in the last post to compute seasonal QB ratings for every passer in the database.

It is important to note that while a truly heroic feat of data collection and organization, the Armchair Analysis database is not perfect. While inconsistencies appear to be minor (at worst) for most of the statistics, there seem to be somewhat larger issues with data on when players were drafted.

For instance, Carson Palmer is listed as being a rookie in 2004, but was actually drafted in 2003. The database also doesn't handle players taken in the supplemental draft very well. Overall, however, the data quality seems to be very good, and I'm confident that the results are not significantly biased by any typographical mistakes.

In order to produce as unbiased a sample as possible, I restricted the investigation to quarterbacks who have at least four seasons, including their rookie season, in the database. Additionally, the quarterback must have thrown more than 150 passes in at least one of those seasons.

Determining a reliable measure for quarterback skill is a non-trivial task; ESPN's Total Quarterback Rating involves "several thousand lines of code" and the website I just linked implies that advanced computational techniques, such as machine vision, are involved. I (sadly) don't have the amount of time necessary to do something this complex, so I'll be sticking to the regular old-fashioned QB rating.

An additional roadblock comes from grading quarterbacks over their careers. As it turns out, a QB's passer rating in one season is a surprisingly poor predictor of their rating in the following season.

While it's clear that QB rating is an imperfect measure of a passer's skill, it still works reasonably well as an overall gauge of competence at the position. To try to avoid the year-to-year issues I'll only look at a quarterback's peak QB rating—their absolute best season.

Alright, that was a lot of explanation, now let's get to the good stuff. Figure 1 shows a histogram of the maximum QB ratings of the entire sample (in gray). The majority of signal-callers in the sample have peak QB ratings between ~75 and ~90, with an average peak QB rating of 83. But there's a fair amount of variance in the sample, from 55 (Mike McMahon, who managed to start seven games for the ill-fated 2005 Eagles) to 118 (2011 Aaron Rodgers, who actually got a rating of 123 in the regular season but this data includes his poor performance in the Packers' playoff loss).

Figure 1: Peak QB ratings of the sample.
I've broken this sample down in two ways. First, I've selected all quarterbacks who threw 150+ passes in their rookie year (purple histogram). I next select QBs who have met the passing criterion for at least 4 seasons (gold histogram). While not perfect, passers who stick around in the NFL for several seasons are going the be the best and most reliable quarterbacks, so length-of-tenure is a good proxy for the most skilled playeres.

The results are notable—the 4+ year starters have a uniformly higher peak QB rating than the group who saw significant action as rookies. In fact, the veteran QBs are responsible for all seasons with a QB rating above ~90, while no passer is given 4+ years in the league as a starter without at least one rating above 70.

Not only are these two distributions notably different, they're significantly dissimilar. The standard test to see if two samples of data come from the same underlying distribution is the Kolmogorov-Smirnov test (usually abbreviated as the KS test). This test says that the two sub-samples are distinct with only an 8% chance of error. However, we know that the rookie starter and long-tenured QB distributions actually do come from the same parent distribution. In addition, there is clearly overlap between the two sub-distributions.

The net result of these facts is that there is an even smaller chance of error than the KS test would indicate. A full Monte Carlo simulation indicates that there is actually a 99.3% certainty that these two distributions are distinct.

Discussion and Conclusions
This result, in its barest form, means that the conditions that result in a quarterback starting as a rookie are different from those that lead to success in the NFL. Note that it does not necessarily mean that a rookie quarterback will not have a long, productive career; there is some overlap between the two sub-distributions. Nor does it mean that the solution to the problem is to make sure all rookie QBs stay far away from the field; the teams that are in a position to redshirt their first-year passers are also the ones most likely to already be in better position to protect their investments when they do make it to the field—for instance the Packers with Aaron Rodgers or the Patriots with Ryan Mallett.

But despite these caveats, these findings are still quite interesting, and indicate that quarterbacks who see significant rookie action tend to have lower ceilings than signal-callers who are rested at the start of their careers. This result indicates that teams who are looking to use a high draft pick on a (hopefully) franchise quarterback should resist the urge to play him right away, and maybe consider upgrading their other positions of need first before drafting their QB of the future. So the next time your favorite team passes on a flashy gun-slinger in order to draft a boring left tackle, don't judge them too harshly.

Social Media Bar

Get Widget