PhD Football: March 2014

Sunday, March 23, 2014

Classifying WRs, TEs, and RBs by Where They Catch the Ball

Abstract
Principal Component Analysis (PCA) is a useful tool to simplify complex datasets. The results of the PCA can be then used either to reconstruct the original data or to classify it into different groups. In this post I apply PCA to reception data for a sample of 150+ NFL receivers. I find that PCA generally does a good job of discriminating between wide receivers, tight ends, and running backs. A few tight ends, however – generally ones known more for their use as receivers than blockers – have significant overlap with the wide receivers. This result indicates that PCA may be useful for determining how to designate players, Jimmy Graham for example, for the franchise tag.

Introduction

Alright, so I lied. Well, partially – I am very busy with job applications, but I've also been teaching myself some new machine learning techniques (mostly from this excellent textbook) and they're just so damn cool that it's been hard not to think of ways to apply them to NFL data.

One of these methods is called Principal Component Analysis (PCA for short), and it's designed to reduce a large, complex dataset down into its most important pieces. These pieces (the 'component' part of PCA) can be used as basis functions to reconstruct the original data with minimal information loss, providing a form of data compression. Or, the coefficients for a given component can be compared between all the observations in a dataset, and trends in these coefficients may be used to classify the data into groups.

One of the great things about PCA is that it relies on no assumptions about how the data are distributed. This means that PCA can be used on just about anything. Something that is especially appropriate for a PCA is the distribution of yardage gained by a player every time they touch the ball. Credit where credit is due: the analysis in this post is partly inspired by Brian Burke of Advanced NFL Stats, who looked at the distribution of yards gained (or lost) on rush attempts in an effort to distinguish between power running backs and smaller, faster RBs. Burke (largely visually) compared the raw yardage histograms, and found that there were only small differences between each type of back. Burke suggests using a gamma distribution to parameterize these gains, although given the distinct rush distribution for each player he shows it seems unlikely that every running back will be well-represented by such a (relatively) simple model (to his credit Burke himself is quite upfront about this). PCA allows us to produce accurate representations of such data without choosing a distribution a priori, which means we don't have to worry about limiting or biasing ourselves by such a decision.

For this post I'll apply PCA to reception statistics rather than rush attempts. One reason for my choice is that Burke's analysis (despite the limitations I mentioned earlier) is pretty thorough, and I prefer to break new ground when I can. The other (more interesting) reason is that while most rush attempts come from a single position group (RBs), the target for a pass attempt can be a WR, TE, or RB. So in addition to looking for differences between possession receivers and home run threats it's also possible to see how the different positions are utilized on passing plays.

Data and Model

I queried my copy of the Armchair Analysis database (which spans the 2000-2011 seasons) and grabbed the yardage gained from every reception for each player with 200+ catches in the database. (I impose this reception threshold to ensure that statistical noise doesn't dominate the data.) The final sample consists of 114 wide receivers, 37 tight ends, and 33 running backs. The reception distribution of the total dataset is shown in Figure 1.

Figure 1: Distribution of all receptions in the sample. It has a strong peak at a gain of around 7-10 yards, with a long tail showing big passing plays.

I next computed the reception distribution for each player, then ran the PCA. The details of exactly how PCA works are beyond the scope of this blog, but I'll give a brief overview of the method here so that at least the general concept is (hopefully) clear.

First off, each player's reception distribution is normalized so receivers with more catches don't bias the analysis, and the mean yardage distribution for the whole dataset is subtracted. From this point the algorithm gets to work, computing a function which minimizes the variation in the residual data. This process is repeated, and each successive iteration accounts for more and more of the fine details of the dataset. Eventually (when the number of iterations approaches the number of players in the sample) the PCA will perfectly reproduce the original data. Of course, that sort of exact duplication isn't the point of PCA; rather since the most variation is explained by the first components, the goal is to truncate the algorithm after only N iterations, where N is much smaller than the number of players in the dataset.

The script I wrote to do this analysis can be found here. It's a fairly long program, but a large chunk of it is just to make the diagnostic plots to show how well the PCA worked – the meat of the PCA happens between lines 107-115.

Results

Figure 2: The first four components of the PCA as a function of reception yards. The first component is the average of all the players, while subsequent components have been computed by the PCA algorithm. Components beyond the second and third are very jagged, signs that they are fitting individual player variation rather than useful information.

I ran the PCA on the reception data to N = 15, but a look at the first four components (Figure 2) indicates that after the first couple of iterations the PCA is mostly fitting differences in the reception distributions for individual players. I can prove (hopefully) this to you via Figure 3.

Figure 3: Sample PCA reconstruction for Anquan Boldin, showing both the original reception distribution (in black) and a reconstruction using the first three PCA components (red). The reconstruction generally does a good job of mimicking the data even with only three components.

In addition to providing the maximal reduction in variance, the PCA also provides a list of coefficients for each component. These coefficients can be used with the components to produce a reconstruction of the original data – in the case of Figure 3, for Anquan Boldin. You can see that just the first three PCA components are required to recover a fairly good representation of Boldin's catch distribution – consistent with what the shape of the components indicated in Figure 2.

Now that we have verified that the PCA is working as intended, we can get to the good stuff – using the PCA to differentiate between players. As I mentioned earlier the data contain WRs, TEs, and RBs. A plot of the coefficients of the first PCA component (PCA1), color-coded by player position, is shown in Figure 4.

Figure 4: The distribution of the first PCA coefficient. Note how WRs are cleanly separated from RBs, while TEs partially overlap with WRs.

This figure is quite striking – running backs all cluster with (relatively) large coefficients, while nearly every wide receiver has negative values for PCA1. Tight ends tend to fall in the middle, although there is substantial overlap with the wideouts. What this means is that there is something inherently different about where each position grouping tends to catch the ball (and by extension, what routes they run). This is not inherently surprising, given it's fairly easy to see this just by watching how players at the different positions move during a game.

Discussion and Conclusions

What is interesting, however, is the fact that tight ends and wide receivers aren't as cleanly separated from each other as they are from running backs. In fact, while TEs and WRs are clearly not drawn from the same distribution, there is definitely some overlap. This implies that some TEs are being used more like wideouts. Additional evidence for this hypothesis comes from looking at which tight ends are most and least 'wide receiver-like'. Table 1 lists the top and bottom five TEs, sorted by PCA1.

Table 1: Tight Ends with Extreme PCA1 Values
Most WR-like		Least WR-like
Name	PCA1	Name	PCA1
Owen Daniels	-3.8x10^-2	Steve Heiden	6.8x10^-2
Antonio Gates	-2.9x10^-2	Donald Lee	5.2x10^-2
Tony Gonzalez	-2.4x10^-2	Bubba Franks	4.6x10^-2
Marcedes Lewis	-1.4x10^-2	Eric Johnson	3.9x10^-2
Tony Scheffler	-7.8x10^-3	Freddie Jones	3.7x10^-2

The left side of Table 1 generally contains TEs, most notably Antonio Gates and Tony Gonzalez, who are able pass-catchers. The right-hand side, however, consists of players generally not known for their receiving ability. It seems prudent to reiterate here that I'm not claiming that PCA1 is a predictor of skill in any way; rather it merely indicates that some tight ends are being used more like wide receivers than others.

Aside from being a cool result on its own, it also provides a way to classify players based on a statistic that's directly comparable between positions. This is especially relevant right now, as New Orleans Saints TE Jimmy Graham attempts to be treated as a wideout for the purposes of contract negotiation. You can read up on the details for yourself, but the upshot is that if Graham can get himself classified as a WR he can earn himself an extra $5 million over what he would get as a TE. A lot of the discussion has centered around statistics, such as where Graham lines up before the snap or how many receptions he had last year, that aren't directly comparable between wideouts and tight ends.

Unfortunately my data isn't current enough to actually include Graham in this sample (ditto Rob Gronkowski, just FYI), but I would bet that he winds up in the same regime as Gates and Gonzalez. Regardless of whether my intuition is correct, however, PCA provides a way to directly compare players at different positions based only on very basic data, and therefore it could be a very useful tool for position disputes like these.

Monday, March 17, 2014

Hiatus

Hi everyone!

As some of you may know, I'm on the hunt for a new job – outside of academia. Unfortunately, sending out applications takes a lot of time, and despite my best efforts I can't keep doing football analytics and give the job search the attention it needs. So I'm putting PhD Football on hiatus while I figure out what I'm doing next. Once I get my work situation in order I'll get back to the blog – hopefully soon! (And if you happen to know of anyone hiring science PhDs I'd love to hear about it.)

Monday, March 3, 2014

First Down Probability

Abstract
In this post I compute the First Down Probability metric, which predicts how likely a drive will produce at least one more first down for a given down and distance. I find similar overall first down conversion rates to prior studies in the literature, including that third down rushing plays are significantly underutilized. Unlike previous studies, however, I break down these rushing plays by the position of the ballcarrier, and find that a significant portion of this discrepancy comes from rushes by the quarterback, likely from scrambles on broken passing plays. More puzzling is the fact that QB runs on first and second downs don't show this trend, a result that is difficult to convincingly explain.

Introduction

During the course of a football game a fan gets a lot of statistical information. These numbers – QB rating, a running back's average yards per carry, time of possession, etc – generally lack any kind of contextual information about how the game is actually going. At best these statistics are incomplete (showing a WR's average yards per catch after a 99-yard completion, for instance); at worst, they're downright misleading (That QB just had 5 completions in a row...but they were all screens for minimal yardage).

A better statistic is one that takes the game situation into account. For instance, a 5-yard completion should count for more on third and 4 than on third and 16. There are several such statistics already in existence, such as Football Outsider's DVOA metric or Expected Points. These sorts of metrics generally depend on using historical play-by-play data to compute average outcomes for plays at any given down and distance. This approach is (unsurprisingly) more computationally complex, and often can appear opaque to the casual fan. Some of these stats, such as the DVOA, are intricate enough that their creators have decided to keep the full details of their computation private.

A direct and (relatively) simple context-sensitive statistic is Brian Burke's First Down Probability, which I will abbreviate as FDP. That link has more details, but the core insight of this metric is that the average odds of converting the next first down in a series can be estimated for any given down and distance. With this information in hand, it's possible to evaluate the result of a play based on whether it improves or harms the offense's chance of eventually getting a first down.

In this post I'm going to compute the FDP for the plays in the Armchair Analysis database. One may ask why I would recompute this quantity when Burke has already done quite a good job of it. One reason is to ensure the reproducibility of results – while I trust Burke's analysis, everyone makes mistakes. A more basic reason is that while Burke produces a nice visualization of his computed FDP he doesn't provide his data in a tabular form, which makes using his FDP values difficult (at best). I can also extend the FDP calculation to all four downs (Burke only considers second and third downs in his post). Finally, I can (spoiler alert) start using FDP to generate new insights about how teams approach different down-and-distance situations.

Data

As I mentioned before, I'm using the Armchair Analysis database, which covers the 2000-2011 NFL seasons. I grabbed the play-by-play data for all regular season and playoff games, then filtered out plays for several reasons. Plays inside the two minute warnings were discarded because teams play differently in those situations; I removed plays when the game wasn't close (defined as one team being up by more than 16 points) for the same reason. I cut out all punts and field goals as well as penalties (although I keep the results of the penalties in the data: if a team runs for -5 yards on second down but then is the beneficiary of a 15-yard roughing the passer call on third down, the second down play would be considered as ultimately resulting in a first down for the purposes of this analysis). Finally, to avoid biasing the data based on field position I only include plays between the offense's own 10-yard line and the redzone.

Ultimately this results in a dataset of 262,601 plays, split 56%-44% in favor of passes over runs. I bin these plays as a function of current down and yards to go, eliminating bins with fewer than 200 plays in my dataset. This cut ensures that there are no bins with conversion rates dominated by sampling error. The Python script I used to do this data querying and processing (as well as produce the plots in later sections) can be found here.

Results

Figure 1: FDP as a function of down and distance. The colors denote different downs, while the line styles break down success if the next play in the drive is a run or a pass. In some cases the data for the individual types of plays does not cover the same range of yards to gain. This is due to the minimum play cutoff detailed in the Data section.

Figure 1 shows the raw results, split by down and distance. For the benefit of anyone looking to check my results or to build on them I have also tabulated these results in text files, which can be obtained from my GitHub repository. Feel free to use them as long as you explain where you got them from (and a link back here would be nice as well!).

Anyway, the first thing to do is to check my results with what Burke obtained. It's a bit difficult to compare directly since I can only eyeball our plots, but in general my results seem to be fairly copacetic with his. The data between downs look fairly similar, with a ~15% shift each down as you go from first and N to second and N, increasing to ~20% from second to third down. There's not much data on fourth down, but I see no reason why it wouldn't resemble the other downs for conversion attempts beyond 2 yards.

More interesting is what happens when you break the conversion percentages down by type of play. Note that when comparing the FDP of runs versus passes at a given down and distance, a higher conversion rate for e.g. a pass doesn't necessarily mean you should always throw the ball in that situation; rather, it implies that currently NFL teams are not playing at the Nash equilibrium. This means that NFL teams should call more passing plays in that situation than they currently do; as defenses adjust to this new reality, there should be more opportunities for successful rushing plays, and eventually the FDP of both types of plays will equalize. Burke has some more detailed discussion of this in his breakdown of first down probability for runs and passes (although he restricts his analysis to third downs).

So again we are treading on old ground, and again it makes sense to compare results. Here we find a bit of a discrepancy, with Burke's rushing FDP on third and short ~5% lower than mine. It's not clear why this would be, although it might be due to the fact that Burke's data only goes through the 2007 season or how he considers sacks (the Armchair Analysis database considers sacks to just be really crappy passes). Regardless, things appear similar enough to proceed.

It's clear that teams aren't passing enough on first and second downs with more than 5 yards to go. Considering teams are already passing a lot in those situations, especially in second and 10+, this would imply that even the occasional rush in such circumstances is too much.

In short yardage, however, things are reversed. On second and 3 or less teams are running less often than they 'should', although the difference is only at about 7% or so. Third down is even more striking: whenever there are fewer than 9 yards to go the data indicate that teams should be running more. This is an even larger discrepancy than Burke finds, and is downright shocking given how unusual 7+ yard runs are under normal circumstances.

But there are two kinds of runs – designed runs and aborted passing plays. Burke considers the latter category to be rare enough to be inconsequential, but I wasn't so certain. So I modified my program to separate out rushes by the position of the ballcarrier – it can't tell if a QB rush was designed that way or if it was improvised, but it's better than nothing.

Figure 2: FDP, corrected for the influence of QB runs (the uncorrected rushing percentages are shown in gray to facilitate direct comparison).

Figure 2 shows the result, and it turns out that without the QB involved a third down rush becomes a much worse proposition. Indeed, now teams should only be running more on third and 3 or less, consistent with what the data show for second down.

While teams are generally doing better at finding the equilibrium between passes and rushes with RBs, these results indicate that teams are letting their signal-callers run the ball far too infrequently. If you look at the conversion rates just for QB scrambles it's generally 10% or more higher than a rush from a running back in the same situation! Even more interesting is that this offset only applies on third down. On first and second down a QB scramble appears to have similar conversion rates as a regular rush.

Discussion and Conclusions

First of all, the fact that QB rushes are so underused compared to other types of plays is quite interesting. Given the fact that teams generally do not want their prize passers taking hits down the field, most of these successful conversions are likely due to scrambles on passing attempts. But given how high the conversion rate is perhaps coaches should consider running a few more QB draw plays, especially with all the mobile passers entering the league.

But what's really weird is that QB's rushes aren't more successful than the regular variety on earlier downs. A possible explanation is that defenses are more keyed toward stopping shorter-yardage plays on second down, whereas on third down they sit back and follow the WRs down the field. But in that case you would expect third down rushes to be equally successful, regardless of the runner. I think it's more likely that on second down a QB under pressure isn't concerned with making the sticks, but rather simply looks to get out of trouble. On third down, however, the consequences of playing it safe are much more clear, which encourages passers to scramble for every last yard.

Of course, I'll be the first to admit that this is just speculation. A definitive analysis of this phenomenon would probably require deep analysis of individual quarterback scrambles, which is way beyond the scope of this work. But it is a cool result from a (relatively) simple metric, and illustrates how deep insights can be gleaned from just a little bit of intelligent digging.

PhD Football

Pages