PhD Football: 2016

tl;dr: I made a Python package to compute NFL Win Probability - given a specific game state, what are the odds the offensive team will go on to win the game? Code on GitHub, documentation on Read the Docs, or just 'pip install nflwin'.

One of the most common advanced statistics used by NFL analysts is Win Probability. Put simply, Win Probability (WP for short) is an estimate of the likelihood that, given a specific game state one team will go on to win the game. For example, at the very start of the game between evenly matched opponents each team's WP will be very close to 50%, while a team up by 20 points with a minute left to go will have a WP of essentially 100%. Down, distance, field position, and other variables can also be added to the model in order to produce an extremely granular WP estimate.

While WP alone is a useful tool for condensing the myriad variables surrounding the game state into a single, easily interpretable number, it becomes even more useful when compared across plays. The difference in WP between two plays (also known as Win Probability Added, or WPA) provides a way of measuring how effective a given play was at helping your team win. Instead of grading a running back's performance based on rushing yards or yards-per-attempt, for instance, summing the WPA from each rushing attempt automatically produces a statistic which gives more importance to a 2 yard rush on a critical fourth-and-one than for a 7 yard draw play on third-and-18.

Despite its easy interpretability, which is relatively rare in the world of advanced statistics, WP is not a straightforward calculation like yards-per-rush or even QB rating. WP isn't based on a simple formula; rather it requires one to build a detailed model based on historical data. This model can be quite complex, both in terms of the specific data used to construct it but also in the choice of model itself. As a result computing WP from scratch is not feasible for a large number of would-be analysts. That's why I built NFLWin.

NFLWin is a Python package designed to make estimating WP robust yet simple. It provides a simple interface for pipelining raw data through all the steps necessary to compute WP along with great documentation that covers installation and use. The code is fully open-source so anyone can inspect its guts or modify it to suit their purposes, and while it includes a WP model to make it easy for anyone to get going right away, NFLWin also includes utilities and instructions for rolling your own model if you so choose.

NFLWin is far from the first effort to compute Win Probabilities for NFL plays. Brian Burke at Advanced NFL Analytics was one of the first to popularize WP in recent years, writing about the theory behind it as well as providing real-time WP charts for games. Others have picked up on this technique: Pro Football Reference (PFR) has their own model as well as an interactive WP calculator, and the technique is offered by multiple analytics startups.

So why create NFLWin? Well, to put it bluntly, while there are many other analysts using WP, they're not publishing their methodologies and algorithms or quantifying the quality of their results. This information is critical in order to allow others both to use WP themselves but also to validate the correctness of the models. Brian Burke has never discussed any of the details of his WP model in any depth (and now that he's at ESPN, that situation is unlikely to improve any time soon), and analytics startups are (unsurprisingly) treating their models as trade secrets. PFR goes into more detail about their model, but it relies on an Estimated Points model that is not explained in sufficient detail to reproduce it.

Possibly the best description of a WP model comes from Dennis Lock and Dan Nettleton, who wrote an academic paper outlining their approach and results. Lock and Nettleton's paper provides information regarding the data source used to train the model, the type of model used, the software used to build the model, and some statistics indicating the quality of the model. It even includes a qualitative comparison with Brian Burke's WP estimates. This is far and away the most complete, transparent accounting of the guts of a WP model and is laudable. However, as often happens in academia, none of the code used to build and test their WP model is available for others to use; while in principle it would be possible for anyone to recreate their model to build on or validate their work, this would require building their entire pipeline from scratch based off of dense academic prose.

"But Andrew", you may say, "What about the PFR online WP calculator you mentioned only two paragraphs ago? Surely we can just use that instead of having to create our own." Well, unfortunately there are two main problems with that approach:

If you ever want to programmatically compute WP you'll need to write a web-scraping algorithm to do so. The end result will require the user to be online, and, like most web-scraping, be fairly brittle - if PFR changes their website your scraper has a good chance of breaking. Not optimal.
There is something obviously wrong with the PFR calculator. Go to the calculator page and ask it to tell you the WP for a tie game with zero point spread, with 5:01 to go in the 4th quarter and the offense at first-and-goal from the 5. You'll see that their model gives the offense a 50% chance of winning the game. Now compute the WP for the same exact situation but with 5 minutes left to play - one less second than before. Suddenly the WP prediction has jumped to 76.69%, a increase of over 25% just from having one fewer second on the clock!

While the first issue is unpleasant, the second is a huge problem. I don't know whether its a buggy implementation or a bad underlying model, but this discontinuity makes no sense. If PFR posted its algorithms publicly it would be possible to diagnose the problem. If their code was on GitHub I could even patch their code and contribute back.

This lack of transparency is endemic in the field of sports analytics. By not publishing their methodologies and the code behind them they are failing the reproducibility test as well as their readers who trust them to provide honest and unbiased stats. I get that controlling access to these algorithms can represent a competitive advantage, but frankly it's impossible to trust any analysis when there's no way to assess its accuracy or even verify that it's not flat-out wrong. How correct is Brian Burke's model for a given game state? Is the PFR model buggy just in this one case or is it pathologically incorrect? There's no way to tell.

NFLWin doesn't have that problem. Anyone can inspect the code to look for bugs, and accuracy measurements are built into the model. To be completely honest the default model in this initial release isn't particularly good - plotting the expected WP based on an aggregated validation set against that predicted by the model shows clear deviations from perfection (see below) - but if you want to use it you can see exactly how much you should trust the model, and it's now possible to quantify improvements made as time goes on and the model is iterated upon.

The default model in Version 1.0.0. Note the deviations from perfect predicted WP.

The OSS community has shown time and time again the value to be gained from open development - not only is there direct benefit to the public but having more eyes on the project leads to better code. By creating NFLWin I hope to not only empower others to produce robust, reliable WP estimates but also to use the knowledge of others to build a better tool than I could construct on my own.

So check NFLWin out. Read through the documentation. Install it and play around. Post an issue if something is missing or wrong. And, of course, contributions are welcome :).

PhD Football

Pages

Tuesday, December 13, 2016

NFLDash: A responsive dashboard for play-by-play data

Thursday, September 1, 2016

Introducing NFLWin: An Open Source Implementation of NFL Win Probability

Social Media Bar