August 16, 2013 by Sean Childers in Analysis with 11 comments
The one statistic that teams ask me to help them with the most is plus-minus. Lots of very smart people in the Ultimate community talk about plus-minus, though some do it a bit more quietly than others.
Leaguevine made the bold move to equate player “Offensive Efficiency” and “Defensive Efficiency” ratings with the general concept of plus-minus – looking at the percent of points you won when starting on O-Line and D-Line, respectively. And one of the smartest Ultimate captains I know spent years tracking only one player-related number: plus-minus.
But Ultiworld hasn’t written about plus-minus yet and that’s for a reason: I think the appeal of plus-minus is misplaced for a variety of reasons, and that it’s especially poorly suited for analyzing elite Ultimate teams as they are currently constructed.
What Is Plus-Minus And Why Is It So Appealing?
Plus-minus is very simple and you could calculate it yourself. It compares how well your team plays with you on the field compared to without you.1 Mechanically, every player first starts with a plus-minus of zero. Second, someone needs to record what players are playing each point. Third, someone needs to record the result of the point. If your team scored, then all of the players that were playing get +1. If the opposition scored, all of the players that were playing get -1. Rinse and repeat for an entire game, tournament, or season, and you get a final plus-minus score.
The beauty and the appeal of plus-minus relates to its simplicity. What’s important in Ultimate? Winning games. What helps you win games? Scoring points. Over-analyzing past “winning points” inevitably results in focusing too much on some measures and under focusing on others – maybe you’re focusing too much on which players scored goals, which players got the positive yardage, or who has the highest Ultiworld D Rating.
If you have a defender that contributes value to your team by playing an amazing mark, then that player’s contribution should eventually be picked up in the plus-minus data – your team will be more likely to win points when that player plays because of his tough mark. An offensive cutter that does an outstanding job of clearing space for other cutters? Also not tracked in our UltiApps tracking system, but should eventually be reflected in higher plus-minus scores. In fact, plus-minus’s primary appeal is that it can track “off-disc” (or off-ball) involvement that is important, allowing certain players who may not blow up the stat sheet in traditional ways to nonetheless shine if they dominate on the little things (check out an old New York Times profile of Shane Battier for a great example of this).
So What’s the Primary Problem?
The most important problem with plus-minus is that pesky word “eventually” which modified all of the examples above. All statistics are subject to sample size issues. But statisticians analyzing plus-minus data in other sports have noted that it takes much longer to obtain consistent scores than more traditional stats. For his first two seasons in the NBA, Kevin Durant “had some of the worst plus/minus numbers in the NBA”; one year later, as Grantland columnist Zach Lowe noted, he had one of the best.
The high variability in plus-minus scores been, at times, borderline schismatic in the basketball analytical community, with ESPN mathematician Kevin Pelton listing the divide between plus-minus metrics and box-score metrics as the greatest unsolved question in the field.2
Intelligent people are able to use plus-minus metrics to analyze players, but they often do so by utilizing multiple seasons of data, performing “adjustments” and “regularizations” (which often times requires more than one NBA season [82 games!] worth of data), and by using other numbers and subjective (video) analysis to confirm any plus-minus story.
We’re years away from being able to do that in Ultimate. The teams that track using the Ultiapps system may track ten or fifteen games at most. It would take about five ultimate seasons to get to one basketball season, and, even with that amount of a sample size, we’d still have highly variable plus-minus results.
What Makes Ultimate even More Problematic for Plus-Minus?
Another issue is that, in its basic form, plus-minus can’t take your own team’s skill or the strength of your opponent into account. Anyone that reads Ultiworld could pick up with Team USA, stay out of the way, and watch as the other six studs on the field worked the disc to the endzone. At the end of your tournament with them, your plus-minus will probably look similar to Alex Snyder’s. But you’re not as good as her.
On the flip-side, even Beau Kittredge will have a negative plus-minus score if he plays a tournament with an inexperienced team that loses all of its games. With the vast disparity in opponent and team quality in club Ultimate today, plus-minus is poorly suited for comparing players on different teams.
But even comparing two players on the same team has serious statistical issues. High-level Ultimate teams tend to play strict lines with tight rotations. This creates tons of player-to-player covariation, a term that basically means it will be hard to separate one player’s worth from another’s if they play a lot together. Imagine that Snyder and Kittredge play every single Team USA point together: They end the tournament with an identical plus-minus score of +10. This means they scored on 10 more points than they were scored on. What portion of that +10 is attributable to Snyder? To Kittredge? To a synergy of playing together? To one of the other five players that usually played with them? As you can see, plus-minus will be a bad proxy for explaining “who” caused any plus or minus.
Of course a club team isn’t so rigid over the course of a season as that hypothetical. But it is about that rigid in close and competitive games. The more important takeaway is that a sample size of ten games – which was already probably too small for the reasons in the Durant anecdote discussed above – is much too small when your playing time patterns are non-random. Even comparing 7-man lineups (trying to figure out which DLines are most effective) is perilous without tons of data. There are some statistical techniques that can help solve all of these problems, but I don’t believe it’s solvable without vastly more games than a team is going to play in any one season.
Don’t Other Stats On Ultiworld Suffer From The Same Issues?
Yes, but probably to a much less extent. One of the projects we are working on is testing which stats are most valid to teams, coaches, and readers in situations when you only have a small sample. As part of that project, we looked at some Leaguevine stats, which include an adjusted offensive and defensive plus-minus measure. Those two metrics are two of the least stable numbers that Leaguevine tracks.
That article, one of our first steps in critically analyzing our own work, should present a more complete picture of what is good in a small sample and what should be viewed more skeptically. We have also been proactive in our past presentations about highlighting and warning about certain numbers, like some of our defensive, throwing, or yardage metrics that we think are especially volatile.
Our current approach is that, using the Ultiapps tracking system, we track and calculate plus-minus – but we don’t even bother to graph it for teams. While the theory of plus-minus sounds very appealing at first, my opinion is that almost any level of faith in the metric is misguided.
A good explanation, which I also read and borrowed many ideas from, is available here. ↩
“When it comes to player evaluation, there are two schools. There are stats derived from the box score — such as player efficiency rating, my WARP, Basketball-Reference.com’s win shares and Wages of Wins’ wins produced — and plus-minus statistics. The problem with plus-minus is that it tends to be highly variable from season to season, but the advanced techniques and multiple seasons utilized by regularized adjusted plus-minus have allowed it to predict future team performance nearly as well as the best box score stats.” ↩