It wouldn’t be outrageous to claim that my series pitting scouting and advanced stats on opposite ends of the player eval spectrum is just a giant straw man. The reality is that things aren’t that simple and scouting methods might have a dusting of advanced stats, while advanced stats would certainly benefit from the granularity that is being observed during all of the endless tape grinding.
So if I don’t actually believe that things are as divided as I might make them seem, let me propose a way to move the whole mess forward. Some of the tape grinders are already assigning scores to players on non-box score measures. For instance, after watching a good amount of tape, a scout may assign a player a rating on Balance, or Lateral Agility, or Vision, or [insert another attribute not included in the box score]. If that same scout has a number of years of ratings, guess what we can do with the ratings? We can backtest them and assign coefficients. It’s possible that upon doing this, the scout might find out that 60% of the predictiveness of his ratings are tied up in his “Burst” score (just for example). At that point, the scout could begin weighting his Burst score based on how it had played an outsized role in the predictiveness of his overall scoring. It’s also possible that a scout may find that he had previously been assigning importance to a measure that had zero predictive ability, so at that point he could either adjust the way that he recorded that measure, or he could save himself a bunch of time and stop measuring it.
If you have a single scout’s ratings that have been backtested and had coefficients assigned to the individual ratings (of what we might otherwise consider to be skills difficult to measure) we can also compile the ratings of a number of scouts and then backtest each scout as to their individual predictive ability. Then it would be possible to come up with a composite player ranking with coefficients for each player evaluator. The final formula would look something like this:
SCOUT_#1_RATING = BURST*.55 + LATERAL_AGILITY*.25 + VISION*.1 + BALL_SECURITY*.1
COMPOSITE_RATING = SCOUT_#1_RATING*.25 + SCOUT_#2_RATING*.40 + SCOUT_#2_RATING*.35
A reasonable question to ask at this point is whether it’s realistic to expect for this process to get every prospect right. That’s actually not a reasonable expectation. But the good news is that the bar for this new process to get over is actually set so low that it could be stepped over. The NFL draft is already a pretty inefficient market, which means that there are a lot of improvements that can be made to the way that teams currently pick players.
The goal isn’t to get every prospect right, or to avoid missing out on Aaron Rodgers for instance. After all, Rodgers’ college stats wouldn’t jump off the page in an algorithmic selection scheme, and he obviously didn’t wow the scouting community given his late first round selection and the fact that when he was named the starter in Green Bay (after sitting on the bench for a few years) he still had questions as to his arm strength. Part of the cost of doing business when you embark on a process to improve your results is that you are going to have to make peace with getting some things wrong, but you’ll do it because over the long run you’ll get more stuff right. Think of it like the data driven serenity prayer. So you don’t throw out the whole process because the scheme you’re implementing would have missed the current best QB in the NFL (it could be argued that scouting failed on all three of the current best QBs in the NFL if Brady, Brees and Rogers comprise that list).
The goal of implementing the scheme I describe above is to reduce errors in the aggregate. You’re trying to go from a process where the relationship between predicted and actual results look like this:
to one in which the relationship between predicted and actual results looks like this:
It might actually be tough to see the difference, but the r-squared for the 2nd graph is significantly higher than the first graph. The first graph’s r-squared of .15 is about what you can expect out of NFL teams when they draft WRs. The second graph has an r-squared of .60 which I think could be possible if you combined box score stats and the derivative scouting metric that I describe above.
If you’ve been reading this thinking that I’m basically talking about Five-Thirty-Eight for scouting, you’re right and in fact that’s what I’ve been calling it in emails with the other RotoViz writers where we’ve been discussing it as a possibility. Stay tuned.