If you must forecast, then forecast often — and be the first one to prove yourself wrong.
— Paul Saffo
Things are constantly changing in the NFL and the fantasy football landscape. As a site, we’ve changed our minds on many things and have come to discover that the ways we’ve understood certain elements in the past no longer hold, for whatever reason. FantasyDouche once explained the proper attitude of the fantasy analyst using Paul Saffo’s language to describe good forecasting: “strong opinions, weakly held.”
In Saffo’s original formulation, the reason strong opinions should be held only weakly is not simply that everything is subject to change and you can always be proven wrong. It’s good advice specifically because it’s our strongest opinions that are most likely to be the result of confirmation bias:
One of the biggest mistakes a forecaster — or a decision maker — can make is to overrely on one piece of seemingly strong information because it happens to reinforce the conclusion he or she has already reached. . . . Lots of interlocking weak information is vastly more trustworthy than a point or two of strong information.
I present this as a disclaimer because I have at least one strong opinion which I continue to hold strongly. My strong opinion is that most of the production metrics we find valuable in wide receiver evaluation are best thought of as thresholds or rules of thumb, rather than continuous variables. They are most useful as heuristics in a “checking all the boxes” exercise — like the one that helped us hit on Justin Jefferson when he was a late first-round pick in dynasty rookie drafts. But in case this opinion is the result of confirmation bias — hitting on Jefferson is precisely the sort of event that would lead to such bias — then we should probably try to test it.
My other strong opinion — and this is one I share with most in the fantasy community — is that yards per route run is an important metric for WR forecasting. Does YPRR display this sort of threshold behavior that I’ve described? And if so, where should we draw the line? And perhaps most importantly, does YPRR give us additional information even after we know a player’s draft position?
Heuristics and the Challenge of Sophisticated Data Models
In the past we’ve used sophisticated models to find which thresholds matter and where we should draw the line. However, there’s value in taking a simpler approach, both because it’s easier to interpret the results and because it illuminates how each metric impacts a player’s NFL outlook. A simpler approach will help us see just which metrics actually follow this threshold behavior and which might be more profitably regarded as continuous.
Moreover, the sophisticated models exhibit an inherent fragility, largely due to the small sample of NFL WRs under investigation. To be sure, it’s difficult to counteract the underlying limitation here. But models that tend toward overfitting in any case aren’t good at dealing with that limitation, even when using methods to make the results more robust.
To give a concrete example, some random forest models we’ve used in the past indicate that touchdown scoring doesn’t really matter. At the same time, past work on the importance of touchdown scoring individually has found it to be a strong indicator, especially of otherwise unforeseen downside. Max Mulitz’s work on using touchdowns per game as a threshold continues to inform my process, and even years after it was published would have helped us to avoid busts that we were otherwise high on like Jalen Reagor, Quentin Johnston, Laviska Shenault, and Lynn Bowden Jr. (I’m sure it’s a coincidence that Reagor and Johnston went to the same college. On the other hand Josh Doctson was also a first-round bust out of TCU and he easily passed the TD-per-game threshold. Jack Bech is a fade for other reasons, but the history of overdrafted TCU receivers can’t be much comfort to his supporters.)
How to Simplify the Process Without Dumbing It Down
Whether other metrics may follow this same pattern or provide similar sensitivity to busts is still unclear, but we can gain insight by looking at some of our favorite metrics in greater depth. We can also somewhat overcome the problems inherent in a limited sample size by borrowing a validation method from the more sophisticated models.
One of the most straightforward and effective methods for addressing the problem of overfitting — and one of the most widely used in machine learning applications — is known as cross-validation. The basic idea is to randomly split your sample into subsets, then train your model on one subset and test it on the other. However, the key step is to repeat this process multiple times over different random subsets. By testing across multiple folds — multiple random subsets of the data — we can avoid the problems of overfitting without requiring a large sample.
Is Yards Per Route Run a Box Worth Checking?
In all the charts below, I’ve transformed YPRR into a percentile for ease of comparison across metrics. Then I’ve aggregated the NFL output of every player who meets or exceeds each percentile threshold. In other words, the 1st percentile includes every single player in the (in-fold) sample. The 90th percentile includes only the top 10% of players by YPRR. At each higher percentile, we remove more low performers from the sample to see precisely where a metric begins to make a difference in outcomes.
While this method doesn’t allow for direct comparison of one percentile group against another (because the lower percentiles encompass all players at the higher percentiles), it does enable us to identify a percentile threshold at which the hits start to outweigh the misses. If these metrics do function as thresholds rather than continuous variables, then we’d expect to see a clear inflection point. What we’re looking for is the area on the chart where the line stops being flat and begins to curve upward.
Looking for this inflection point over multiple folds gives us a more robust sense of whether such a threshold really exists and where to draw the most useful threshold. That said, it’s important to remember that these are rules of thumb, and drawing the line too precisely can often lead to mistakes.
Final YPRR and Fantasy Scoring
To see what I mean, consider the charts below.