Recently I wrote about work I’m doing to try to get closer to having an accurate way to age-adjust players once they’re in the NFL. This work is very much ongoing, and not definitive by any measure. However, I still like to write about it even if it’s a work in progress because the act of writing about it helps me to clarify further questions I want to ask. One of the things I’m starting to see is that it’s early in the player’s career where the age adjustments have the most utility, then things sort of plateau for a few years, and then age adjustments become more important later in their career.
Since there are various moving parts at play, like production, age, experience, and draft position – some of which might not be linear in the way they relate to projecting fantasy points – I thought it might be interesting to simply use a regression isolated on rookie receivers to determine which variables have been significant in the past at explaining year two PPR scoring. After looking at a number of combinations of variables (mostly focused on interchanging the production variables), this is the best fit I found for the rookies who played in the 2000-2014 time period.
DEXP (draft expectation) is a transformed version of draft position to account for draft position having a nonlinear effect on fantasy scoring. DEXP in this case is also specific to players who have already played their rookie season but no other seasons. YDS is the player’s receiving yards per game, while GMS is the total games played in the rookie season, and AGE just takes the season and subtracts out the player’s date of birth. Note that while in a regression you would ideally want a variable that is more accurately observed than that, this saves me having to do a lot of research on actual dates of birth rather than simply using year of birth.1
I still need to redo this process in a way that will let me validate the model against a testing set, but for now I just wanted to offer a look into how these variables would impact valuation by showing the year two fantasy points that would be predicted for the 2015 rookie class. Again, just to be clear, this isn’t an actual prediction because I haven’t validated this model against a test set.
There is one major takeaway I have after this exercise: even when you control for draft position and rookie production, age has still been a significant variable for the historical data. But the other takeaway I have just from looking at this table is that you can see what a big impact opportunity is going to have on generating the independent variables in the model. If Rishard Matthews doesn’t get injured, then the production numbers helping Devante Parker are going to look a lot worse. Doing our own offensive projections in a few months (after rookies and free agents are a known commodity) will still be very important to arriving at a good estimate for 2016 production. Usage is that conditional probability that looms large between any understanding we might have of age — or draft position — and actual fantasy scoring, which is all we care about.
I’ll be continuing this work in the coming weeks because there’s still a lot of wood to chop. For instance, the way that the transformed draft variable is constructed can impact the results and that’s an interesting thing to look at. In addition, I don’t show weight in this model but it was pretty close to being significant. If you remember that I say a more closely observed Age variable would be more ideal, the same thing applies to weight. How many times during a player’s career do we get a good and accurate weight measurement? Not many, and yet lots of players add and lose weight over their careers. So that’s a variable where it’s already pretty close to being significant while not closely observed at all. If I was a betting man I would say that getting a more closely observed number is more likely to end up being more significant in the model rather than less significant.
- which is already in my dataset (back)