RotoDoc holds a Ph.D. in mathematics and tackles sports data to give you a competitive edge in your fantasy games.
The regular season is over. For many, that means a serious case of fantasy football withdrawal. But for certain die-hard types like me, there’s still three more weekends of daily fantasy left before we turn our attention toward the 2015 season. With a full season of data at my disposal, I took the time to create a data-driven model that projects fantasy production to aid me in creating my lineups. Projections are often overlooked by the novice player, but are extremely important in daily fantasy because normalizing expected scoring to a player’s salary helps us find cheap value plays or let us differentiate a single player from a group of similarly priced players.
The model I created uses a machine learning algorithm called a random forest that is trained on the regular season data and then projected forward to this weekend’s wild card games. Once we have those projections, we can then use them in conjunction with our favorite daily fantasy site’s player salaries to find the top value plays for this weekend’s games.1
The model takes into account all sorts of factors, the most important being recent fantasy production. To find the optimal number of games for the recent history data point, I took the average fantasy production over a players last X games, and plotted it against actual fantasy points. The table below shows that the average production over the player’s previous six games was the best predictor of actual fantasy production, as measured by the R-squared value of a simple linear regression.
|Last 1||Last 2||Last 3||Last 4||Last 5||Last 6||Last 7||Last 8|
This result is actually quite intuitive. The most recent game or two is usually not enough to accurately determine future production, but as we climb back farther into a player’s history the predictive power increases. However, when we go back too far, older results become less predictive, presumably for reasons such as personnel and scheme changes.
However, I don’t throw out all that other useful data. I also use data from more recent weeks to reflect any possible changes in usage, and I also use the average production over all games for the whole season as input factors into the model. When combined with some other metrics including match-up quality, Vegas lines, etc. we get a highly predictive model that we can use for fantasy purposes.
This model isn’t an end-all be-all projection, rather, it’s another data point for you to use when creating your rosters. I still rely on the various GLSP projection apps as another data point, especially because it’s great at filtering out irrelevant games, something I’m working toward improving with my model. However, I am pleased with its fit to the season-long data it was trained on. Note: the distribution of fantasy points is highly skewed. I use a Box-Cox transformation to normalize the data, as displayed in the graph below. I then transform it back to the correct point system for the final projections. For the whole 2014 season, the model predicts around 60 percent of the variation in fantasy points for all players, despite things like injuries and unexpected depth chart replacements undoubtedly throwing off some of the data points. It’s not as predictive on new data, explaining ~30 percent of the variation, but it’s a work in progress.
The model has a few other limitations, all of which I’m working on perfecting for next year to provide even greater predictive value. I need to incorporate injury status updates, depth chart replacements, and a data driven filter much like the GLSP projection apps use to better filter relevant games. Additionally, my model is currently tuned for DraftKings scoring, since that’s the site I mainly use. I will be quickly implementing a more flexible version to accommodate any scoring system.
So how does the model view this weekend’s games? For one, Cole Beasley looks to be a great value pick if you’re playing in a GPP. He’s averaged nearly 12 DraftKings points per contest over his last six games, has an average match-up, is favored at home with a fairly high expected point total of 27.75 points, and costs a mere $3700. Antonio Brown also appears as a must-play in both GPP and cash game format against a Baltimore secondary that on average allows WRs to add 4.5 more points to their season average.
Where does the model fall short? I believe Greg Olsen is underrated by the model, being punished for lackluster production the past two weeks as well as a low scoring team total of 21.75 points. Arizona is absolutely awful against the tight end position, and I expect Cam Newton to lean on Olsen in the passing game. With that said, the model still projects him as the highest scoring TE in the wild card round, just not in points per thousand dollars (P/$). Since we can find value elsewhere, I’ll probably have significant exposure to him in all formats. It’s also projecting Le’Veon Bell as if he is fully healthy. We know that’s not the case, and as we get some clarity on his status, I’ll apply an adjustment to his projection.
|Name||Position||Team||Opp||DK Salary||Proj. Pts||P/$|