Thanks to the hard work of Mike Beers, we’ll be releasing an updated version of the Running Back Prospect Lab this week. In a nutshell, the Lab allows users to choose from a number of variables, create a model from these variables, and then use this model to project an RB’s fantasy output at the professional level.
You can find the full-fledged version of the tool here as well as a streamlined and mobile-friendly version of it here. Note that the mobile version includes a predefined model and does not require you to build a model of your own.
To overview how to properly use the tool, let’s run through the entire process and get a sense of what the start of D’Andre Swift’s career may look like.
Step 1: Choose Target Variable
Before we can even begin to consider results, we need to determine a “target” — what it is that we’re looking to predict. The Lab allows users to select one of three targets.
- Total points scored in first three seasons
- Points per game in first three seasons
- Total points scored in the best of a player’s first three seasons
For the purposes of this post, we’ll target total points scored in Swift’s first three seasons.
Step 2: Load Player Stats
Click into the box under Step 2, backspace until you’ve deleted the name of the default player, and type until you see the name of the player you’re interested in. This will automatically load his stats into the fields in Step 3. If you can’t find the player that you’re looking for don’t worry. You can manually enter his stats into the fields in Step 3. To do this, select “none” in Step 2 and then click into the available boxes under Step 3 and enter the corresponding stats.
Step 3: Select the Inputs to Be Included in Model
Step 3 is where the heavy lifting occurs. In this step, we’ll be selecting up to nine variables to utilize in our model. You may be inclined to assume that more is better and want to include every variable, but often this is not a sound approach. Including too many variables can detract from the impact of those that are most predictive while also introducing unnecessary noise into the calculations. Though the adjusted R-squared of the model, which we’ll take about later, will often increase as you add variables, this does not mean that you’re necessarily improving the predictive value of the model. You may have heard the term “overfitting” before and this can happen if you keep layering on variables. For this reason, we’ve instructed the tool to allow a maximum of nine variables. Don’t feel pressured to use nine in every model. Experiment with various configurations by toggling the “on/off” boxes. After making tweaks, click on the “update” button. This will instruct the Lab that changes have been made, and it will update the model.
As the NFL Draft and combine have yet to occur, I’ll have to make a couple of guesses and manually input them into the “player attributes” section of Step 3. Swift turned 21 in January so I’ll use that for his “final college age” and set his “weight” to 216. “Draft position” plays a major role in NFL success for RBs. For a back like Swift, who could be drafted in the late first or early second, draft position is a necessary variable.1 In a number of the mocks that I’ve reviewed, Swift has been selected in the mid-40s, so I’ll set his draft position to 46. I really don’t know what time he’ll record in the forty-yard dash at the combine but given some research of his high-school days, a 4.48 seems reasonable.2
The Lab allows users to select stats from a player’s entire career or final collegiate season. I prefer to look at a player’s entire body of work but if I were inclined to do so, I could include a mixture of career and final season stats. I’ll definitely be spending more time toggling variables in the coming weeks, but below is a simple model that uses four variables that we can use as an example. It turns out that I didn’t use age, weight, or forty as neither proved to mesh well into this model.
Remember that you should toggle between “yes” and “no” for the “Power 5 Conference” input if you are manually entering a player.
Step 4: Evaluate Model
As I played around with the variables, I made sure to check in on Step 4 after each update. If you’re not sure what the information in the two tables below the graph is telling you, here’s the very compressed summary. “Coef” is a constant that describes the mathematical relationship between the variable and the thing we are trying to predict (the dependent variable), in this case, fantasy points. The coefficient signifies how much the mean of the dependent variable changes given a one-unit change in the independent variable (while holding other variables in the model constant).
The model that we are looking at uses 295 fantasy points as a baseline, adds a bonus of 24 points for playing in a Power 5 conference, subtracts 0.88 points for every increase of one draft spot, adds 1.56 points for every rushing yard per game, and subtracts 4.87 points for every rushing attempt.
Wait, what? Why would it subtract points for more rushing attempts? Don’t we want our RBs to get more carries — the better the back the more carries he gets? While that certainly makes intuitive sense, what the model might be picking up on is efficiency. A back who produces 100 yards on 15 rushing attempts is likely preferable to one who compiles 100 yards on 25 attempts. By rewarding yards gained and penalizing attempts, the regression is rewarding more efficient players.
While playing with the Lab, you might see a really high coefficient for something like receiving yardage market share. Keep in mind that this variable runs from 0 to 1. So the actual change in the projection from bumping up the variable by 0.01 (i.e., a change of one percentage point — from, say, a 10% market share of receiving yards to an 11% percent market share) is only one-hundredth of the reported coefficient. A helpful thing to do is to increase or decrease an input and see how it impacts the projection. In the model we’re using for this article, if I increase Swift’s attempts per game to 11.2, his projected points fall by 4.87.
The more important measure to focus on is “Pr(>|t|)” or the p-value. The p-value can help us in determining whether a variable is statistically significant — that is, whether it’s a reliable predictor. Basically, the p-value can help us determine whether an effect we’re seeing in the model is an actual relationship or just random noise. The closer the value is to zero the better. In general, try to keep the p-value of your selected variables under 0.05. I wouldn’t sweat it if one or two of my variables slightly encroaches beyond this point, but your focus should be to limit these values. For example, if you see a variable with a p-value of 0.35, that means about 35% of the time you’d expect to see the same effect just by random chance — there’s a 35% probability that the variable in question is giving you noise instead of real signal. Remove that variable from your model.
You’ll also want to focus on the adjusted R-squared. In a simplistic sense, adjusted R-squared allows for comparing the “goodness” of fit between models that contain different numbers of variables. The higher the adjusted R-squared, the stronger the relationship is between the variables included and the thing we are trying to predict. Ideally, we want the points on the graph to form a line. However, we shouldn’t expect to come close to doing so. What we should do is review the change in the adjusted R-squared of our model after each change that we make. Remember, however, that we want to be careful of building in too many variables just to raise the R-squared. It’s likely that a five-variable model is more sound than a nine-variable model.
By toggling a variable on or off and reviewing its impact on adjusted R-squared we can get a better idea of whether it’s a meaningful input. I normally do this by first limiting to just one or two of the variables I know I want to include and experimenting from there. As I include an increasing number of variables, I’ll monitor the changes in adjusted R-squared. Pretty quickly, things will reach a point where toggling variables on or off has little to no impact. This is why I like to review variables on their own first. Try this out yourself and notice the major difference between including weight, rushing yards per game, and rushing touchdowns per game versus weight, rushing yards per game, and draft position.
One thing that I’d recommend you limit is variables that can “overlap.” For example, I wouldn’t advocate selecting rushing touchdowns per game, receiving touchdowns per game and all-purpose touchdowns per game. The former inputs are encapsulated in the latter and as a result redundant. Ideally, we want our included variables to be independent of each other.
This naturally leads to the question: “so how do I know which model is better or which to use if I can create four or five with a similar adjusted R-squared while limiting their p-values?” In cases like this, I like to review the results across models and see if there is significant variation. Often, players will fall in a similar range in all the models created, especially if you include draft position.
While the example model is simple, it gets the R-squared to one of the higher levels achievable without including unnecessary or counterproductive variables.
Given the variables I included, Swift looks like a solid prospect who can be expected to score somewhere around 330 points in his first three NFL seasons. Players with similar projections, given the created model, are really encouraging, and a number recorded multiple fantasy-relevant seasons.
The “Subject Player” table displays the variables we included, Swift’s numbers for these inputs, a projection of the total points he will produce in his first three seasons, and a projected percentile. His projected percentile of 80 means that only 20% of players are projected to score more points in their first three seasons according to the model we built.
The “Players With the Closest Projections” table lists out the 12 players with the closest scores given the model. These players aren’t “comps,” but players that the model projects to have about the same number of points in their first three seasons. As you can see, several players, notably Ray Rice, beat their expectations. That’s not to say that Swift is guaranteed to have success. The model projects Bishop Sankey favorably, and we all know how his career panned out. Nonetheless, this is still an encouraging projection for Swift as it places him in a range with some solid company.
Play around with the lab, workshop different configurations, and have fun reviewing the results!