One of the things that we can do with the college football database is to use it to visualize relationships between various college stats and some pro stats. I did a little bit of that this morning by downloading some CSV files from the CFB quarterback app and then running correlation matrices with pro measures. I did enough of it that I can report back that some relationships exist and also that coming up with conclusive projections is going to be tough sledding.
To illustrate, here’s a correlation plot that shows some pro measures across the top of the plot and some college measures down the side of the plot. The numbers in the squares are correlations expressed as a whole number between –100 (perfect negative correlation) and 100 (perfect positive correlation). You can see that CFB INTRT enjoys a negative correlation with a number of pro measures, which we would expect. It’s also the case that CFB yards has a positive correlation with a number of pro measures. The problem is that both correlations are pretty weak. I’m toying around with the data to see whether various splits yield more predictive measures, but again, I expect this to be tough sledding. Some of the correlations increase or decrease depending on how I adjust the defensive filters and down and distance filters, so that’s at least reason for optimism that there are answers in the database.