Baseball Batting Hall of Fame Predictor

1 minute read

With the start of May and unfortunately no baseball, I decided to try and pass the time by seeing how well an ML model could predict who is going to enter the Hall of Fame.

Important Notes:

  • Criteria is offense based only (no defense or pitching yet)
  • The training cut off is 1995
  • The model takes into account the number of years played so someone like Mike Trout who only has 9 years of service might be penalized as 9 years would be unlikely to get you in the HoF

Predictions

First Last Final Season HoF
Jeff Bagwell 2005 Y
Albert Belle 2000 N
Barry Bonds* 2007 N
Ken Griffey 2010 Y
Vladimir Guerrero 2011 Y
Derek Jeter 2014 Y
Chipper Jones 2012 Y
Edgar Martinez 2004 Y
Manny Ramirez* 2011 N
Alex Rodriguez* 2016 N
Gary Sheffield 2009 N
Sammy Sosa* 2007 N
Frank Thomas 2008 Y
Jim Thome 2012 Y
Mo Vaughn 2003 N
Todd Helton 2013 N
David Ortiz* 2016 N
Adrian Beltre 2018 N
Carlos Beltran 2017 N
Albert Pujols Current N
Ichiro Suzuki 2019 N
Miguel Cabrera Current N
Juan Soto Current N

* Named in the Mitchell Report, suspected of steroid use, or confirmed steroid use

As a baseball fan growing up in the mid 2000s, not many of these names are too surprising to see here. Some of these players are “tained” accofding to the BBWAA and will not actually be inducted (maybe veteran vote eventually). David Ortiz will be the player to watch to see what comes of the other “tainted” players as Ortiz has been rumored to be on better terms with the BBWAA.

It was surprising to see that the model did not predict Mike Trout making it until I extending the data to give him a longer career when Juan Soto was predicted as making it having only played 2 years (not trying to discount how good Juan Soto is). I wonder how extending the career for some other younger players would play out in this prediction.

The Jupyter Notebook can be found on Google Colab if you would like to check the source code or edit the notebook. I might be uploading a pitchers prediction soon as well as working in defensive statistics.

Categories:

Updated: