As long as I can remember, I have always loved March Madness. Selection Sunday’s came and I watched anxiously as if my season was riding on the committee’s announcement. As the teams were unveiled, I would comb through analyst articles, RPI rankings, and almanacs. To me the seedings were irrelevant, I tried to dive into as many objective factors that I could think of that would allow me to objectively pick the outcome. Crunching through, record vs similar opponents, W-L in the last 10. I would get a sense of who I felt had the strongest ability to play deep into the final four. From there, I tried to get a sense of each team’s style of play. Were they a sharp shooting team that played small like Duke with JJ Redick? Did they have an overpowering inside presence with a dominant big-man like Shaquille O’neal and LSU? Perhaps a team had a little known squad of also-rans with a superstar point-guard like Steph Curry and Davidson. The last and final X factor for me had been coaching. As history will show, teams with tenured coaches that had led tournament teams in the past seemed to have an edge over their respective peers. At the time, I thought I was a budding Dick Vitale. Absent a few buzzer beaters, Villanova 2016, and Butler 2010 I felt pretty good about my tournament prognostication skills.
This year I decided I was going to take my strategy to the next level and work to understand how the power of machine learning could aid in the prognostication. Turns out that two P.H.D students from Ohio State (Matthew Osborne and Kevin Nowland) had beat me to the punch and wrote a machine learning program built around predicting more upsets then their human counterparts. The algorithms aim to use classification algorithms such as logistic regression, random forest models, and k-nearest neighbors. Each has its own unique way of trying to predict upsets by analyzing the same data set of 2001-2017 first-round games. Surprisingly, like humans the machines were not infallible but provided some noteworthy results. 75% of the time the combined predictions of the models picked the correct outcome, which while not something that makes people immediately go running to their nearest Best Buy to set up an NCAA Bracketology Mining rig, you are rather likely to see this type of machine learning used to tweak the methodology of the selection committee moving forward.
More and more, data scientists are competing to improve machine learning’s capabilities. In fact. since 2014. a throng of basketball enthusiasts have competed in Machine Learning Madness. This year 955 competitors are vying for a total of $25,000 in prize money that goes to the five most accurate brackets. Brackets are rated not only on the bracket outcome but also their degree of certainty. Essentially each winning game gets rewarded more points based on a confidence score. Doubly sure that Loyola of Chicago will beat Illinois (don’t feel bad about this one, I don’t think IBM Watson or Stephen Hawking had this one) you put 1 down, not feeling so lucky, put it zero, but beware the leverage applied on the confidence can provide. To date the random forest algorithm, which applies a decision tree like learning method to running simulations has been the most widely used.
With the popularity of Machine Learning rising, the NCAA has decided to remove RPI (Ratings Percentage Index) from its selection criteria in favor of NET. NET aims to incorporate variables into its system for calculating a team’s rating. NET factors strength of schedule, game location, scoring margin, and net offensive and defensive efficiency. Other factors include performance in late-season games, including tournament games. It will be interesting if the use of this criteria will help create a more equitable field and fewer upsets. Only time and cheesy AT&T commercials will tell. If this year’s games are any prediction, then the algo’s have a way to go. It will be an exciting development to watch and will also likely be something that my Jay Bilas like reflexes incorporate into my yearly march madness efforts.