Big Data, Analytics, and the Future of Baseball

Screen Shot 2015-11-04 at 2.55.15 PM

Following our class last week on data and analytics, I wanted to take a look at how analytics and other innovations are changing the game of baseball. Major League Baseball has always been data-driven, with statistics easier to record than other sports and longer seasons creating large datasets ripe for analysis.  Data has become even more essential in decision-making by teams today, which coaches and players need to buy into. Most teams have full-time analytics departments who are trying to find variables that will lead to success which are undervalued by other teams, allowing them to pay less for more. Each team has its own top secret algorithms to measure the value of players, which can be used to find talent that fits their organizational approach and forecast player performance.


Some teams that rely on numbers the most include the Houston Astros, Pittsburgh Pirates, and Tampa Bay Rays. The Astros are the most analytics-driven team in baseball, with an analytics staff that includes Sig Mejdal, Director of Decision Sciences, who is an engineer and former rocket scientist at NASA, as well as a medical risk manager and mathematical modeler. They have a database called Ground Control, detailed in a Sports Illustrated cover story, that attempts to synthesize quantitative and qualitative information about a player. Quantitative data primarily consists of on-field performance of players. Qualitative information includes scouting reports, biological factors and family history, psychological tests, work ethic, personality, durability and health. They use this to see how past prospects with similar attributes to the current prospect turned out and arrive at a projection of the future value of that player. In the minor leagues, they assign grades to starting pitchers for each game, which total to a GPA at the end of the season. They also have spray charts for every situation for each batter and pitcher to help determine where to best position fielders. Their system is so unique that it was hacked by the St. Louis Cardinals this past summer.


Blue Jays catcher Russell Martin, when he played on the Pirates, wearing the Zephyr BioHarness tracker.

Several teams have invested substantially in data-based injury prevention initiatives, including the Pittsburgh Pirates. The Pirates use wearable sensors and special player tracking technology installed in their stadium to measure energy levels and signs of fatigue. They found that resting players more often resulted in less injuries and better performance. They are also one of many teams who utilize pitch framing, which is the ability of a catcher to turn borderline calls into strikes, since data shows that the best catchers can save a team up to 50 runs in a season, equivalent to five additional wins. This could be one reason behind why strikeouts have increased across baseball.

The Rays try to find any advantage they can over the competition due to their financial constraints, which is detailed in The Extra 2% by Jonah Keri. Led by former Goldman Sachs executives, they used the same approach that worked for investments on Wall Street to gain an edge, particularly focusing on defensive shifts, where fielders change their positioning for every pitch, based on where data indicates the batter is likely to hit the ball. Teams such as the Kansas City Royals and Chicago Cubs use machine learning techniques and predictive modeling to inform their decisions. The New York Mets use models to determine the optimal lineup to maximize runs scored for each game. All of the teams detailed above have experienced recent success with 11 playoff appearances and 3 World Series appearances between them since 2010.


There are also tools available to every team including a sensor created by Zepp Labs that you attach to the end of a bat, which analyzes over one thousand data points for every swing including bat speed, hand speed, and time to impact. It is used by professional players such as Mike Trout, David Ortiz, and Giancarlo Stanton, but can be used by anyone to improve their skills and compare their swing to the pros. The sensor helps scouts get consistent data on players. It is used to evaluate players at all events put on by Perfect Game, the largest baseball scouting organization in the world. Additionally, technology including cameras and radar are installed in every Major League stadium, which capture real-time data supporting the PITCHf/x and Statcast systems.PITCHf/x provides detailed pitch by pitch data including speed, location, trajectory and movement, which is made available to public. It also measures biomechanical information on pitchers (location of foot, shoulder, elbow, and hand at release of pitch) and batters (location of hands, back, front feet, tip of bat at point of contact), which can be used to detect injuries. This can be especially useful for pitchers due to the rise in Tommy John surgeriesStatcast tracks the location and movement of the ball and every player on the field during the entire game. It is creating new ways to quantify player talent and performance. For example, it can be used to find out how hard the ball is hit off the bat, how fast a fielder got to the ball when making an amazing catch, or the speed of a baserunner when stealing a base.


The ubiquity of data and technology brings up a debate about whether it is better than relying on humans, including coaches, players, scouts and umpires. Presenting this data to coaches and players can be overwhelming and result in overthinking, so it is important to achieve a balance, where both numbers and human judgment are used to make decisions.


  1. Nice post. It’s been interesting to see how Baseball is making increased use of data for decision making. One of our retired professors actually worked with the Red Sox on one such program (for pitch count and location, I think). He ended up with some GREAT season tickets.

  2. One of my favorite movies, Moneyball, talks extensively about how data changed the way player data was worth more than other traditional measures, and how the A’s were able to field a competitive team at a fraction of the amount the Yankee and Red Sox organizations shell out. It always amazes me what stats the commentators bring up during a game, such as ERA of a batter against left-hand pitchers…in night games…on the road…in the month of August. Really? In the end, big data and types of data will keep evolving in baseball, other sports, and all businesses. Perfect topic following the big data discussion last week. Thanks for sharing!

  3. I definitely think that technology and data has opened up a lot of avenues for sports that previously never existed. Instant replay is one huge one that I can think of that has truly revolutionized many sports, including baseball. I do however feel like it’s important to maintain the “human” element of sports and not get too caught up in data and numbers. Sometimes the best part of sports and baseball is the randomness of it all. Numbers can certainly help give a competitive advantage, but some things need to be left alone I think.

  4. I completely agree with your last paragraph, and was thinking about how they can achieve a balance between human intuition and data. I think it is an interesting issue especially when it comes to recruiting. Just because a person with similar characteristics did not succeed, does not mean the recruit will not succeed. There are so many factors including the other people on the team, the other people on the other team, the weather, etc. that may have accounted for the previous person’s either success or failure. Relying on data can be dangerous for recruiting in my opinion because it assumes that all other conditions are also similar in order to effectively compare two players. I imagine that they account for external issues in some ways, and know that baseball teams have data relating to weather, but will it ever really be possible to aggregate all of this data to accurately analyze a situation without human intuition? My guess is that it won’t ever be possible… Great research! I’m so compelled by the topic of big data and how different industries use the data, so it was interesting to learn about baseball.

  5. rebeccajin06 · ·

    I think that the business of sports and analytics is really fascinating and an industry that is growing quickly. BC even offers a class in sports analytics now (if I believe correctly) which proves how data is changing even an industry such as sports. In some ways sports have behind in technology with their very late adaptation to replays, reviews, etc. but data mining is a very advanced topic that not many industries have successfully done. It’s also very interesting how baseball specifically has picked up more data tools. To me, it makes a lot of sense because batting average is a numerical metric that has always been very important. On the other hand, in a sport like football, touchdowns might be the main measurer for a wide receiver which is just a cumulative amount. Thanks for sharing your post!

  6. I agree with all the comments here about finding the balance between human intuition and data analytics. I’m a very loyal fan of the game, but would really like to see less time dedicated to the intricacies of spray charts and advanced metrics and more time making it more popular to the next generation of fan. I won’t go into an old-man-in-my-day rant about the game.

    I’m glad you brought up injury prevention. I think this is critical for players and especially teams when they are investing millions on players. It would be a huge advantage to any team if they can find a way to predict when a player is more likely to suffer an injury and then take steps to avoid, or at lease lessen the impact of an injury.

  7. Interesting post. As Henry mentioned, the first thing that popped into my head when I started reading this was the movie Moneyball. It is truly amazing how many different statistics that can be dug up in baseball these days. Always funny when TV commentators utilize the most random stats that are out there. As you mentioned, The 162-game season has made these stats even more relevant due to the large sample size. I think we will see more and more teams begin to embrace this shift in emphasizing data analytics too. But overall, I agree that all the data and stats can be overwhelming to a coaches and players who have much more to focus on. Data teams need to ensure plenty of balance to present the data in a simplified manner that will ultimately help their team win games.

  8. As a huge baseball fan for as long as I can remember this was a fun post to read. It is really amazing how data in baseball has advanced over the last couple years. Moneyball was just the beginning of data in baseball. All of the examples that you pointed out go way beyond what was in Moneyball. The pitch f/x systems seems really interesting to me because it is breaking things down to a bio-mechanical level. This is important because it can help explain a statistical trend about a player, such as why they can’t hit a curveball. It goes beyond merely saying that the trend it exists, it can help you fix the problem. The Astros system also seems pretty interesting given how many variables it is taking into account. Only the Astros will know if their system works though because they will never reveal the formulas behind it to the public. Its also pretty cool to see hacking and rocket scientists and wall street people getting into baseball. As you say in your last paragraph there will always be a human element of baseball. There’s still something to be said about a scout seeing a player in real life and judging what they can do.

  9. This is amazing. I love how data can help people in business but more importantly in sports. I personally own a zepp for my golf game. I have to say it has done wonders in telling exactly how my swing plane has evolved and has helped me with my swing speed. What i really like about it is that it shows you what you are doing right or wrong and then analyzes this, and sends you reports every monday with videos on how to improve the particular area where you are not excelling at. In terms of baseball, I loved the Brad Pitt movie and how that team was built on statistical knowledge to go big. I do have to say that it is important not to create an overreliance on stats though.

  10. Others have already beat me to this point, but I agree that scouting in baseball should be a nice balance of data/analytics and those human gut feelings. To relate it to your Mets, I read an article at some point during the postseason where the guy who scouted Daniel Murphy remembered that there was just something about his swing that was so promising, regardless of what the numbers said. I think Murphy is also a good case study into just how much the psychological component of baseball can outweigh whatever the data and analytics say. His .400+ average and 7 homer outburst in the NLDS and NLCS was so unheard of that no data/analytics could have predicted that, and then his atrocious performance in the WS was so far below not just his playoff outburst, but even his regular season standards. I think Murphy’s performance in the playoffs was nothing more than getting psyched in and catching fire during the first 2 series, then collapsing under the pressure and performing way below his standards during the World Series. It’s because of this psychological component that even the best-looking teams from an analytical standpoint don’t always deliver in the clutch, and so I don’t think the human element can be completely removed from scouting. Nonetheless, things like pitch framing can definitely have a huge impact over time, and I love seeing Statcast reports on how fast/long a home run ball was hit, so I feel that analytics have contributed value for both the teams themselves and for the fans.

  11. […] were given the freedom to explore areas that interested us, which for me included baseball and analytics. For example, I looked at how MLB’s innovative digital strategy is helping them engage with a […]

%d bloggers like this: