Which pitching stat correlates best with WAR?
- Ayush Arora
- Jan 28, 2019
- 3 min read
This weekend, I decided to play around with baseball data on Tableau, a powerful data visualization tool, and thought I'd post my analysis here.
The question I sought out to answer was:
In 2018, which FanGraph pitching statistic correlated with WAR (wins above replacement) the best?
I downloaded pitching stats (standard, advanced, batted ball, pitch type, etc.) accumulated from the 2018 season for all 30 MLB teams from fangraphs.com, converted the CSV file to an excel friendly file using Microsoft Excel, then imported this file into Tableau.
I created a scatterplot for each of the 315 different pitching stats against WAR, ran trend line analysis to gather the R^2 correlation coefficient for each graph. (all scatterplots are at the bottom of article)
Statistics Lesson: In this case, the R^2 correlation coefficient shows the percentage variation in WAR which is explained by a specific pitching statistic.
After reading the "describe trend model" page Tableau created for each scatterplot, FIP- came out with the highest correlation coefficient value of 0.98.

You might ask, what the heck is FIP-?
FIP stands for Fielding Independent Pitching. FIP is basically a better version of ERA, stripping pitcher performance down to those events only controlled by the pitcher himself (walks, strikeouts, home runs, & hit by pitches).
FIP- is a park & league adjusted version of FIP on a scale that makes a FIP- value of 100 the average:

For a more detailed explanation, check out fangraph's page on FIP.
Alright, lets take a look at the scatterplot of FIP- vs WAR:

Graph Analysis
There are 2 lines on this graph I should discuss while explaining my analysis.
The first is a horizontal line at the average FIP- of 100, denoting that teams below this line had an above average pitching squad and teams above this line had a below average one.
16 teams had a below average pitching with the Miami Marlins having the worst FIP- of them all at 115.
14 teams had an above average pitching with the Houston Astros having the best at 78.
The second line is a line of best fit (a line that best represents the data on the scatterplot, literally).
The equation of this line is FIP - = -1.408*WAR + 120.208.
Looking more closely at the graph, I noticed that the Tampa Bay Rays are the farthest data point from the line of best fit. Using the equation & plugging in their FIP- of 93, the Rays should've had a WAR of 19.32 instead of the actual 16.40. Thats a difference of nearly 3 wins, putting them at a record of 93-69 rather than 90-72. Granted, this 3 win difference wouldn't have mad much a difference at the end of the season because of the Boston Red Sox's (108-54) and New York Yankees' (100-62) outstanding 100+ win seasons.
The New York Mets seem to be the next farthest from line of best fit at 99 FIP- & 16.90 WAR. Using the equation, the Mets should have finished with a 15 WAR. This team overachieved by 2 wins based on this graph. The Washington Nationals also overachieved by nearly 2 wins, having a WAR of 14.60 instead of the expected 12.90. The Braves won the NL East quite comfortably (8 game lead) in the end and their data point is smack dab in the middle of the line of best fit.
Now, the most interesting case: The Los Angeles Dodgers. With a FIP- of 90, the Dodgers should've had a WAR of 21.5 rather than 20.5. At the end of the 2018 regular season, the Dodgers and Rockies finished with identical records of 91-71 and were forced to play a one game tiebreaker for the division. While the Dodgers did win that tiebreaker and eventually reached the World Series (but lost haha!), that extra regular season win would have changed the whole playoff scenario for the team and possibly the result of the 2018 postseason.
Tableau is great tool for data analysis that I'll continue to use from now on.
I plan on making a similar post to answer which fangraphs batting statistic correlates with WAR best, so stay tuned!

Comments