Keep It in the Ballpark, and Don’t Beat Yourself

In baseball circles today, FIP is a ubiquitous statistic, and for good reason. In a single stat, a pitcher’s ability to control his own destiny is captured, given that FIP in turn controls for factors which any individual pitcher cannot.  

FIP is a statistic fitting for this era.  It emphasizes the Three True Outcomes, although not from a batter’s perspective, but from a pitcher’s. 

Fewer balls are put in play today. The three true outcomes, correspondingly, are on the rise (it’s been talked about here before even). It’s fitting that a primary tool for performance measurement should so explicitly feature those metrics.

FIP and its ballpark-factor/league incorporating cousin, xFIP, are appealing because they are so simple and parsimonious. FIP incorporates only five metrics (HR, BB, HBP, K, IP) into the fold. That said, there is a bit more to FIP than generally meets the eye. FanGraphs has a great summary

In short, as FanGraphs covers, in order to be easily digestible and read in relation to the even wider household name, ERA, those five aforementioned metrics must undergo some manipulation. Coefficients are employed to weigh metrics more effectively. A seasonally-adjusted constant is tacked on to the formula in order to more easily juxtapose any given pitcher’s ERA and FIP.

As simple as FIP is, there is always reason to believe that a simpler explanation is out there for understanding what leads to a pitcher giving up runs. Occam’s Razor, you know? One such possibility came to me when looking over C.C. Sabathia’s career statistics. This is what stuck out:

Career Earned Runs: 1442

Career Home Runs plus Walks: 1443

Without incorporating coefficients or a constant or even strikeouts, these numbers were remarkably close.  The logical next step was to discover whether this phenomenon was unique to Sabathia or not.

Visualizations are fun and thus a good place to start.  Below is 2018 data for all qualifying (162+ IP) pitchers plotting totals for home runs + walks against earned runs with a linear regression line superimposed.

The highest and right-most point represents Lucas Giolito

As you can see, there is a weak positive relationship illustrated above.  The simple linear regression’s adjusted R-squared, a metric which captures the “goodness of fit”, suggests that simply summing walks and home runs to predict earned runs in a single season “explains” about 18% of that earned run total.  Despite the sum of home runs and walks being statistically significant, they do not do a particularly good job telling the whole story of how runs get scored off pitchers in a solitary season.

A positive relationship between pitchers’ home runs allowed and earned runs? Shocking.  While the results of that first scatterplot/regression weren’t surprising or frankly even compelling, it seemed worth investigating whether a larger sample illustrated a more concrete relationship.  

To that end, this next scatterplot features data dating back to the year 2000, and includes all pitchers who have thrown 500+ innings.  500 innings is only sort of arbitrary here.  While it is an appealingly round number, it also represents a round total slightly above 482, the number of innings necessary for three qualifying seasons of 162 innings pitched, thus tripling the threshold from the prior scatterplot. 

The highest and right-most point represents none other than C.C. Sabathia

Here the results seem immediately more insightful.  The data are less noisy and more tightly packed.  The regression line ascends at a near-45 degree angle emphasizing the 1:1 (ish) relationship between earned runs allowed and home runs/walks permitted.  

A simple linear regression creates particularly interesting results though.  For one, the adjusted R-squared metric for this regression is 0.884, meaning that the “walks plus home runs” metric does in fact explain a lot of the variation in this data and fit the data well.  More striking though is the coefficient on the “home run + walks” explanatory variable: 1.005.  

That number essentially indicates that, based on this simple linear regression, for each additional home run or walk a pitchers earned run total is estimated to rise by 1.005 runs, holding all else equal.  Obviously, there are solo home runs; one home run and one corresponding earned run being the literal outcome.  The nuance is that when taking into account home runs allowed of all varieties and every walk permitted, after whole careers, this ratio winds up so close to being 1:1.

This insight does not necessarily offer a key takeaway; it cannot be applied like FIP can and does not make predictive assertions.  But it is an example of how delicately calibrated baseball is.

Without taking into account strikeouts or using coefficients or applying context-dependent constants, the runs that a pitcher may give up over his career (if he’s skilled and fortunate enough to pitch so long) can fairly accurately be accounted for based on the sum of the batters he walks and home runs he allows along the way.  Keep the ball in the ballpark, and don’t beat yourself.

You may also like...

%d bloggers like this: