One Way to Quantify Pitch Mix Variation

What follows is a general attempt to create a super simple, catch-all metric that aims to define how varied any given pitcher’s pitch mix is, relative to the rest of the league. My goal for what I am presently referring to as “pitch mix variation” is to quantify how diverse a pitcher’s offerings are and differentiate between those who are relatively more or less reliant on their primary pitch.

First, a clear explanation of what pitch mix variation (PMV, from here) is not. PMV is not predictive since, among other things, pitchers can adjust their offerings however they please. Nor is it informative when perceived through the lens of productivity: there is no discernible relationship between PMV and starting pitcher performance (there is only a slight positive correlation between PMV and FIP+, for instance), though admittedly I have not searched for a relationship very hard either. Finally PMV, as it is currently put together, is not context-specific. Pitchers choose pitches for a plethora of reasons, from batter handedness to various base-out situations and more; this number only aggregates across all batters faced in every situation.

PMV, as I’ve cobbled it together, aims simply to be descriptive. In essence, I ask: how many pitches does this pitcher have and how evenly is he choosing amongst them? On one end of the spectrum, a pitcher might have just two pitches and defer to a fastball 90% of the time, while employing a breaking pitch primarily to keep batters honest. This pitcher does not have a high PMV. On the other end, a pitcher might have five pitches and not deploy any one of them more than 35% of the time. This pitcher does have a high PMV.

A quick note on the data. Pitch classification and usage data were pulled from FanGraphs’ leaderboards for qualifying starters from 2010-2019. In all, there were 770 qualifying pitcher-seasons across the 2010s that were made part of this exercise. I compared only starting pitchers in order to avoid what I anticipated would be a ton of two-pitch relievers. Second, I have removed pitches that were thrown by pitchers <3% of the time in each player-season. This was a completely arbitrary cutoff, motivated only by “if he doesn’t throw it 3 times in a 100 pitch start, on average, it will only skew results to classify/count that pitch” logic. Finally, a serious consideration is that no distinction is made in these data between two- and four-seam fastballs. Pitch classification is generally far from perfect, particularly dating back a full decade, but it is very simply an unavoidable reality.

Simply put, I defined pitch mix variation as this:

My reasons for including these components is pretty straightforward. Here is a brief outline:

  1. S.D.(Pitch Percentages) – First, the standard deviation literally acts as a measure of variation across individual pitch usage. Should Pitcher A throw two pitches 75% and 25% of the time respectively, his pitches’ usage percentages would have a higher standard deviation that Pitcher B, who throws two pitches 60% and 40% of the time, respectively.
  2. max(Pitch Percentages) – I included this in order to account for the case of a single pitch being relied on an outsized amount of the time. While the sum of these percentages will always be 100%, including this further solidifies a 40%/30%/30% pitcher, whose max pitch percentage is 40%, having a higher PMV relative to a 55%/45% pitcher (55% max pitch percentage).
  3. Number of Distinct Pitches – While standard deviation, by definition, accounts for the number of pitches a pitcher has, as it measures the dispersion of their usage rates, it did not alone do enough to sort 2-, 3-, or 4-pitch pitchers. Given that, I have discounted the max(Pitch Percentages) by the number of pitches each pitcher throws. This additionally dilutes the influence of the max(Pitch Percentage), which was otherwise too significant a factor. Together, the righthand side of this formula also serves as a tiebreaker for the theoretical instance where a 50%/50% pitcher (0 standard deviation of pitches) is juxtaposed to a 25%/25%/25%/25% pitcher (again, 0 standard deviation of pitches).
  4. Subtracting from 1 – By subtracting this formula from 1, higher final values then correspond to higher PMV figures while lower final value correspond to lower PMV values.

This formula adheres to the following logic:

As a final step, the PMV values have been mean normalized around 100. A PMV higher than 100 thus corresponds to a pitcher whose pitch mix is above the mean (as calculated this way) for the population. A PMV lower than 100 in turn corresponds to a pitcher whose pitch mix is below the mean of the population.

Below is an example table that illustrates the PMV of several hypothetical pitcher arsenals.

Pitcher A illustrates that even a 2-pitch pitcher can have an above average PMV here, should they mix the two pitches roughly evenly.

Given this, we can tell the most average pitch mix variations come from, first, a 5-pitch pitcher who relies heavily on just 1 of those pitches (Pitcher J, 101.7 PMV) and second, a 3-pitch pitcher (Pitcher E, 107.6 PMV) who more evenly distributed his usages.

To visualize how pitchers in the 2010s ranked in practice, below is a histogram of PMV for all pitchers in the dataset.

As aforementioned, I did mean normalize PMV around 100. That said, I clearly did no additional data manipulation here. Thus, while the mean of PMV is in fact 100, the distribution around that point is not quite normal. Nor does a PMV of 105 indicate “5% above average PMV” as it might for OPS+ or the like. It is somewhat left-skewed with a considerable left-side tail, making the median or middle-most PMV slightly >100. That said, the distribution without further adjustment is normal-ish (another scientific term). That lefthand tail is in a large part a response to knuckleballer, and surely well-meaning antihero to this exercise, R.A. Dickey, who was in the 2010s essentially a 1-pitch pitcher.

Next, I have included three charts that cover the top, average, and bottom ranked player-seasons in PMV from 2010-2019 alongside pitch mix usages.

All the leaders in PMV throw either 4 or 5 distinct pitches.
Those players most middling in PMV generally throw 4 distinct pitches, but their offerings are skewed heavily in favor of their fastballs.
As a note, R.A. Dickey has been removed from this list as 7 player-seasons otherwise included were his.

So there is a pretty clear progression across those three charts above. For one, fastball usage and PMV are negatively correlated (-0.55). Pitchers like Bartolo Colon and Lance Lynn, who use their fastballs a considerable amount of the time, are then considered not to mix their pitches very much. Of course, these high fastball usage pitchers might rank differently with data that differentiates two- and four-seam fastballs.

Additionally, pitchers in the top 15 of PMV also appear to use their offspeed/breaking offerings pretty consistently/frequently, as one might generally expect. Marco Gonzales, who led the 2010s in PMV here, threw all three of his secondary pitches within 1% of the same frequency and didn’t employ his fastball more than a third of the time either. On the other end of the spectrum, all of the lowest PMV pitchers relied on either 2 or 3 pitches and were extremely fastball-reliant.

One of the more interesting uses for a metric like PMV is its progression over time. By this measure at least, PMV can summarize how mixed any given pitcher’s arsenal was from season to season. To illustrate this point, below are three pitchers’ seasonal PMV from 2010-2019, for seasons in which they qualified.

Max Scherzer’s rise in PMV was motivated primarily by the decreased usage of his fastball.
Clayton Kershaw’s slider and curveball usage roughly doubled from 2010 to 2019; that consistent rise across secondary pitches has pushed his PMV north of 100 in the last 3 years.

For pitchers who pitched at least 5 qualifying seasons from 2010-19, a rising PMV across those seasons was pretty regular. Max Scherzer, Clayton Kershaw and Zack Greinke are all examples of this. However, PMV across the entire group of pitchers only very weakly correlates with age.

There is a significant game theory component to every batter-pitcher faceoff. Logically, it behooves a pitcher to keep a batter guessing as much as possible. That said, there is a tradeoff to prioritizing breadth for depth, as throwing an inferior pitch only to give a hitter a different look might not always pay off. As aforementioned, there is no immediately evident relationship between PMV and success; last season in fact, Dinelson Lamet had the lowest PMV among qualifying starters and had a tremendous season essentially throwing just two pitches.

In a lot of ways this was a fairly subjective exercise. What keeps it from being more rigorous, in part, is no absolute definition of diversity: are more pitches indicative of a greater pitch mix variation or is a more even usage across perhaps fewer offerings a better indicator? You cannot exactly run regressions to refine these numbers like you might a statistic with a clearer, empirical result. Still, this was an interesting exercise and one worth further consideration. Someone might suggest incorporating the variation across pitch velocity/locations, arm angle, or spin rate/axis as part of this discussion, for instance. Should anyone have suggestions or critiques, I am always interested in feedback; feel free to comment below!

You may also like...