Pythagoras’ Rabbit Hole

This post comes about after having perused 2019’s Pythagorean records, and is admittedly a bit meandering. In an attempt to pin down where this meandering content goes, below are the three primary takeaways I aim to address:

1. In which seasons over the past decade did Pythagorean records correlate most, and least, to teams’ records in actuality?

2. Were prior year win totals or prior year Pythagorean win totals more strongly correlated, on average, with the following season’s win totals? (And to that end, do they correlate at all?)

3. Which “prior year” records (Pythagorean or actual) for specific teams correlate most and least to their follow-up season records?

A brief and obligatory note on Pythagorean records in baseball: originally developed by Bill James, a Pythagorean record is titled as such due to its derivation’s resemblance to the Pythagorean Theorem. At its base, Pythagorean records correlate strongly, on average, with actual team records. It does so simply as a function of the total runs any given team scores and, in turn, gives up. Bill James’ initial iteration of Pythagorean expectation is this:

This formula did have some error baked in though, and more widely cited Pythagorean records have subsequently been refined to cut down on that error. One primary method for doing so (discovered by people smarter than me) is to trim the formula’s exponent 2 to 1.83. The formula above, but featuring the 1.83 exponent, is the iteration that Baseball Reference utilizes, and which I apply here.

Very simply, team win totals, Pythagorean win expectation, and the discrepancy between those two totals have been pulled for each of the past 10 seasons. The chart below represents the dispersion of those two totals’ differences from season-to-season.

Dispersion between expected and actual win totals, as quantified by standard deviation, was at its lowest in 2010. Most years in the 2010s featured at least one team that slipped, to a statistically significant degree, outside their expected win range. 2019 also featured win totals as we might have expected, as compared to the four-season stretch that preceded it. 

Pythagorean wins are meaningful because they provide some form of litmus test for how a team maybe “should” have performed. The Cincinnati Reds won 75 games in 2019, but their Pythagorean expectation was to have won 80. In the NL Central where their primary competition has been by and large stagnant, the additions of Mike Moustakas, Wade Miley, Nick Castellanos, and Shogo Akiyama (not to mention Trevor Bauer) at least feel like additions to an already average team with upside potential, not a 75-win team navigating a multi-year rebuild.

Altogether, the distribution of record discrepancies has been fairly normal over the past decade, however it does appear more consistently bell-shaped in the case of teams underperforming in relation to their runs scored/allowed exceptions. Below is a histogram for the cumulative distribution of those discrepancies.

The 2016 Rangers were the “luckiest team” of the decade (+13 wins, far right) and the 2014 Athletics were the least lucky (-11 wins) over the past 10 years.

After finding there to be at least some spread between those two figures in the last few years, considering which of those records correlated with records in the following season came to mind. Of course, such an exercise is purely conjectural and in all likelihood presents no real predictive value; teams can make trades, sign free agents, see the strength of their division fluctuate, and generally be subject to any number of extraneous occurrences that more directly correlate to their subsequent performance. 

Still, such an exercise probably sheds light on those teams whose decades have been relatively more turbulent on a season-to-season basis as opposed to more consistent, for better or worse. In fact, the table below highlights a complete absence in correlation between season-to-season win total correlations, and in several cases, negative relationships.

Correlation, predictably, are not strong on average. Pythagorean Records outperform Actual Records in terms of correlation (.334 to .311) on average, but not by much. 

Houston sits atop this chart, which is ranked in descending order of correlation strength between season and prior-season win totals. Such a strong correlation really underscores the identity of the Houston Astros in the 2010s: they were a team that was consistently terrible for the decade’s first half, then, relatively abruptly, consistently superb. The Cubs also fit this description, albeit not so drastically. The Yankees, meanwhile, were consistently strong all decade long, having bottomed out with two 84-win seasons.

On the other end of the spectrum, the Diamondbacks, Twins, and Red Sox records bounced around a lot, such that there was a decade-long negative relationship between seasonal win totals and win totals the following year. The Diamondbacks, the 2010’s most turbulent franchise by this measure, twice jumped from 60-something win seasons to 90+ win seasons. Only once from 2010-2019 did the Boston Red Sox have a win total that begins with the same digit across two consecutive seasons (winning 98 and 93 in 2016 and 2017, respectively).

In summation, Pythagorean expectations clearly perform well overall, despite heightened dispersion in the latter half of the 2010s. That dispersion could in part be a result of a higher run-scoring environment wherein lopsided contests can skew Pythagorean expectations, but that is simply whimsical conjecture. Next, prior win and prior Pythagorean win totals both, perhaps expectedly, only generally relate to follow-up season records. And still only on average. What those correlations do highlight though is the year-to-year consistency (i.e. Astros, Cubs, Yankees), or lack thereof (i.e. Twins, Red Sox, Diamondbacks), of various franchises. 

You may also like...