Sunday, January 20, 2008

Units and Comparisons

In writing some other stuff, I’ve noticed that I’ve been referencing this topic, and so I figured I should just write about it in isolation so that I don’t have to go off on a tangent elsewhere.

This kind of goes back to Bill James’ article in the 1987 Baseball Abstract about “meaningful and meaningless statistics”. When looking at a baseball statistic (which I’m using to mean a category like “Triples”, not a single statistic like “Curtis Granderson had 21 triples in 2007”), I ask myself a few questions. These questions don’t answer how worthwhile it is to know, but they are very useful in considering derived statistics:

1) What are the units of the statistic? Are they actual units or estimated units?

Consider a fairly mundane counting statistic, the balk. The unit of a balk is clearly “balks”, and since the category is simply a count, they are actual units.

Batting average is measured in units of “hits per time at bat”. Again, they are actual units, although the at bat itself is a weird, kind of artificial subcategory of plate appearances.

OPS is measured in units of “total bases per at bat plus times on base per plate appearance”. Since we have two different denominators, there’s not a clear unit here as there is in the two components taken separately. The unit total bases per at bat plus times on base per plate appearance has no clear meaning; people use the stat as an approximation of overall offensive ability, but overall offensive ability is not a unit either. OPS has no units.

Runs created is measured in units of “estimated runs”. While both RC and OPS are estimates, the distinction between these will be explained below.

Questions two and three apply primarily to derived statistics, not from counts of events.

2) If the derived statistic is measured in an estimated unit, is it one that is fundamental to our understanding of baseball?

By fundamental, I mean units of things that really matter in terms of winning baseball games. If the stat is expressed in estimated runs or wins, then it is fundamental, although I would consider other things to be fundamental.

For example, On Base Average is a very fundamental thing to know; the rate of reaching base. Of course, OBA is not really an estimated unit, but an actual count of things.

I also consider any sort of event frequency with a sensible denominator to be fundamental...walks/PA, homers/PA, hits/balls in play, etc. Of course, some of these are more telling than others (catcher’s interference/PA is not particularly important to know), but they all are very straightforward, basic pieces of information.

Slugging Average is a more interesting case; clearly “bases gained by the batter on hits” is a factual count. However, bases gained by the batter on hits is not critical to understanding baseball, like the rate of reaching base is. If it were “bases gained on hits by batter and baserunners”, then it would be a bit more telling, either on the team level or as an estimated unit on the individual level. So I’ll leave that one up there as one to be decided on. It’s sort of like if you took (balks + wild pitches)/inning.

OPS fails this test, since it’s not measured in terms of anything. That does not mean that OPS cannot be transformed by mathematical operation into a fundamental estimated unit, like runs, but on its own, the units are meaningless.

Win Shares is measured in wins, except the wins are multiplied by three. I consider that to be fundamental. If you want to be a stickler and demand that Win Shares divided by three to consider it a fundamental estimated unit, that’s okay too. The reason I don’t draw a distinction is that the transformation is scalar and straightforward.

3) Can two players (or teams, etc.) be compared using this statistic by the difference between their figures? By the ratio? Both? Neither?

What I am getting at here is does the difference or the ratio have meaning, other than just to tell us which is better. For instance, an OPS of 1000 can be compared to an OPS of 800, and seeing that the one player is +200 points or has a 1.2 ratio, we can see that the 1000 OPS is superior. But the +200 points or the 1.2 ratio don’t have any meaning other than facilitating the comparison.

For an example of a statistic that can be compared by difference and ratio, take runs created per game. 5 RG is two more runs per game than 3 RG, or it’s 67% more runs. Either way, the numeric result of the comparison is expressed in a meaningful unit. There are many stats that would fall into this category: Winning Percentage, On Base Average, Wins, Losses, …, Balks.

However, there are some derived statistics for which only one of the operations produces a meaningful result. Take Runs Above Average for example. If I tell you that one player is +10 RAA and the another is +1, then we have a ratio of 10 and a difference of 9. The difference of nine tells us that Player A contributed nine more runs beyond an average player than Player B did, which is valuable. But the ratio of 10 just obfuscates things, unless you take the position that RAA measures value, and thus Player A is ten times more valuable. Even if one takes that view, the ratio gives a much cloudier picture of the disparity in value than does the difference.

Statistics with meaningful ratios but meaningless differences are harder to come by, but one example is ERA+. ERA+ inverts the usual format of a relative statistic (X/LgX) to LgX/X, in order to make a figure above 100 desirable as it is for OPS+, or Relative Batting Average, or any number of other such stats.

This seems innocuous enough at first glance, but it causes some problems, and you need to be careful when averaging it, as Tango Tiger has shown. Suppose that you have a pitcher who works exactly 200 innings in consecutive seasons in leagues with an ERA of 4.50. In the first season, our pitcher’s ERA is 3.00, and thus his ERA+ is 1.50. In the second season, his ERA jumps to 4.00, and his ERA+ comes in at 1.125. Since he worked the same number of innings each year, we can just average his ERAs together and find that he has compiled a 3.50, which is a 1.286 ERA+. However, if we average his ERA+s, we get 1.313.

The reason this ends up happening is that when you invert the calculation, earned runs, rather than innings, become the denominator quantity. So in order to average the two seasons, you must weight by earned runs (and indeed (4*1.125 + 3*1.5)/(4+3) = 1.286).

For the same reason, the difference between the two means nothing. 1.5-1.125 = .375. What does .375 represent? If we take it times the league ERA of 4.5, we would expect to get the difference between the two ERAs (1), but instead you get 1.69.

Suppose that we instead consider ERA/LgERA. The seasonal figures are now .667 and .889, and we know the total to be 3.5/4.5 = .778, and the average of .667 and .889 is indeed .778. Now the difference between the two, multiplied by the common league ERA is (.889-.667)*4.5 = 1.

The ratio between the two is .889/.667 = 1.333; the ratio between the ERA+s is 1.50/1.125 = 1.333. You can see that the ERA+ ratio is meaningful, but the difference is not, and that’s something you should always keep in mind when working with ERA+ over the course of a pitcher’s career. So while we may have become accustomed to ERA+, its reciprocal is easier to work with and is meaningful in ratio and differential comparisons.

Then there are the statistics for which neither the difference nor the ratio has any intrinsic meaning. Generally speaking, these are the derived stats that I cannot stand, and wish that their inventors and figurers would convert them to a different format. Examples include OPS (since it is unitless to begin with), EQA, and Offensive Winning Percentage. I’ll examine the case of OW% here. OW% starts with a statistic, Adjusted RG, which is meaningful by both difference and ratio, and converts it into a format where it is meaningless by both, at least for application to individual players.

Assuming a pythagorean exponent of 2, OW% = RG^2/(RG^2 + Lg(R/G)^2), or ARG^2/(ARG^2 + 1). Suppose we have a hitter with an ARG of 110, and another with an ARG of 120. Player A has created 110% of the league average runs per out; Player B 120%. The ratio 120/110 is meaningful; it tells us that Player B created 9% more runs per out than did Player A. The difference 1.2-1.1 is meaningful as well. If we multiply by the league average R/G (say .18), we will find that player B was created .018 runs/out more than Player A.

When we convert to OW%, Player A is now at .548 and Player B is at .590. The ratio between them is now .590/.548 = 1.078. In what sense was Player B 7.8% more productive, more valuable, more whatever than Player A? The answer is in no sense that reflects on their actual status as individual members of a ballclub. It is true that a whole team that hit like Player B would be expected to win 7.8% more games than Player A. However, this does not translate directly into any statement of their actual value as individual members of a team.

The difference is just an unintelligible. OW% is okay for the thought exercise aspect of “how good would a whole team of this guy be”, but when it comes to actually measuring the value of a player to his team in a meaningful way, it fails. All of this is not to say that non-linear relationships have no place in evaluating players, but if you’re going to use them, you need to be sure that you model reality and not an unrealistic scenario. For example, you could ask “What would the team’s W% be if it was made of eight average players and Player X” and use the pythagorean formula to make an estimate. The relationship between two player’s OW%s figured that way would not be the same as the relationship between their ARG, but one could argue that it would be a truer reflection of their value. OW% cannot make that claim, and thus the non-linearity just serves to obfuscate relationships between players.

To wrap this rambling atrocity up, the most useful statistics tend to have positive answers to all three questions: they are denominated in some sort of unit, that unit is fundamentally important, and the ratio or difference between players or teams express the relationship between them in a meaningful way.

1 comment:

  1. It all went downhill with algebra. Before algebra, there were no meaningless quantities.

    Seriously though, I agree. Counting statistics have more meaning to most people because it is easy to see what is being counted. Rate statistics, like OBP, are easily comprehensible when they are per event or per nine innings.

    But other metrics (OPS especially) have always bothered me because, like you said, either they have no meaningful unit, their ratios are meaningless, their differences are meaningless, or both.

    This really is the danger of using unknown quantities. Not being a numbers person myself, I easily get frustrated with metrics that seem void of meaning because they don't have an obvious relation to discernible baseball events.

    Call it Euclidean Sabermetrics.

    ReplyDelete

I reserve the right to reject any comment for any reason.