Tuesday, December 30, 2008


* Advice to aspiring sabermetric bloggers who’d like to increase comments on their site: blog about Jim Rice. It doesn’t have to be about Jim Rice’s Hall of Fame candidacy, it just has to be timed around the Hall of Fame season (unfortunately for you, this is his last year of eligibility, so chop chop). As a matter of fact, you can even explicitly say that your point is not to discuss Rice’s candidacy as a whole, but rather one of the many arguments that have been offered on his behalf. It doesn’t matter.

Then comes the hard part. You have to figure out a way to get your post linked by a much more prominent blog, like the BTF Newsblog or Rob Neyer. I don’t really have any advice on this count, since I don’t go around seeking to have my posts linked anywhere, but I’m sure you can figure something out. Send them nice emails or something.

And now you’re in business. Now you will get a steady stream of comments from Rice fans and supporters, and this post of yours will get more comments than the last two dozen boring posts you wrote combined. The drawback is that these new posters consider you a big meanie for daring to say something less than complementary of Rice, have no interest in sabermetrics, and will never read your blog again. But hey, at least you got comments!

Seriously, though, this illustrates why topics like the Hall of Fame debate de jour will always be plastered all over the net: they sell. They get attention and comments and page views, and for anyone with any dreams of making money off his blog, it’s tough to pass up. I am always amused by which posts of mine get linked in other places most often, because the ones that get the most exposure (which admittedly is still not much at all) are the ones I tend to think are the weakest and/or most blah. Posts about Jim Rice, OPS, the MVP award, and rankings of players are popular. Posts about Base Runs and the nineteenth century National League, not so much.

* I recently went digging through a box of Baseball Weekly back issues, looking for one with roughly the same date as now--the end of December. In short order I found the “on sale through December 26, 1995” issue with Ozzie Smith on the cover. I was hoping to find some stuff that read very funny in retrospect--not in an attempt to mock the writer, but just as an illustration of how funny some things that seemed perfectly reasonable at the time would look thirteen years later.

Unfortunately, I didn’t find too much material along those lines. One of the problems was that it is one of those skinny, wintertime, forty page editions they used to have. There aren’t too many opinion columns, just a lot of notes on which marginal free agents are signing where. I should try it again over the spring with the 1996 preview issue, which should be a much more fertile ground for the kind of stuff I was looking for, and will juxtapose well with my annual “don’t take predictions too seriously” disclaimer.

All that being said, there was one gem to be found, a letter to the editor from Dave Roman of Brooklyn. This time, I will be poking a little fun at the author, as this is not just a bad prediction, it’s a bad prediction combined with a large presumption of knowledge and a dash of haughtiness thrown in--the kind of stuff I would make fun of anywhere, anytime:

Your piece (Mets, Marlins rebuilding, Dec. 6-12) should serve as an excellent reference for the Yankees and their owner, who should take notes on how to stop insulting their fans with lousy management tactics and overall lack of planning. There are no “yes men” or lackeys at Shea.

As a longtime Mets fan, I just sit here and laugh quietly knowing that class organizations like my Mets, the Marlins, AND Buck Showalter’s Diamondbacks will win championships before the Yankees.

In fairness, the Marlins have won a couple of championships, although in the process they completely destroyed any perception of being a class organization. And the Diamondbacks did win a World Series, albeit after Showalter was fired. But the Mets…well, at least they managed to win a pennant. And lose the subsequent World Series. To the Yankees. Good call, Dave.

* Within the last month I received my annual (actually, now bi-annual thanks to a title shakeup) copy of the SABR Baseball Research Journal. I have been a member of SABR since 1999 and have read all of the BRJs since then, as well as all the back issues dating back to 1996 and probably about a dozen from before then.

The difference in quality between the early journals (I’m using “early” here to mean “early in my time as a member”, not to refer to the BRJs of the 70s and early 80s) and the most recent editions is night and day. The older ones had many more articles, generally much shorter in length, and many about trivial topics (which is some people’s cup of tea, but not mine). The sabermetric pieces in them were generally trash; freak show rankings of players, endless rediscoveries of bases/something, and the like. There were always some very good non-statistical articles to be found, but the consistency of quality left something to be desired.

It is therefore a great endorsement of both the former SABR publications director Jim Charlton and his successor, Nick Frankovich, to say that the recent editions have been light years better. From the sabermetric perspetive, the BRJ remains non-essential, and that will probably always be, as sabermetricians were early adaptors of the internet and most of the best research will always wind up there. Additionally, the internet provides a great “peer review” outlet for sabermetric research, and allows for frequent publication.

However, the sabermetric pieces that are in the BRJ now are of a much higher quality than those of a decade ago. While the 2008 edition features a dialogue between Bill James and Phil Birnbaum lifted from the pages of By the Numbers (the newsletter of SABR’s Statistical Analysis committee), they are good articles as is a lot of what is in BTN. The old state of affairs was that the sabermetric articles in BTN, the quarterly newsletter of a single committee, were vastly superior to those in SABR’s flagship annual publication, which was pretty sad.

The historical articles are of greater quality and greater detail than those of the past, generally speaking. A couple that I really enjoyed in this one were Daniel Levitt, Mark Armour, and Matthew Levitt’s piece on Harry Frazee and Jerry Kuntz’s piece on George Lawson, baseball rabble-rouser who tried to organize a couple of “major” leagues. If you were unfamiliar with Mr. Lawson, don’t feel bad, as I was too. But the man was an unbelievable character as Mr. Kuntz’s piece demonstrates. Apparently he is working on a book about the Lawson brothers, which if this piece is any indication, will be a very entertaining read for a biography, even if much of the story is non-baseball.

Like any other journal covering a broad spectrum, there are articles that you will not find up your alley, and there are certainly a few pieces that didn’t do anything for me. Regardless, the recent BRJs are a huge step forward from those of a decade ago, and are a great example of the benefits of SABR membership and the wide-ranging interests and expertise of its members.

* I am an unabashed fan of the World Baseball Classic and am very much looking forward to the second edition. I am also an American and an unabashed supporter of the US team. That being said, though, I have no issues whatsoever with Alex Rodriguez choosing to play for the Dominican Republic.

While I am fairly patriotic (although that’s not why I go by the handle “Patriot”--that's another story, and not a particularly interesting one), I am an individualist first, and so I support any individual’s right to represent whichever nation they’d like in something as innocuous as a baseball tournament (if ARod had signed up for the Iranian army, that might be another story). While the various means of determining citizenship for international competition are a little silly, as long as a player is eligible under the rules, I see no problem with them choosing to represent either of his possible choices.

From a value perspective, the loss of ARod is not really much of a blow to the US team, if his replacements at third are indeed David Wright and Chipper Jones. While Adrian Beltre is a fine player (assuming he is playing in the WBC), it’s safe to say that Rodriguez is more valuable to the Dominicans relative to his replacement.

While I will be rooting for the US, my number one wish for the tournament is that Cuba be kept from winning, and strengthening the Dominican side without doing much damage to the American cause is a winning move on this front. My antipathy towards the Cubans is purely political and not directed at their baseball people; I feel bad for their players, and I also feel bad for the Cubans like Jose Contreras, Kendry Morales, and Orlando Hernandez who may well want to represent their country but cannot because they fled the Castros’ regime. Unfortunately the players are pawns in a political game, and unlike baseball games which are ultimately for fun, that one is for keeps.

I can also justify pulling for the US and the DR on the basis that I generally prefer to see the “better” team win a small sample size tournament. All things being equal, I will root for the team with a better regular season record in a playoff matchup. While both national teams may have failed to reach the semifinals in 2006, it would be difficult to argue that the US and the DR don’t produce the most plentiful talent, as evidenced by major league performance, with apologies to Venezuela.

Finally, I can’t help but be amused at the stories that said something to the effect of “Howard declines invite to Team USA”. I don’t doubt the veracity of this, but in a sane world, this would read like me announcing that I am declining to toss my hat into the ring for the Browns’ coaching job.

Tuesday, December 16, 2008

A Jim Rice Post (sorry)

I really don’t want this to be about Jim Rice or the Hall of Fame, but you may not believe me if you keep reading. What I really want to do here is make a point, and Rice just happens to serve as the example. However, the Hall of Fame debates are near the forefront of the baseball scene right now, and one writer’s argument about Rice has inspired me to write this piece.

Another disclaimer: this is not a good post. It just isn’t. Don’t say you weren’t warned.

I also don’t want to make this about the writer in question, Peter Abraham. I cannot claim to be familiar with his work, but my first impression is that he seems to be reasonably thoughtful and intelligent. However, he happened to raise a particular point, which I have seen broached elsewhere in different forms, explicitly and openly, and thus it is easy to respond to. It is a lot more difficult to respond to “Some people say…” as it seems as if you are setting up a strawman piƱata to bash with a Louisville Slugger.

This is what Mr. Abraham wrote in his blog at LoHud.com:

But here’s the problem, the Hall of full of players who were elected based on those standards. So should Jim Rice suffer or Bert Blyleven be elevated because smart people came up with better, more revealing statistics?

Nobody cared about on-base percentage in the 70s and 80s. Rice’s job was to swing for the fences. But now we know OBP matters. But Jim Rice can’t get in the DeLorean and take more pitches because it would make the Baseball Prospectus guys respect him more.

I have four points I would like to make in response, and I will take the lazy route and make a list:

1. Jim Rice had a low walk rate for a big slugger, even for a benighted era. I looked at the relative walk rates (W/(AB + W) relative to the league average of the same) for all of the 350 HR men in MLB history who started their careers in 1974 or earlier; Rice started his career in 1974, so these are all players who presumably would not have been affected by external pressure to take more pitches:

As you can see, most of these hitters drew a lot of walks relative to their peers, regardless of whether it was “their job” or not. Rice, walking at just 79% of the league average, is 38th of 41 on this list.

I could see this argument if Rice was not unique, and all of the big sluggers of the benighted era were not drawing walks. That’s just not the case.

In fairness, you can pick apart this list in a couple of ways. For one, the 350 homer minimum puts Rice, with 382, near the bottom. He’s being compared to a bunch of players who are better than him. A true group of his comparables would set the line lower, so that Rice was near the middle of the group. On the other hand, since the Hall of Fame is the overarching subject of this whole discussion, these are the guys he should be compared to. Also, eyeballing similarity, I would say the most similar players (in terms of career length, era, and production) on this list would be Dwight Evans, Dick Allen, Norm Cash, Rocky Colavito, Frank Howard, Graig Nettles, Billy Williams, Tony Perez, Dave Kingman, Orlando Cepada, and Lee May--all of whom walked more than he did except the last two. You don’t need to compare him to Ruth, Foxx, and Mantle to see that he didn’t draw a lot of walks for a slugger.

Another complaint is that by requiring the players to have embarked on their careers in 1974 and earlier, I am cutting out players who are essentially contemporaries of Rice while including many others who are clearly not his contemporaries (Ruth, Gehrig, Ott, etc.) Rice belongs in a group with Andre Dawson, who’s excluded here, much more than he does with them.

Caveats aside, I think the claim that sluggers of Rice’s era didn’t walk (or weren’t expected to walk) and thus it doesn’t matter should have to establish that contention. It flies against common sense and it flies against a cursory look at the evidence, and I am trying to be generous.

2. I let it go in my first point, but I don’t believe there was a benighted era (Abraham does not explicitly state that there was, but it is hinted at). Sure, baseball people (and writers, and fans, etc.) generally have a better or more complete understanding of the value of OBA and walks now than they did in 1980 or 1920. But I think it’s just a little bit condescending to pretend as if it was a revelation to them. Perhaps to some; but there have always been Earl Weavers and Branch Rickeys out there who understand the importance of getting on base/avoiding outs as well as anybody in any time.

There are certainly people throughout baseball, past and present, who didn’t properly appreciate the value of getting on base. And there are many more that, while recognizing the theoretical value of getting on base, have ignored or undervalued it when using statistics to judge a player. That’s a far cry from believing that this belief was so overwhelming that it prevented Rice from displaying patience, and the walk rates of other big home run hitters doesn't support such a position.

3. Even if someone threatened to fine Rice for every pitch he took, I don’t care. In the end, I am only interested in assessing value. If Rice’s lack of selectivity was caused by the conditions of his time, and made him a less valuable player in his own time and place than he might have been had he played in 2005, I don’t care.

Generally speaking, the same things that won baseball games in 2005 won baseball games in 1975. Evolving metrics just allow us to better quantify reality--they don’t change it.

This is where the discussion crosses into opinion territory of course--"value rules all" does not have to be the underlying philosophy that you use when evaluating players. I happen to believe that the only truly fair way to evaluate a player is to estimate how many wins he contributed to his team in the unique context in which he actually performed. Anything else is judging him on what he might have been or could have been and ultimately, what the individual thinks he could have been, rather than what he was. From reading the arguments of other writers over the years, it is my observation that straying too far off the value reservation often results in a mess of countering what-ifs and contradictions. It is much easier to just judge the player’s achievements as they relate to winning baseball games in the environment in which he actually performed.

Perhaps Rice would have taken more pitches if he had played today…but perhaps his aggressiveness enabled him to hit some of his home runs. Perhaps he would have gotten frustrated as a rookie when his hitting coach tried to impose this upon him, pressed, and gotten labeled as a AAAA player. Perhaps he would have been Manny Ramirez instead. The point is, you cannot possibly know what would have happened with any degree of certainty. Sabermetric estimates of Rice’s value in his own time and place are certainly not without flaw, weakness, and oversight, but they also are grounded in the principle of assessing what Rice actually did, and estimating its value.

I would even go so far as to say that Abraham’s argument, taken to its extreme, glorifies statistics above winning. One of the common complaints about sabermetricians is that all we care about is the numbers. Of course, the reason we look at certain statistics and interpret them as we do is because they correlate with wins. If you throw up your hands and say “In Rice’s time, people valued BA, HR, and RBI” and look at these despite agreeing that they are less telling than other metrics, aren’t you in fact saying that putting up statistics (specifically, statistics that are in vogue in a given time and place) is what matters?

4. Even if one accepts Abraham’s premise, I can’t imagine using it as an argument against a player. He cites Rice and Blyleven as two sides of the coin; one whose standing has been hurt by the proliferation of sabermetric ideas and one who has benefited from it.

Abraham suggests that perhaps Rice would have played the game differently if he played today. As I have spent the rest of this post explaining, I don’t buy it and even if I did, it wouldn’t change my mind on my opinion of him. But if you do, you may accept Abraham’s argument and view Rice’s relatively low OBAs (for a Hall of Fame corner outfielder) as a product of environment.

I have to ask, though, how would Blyleven’s performance be affected? Blyleven looks better when you evaluate him by runs allowed instead of by win-loss record. Had the observers of the time thought more about his ERA instead of his W%, he would have been better regarded in his own time. But how would it have affected what he did on the field? Blyleven was trying to prevent runs and win games. Are we to believe that he would have been able to allow less runs (or that he would have allowed more) had his contemporaries not paid attention to the “W” and “L” columns? Since I have to assume that just about everyone would answer “no”, then what good does it do to evaluate Blyleven in an outdated light? At least in the case of Rice, Abraham has offered a possible cause and effect relationship between contemporary views of statistics and performance. I don’t see any offered for the case of ERA v. W-L.

I have tried to avoid mentioning the Hall of Fame, because I don’t want to get into that debate (“that debate” being the one about which specific players should be in/out, not the election process itself or how a theoretical Hall might look). The same issues that come up in relation to the Hall are relevant for the general discussion of player value down through the years. My opinion of the Hall will be the same after Rice’s induction as it was before Rice’s induction, and while you can probably tell what I think of Jim Rice as a player, my intent really was just to use him as a vehicle to touch on the larger issues.

Tuesday, December 09, 2008

Demystifying Fibonacci Win Points

In his seminal 1994 book The Politics of Glory, Bill James introduced a simple method for evaluating pitcher win-loss records by combining the two components into one number. He called this method Fibonacci Win Points, and found that it was a fairly good predictor of which starting pitchers would be selected for the Hall of Fame.

You still see Win Points brought up from time to time for that kind of lightweight analysis, but I get the impression that most of the users don’t really know what the results are telling them (outside of the general idea that high numbers are good, it’s a combination of wins and losses, etc.). Since no one is taking the results too seriously, it’s not a big deal. But I always like to look underneath the math hood and see how things work.

So warning: this is a math post, and doesn’t really have much to do with baseball. The formula for Win Points (which I will abbreviate “FIB”) is wins times winning percentage, plus wins, minus losses. Writing it as a formula:

FIB = W/(W + L)*W + W - L

We can also write it as W^2/(W + L) + W - L, and make it abundantly clear that this is a unitless measure. The results bear a resemblance to win figures, but they are no longer a meaningful baseball unit.

To understand how FIB works a little better, let’s express all of the terms as per-decision rates. W% is still winning percentage, but wins are now equal to W% and losses are equal to 1-W%. The formula for Win Point rate (FIBr) becomes:

FIBr = W%*W% + W% + (1-W%) = (W%)^2 + 2(W%) - 1

To convert back to Win Points, we simply multiply FIBr by the number of decisions the pitcher was credited.

FIBr is a useful way to explore the relationship between Win Points and W%. We can see that a .500 pitcher will have a FIBr of (.5)^2 + 2(.5) - 1 = .25. So, a 10-10 pitcher will get .25 Win Points/decision, or 5 points. What does a pitcher have to do to earn .5 win points/decision?

We can find that by setting FIBr equal to .5, and solving for W%. It is a simple quadratic equation, which becomes very easy to see when we use “x” in place of W%:

FIBr = x^2 + 2x – 1

Setting FIBr equal to .5 and solving for x gives sqrt(10)/2 - 1, or .581. A pitcher with a W% of .581 will get a win point for every two decisions, whereas one with a .500 W% gets a win point every four decisions. It’s a very steep function (more on this later).

What is the point at which a pitcher gets zero win points? Solve the equation for FIBr = 0, and you get sqrt(2) - 1 = .414. While the value is superficially similar to a replacement level baseline, win points do not serve as a WAR method. .414 though is obviously the baseline, below which negative values are returned.

Finally, what is the point at which the pitcher gets as many win points per decision as he does wins? Solve FIBr for x, and you get (sqrt(5) - 1)/2 = .618. This consequence of James’ formula is why he called them “Fibonacci” win points, as (sqrt(5) - 1)/2 is the reciprocal of Fibonacci’s number, the golden ratio.

Allow me to go completely down the math digression path for a moment, with a disclaimer that I am not by any means a math professor and that James covered this ground in The Politics of Glory. The Fibonacci sequence starts with 0 and 1; at each step, you add the last two numbers together to yield the next term. So 0 + 1 = 1, 1 + 1 = 2, 1 + 2 = 3, 2 + 3 = 5, 3 + 5 = 8, 5 + 8 = 13, 8 + 13 = 21, 13 + 21 = 34, 21 + 34 = 55, and so on. As this sequence approaches infinity, the ratio between the previous term and the next term approaches (sqrt(5) - 1)/2. You can see this even in the early iterations; 8/13 = .61538, while 34/55 = .61818181… .

This will happen regardless of which two numbers you use to start out with. If we start out with 1 and 155, 1 + 155 = 156, 155 + 156 = 311, 156 + 311 = 467, 311 + 467 = 778, 467 + 778 = 1245, 778 + 1245 = 2023…1245/2023 = .6154, and you can see where this is going.

Writing it out to ten decimal places, (sqrt(5) - 1)/2 = .6180339887. The reciprocal of .6180339887 is 1.6180339888…the only reason why it is not exactly equal to 1 + itself is because I rounded.

The square of the number is .381966011, which has a reciprocal of 2.618033989. So the reciprocal of its square is 2 + itself. I wish I could tell you that the reciprocal of its cube is 3 + itself, but alas… However, the square plus Fibonacci’s number is one; thus it is both the square and the complement. The Wikipedia link above gives some more details on the Fibonacci sequence and the golden ratio and their appearances in art and nature as well as a much more detailed discussion of their mathematical properties.

Getting back to Win Points, we can differentiate FIBr with respect to W% to see just how steep the function is:

dFIBr = 2(W%) + 2

If we look at this specifically at the mean W% (.5 of course), we get a slope of 3. We can write the tangent line at that point in point-slope form as:

y - y1 = m(x - x1) where y = FIBr, y1 = FIBr(.5) = .25, m = dFIBr (.5 at this point), x = W%, and x1 = .5: FIBr - .25 = 3(W% - .5), which can be simplified to:

FIBr ~= 3(W%) - 1.25

Which can also be written as:

FIBr ~= 3(W% - 5/12) ~= 3(W% - .417)

Again, this is just a linearization of the FIBr function for a .500 W%; it is precise at that point, but doesn’t hold over the entire range of W%. However, it does give us some insight into how Win Points work. The baseline is a W% of .417 (although .414 is the actual zero point, a .500 pitcher doesn’t get any win points until his W% reaches .417--and if that doesn't make sense to you, just ignore it, as I'm not quite sure how to express it coherently), and each point of W% above .417 is multiplied by three.

Comparing this to a standard WAR formula, the WAR baseline will be in the same ballpark as .414 (I use .390), but each point of W% above the baseline is equally valuable. A .500 pitcher and a .495 pitcher will have the same WAR gap, given equal decisions, as a .600 and a .595 pitcher. That will not be true for Win Points, which rewards higher W%s more. This may be why James found them useful for predicting the Hall of Fame’s treatment of pitchers, as Win Points put a premium on excellence, beyond its real world win value.

Just to give you an idea of how well the approximation works on the career level, I figured actual Fibonacci Win Points and the linear knockoff for all post-1900 pitchers with 150 or more wins as of 2007. There are 198 pitchers, for whom the average absolute difference between the two is 2.3 win points. The differences are proportional to win points; the biggest differences are for those with the most win points (the largest single difference is 15 between Christy Mathewson’s 433 actual and 418 approximate). I think that this supports my claim that the linear approximation is a decent tool by which to understand how Win Points work.

It should go without saying that the approximation works best for those pitchers with W%s near .500, as that is the point at which I found the tangent line. If you were to find the tangent line at .600, you would get more accurate results for pitchers with W%s near .600, naturally.

Now I will give a freak show stat of my own, which I would never really encourage anyone to use. I include it here as a way to use the career value metric I prefer (given the constraint of working with the actual W-L record) with a career wins pseudo-scale. It is just a quick z-score conversion of pitcher WAR to a pseudo-Wins unit. For the 150 win pitchers, the mean number of wins was 210 with a standard deviation of 55. I figured WAR crudely as (W% - .39)*(W + L); crude because it is using actual wins and losses as the inputs, not because the concept is flawed. This metric has a mean of 65 and a standard deviation of 25 for this group. Setting the z-scores equal and rearranging to isolate wins gives this conversion:

pseudo-Wins = 2.2*WAR + 67

Obviously this is only intended for use with career WAR…saying that CC was worth 7 WAR last year and that is equivalent to 84 wins would make no sense. Of course it would make no sense for a pitcher with 7 career WAR either; this was based on pitchers with 150 wins and shouldn’t be used outside the range of long, reasonably productive careers. Well, it really shouldn’t be used at all, but I needed some original content, even of questionable quality, to make me feel better about a post rehashing James’ method.

The lowest-ranking 300 win pitcher under this formula (and thus, by definition, WAR as defined above as well) is Nolan Ryan (251). There is only one pitcher with 300 pseudo-Wins without 300 actual wins, and that may change this season--Randy Johnson at 319. The other non-300 game winners with 290 or more pseudo wins are Jim Palmer (296) and Whitey Ford (293). In case it is not clear, I have been using actual wins and losses, not “neutral wins and losses” as in the previous post and other posts through the years.

In summation, Win Points may well have value as a gauge of Hall of Fame chances or as a reasonable way to combine wins and losses into one number. However, they are unitless, and they place a very high premium on excellent performance, much more than the actual, tangible win impact of said performance. Despite their baseline of .414, which is a reasonable “replacement” level, the aforementioned premium should dispel any notion that they are a WAR knockoff. And of course, they just so happen to tie in with a fascinating mathematical sequence and ratio, which gives an otherwise forgettable “freak show stat” (I apply that label in the kindest sense of the term, and it is one which Bill James often applied to his own inventions) a little more pizzazz.

Tuesday, December 02, 2008

W-L Records of Mussina and Contemporaries

The retirement of Mike Mussina and the imminent departure of the rest of a roughly contemporary group of great pitchers that many lump together (Clemens, Maddux, Johnson, Smoltz, Glavine, Schilling, Martinez, Brown) has led to a number of discussions about how they stack up, which ones are worthy of the Hall of Fame, and the like. I don’t wish to enter the Hall of Fame debate, but I do want to provide a little bit of information and use this as an opportunity to re-make a larger point.

In the course of these discussions, sometimes pitcher W-L records are brought up. I have no particular desire to promote their inclusion in these discussions, but to the extent that it is inevitable that they will be used, I do wish that people would take the time to really think about how to evaluate them, and the assumptions that their approach entails.

Everyone who is reading this blog knows about the deficiencies of pitcher W-L record as a serious analytical tool. At the risk of being patronizing, I will list some of the biggies:

1) they are heavily affected by the offensive performance of the pitcher’s teammates
2) the accounting rules used to assign them are outdated and often result in questionable (to put it generously) results
3) they are heavily affected, in the modern era at least, by a pitcher’s bullpen support
4) like most other basic pitching measures, they do not isolate the pitcher’s efforts from those of the fielders behind him

On the other hand, there are a few good things to be said about them:

1) they are inherently (at least as a W% or a W/L ratio) park and era adjusted, as the mean is .500 always and forever
2) if you subscribe to the notion that a player really only adds value in games his team ends up winning (I don’t), at least a pitcher’s win is always a team win
3) when analyzed, they often lead to similar conclusions as other measures of pitching effectiveness; they are positively and fairly strongly correlated with ERA and similar metrics

Please do not get me wrong--I don’t believe that the positives outweigh the negatives. However, they are not going anywhere, and some people will continue to use them to evaluate pitchers. With that being the case, the question that I am addressing is “How can W-L records be interpreted so as to make the best estimate of a pitcher’s true value?”

Ideally, run support data can be used (either average or discrete figures from each game) to provide context for the W-L record. However, this introduces the issue of park effects, which we previously could ignore, one of the positive attributes of W-L. There is a bit more math involved as well, which may not deter me or you but will lose many of the fans who continue to rely on W-L. Additionally, there is the problem of past seasons not covered by the efforts of Retrosheet and others. Pointing out these drawbacks is not an attempt to shun this approach in favor of what is to follow, which admittedly is a more rudimentary and less optimal approach.

A very common approach that even casual, non-sabermetric fans seem to gravitate towards is comparing a pitcher’s W% to that of his teammates. This approach dates to at least 1944 and Ted Oliver’s “Kings of the Mound”. It seems like a common sense way to account and adjust for the quality of a pitcher’s team, it is easy to do computationally, and it involves data (team W-L record) that is readily available. So what’s not to like?

Notice that I slipped “adjust for the quality of a pitcher’s team” in there. That’s exactly what a direct comparison of pitcher W-L to teammates’ W-L record does. But why would one want to adjust for the quality of the team? The team’s record includes the contributions of the team’s hitters, fielders, and relievers, all of which influence the W-L records of starting pitchers. But it also includes the contributions of the team’s other starting pitchers, which are irrelevant to any individual starter. If Stephen Drew plays well, he helps to increase Brandon Webb’s “teammate W%”. And if Dan Haren pitches well, he also winds up increasing Brandon Webb’s “teammate W%”. The difference is that while Drew’s actions serve to increase Webb’s chances of earning a win (or avoiding a loss), Haren’s do no such thing. They are confined to a completely different set of games, games in which Webb does not pitch.

Therefore, assuming that the goal of any method of comparing pitcher W% to team W% is to estimate what his W% would be on an average team, the simple differential between W% and teammates’ W% (which I will call Mate for the sake of brevity) is flawed. This is because it implicitly assumes that all of the team’s deviation from .500 is the product of offense, fielding, and relief support, ignoring the contributions of the other starting pitchers.

In order to come up with a simple model, let’s make the following assumptions:

1) 50% of an average team’s deviation from .500 is due to offense; 50% is due to defense
2) Pitching is 100% of defense (this is obviously a faulty assumption, unlike the first one, which is reasonable)
3) The starting pitcher, in one of his starts, is the entirety of the pitching; his relievers will not affect the outcome (again, faulty, although closer to reality than #2)
4) Team W% can be modeled linearly (faulty, but reasonable, as a linear model works fine for normal teams)

Given these assumptions, a pitcher should be compared not to Mate directly, but to the average of Mate and .500. In doing so, the assumption is that half of the deviation from .500 was due to the offense, and has changed the W% of a hypothetical average pitcher on this team from .500 to (Mate + .5)/2.

Continuing to apply the linear assumption, a pitcher’s Neutral W% (hypothetical on a .500 team) can be figured as:

NW% = W% - (Mate + .5)/2 + .5 = W% - Mate/2 + .25

Under the traditional approach, a .600 pitcher on a .600 Mate team would have a NW% of .500. Under this approach, his NW% will be .6 - .6/2 + .25 = .550. Compared to a simple differential, this approach is kinder to pitchers on good teams and less generous to those on bad teams.

One can argue about the assumptions above; you can use more sophisticated assumptions about fielding and bullpen support, use a Pythagorean model instead of a linear one, and the like. I think those refinements are overkill, since any analysis of W-L records is going to be inherently fraught with imprecision, but if you want to go further down that path, I won’t try to stop you.

Another, simpler option is to alter the weights on Mate and .500. I have weighted them 50/50; perhaps 40% of Mate and 60% of .500 would better account for some of the factors our assumptions brushed aside (I picked those specific numbers as an example rather than for any justifiable reason).

In any individual case, the 50/50 assumption may wind up being “worse” than the standard 100/0 assumption, or the 0/100 assumption (which would just set NW% = W%, assuming that the pitcher was solely responsible for deviation from .500). The average team may have a perfect offense/defense value split, but very few teams actually do. An example that you will see in the data presented later is the Braves teams of the 1990s. Their defenses were better relative to the league than their offenses, and thus even after making neutralizing the W% of a Maddux or a Smoltz in the manner prescribed here, they are being shortchanged. However, for more cases than not, 50/50 is going to match reality better than 100/0 or 0/100, and thus is better suited for general application.

Regardless of the assumptions made in figuring NW%, once we have NW% by any method, we can extend it to value measures. The most common is “Wins Above Team”, first figured by Oliver and carried on by countless analysts since. It is figured as (NW% - .5)*(W + L), and is the number of wins beyond those expected of an average pitcher in the same number of decisions.

We can also compare the pitcher to some replacement level; I use a .390 W% as my replacement level for starting pitchers, and thus what I call WCR (Wins Compared to Replacement, as I don’t want to overuse the common WAR acronym) is simply (NW% - .39)*(W + L).

Both formulas assume that the pitchers decisions will remain constant; you could use estimated decisions (ex. IP/9) in place, as the number of decisions itself can be affected by external factors. However, I am most comfortable assuming decisions are a constant. After sticking with actual decisions, we can figure a new W-L record, with NW = NW%*(W + L) and NL = (1 - NW%)*(W + L).

If by any chance this sounds familiar, I have written about all of this before. My previous posts on this matter were by no means original; the idea of the 50/50 split was explained and implemented by Rob Wood in his August 1999 By The Numbers article, "Evaluating Pitchers' Winning Percentages: A Mathematical Modeling Approach" (pdf link). I have also published some results for great pitchers on this blog; here I am going to supplement that with updated (through 2008) results for the pitchers generally considered to be Mussina’s contemporaries (Brown, Smoltz, Schilling, Martinez, Glavine, Johnson, Maddux, Clemens).

Here is the career data for those pitchers with the list sorted by NW:

Career Mate is weighted by decisions in each season, the reasoning behind which should be obvious. A few observations about the results:

* For pitchers with 150 or more Neutral Wins, Lefty Grove has the highest career NW%, at .650. When I last figured Clemens, he was at .654, but his performance in 2007 dropped him to .6497, now behind Grove’s .6502. Randy Johnson still leads Grove, but has slipped from .661 to .653 and may not hang on. Pedro Martinez has also slipped, from .680 to .671. I would wager that one of them manages to hang on, but it is within the realm of possibility that Grove will retain the career lead.

* Maddux may have slipped ahead of Clemens in wins, but the Rocket still has a seven win edge in NW, and with Maddux’ retirement seeming quite possible, Warren Spahn will remain the post-war leader at 355.

* Glavine may wind up outside the 300 NW club, but Randy Johnson is closer in NW than in actual wins thanks to pitching on slightly below average teams over the course of his career; Schilling is the only other member of this group with sub-.500 teammates.

I have posted the complete career data for these guys so that you can look at individual seasons (although an imprecise metric like this is best used when aggregated over a long period of time).

Finally, allow me to briefly comment on the Hall of Fame as it relates to Mussina. I have written before that I don’t really care who goes into Cooperstown, because I think that their process is broken beyond drastic repair, and has been for many years. However, I don’t waive the right to comment generally on the issue; my policy is simply not to advocate or give a yes/no answer for or against any particular player. (I strive for neutrality, but sometimes I can’t help myself, so if you want to accuse me of hypocrisy, have at it.)

There are 49 post-1900 starters in the Hall (depending on who you consider to be post-1900; I did not count Kid Nichols but did count Cy Young, if that helps). Fourteen of them (29%) won 300 games, so any notion that 300 wins is a time-established standard for induction is off-base. Thirty-six pitchers have been selected by the BBWAA, including all 14 of the 300 win group (39%). So even if you limit it to the writers, 61% of starting pitchers inducted did NOT win 300 games.

I have Mussina’s NW% as 262-161 (.619). Eyeballing similarity, that’s in the same area code as Carl Hubbell (244-163, .600), Joe McGinnity (234-154, .603), Bob Feller (257-171, .601), Bob Gibson (248-177, .583), Juan Marichal (236-149, .614), and Jim Palmer (251-169, .599). All of those guys are in the Hall of Fame, and seem to be regarded as fine choices.

I certainly am not saying Mussina must be elected because of his neutral W-L record alone; there are certainly better metrics by which to evaluate pitchers, other factors that you may want to consider beyond career value, and there's nothing stopping you from having your own standards for what is a Hall of Famer. However, the notion that Mussina’s career W-L record is in and of itself a liability, absent mitigating circumstances or a divergent opinion about what the standard should be, seems misguided. Put another way, Mussina’s career W-L record is one that typically would be associated with a Hall of Fame pitcher.

One of the observations that has been put out there by a number of writers (I think I remember Tom Verducci in particular mentioning this) is that Mussina spun a career of near-misses--he wasn’t that far from 300, he always just missed 20 wins (until 2008 of course), he just missed winning the Series as a Yank, he just missed a couple of no-hitters/perfect games.

Personally, I was always a fan of Mussina as a result of two things, one of those a near-miss; his almost perfect game against the Indians in 1997. The other was that in my childhood/adolescence I played Front Page Sports: Baseball incessantly. I recall that the pitcher on the box of the first edition (’94) bore a resemblance to Mussina. If you actually dredge this up, you may well find that I am way off and it embarrassingly looks nothing like him, or that it actually is him. Regardless, I pretended that it was him. He was also money in the ’96 version; in my seasons, he seemed to be the most consistently effective pitcher, better than Johnson or Cone or Clemens or even Maddux. The Yankees are now short a starter, and my old FPS Cleveland Indians are missing one too. So long, Moose.

Tuesday, November 25, 2008

Hitting by Position, 2008

Here is another mail-it-in annual post; this time I will look at offensive production by position, based on the data from baseball-reference.com. This is actually one of my favorite areas of inquiry, although the one-year data shouldn’t be overanalyzed.

First, here are the positional totals for 2008. “MLB” is the overall total for MLB, which is not the same as the sum of all the positions here, as pinch-hitters and runners are not included in those. “POS” is the MLB totals minus the pitcher totals, yielding the composite performance by non-pitchers. “PADJ” is the position adjustment, which is the position RG divided by the position (non-pitcher) average. “LPADJ” is the long-term positional adjustment that I used, based on 1992-2001 data. The rows “79” and “3D” are the combined corner outfield and 1B/DH totals, respectively:

Again, I don’t want to draw any conclusions from one-year of data, so I’ll let those figures speak for themselves.

Now, let’s take a look at the exciting and pivotal spectacle of pitcher batting. Here are the basic stats for the pitchers of each NL team. RAA is runs above the average pitcher, a RG of .35 according to the first chart. It should be noted that sacrifices, a major part of a pitcher’s batting responsibility, are not included in these figures in any way:

The Cubs’ pitchers were clearly the standouts, as they were the only ones that managed to crack Mendoza line, led in OBA, led by 60 points in SLG, and were 8 RAA ahead of their closest challenger, the Cardinals. Nonetheless, they still only created runs at 40% of the overall league average.

The Rockies bring up the rear at -8, which is even worse considering I did not apply a park adjustment to the pitcher figures.

Last year, Toronto pitchers turned in an excellent 5.1 RG in their 21 PA. This year, in 16 PA, Blue Jay hurlers failed to reach base. The Twins were the most productive AL pitching staff, compiling a .316/.316/.368 line in 19 PA that compares favorably to that of their center fielder, Carlos Gomez (.258/.289/.360).

Now, let’s take a look at the worst hitting positions in the majors, as measured by RAA (compared to the overall MLB average for 2008 at the position, with left and right fields considered together, and with park adjustment). It’s more interesting to look at the worst than the best, as the best are easy to figure out--they are generally teams with a star player who plays all the time. It’s no surprise that St. Louis first baseman or Minnesota catchers hit well. So first, a simple list of the best positions, and then a table for the worst:


The Astros have the unfortunate distinction of two sinkholes, which is a good news, bad news situation. The bad news is obvious; the good news is that it shouldn’t be that hard to improve at those positions. You can see why Mariner fans were fed up with Jose Vidro and the other players in their DH; wonder why Washington has horrid production out of left despite Jim Bowden’s love of collecting candidates for the outfield corners (Dukes, Kearns, Pena); and see that one really bad position can be overcome (Angels).

In the past I have concluded this piece by discussing those teams which had unusually strong, weak, and negative correlations between expected and actual production at the position. This time, I am going to instead present a series of charts showing the RAA at each position for each team, organized by division. Below average performances are in red; outstanding performances (arbitrarily defined as +20 RAA or more) are in bold; and each table is sorted by “SUM”, which is the sum of the RAA figures for the positions (no pitchers or pinch hitters). These ARE park adjusted:

Did you ever expect to see a team with below average production at every position except for the one largely manned by Cristian Guzman? The Mets are also interesting--three positions were at +20 or more, and the primary performer at each of those positions was in the top ten on my IBA ballot for NL MVP. The other five positions are a composite -16. While the Mets still led the division in RAA, it illustrates my contention in defense of my ballot that the Mets’ stars can hardly be blamed for the team’s failure to make the playoffs. Florida led the majors in combined middle infield RAA (+77) on the backs of Dan Uggla and Hanley Ramirez. Atlanta had the lowest combined outfield RAA in the majors (-62); they balanced this out by having the best infield RAA in the majors (+75) and solid catcher production (+19).

Pittsburgh had the worst combined middle infield RAA (-46). St. Louis had the top outfield RAA (+50); they also had the top combined corner RAA (+99 for 1B, 3B, LF, and RF). Eyeballing it, the Cubs may have gotten the most balanced contributions relative to positional norms for a team with a good offense. Cincinnati had only three positive positions, two of which were manned by favored whipping boys of what I consider the “Pete Rose idolizing” segment of their fan base (Encarnacion and Dunn).

San Diego got its best production out of center field, and the fourth highest RAA at that position in the majors. Extra credit to anyone who thought before the campaign that a Jody Gerut/Scott Hairston combination would pull that off. San Francisco had the lowest infield RAA (-87) in the majors, “besting” their neighbors, the A’s (-81). No other team was worse than -70.

Baltimore shortstops post-Tejada were certainly problematic, although excellent performance from Brian Roberts and Nick Markakis managed to keep the Orioles’ attack respectable. As I pointed out in a post a few weeks ago, Toronto was a team stocked with guys who hit like middling middle infielders. Only their center fielders managed to match the league average.

Is there any primary position holder for a +20 position who gets more grief from his hometown fans than Jhonny Peralta? (There may well be, but Peralta is a favorite whipping boy on Cleveland sports talk. While I freely admit that his fielding is subpar, he’s still at worst an average player. Yet there are a number of Indians fans who can’t wait to ditch him). The Twins offense defied all run estimators this year, finishing second in the AL with 5.1 R/G but only eighth in RC/G at 4.8. Only the Mauer and Morneau-manned positions managed above average performances.

Finally, sneaking in on the last table, is a team (Oakland) below average at every position. They were last in the majors in composite corner position RAA (-73).

You will note that the AL teams have a negative total; this is because I used the overall MLB average. Believe it or not, the AL with pitchers removed hit .268/.332/.421, while the NL with pitchers removed hit .267/.336/.426.

Here is a link to a Google spreadsheet with the positional data for each team.

Tuesday, November 11, 2008

Why I Don’t Care About the BBWAA Awards

Because it’s pretty apparent that the BBWAA doesn’t care. Why should I?

This post is gratuitous piling-on, no doubt, as the votes for NL Rookie of the Year received by Edinson Volquez have already become and will continue to be easy message board/blog fodder.

Seriously, though, why should anyone care about an award if three of the thirty-two voters can’t even correctly identify who is eligible for it? You trust people drawn from this same pool to fill out a ten-deep MVP ballot intelligently?

This is not a sabermetrician’s rant against the stupid old sportswriters looking at RBI and “chemistry” or win-loss record or what have you. This is much more elementary.

In the original Historical Baseball Abstract, Bill James compared and contrasted the voting system for MVP (and by extension Cy Young and ROY, since they are similar) and the Hall of Fame. Both are chosen by members of the same group (although the HOF electorate is much larger and does not require the close attention to current baseball that one must engage in to be granted a MVP ballot), but James argued that one system is intelligently designed and the other was haphazardly designed and is inherently flawed.

I agree with James’ points in that article. However, any system, no matter how well-designed, is going to produce bad results if you have unqualified or unserious people as the voters. And it is clear that there are at least three people with a say in the matter who are one or both of the above. Even worse is the fact that the BBWAA apparently did not notice this when they tabulated the vote!

If you are getting a tone of outrage from this, then I have failed. I’m not outraged; I’m actually more amused and bemused. I like the Hot Stove fodder that the MVP and similar awards provide, and naming the best player, pitcher, and rookie in each league is a perfectly worthwhile activity. But each individual can have that discussion with their friends and internet associates, post their own ballot on a blog or message board, participate in a broad-based amalgamation like the IBA, and so on without caring what the BBWAA decides, except as a passing curiosity. And hey, at least the IBA restricts the ballot to eligible candidates.

Tuesday, November 04, 2008

Leadoff Hitters, 2008

Once again, here is a look at the composite performances of the players who batted in the leadoff spot for each team. The data is from baseball-reference.com and again, it includes ALL of the PA out of the leadoff spot. In parentheses I list the players who appeared in twenty or more games in the #1 slot (which is not the same as starting twenty games; they could have been pinch runners, defensive replacements, etc.), but that does not in any way mean that they are the only contributor to the team total.

I always feel obliged to point out that as a sabermetrician, I feel that the importance of the batting order is often overstated, and that the best leadoff hitters would generally be the best cleanup hitters, the best #9 hitters, etc. However, since the leadoff spot gets a lot of attention, it is instructive to look at how each team fared there.

The conventional wisdom would say that the most important function of the leadoff hitter is to get on base and score runs. So a good place to start is looking at runs scored per 25.5 outs (AB - H + CS):

1. FLA (Ramirez), 7.3

2. DET (Granderson), 6.4

3. TEX (Kinsler/Arias), 6.4

Leadoff Average, 5.1

MLB Average, 4.8

28. TOR (Inglett/Eckstein/Scutaro/Rios), 4.4

29. WAS (Lopez/Guzman/Harris/Bonifacio), 4.3

30. OAK (Ellis/Suzuki/Davis/Buck/Hannahan/Sweeney), 3.7

For clarification, “Leadoff Average” is the average for leadoff hitters, and “MLB Average” is the average for all hitters, regardless of lineup slot.

I am not going to insult your intelligence by extensively lecturing about the drawbacks of using actual runs scored figured or any of the other metrics presented here.

Another very basic measure by which to gauge a leadoff hitter is On Base Average. I did not include hit batters or sacrifices, so this is just (H + W)/(AB + W):

1. FLA (Ramirez), .385

2. BAL (Roberts), .373

3. CLE (Sizemore), .364

Leadoff Average, .341

MLB Average, .329

28. COL (Taveras/Barmes/Podsednik), .308

29. HOU (Bourn/Matsui/Erstad), .290

30. OAK (Ellis/Suzuki/Davis/Buck/Hannahan/Sweeney), .277

Last year, leadoff hitters had the same .341 OBA, but the league average was .331, so there was a tiny relative improvement for the leadoff spot this year.

I had an online discussion with an Indians fan some time in May about who, to that point, was the team’s most productive offensive player. He argued for Victor Martinez on the basis of his Batting Average, which was sad and predictable. What was odd about it was that when I pointed out Sizemore’s far-superior OBA, he scoffed that it wasn’t in the top ten in the league and thus was inadequate for a leadoff hitter.

I don’t know if he is representative of the larger group of casual fans or not, but in his case at least, there is a misguided belief that leadoff hitters have superior OBAs. As a group, they don’t, at least not to an extent where a .360 OBA would be subpar.

I am amused by the third and second to last finishes of the Rockies, spearheaded by Willy Taveras, and the Astros, led by Michael Bourn. Houston parted with Taveras in the Jason Jennings trade, then decided they couldn’t live without a speedy center fielder who can’t hit, so they accepted him as the key piece for Brad Lidge.

A slightly modified OBA is what I like to call Runners On Base Average. It is the A component of Base Runs per PA, and it simply removes home runs and caught stealing from the numerator of OBA. Thus, it leaves only times in which the hitter was actually on base, waiting to be driven in by the subsequent batters.

1. BAL (Roberts), .348

2. SEA (Suzuki), .345

3. LAA (Figgins), .338

Leadoff Average, .309

MLB Average, .297

28. MIN (Gomez/Span), .276

29. HOU (Bourn/Matsui/Erstad), .258

30. OAK (Ellis/Suzuki/Davis/Buck/Hannahan/Sweeney), .258

Florida falls to tenth (.324) and Cleveland to seventeenth (.313), mostly because they tied for the ML lead with 34 homers by leadoff hitters.

Now I will look at two statistics which are describe the shape of performance, not the quality (ROBA is sort of in this class--a high ROBA is good, but so are home runs which don’t help you out there). The first is simply the ratio of runs scored to RBI. Leadoff hitters as a group score many more runs than they drive in, partly due to their skills and partly due to lineup dynamics. Those with low ratios don’t fit the traditional leadoff profile as well as those with high ratios:

1. LAA (Figgins), 2.8

2. SEA (Suzuki), 2.5

3. BOS (Ellsbury), 2.2

Leadoff Average, 1.6

28. CIN (Hairston/Patterson/Dickerson/Bruce/Freel), 1.2

29. CHN (Soriano), 1.2

30. CLE (Sizemore), 1.1

MLB Average, 1.0

A similar idea posited by Bill James is the Run Element Ratio, which James intended to balance skills more helpful in setting up an inning (walks and steals) against those more helpful in driving runners in (power, measured by extra bases). RER is simply the ratio (SB + W)/(TB - H):

1. LAA (Figgins), 3.0

2. COL (Taveras/Barmes/Podsednik), 1.8

3. BOS (Ellsbury), 1.8

Leadoff Average, 1.0

MLB Average, .8

28. CHN (Soriano), .6

29. ARI (Drew/Young), .6

30. SD (Gerut/Giles/Hairston), .5

Returning to measures which attempt to measure quality, Bill James used an estimated runs scored to rate leadoff hitters. He assumed that if a leadoff hitter reached first (S + W - SB - CS), he would score 35% of the time; 55% from second (D + SB); 80% from third (T), and of course once for each home run. Expressed per 25.5 outs, I’ll call this Leadoff Efficiency:

1. FLA (Ramirez), 7.3

2. CLE (Sizemore), 7.1

3. BAL (Roberts), 6.6

Leadoff Average, 5.7

MLB Average, 5.5

28. MIN (Gomez/Span), 4.8

29. HOU (Bourn/Matsui/Erstad), 4.4

30. OAK (Ellis/Suzuki/Davis/Buck/Hannahan/Sweeney), 4.3

As Tango Tiger pointed out in the comments last year, James’ weights aren’t optimal. You can see this in the fact that he expects leadoff hitters to score 5.72 runs/“individual game”, whereas they actually average 5.36. Tango suggested alternate scoring percentages of 30/50/65. I stuck with James’ here, but please heed the warnings them.

When I first did a review of leadoff hitters in this vein, David Smyth suggested that I include 2*OBA + SLG. Since the optimal weight for OBA in a x*OBA + SLG construction is somewhere in the vicinity of 1.7, using “2OPS” is closer to the mark than regular OPS, while also providing an extra boost in value for OBA. So here is that list (the actual figure displayed here is .7*(2*OBA + SLG), to bring it in line with the regular OPS scale. OPS and 2OPS are both unitless, so I may as well express 2OPS on the more familiar regular OPS scale):

1. FLA (Ramirez), 906

2. CLE (Sizemore), 860

3. SD (Gerut/Giles/Hairston), 851

Leadoff Average, 768

MLB Average, 753

28. COL (Taveras/Barmes/Podsednik), 675

29. HOU (Bourn/Matsui/Erstad), 640

30. OAK (Ellis/Suzuki/Davis/Buck/Hannahan/Sweeney), 617

Finally, we can evaluate the leadoff men in exactly the same way as I would evaluate anyone else--their RG, based on ERP:

1. FLA (Ramirez), 7.1

2. CLE (Sizemore), 6.7

3. BAL (Roberts), 6.1

Leadoff Average, 5.1

MLB Average, 4.8

28. MIN (Gomez/Span), 4.0

29. HOU (Bourn/Matsui/Erstad), 3.4

30. OAK (Ellis/Suzuki/Davis/Buck/Hannahan/Sweeney), 3.1

Here is a link to the spreadsheet if you want to examine this yourself.

Tuesday, October 28, 2008

IBA Ballot: MVP

Presented below is my ballot (and some justification) for one of the categories in the Internet Baseball Awards hosted at Baseball Prospectus. I’m just one person, and the whole point of having a vote like the IBA is to get a wide variety of (intelligent) perspectives, and so I will not feel in the list bit slighted if you don’t give a flip about this.

In the American League, it was a pretty underwhelming years for position players, at least as far as MVP candidacies go. This results in a large number of candidates but no real standouts. Here is a chart-form look at some of the top candidates. RAA and RAR are against an average hitter at the position, so the column “Def” is estimated of runs saved above average at the position, based on Justin’s stats:

NAME                           RAA                RAR                Def  

Rodriguez 40 59 7

Sizemore 34 58 9

Pedroia 34 58 9

Mauer 38 54 N/A

Hamilton 31 53 0

Roberts 31 51 6

Kinsler 34 50 -6

Markakis 26 50 -4

Youkilis 21 44 4

Granderson 22 43 3

Morneau 15 41 -9

One thing to note is that in choosing my order, I will not treat the fielding stats as 100% equally valuable to hitting; they are pretty clearly less reliable. Also, the approach of comparing to the average hitter can be argued to favor second baseman and shortchange center fielders, which is something to keep in mind. Also, ARod’s “clutch” numbers are dreadful--while I don’t place much weight on this, it’s something that can swing my opinion in the case of a virtual tie.

As a result of all that, I would go with Joe Mauer (who does well fielding, +7 according to Chone’s estimates, although they don’t account for the quality of the pitching staff) in a close race over Sizemore and Pedroia. However, the most generous possible final RAR for Sizemore (giving him the 58 runs for hitting, 9 for fielding, and 5 more to correct for undervaluing center fielders) leaves him at +73; for Mauer, +54 RAR +7 fielding + an indeterminate amount for being a catcher…leave both the pitching duo of Cliff Lee and Roy Halladay.

I usually try to avoid giving my MVP support to a pitcher if there is a very small margin. This is not out of any bias against pitchers being the MVP, but because I am more confident in the sabermetric evaluation of hitters. No fielding or bullpen support to worry about, less nagging questions about “hit luck” and peripherals, etc. But here there is a clear demarcation between the two pitchers and everyone else, and I have to respect that. So this is how I see it:

1) SP Cliff Lee, CLE

2) SP Roy Halladay, TOR

3) C Joe Mauer, MIN

4) CF Grady Sizemore, CLE

5) 2B Dustin Pedroia, BOS

6) 3B Alex Rodriguez, NYA

7) SP Jon Lester, BOS

8) 2B Brian Roberts, BAL

9) CF Josh Hamilton, TEX

10) RF Nick Markakis, BAL

If any of the nimrod crowd at BTF (note that I consider this a subset of the BTF commentariat, not the whole) ever see this, I’m sure they will complain about how there are few guys from the left side of the defensive spectrum and attribute this to VORP and its treatment of DHs. The fact of the matter is, the left side of the defensive spectrum players in the AL just aren’t that good. Here are the leaders in Hitting RAR (not accounting for position) in order by position, without their identities:

5, 8, 9, D, 8, D, 3

I listed seven because I gave three spots on the ballot to pitchers (merit-based; I don’t have a three pitcher quota or anything)--thus there are seven position player spots up for grabs. It should stand to reason that if the best a first baseman can do is seventh, without considering defensive value at all, they are not going to fare very well when you do consider it. The third baseman, the two center fielders, and the right fielder all make my ballot. The two DHs are a guy who played for a bad team (Aubrey Huff, BAL) and a guy who played only 126 games, but was extremely productive when in the lineup (Milton Bradley). In fairness to them, they each played a fair amount in the field, but are considered 100% DH because of the single position adjustment used here (Huff played 24 games at first and 33 at third, Bradley 20 in the outfield corners).

So one could certainly argue that one or both are worthy of a ballot spot--but who are you going to replace? Are you going to argue that Huff was more valuable than his two Oriole teammates who played the field all the time, and also hit well? Are you going to argue that Bradley is more valuable than Hamilton, who could have very easily been reversed in roles with Bradley had Texas felt that would make them a better team? You can, but I’d have a hard time buying it.

In the National League, there is a runaway winner. Albert Pujols led players with 300 or more PA in SLG, RC, RG, and all four of the “above baseline” categories I track. He was second in BA, OBA, and secondary average. He did all this while fielding well (albeit at first base) and helping his team stay in contention all year when many (including myself) thought they’d be bad. And while it was arguably his best season (I think I’d hold out for 2003), he didn’t do anything that was way out of line with his track record.

I realize that you, as an intelligent baseball analyst, realized all that and didn’t need a lecture. But as an intelligent baseball analyst, you probably don’t care much about my MVP preferences in any case.

For the rest of the ballot, Hanley Ramirez’ seemingly improved fielding makes him a clear #2--even if you don’t believe he’s +7 out there as Jin’s numbers do, there would have to be around a dozen run error in that estimate to make me place David Wright or Chipper Jones ahead of him (that’s not to say that there couldn’t be a dozen run error, but I’ll bet against it).

Wright and Jones were #1/#2 on my 2007 ballot, but they will be #4/#3 this year. Wright’s RAR edge over Jones is razor-thin despite having nearly 200 more PA, and the zone data has Jones ahead by ten runs in the field. I find that hard to believe, but I was learning towards Jones anyway. Again, you can’t go wrong with either of them.

Behind them, I slip the top two pitchers in, then go with Berkman on the basis of trusting batting stats, although Beltran and Utley, on the strength of +10 performances in the field, could very well be ahead of them. Jose Reyes rounds out my ballot; Justin has him at -6 in the field, and there are number of guys who you could also make a case for (Giles, Holliday, Ludwick, and McCann among them).

1) 1B Albert Pujols, STL

2) SS Hanley Ramirez, FLA

3) 3B Chipper Jones, ATL

4) 3B David Wright, NYN

5) SP Tim Lincecum, SF

6) SP Johan Santana, NYN

7) 1B Lance Berkman, HOU

8) CF Carlos Beltran, NYN

9) 2B Chase Utley, PHI

10) SS Jose Reyes, NYN

Four Mets in the top ten will rub some folks the wrong way, but I’m hardly the first to observe that New York is a team with several stars and a lot of mediocre filler around them. It is a testament to Wright, Santana, Beltran, and Reyes that they came as close as they did.

Finally, I apologize again for the terrible formatting. Blogger has made it damned near impossible to copy and paste from Word while maintaining a readable output. The "Meanderings" post looked awful and this one may be worse.

Tuesday, October 21, 2008


Here are some disjointed observations and digressions largely inspired by my annual look at the final stats. I have to apologize that they are kind of Indian-centric; I strive to be non-partisan here, but I can’t help that they are the team to which I pay the most attention:

* I want to mention this before the Rays have a chance to ruin it, but if you look at the expansions in groups of two teams, one of the teams has won the World Series and the other has not. This is true for all of the expansions except the 1969 NL expansion, in which neither team has won:

1961: Angels, Senators
1962: Mets, Colt .45s
1969N: Padres, Expos (the exception)
1969A: Royals, Pilots
1977: Blue Jays, Mariners
1993: Marlins, Rockies
1998: Diamondbacks, Devil Rays

Please note that I’m just pointing this out as a coincidence, not any kind of profound insight.

* The AL hit .267/.332/.420, while the NL hit .260/.327/.413. The AL walk/at bat ratio was .096 (.090 with intentional walks removed), while the NL’s was .100 (.091). The AL and NL both had an isolated power of .152. So the biggest real difference in offense between the leagues was seven points of batting average.

Despite this, the AL managed to score .188 runs per (AB - H + CS) while the NL scored just .178. In terms of Base Runs per out, I have the AL at .189 versus the NL’s .183. The apparent difference from the components is not as large as the actual difference. The extra intentional walks could be a factor, but it could be a number of other things and the discrepancy is not particularly noteworthy.

BTW, all of those stats are for the AL and NL offenses. Interleague play makes the issue of league totals a mess as of course there are both offensive and defensive totals, and they no longer are equal on the league level.

* I list three winning percentage categories in my team spreadsheet. The first is regular W%; the second is EW%, which is Pythagenpat; and the third is PW%, which is Pythagenpat based on Base Runs. Teams for which all three figures are close include (these are displayed W%, EW%, PW%) the Cubs (.602, .614, .604), A’s (.466, .470, .470), White Sox (.546, .551, .548), Yankees (.549, .539, .545), and Cardinals (.531, .534, .529). Teams for which there are big differences include the Angels (.617, .544, .519), Braves (.444, .484, .504), and Padres (.389, .416, .453).

Last year there was much discussion about the Diamondbacks, who outplayed their pythagorean expectation to an extreme extent (they won 90 games despite being outscored). This year they had a .506 W% with an EW% of .509.

* The Indians struggled offensively early in the season, and were getting very good starting pitching. Thus the narrative that has been written for the season by the general fan base is that the offense was inadequate (this is not to say that the pitching is being praised; everyone agrees at the very least that the bullpen was dreadful) and the main cause of the team’s .500 season. However, if you look at the season as a whole, the Indians’ were +34 runs versus the league average (park-adjusted) offensively, and +10 defensively. If you look at Runs Created instead of actual runs, then it is +8/+6. The story may have been written in the early part of the season when the Tribe fell out of the race and started selling, but in the end, the run scoring and run prevention were pretty close.

* The Rangers and their opponents easily had the highest scoring level of any team. The RPG in Texas games was 11.53, while the overall MLB average was 9.30. The second-highest was Detroit at 10.36, over a run per game less.

Adjusting for park, the Rangers still lead the way at 11.20, with Detroit still second at 10.36. Toronto ended up with the lowest scoring context either way (8.17 raw, 8.01 adjusted).

* Speaking of Texas, have you noticed how dreadful Luis Mendoza’s season was? I had no idea until I looked at the stats. Mendoza pitched 63 1/3 innings and allowed 61 earned runs for an 8.67 ERA. It’s worse than that, though, as he also was tagged for 13 unearned runs, raising his RA to 10.52. He also inherited 13 runs and allowed 7 to score, so that would be another three runs surrendered.

Park factors help him, a little bit; his adjusted RA is 10.21. His eRA is 7.76, but his dRA is a much more reasonable 4.96. Opponents hit .384 against him when they put the ball in play.

The last pitcher with an ERA greater than 8.00 allowed to pitch more than 60 innings (in fairness, note that Mendoza is just above that cutoff) was Kyle Davies with Atlanta in 2006 (8.38 in the same 63 1/3 IP). The last pitcher to accomplish this with at least half of his appearances as a reliever was Russ Ortiz in 2006 (8.14 in 63 innings, with 26 appearances and 11 starts; Mendoza had 25 appearances, 11 starts). Beyond them, you have Miguel Batista in 2000 (8.54 in 65 1/3) and Benji Sampson in 1999 (8.11 in 71).

All of this added up to -38 RAR for Mendoza, making him the least valuable player in baseball among those who qualified for my spreadsheets. His RAR is overstated a bit by the fact that I lump pitchers into a binary class of starter or reliever with no gray area. Mendoza pitched 45 innings as a starter and 18.3 as a reliever. Thus, weighting the replacement levels by inning, he comes in at -34 RAR, which is still last in the majors by a considerable margin.

* Aquilino Lopez of the Tigers worked in 48 games, all in relief. He inherited 57 runners and allowed 29 of them to score. 1.19 inherited runners/game led all major league relievers, as does (on the trailers list) the -12 runs saved on inherited runners (acknowledging that this is a crude approach that does not consider where the runners are or the number of outs).

* Craig Breslow had a nice season, albeit over just 47 innings, as a lefty reliever for the Indians and Twins. Cleveland claimed him on waivers from Boston near the end of spring training, and pitched just 8 innings before he was let go again. He serves as an illustration of my biggest frustration with Eric Wedge as a manager.

I will tread lightly here, as this criticism is intended more as a fan than an analyst. However, Breslow was allowed to languish in the bullpen for weeks, never entrusted with any high-leverage situation whatsoever. Then, when he did get to pitch, he was not particularly sharp (surprise, surprise). Wedge picks his horses in the bullpen, and then he rides them hard. He doesn’t seem to be able to develop a bullpen in which five or six guys have valuable roles.

In fairness to him, he didn’t have a lot of material to work with this year.

* Most people are aware of the great performance Oakland got out of Brad Ziegler. What I didn’t notice until I looked at the stats was how well Joey Devine pitched for them this year. I would guess I’m not alone in saying that the main thing I remembered about Devine’s short stint in Atlanta was his propensity to allow grand slams. While Devine only pitched 46 innings this year, he was brilliant by any measure (1.41 RA, .60 ERA, 1.16 eRA, 2.39 dRA) and is still only 25. He’s one to keep an eye on for the future.

* About a month ago I wrote about Cliff Lee and his remarkable season in terms of W-L record compared to that of his team. At the time Lee was 21-2 and Cleveland was 71-73. The final tallies were 22-3 for Lee and 81-81 for the team, so his final NW% dipped to .915, still better than Randy Johnson’s .906 in 1995. That Big Unit season was the best that I could find for ten or more wins in my data for Hall of Fame pitchers.

* A hat tip to R.J. Anderson at Beyond the Box Score is warranted here, as he pointed it out a while back, but I thought it to be curious enough to mention again. The perennially disappointing Daniel Cabrera saw his strikeout rate drop to 4.8, which is woeful for a pitcher with his stuff (he didn’t pitch well in 2007 but was still fanning 7.3 per nine innings). I’m not a scout or a PitchF/x-er so I don’t have anything to add beyond that, but maybe there is something not evident in the traditional stats that explains why Cabrera’s career is floundering so.

* Remember when ARod, Jeter, Garciaparra, and Tejada were all AL shortstops? It seems like a long time ago when you look at the sorry crop of 2008. Only three AL shortstops with 300 or more PA were above average hitters: Jhonny Peralta, Derek Jeter, and Mike Aviles. Peralta, though much-maligned by Indians fans, was arguably the AL’s top shortstop in context-neutral terms. That still does not make him a great player (+22 RAA and +41 RAR before taking off up to ten runs for fielding), but you would think that he was below-average and a millstone listening to the talk shows here.

* Your job is to tell me who these players are:

.294 .346 .403
.268 .355 .415
.263 .327 .346
.260 .334 .427
.263 .317 .361
.282 .316 .439
.243 .326 .357
.240 .325 .303
.237 .339 .359
What is the common thread here? They are all Toronto Blue Jays with 100 or more PA (the stats for those with more than 300 PA are park-adjusted, while the others are not; that’s lazy and sloppy on my part, but irrelevant to the point). For some of the season the Blue Jays had Joe Inglett, John McDonald, David Eckstein, and Marco Scutaro on the roster simultaneously. Any one or two of those guys may be bale to help your team, but what on earth do you need four of them for?

* It’s hard to find a better offensive value match than Jimmy Rollins and JJ Hardy. Rollins had 614 PA, Hardy 621. Rollins made 405 outs, Hardy 409. Each created 91 runs, so Rollins’ RG was 5.71 and Hardy’s was 5.67. Rollins was +29 RAA, Hardy +28. Both were +45 RAR.

* One of these players is considered a MVP candidate, and one was until his team went in the toilet. The other two are well-known, but are often derided for their fielding, which while not great, is not significantly worse than the other two:
638 402 101 6.42
670 437 110 6.43
672 428 108 6.43
691 458 111 6.17
They are Pat Burrell, Carlos Delgado, Ryan Howard, and Prince Fielder.

* Here are three AL players:
.223 .326 .393
.225 .319 .400
.275 .326 .400
Some people still believe that if you have two players with equal OPS, but one has a higher BA, that the one with the higher BA is more valuable. They believe this despite the fact that more sophisticated run estimators show them to be of nearly identical value, with an edge for the lower BA if anything (with the caveat that we are considering a normal environment in the modern major leagues). This is illustrated by these player’s RGs, which are 4.45, 4.51, and 4.43 respectively. Not that I intend this to prove anything, but the players' (R + RBI)/Out are .32, .33, and .31 respectively. (R + RBI - HR)/Out are .29, .28, .27.

You should always remember that if you have identical OPS but varying BA, the player with the lower BA has a better combination of secondary skills. Incidentally, the players are Brandon Boggs, Gary Sheffield, and Billy Butler.

* I have a junkish-stat abbreviated “SU” for Speed Unit. I do not claim it to be better than Speed Score; as a matter of fact, it’s worse. It is based on triples/ball in play, runs/time on base, stolen base percentage, and stolen base attempt frequency. One of the big problems is that I did not cap each component; Curtis Granderson got a 121 last year (it’s supposed to be a 0-100 scale) because he hit a remarkable number of triples. Anyway, take this for what it’s worth. These are the highest and lowest SU by each position in the majors last year:

C Rodriguez (56) Varitek/YMolina(25)
1B Berkman(61) Sexson/Aurilia(30)
2B Weeks(79) Kent(30)
3B Figgins(67) Glaus(29)
SS Reyes(92) Eckstein(36)
LF Crawford(85) Cust/Gonzalez(30)
CF Taveras(92) Rowand(32)
RF Span(79) Ordonez/Jenkins(31)
DH Huff(47) Butler(28)
* Finally, the answers to “name the Blue Jay”. In order, they are Joe Inglett, Lyle Overbay, David Eckstein, Scott Rolen, Aaron Hill, Adam Lind, Kevin Mench, Shannon Stewart, and Gregg Zaun.