Tuesday, October 30, 2012

A Brief, Incomplete History of Replacement Level

Baseball Prospectus was kind enough to run a meta-sabermetric history piece written by me on the history of replacement level in sabermetric analysis. As you can imagine, it's not a topic that has generated a lot of responses, but the one person who has commented on the article so far seems to like it, and perhaps you will too.

Wednesday, October 10, 2012

End of Season Statistics 2012

The spreadsheets are published as Google Spreadsheets, which you can download in Excel format by changing the extension in the address from "=html" to "=xls". That way you can download them and manipulate things however you see fit. The player spreadsheets are not ready yet, but I want to get the team stuff posted.

The data comes from a number of different sources. Most of the basic data comes from Doug's Stats, which is a very handy site, or Baseball-Reference. KJOK's park database provided some of the data used in the park factors, but for recent seasons park data comes from B-R. Data on pitcher's batted ball types allowed, doubles/triples allowed, and inherited/bequeathed runners comes from Baseball Prospectus.

The basic philosophy behind these stats is to use the simplest methods that have acceptable accuracy. Of course, "acceptable" is in the eye of the beholder, namely me. I use Pythagenpat not because other run/win converters, like a constant RPW or a fixed exponent are not accurate enough for this purpose, but because it's mine and it would be kind of odd if I didn't use it.

If I seem to be a stickler for purity in my critiques of others' methods, I'd contend it is usually in a theoretical sense, not an input sense. So when I exclude hit batters, I'm not saying that hit batters are worthless or that they *should* be ignored; it's just easier not to mess with them and not that much less accurate.

I also don't really have a problem with people using sub-standard methods (say, Basic RC) as long as they acknowledge that they are sub-standard. If someone pretends that Basic RC doesn't undervalue walks or cause problems when applied to extreme individuals, I'll call them on it; if they explain its shortcomings but use it regardless, I accept that. Take these last three paragraphs as my acknowledgment that some of the statistics displayed here have shortcomings as well.

The League spreadsheet is pretty straightforward--it includes league totals and averages for a number of categories, most or all of which are explained at appropriate junctures throughout this piece. The advent of interleague play has created two different sets of league totals--one for the offense of league teams and one for the defense of league teams. Before interleague play, these two were identical. I do not present both sets of totals (you can figure the defensive ones yourself from the team spreadsheet, if you desire), just those for the offenses. The exception is for the defense-specific statistics, like innings pitched and quality starts. The figures for those categories in the league report are for the defenses of the league's teams. However, I do include each league's breakdown of basic pitching stats between starters and relievers (denoted by "s" or "r" prefixes), and so summing those will yield the totals from the pitching side. The one abbreviation you might not recognize is "N"--this is the league average of runs/game for one team, and it will pop up again.

The Team spreadsheet focuses on overall team performance--wins, losses, runs scored, runs allowed. The columns included are: Park Factor (PF), Home Run Park Factor (PFhr), Winning Percentage (W%), Expected W% (EW%), Predicted W% (PW%), wins, losses, runs, runs allowed, Runs Created (RC), Runs Created Allowed (RCA), Home Winning Percentage (HW%), Road Winning Percentage (RW%) [exactly what they sound like--W% at home and on the road], Runs/Game (R/G), Runs Allowed/Game (RA/G), Runs Created/Game (RCG), Runs Created Allowed/Game (RCAG), and Runs Per Game (the average number of runs scored an allowed per game). Ideally, I would use outs as the denominator, but for teams, outs and games are so closely related that I don’t think it’s worth the extra effort.

The runs and Runs Created figures are unadjusted, but the per-game averages are park-adjusted, except for RPG which is also raw. Runs Created and Runs Created Allowed are both based on a simple Base Runs formula. The formula is:

A = H + W - HR - CS
B = (2TB - H - 4HR + .05W + 1.5SB)*.76
C = AB - H
D = HR
Naturally, A*B/(B + C) + D.

I have explained the methodology used to figure the PFs before, but the cliff’s notes version is that they are based on five years of data when applicable, include both runs scored and allowed, and they are regressed towards average (PF = 1), with the amount of regression varying based on the number of years of data used. There are factors for both runs and home runs. The initial PF (not shown) is:

iPF = (H*T/(R*(T - 1) + H) + 1)/2
where H = RPG in home games, R = RPG in road games, T = # teams in league (14 for AL and 16 for NL). Then the iPF is converted to the PF by taking x*iPF + (1-x), where x = .6 if one year of data is used, .7 for 2, .8 for 3, and .9 for 4+.

It is important to note, since there always seems to be confusion about this, that these park factors already incorporate the fact that the average player plays 50% on the road and 50% at home. That is what the adding one and dividing by 2 in the iPF is all about. So if I list Fenway Park with a 1.02 PF, that means that it actually increases RPG by 4%.

In the calculation of the PFs, I did not get picky and take out “home” games that were actually at neutral sites, like the Astros/Cubs series that was moved to Milwaukee in 2008. I have also reset the NYN park factor due to park modifications; only 2012 data for the Mets is being considered.

There are also Team Offense and Defense spreadsheets. These include the following categories:

Team offense: Plate Appearances, Batting Average (BA), On Base Average (OBA), Slugging Average (SLG), Secondary Average (SEC), Walks per At Bat (WAB), Isolated Power (SLG - BA), R/G at home (hR/G), and R/G on the road (rR/G) BA, OBA, SLG, WAB, and ISO are park-adjusted by dividing by the square root of park factor (or the equivalent; WAB = (OBA - BA)/(1 - OBA) and ISO = SLG - BA).

Team defense: Innings Pitched, BA, OBA, SLG, Innings per Start (IP/S), Starter's eRA (seRA), Reliever's eRA (reRA), Quality Start Percentage (QS%), RA/G at home (hRA/G), RA/G on the road (rRA/G), Battery Mishap Rate (BMR), Modified Fielding Average (mFA), and Defensive Efficiency Record (DER). BA, OBA, and SLG are park-adjusted by dividing by the square root of PF; seRA and reRA are divided by PF.

The three fielding metrics I've included are limited it only to metrics that a) I can calculate myself and b) are based on the basic available data, not specialized PBP data. The three metrics are explained in this post, but here are quick descriptions of each:

1) BMR--wild pitches and passed balls per 100 baserunners = (WP + PB)/(H + W - HR)*100

2) mFA--fielding average removing strikeouts and assists = (PO - K)/(PO - K + E)

3) DER--the Bill James classic, using only the PA-based estimate of plays made. Based on a suggestion by Terpsfan101, I've tweaked the error coefficient. Plays Made = PA - K - H - W - HR - HB - .64E and DER = PM/(PM + H - HR + .64E)

Next are the individual player reports. I defined a starting pitcher as one with 15 or more starts. All other pitchers are eligible to be included as a reliever. If a pitcher has 40 appearances, then they are included. Additionally, if a pitcher has 50 innings and less than 50% of his appearances are starts, he is also included as a reliever (this allows some swingmen type pitchers who wouldn’t meet either the minimum start or appearance standards to get in).

For all of the player reports, ages are based on simply subtracting their year of birth from 2011. I realize that this is not compatible with how ages are usually listed and so “Age 27” doesn’t necessarily correspond to age 27 as I list it, but it makes everything a heckuva lot easier, and I am more interested in comparing the ages of the players to their contemporaries, for which case it makes very little difference. The "R" category records rookie status with a "R" for rookies and a blank for everyone else; I've trusted Baseball Prospectus on this. Also, all players are counted as being on the team with whom they played/pitched (IP or PA as appropriate) the most.

For relievers, the categories listed are: Games, Innings Pitched, estimated Plate Appearances (PA), Run Average (RA), Relief Run Average (RRA), Earned Run Average (ERA), Estimated Run Average (eRA), DIPS Run Average (dRA), Strikeouts per Game (KG), Walks per Game (WG), Guess-Future (G-F), Inherited Runners per Game (IR/G), Batting Average on Balls in Play (%H), Runs Above Average (RAA), and Runs Above Replacement (RAR).

IR/G is per relief appearance (G - GS); it is an interesting thing to look at, I think, in lieu of actual leverage data. You can see which closers come in with runners on base, and which are used nearly exclusively to start innings. Of course, you can’t infer too much; there are bad relievers who come in with a lot of people on base, not because they are being used in high leverage situations, but because they are long men being used in low-leverage situations already out of hand.

For starting pitchers, the columns are: Wins, Losses, Innings Pitched, Estimated Plate Appearances (PA), RA, RRA, ERA, eRA, dRA, KG, WG, G-F, %H, Pitches/Start (P/S), Quality Start Percentage (QS%), RAA, and RAR. RA and ERA you know--R*9/IP or ER*9/IP, park-adjusted by dividing by PF. The formulas for eRA and dRA are based on the same Base Runs equation and they estimate RA, not ERA.

* eRA is based on the actual results allowed by the pitcher (hits, doubles, home runs, walks, strikeouts, etc.). It is park-adjusted by dividing by PF.

* dRA is the classic DIPS-style RA, assuming that the pitcher allows a league average %H, and that his hits in play have a league-average S/D/T split. It is park-adjusted by dividing by PF.

The formula for eRA is:

A = H + W - HR
B = (2*TB - H - 4*HR + .05*W)*.78
C = AB - H = K + (3*IP - K)*x (where x is figured as described below for PA estimation and is typically around .93) = PA (from below) - H - W
eRA = (A*B/(B + C) + HR)*9/IP

To figure dRA, you first need the estimate of PA described below. Then you calculate W, K, and HR per PA (call these %W, %K, and %HR). Percentage of balls in play (BIP%) = 1 - %W - %K - %HR. This is used to calculate the DIPS-friendly estimate of %H (H per PA) as e%H = Lg%H*BIP%.

Now everything has a common denominator of PA, so we can plug into Base Runs:

A = e%H + %W
B = (2*(z*e%H + 4*%HR) - e%H - 5*%HR + .05*%W)*.78
C = 1 - e%H - %W - %HR
cRA = (A*B/(B + C) + %HR)/C*a

z is the league average of total bases per non-HR hit (TB - 4*HR)/(H - HR), and a is the league average of (AB - H) per game.

In the past couple years I’ve presented a couple of batted ball RA estimates. I’ve removed these this year, not just because batted ball data exhibits questionable reliability but because these metrics were complicated to figure, required me to collate the batted ball data, and were not personally useful to me. I figure these stats for my own enjoyment and have in some form or another going back to 1997. I share them here only because I would do it anyway, so if I’m not interested in certain categories, there’s no reason to keep presenting them.

Instead, I’m showing strikeout and walk rate, both expressed as per game. By game I mean not 9 innings but rather the league average of PA/G. I have always been a proponent of using PA and not IP as the denominator for non-run pitching rates, and now the use of per PA rates is widespread. Usually these are expressed as K/PA and W/PA, or equivalently, percentage of PA with a strikeout or walk. I don’t believe that any site publishes these as K and W per equivalent game as I am here. This is not better than K%--it’s simply applying a scalar multiplier. I like it because it generally follows the same scale as the familiar K/9.

To facilitate this, I’ve finally corrected a flaw in the formula I use to estimate plate appearances for pitchers. Previously, I’ve done it the lazy way by not splitting strikeouts out from other outs. I am now using this formula to estimate PA (where PA = AB + W):

PA = K + (3*IP - K)*x + H + W
Where x = league average of (AB - H - K)/(3*IP - K)

Then KG = K*Lg(PA/G) and WG = W*Lg(PA/G).

G-F is a junk stat, included here out of habit because I've been including it for years. It was intended to give a quick read of a pitcher's expected performance in the next season, based on eRA and strikeout rate. Although the numbers vaguely resemble RAs, it's actually unitless. As a rule of thumb, anything under four is pretty good for a starter. G-F = 4.46 + .095(eRA) - .113(K*9/IP). It is a junk stat. JUNK STAT JUNK STAT JUNK STAT. Got it?

%H is BABIP, more or less--%H = (H - HR)/(PA - HR - K - W), where PA was estimated above. Pitches/Start includes all appearances, so I've counted relief appearances as one-half of a start (P/S = Pitches/(.5*G + .5*GS). QS% is just QS/(G - GS); I don't think it's particularly useful, but Doug's Stats include QS so I include it.

I've used a stat called Relief Run Average (RRA) in the past, based on Sky Andrecheck's article in the August 1999 By the Numbers; that one only used inherited runners, but I've revised it to include bequeathed runners as well, making it equally applicable to starters and relievers. I am using RRA as the building block for baselined value estimates for all pitchers this year. I explained RRA in this article , but the bottom line formulas are:

BRSV = BRS - BR*i*sqrt(PF)
IRSV = IR*i*sqrt(PF) - IRS
RRA = ((R - (BRSV + IRSV))*9/IP)/PF

The two baselined stats are Runs Above Average (RAA) and Runs Above Replacement (RAR). RAA uses the league average runs/game (N) for both starters and relievers, while RAR uses separate replacement levels for starters and relievers. Thus, RAA and RAR will be pretty close for relievers:

RAA = (N - RRA)*IP/9
RAR (relievers) = (1.11*N - RRA)*IP/9
RAR (starters) = (1.28*N - RRA)*IP/9

All players with 300 or more plate appearances are included in the Hitters spreadsheets (along with some players close to the cutoff point who I was interested in). Each is assigned one position, the one at which they appeared in the most games. The statistics presented are: Games played (G), Plate Appearances (PA), Outs (O), Batting Average (BA), On Base Average (OBA), Slugging Average (SLG), Secondary Average (SEC), Runs Created (RC), Runs Created per Game (RG), Speed Score (SS), Hitting Runs Above Average (HRAA), Runs Above Average (RAA), Hitting Runs Above Replacement (HRAR), and Runs Above Replacement (RAR).

I do not bother to include hit batters, so take note of that for players who do get plunked a lot. Therefore, PA are simply AB + W. Outs are AB - H + CS. BA and SLG you know, but remember that without HB and SF, OBA is just (H + W)/(AB + W). Secondary Average = (TB - H + W)/AB = SLG - BA + (OBA - BA)/(1 - OBA). I have not included net steals as many people (and Bill James himself) do--it is solely hitting events.

BA, OBA, and SLG are park-adjusted by dividing by the square root of PF. This is an approximation, of course, but I'm satisfied that it works well. The goal here is to adjust for the win value of offensive events, not to quantify the exact park effect on the given rate. I use the BA/OBA/SLG-based formula to figure SEC, so it is park-adjusted as well.

Runs Created is actually Paul Johnson's ERP, more or less. Ideally, I would use a custom linear weights formula for the given league, but ERP is just so darn simple and close to the mark that it’s hard to pass up. I still use the term “RC” partially as a homage to Bill James (seriously, I really like and respect him even if I’ve said negative things about RC and Win Shares), and also because it is just a good term. I like the thought put in your head when you hear “creating” a run better than “producing”, “manufacturing”, “generating”, etc. to say nothing of names like “equivalent” or “extrapolated” runs. None of that is said to put down the creators of those methods--there just aren’t a lot of good, unique names available. Anyway, RC = (TB + .8H + W + .7SB - CS - .3AB)*.322.

RC is park adjusted by dividing by PF, making all of the value stats that follow park adjusted as well. RG, the rate, is RC/O*25.5. I do not believe that outs are the proper denominator for an individual rate stat, but I also do not believe that the distortions caused are that bad. (I still intend to finish my rate stat series and discuss all of the options in excruciating detail, but alas you’ll have to take my word for it now).

I have decided to switch to a watered-down version of Bill James' Speed Score this year; I only use four of his categories. Previously I used my own knockoff version called Speed Unit, but trying to keep it from breaking down every few years was a wasted effort.

Speed Score is the average of four components, which I'll call a, b, c, and d:

a = ((SB + 3)/(SB + CS + 7) - .4)*20
b = sqrt((SB + CS)/(S + W))*14.3
c = ((R - HR)/(H + W - HR) - .1)*25
d = T/(AB - HR - K)*450

James actually uses a sliding scale for the triples component, but it strikes me as needlessly complex and so I've streamlined it. I also changed some of his division to mathematically equivalent multiplications.

There are a whopping four categories that compare to a baseline; two for average, two for replacement. Hitting RAA compares to a league average hitter; it is in the vein of Pete Palmer’s Batting Runs. RAA compares to an average hitter at the player’s primary position. Hitting RAR compares to a “replacement level” hitter; RAR compares to a replacement level hitter at the player’s primary position. The formulas are:

HRAA = (RG - N)*O/25.5
RAA = (RG - N*PADJ)*O/25.5
HRAR = (RG - .73*N)*O/25.5
RAR = (RG - .73*N*PADJ)*O/25.5

PADJ is the position adjustment, and it is based on 2002-2011 offensive data. For catchers it is .89; for 1B/DH, 1.17; for 2B, .97; for 3B, 1.03; for SS, .93; for LF/RF, 1.13; and for CF, 1.02. I had been using the 1992-2001 data as a basis for the last ten years, but finally have done an update. I’m a little hesitant about this update, as the middle infield positions are the biggest movers (higher positional adjustments, meaning less positional credit). I have no qualms for second base, but the shortstop PADJ is out of line with the other position adjustments widely in use and feels a bit high to me. But there are some decent points to be made in favor of offensive adjustments, and I’ll have a bit more on this topic in general below.

That was the mechanics of the calculations; now I'll twist myself into knots trying to justify them. If you only care about the how and not the why, stop reading now.

The first thing that should be covered is the philosophical position behind the statistics posted here. They fall on the continuum of ability and value in what I have called "performance". Performance is a technical-sounding way of saying "Whatever arbitrary combination of ability and value I prefer".

With respect to park adjustments, I am not interested in how any particular player is affected, so there is no separate adjustment for lefties and righties for instance. The park factor is an attempt to determine how the park affects run scoring rates, and thus the win value of runs.

I apply the park factor directly to the player's statistics, but it could also be applied to the league context. The advantage to doing it my way is that it allows you to compare the component statistics (like Runs Created or OBA) on a park-adjusted basis. The drawback is that it creates a new theoretical universe, one in which all parks are equal, rather than leaving the player grounded in the actual context in which he played and evaluating how that context (and not the player's statistics) was altered by the park.

The good news is that the two approaches are essentially equivalent; in fact, they are equivalent if you assume that the Runs Per Win factor is equal to the RPG. Suppose that we have a player in an extreme park (PF = 1.15, approximately like Coors Field pre-humidor) who has an 8 RG before adjusting for park, while making 350 outs in a 4.5 N league. The first method of park adjustment, the one I use, converts his value into a neutral park, so his RG is now 8/1.15 = 6.957. We can now compare him directly to the league average:

RAA = (6.957 - 4.5)*350/25.5 = +33.72

The second method would be to adjust the league context. If N = 4.5, then the average player in this park will create 4.5*1.15 = 5.175 runs. Now, to figure RAA, we can use the unadjusted RG of 8:

RAA = (8 - 5.175)*350/25.5 = +38.77

These are not the same, as you can obviously see. The reason for this is that they take place in two different contexts. The first figure is in a 9 RPG (2*4.5) context; the second figure is in a 10.35 RPG (2*4.5*1.15) context. Runs have different values in different contexts; that is why we have RPW converters in the first place. If we convert to WAA (using RPW = RPG, which is only an approximation, so it's usually not as tidy as it appears below), then we have:

WAA = 33.72/9 = +3.75
WAA = 38.77/10.35 = +3.75

Once you convert to wins, the two approaches are equivalent. The other nice thing about the first approach is that once you park-adjust, everyone in the league is in the same context, and you can dispense with the need for converting to wins at all. You still might want to convert to wins, and you'll need to do so if you are comparing the 2010 players to players from other league-seasons (including between the AL and NL in the same year), but if you are only looking to compare Jose Bautista to Miguel Cabrera, it's not necessary. WAR is somewhat ubiquitous now, but personally I prefer runs when possible--why mess with decimal points if you don't have to?

The park factors used to adjust player stats here are run-based. Thus, they make no effort to project what a player "would have done" in a neutral park, or account for the difference effects parks have on specific events (walks, home runs, BA) or types of players. They simply account for the difference in run environment that is caused by the park (as best I can measure it). As such, they don't evaluate a player within the actual run context of his team's games; they attempt to restate the player's performance as an equivalent performance in a neutral park.

I suppose I should also justify the use of sqrt(PF) for adjusting component statistics. The classic defense given for this approach relies on basic Runs Created--runs are proportional to OBA*SLG, and OBA*SLG/PF = OBA/sqrt(PF)*SLG/sqrt(PF). While RC may be an antiquated tool, you will find that the square root adjustment is fairly compatible with linear weights or Base Runs as well. I am not going to take the space to demonstrate this claim here, but I will some time in the future.

Many value figures published around the sabersphere adjust for the difference in quality level between the AL and NL. I don't, but this is a thorny area where there is no right or wrong answer as far as I'm concerned. I also do not make an adjustment in the league averages for the fact that the overall NL averages include pitcher batting and the AL does not (not quite true in the era of interleague play, but you get my drift).

The difference between the leagues may not be precisely calculable, and it certainly is not constant, but it is real. If the average player in the AL is better than the average player in the NL, it is perfectly reasonable to expect the average AL player to have more RAR than the average NL player, and that will not happen without some type of adjustment. On the other hand, if you are only interested in evaluating a player relative to his own league, such an adjustment is not necessarily welcome.

The league argument only applies cleanly to metrics baselined to average. Since replacement level compares the given player to a theoretical player that can be acquired on the cheap, the same pool of potential replacement players should by definition be available to the teams of each league. One could argue that if the two leagues don't have equal talent at the major league level, they might not have equal access to replacement level talent--except such an argument is at odds with the notion that replacement level represents talent that is truly "freely available".

So it's hard to justify the approach I take, which is to set replacement level relative to the average runs scored in each league, with no adjustment for the difference in the leagues. The best justification is that it's simple and it treats each league as its own universe, even if in reality they are connected.

The replacement levels I have used here are very much in line with the values used by other sabermetricians. This is based both on my own "research", my interpretation of other's people research, and a desire to not stray from consensus and make the values unhelpful to the majority of people who may encounter them.

Replacement level is certainly not settled science. There is always going to be room to disagree on what the baseline should be. Even if you agree it should be "replacement level", any estimate of where it should be set is just that--an estimate. Average is clean and fairly straightforward, even if its utility is questionable; replacement level is inherently messy. So I offer the average baseline as well.

For position players, replacement level is set at 73% of the positional average RG (since there's a history of discussing replacement level in terms of winning percentages, this is roughly equivalent to .350). For starting pitchers, it is set at 128% of the league average RA (.380), and for relievers it is set at 111% (.450).

I am still using an analytical structure that makes the comparison to replacement level for a position player by applying it to his hitting statistics. This is the approach taken by Keith Woolner in VORP (and some other earlier replacement level implementations), but the newer metrics (among them Rally and Fangraphs' WAR) handle replacement level by subtracting a set number of runs from the player's total runs above average in a number of different areas (batting, fielding, baserunning, positional value, etc.), which for lack of a better term I will call the subtraction approach.

The offensive positional adjustment makes the inherent assumption that the average player at each position is equally valuable. I think that this is close to being true, but it is not quite true. The ideal approach would be to use a defensive positional adjustment, since the real difference between a first baseman and a shortstop is their defensive value. When you bat, all runs count the same, whether you create them as a first baseman or as a shortstop.

That being said, using "replacement hitter at position" does not cause too many distortions. It is not theoretically correct, but it is practically powerful. For one thing, most players, even those at key defensive positions, are chosen first and foremost for their offense. Empirical work by Keith Woolner has shown that the replacement level hitting performance is about the same for every position, relative to the positional average.

Figuring what the defensive positional adjustment should be, though, is easier said than done. Therefore, I use the offensive positional adjustment. So if you want to criticize that choice, or criticize the numbers that result, be my guest. But do not claim that I am holding this up as the correct analytical structure. I am holding it up as the most simple and straightforward structure that conforms to reality reasonably well, and because while the numbers may be flawed, they are at least based on an objective formula that I can figure myself. If you feel comfortable with some other assumptions, please feel free to ignore mine.

That still does not justify the use of HRAR--hitting runs above replacement--which compares each hitter, regardless of position, to 73% of the league average. Basically, this is just a way to give an overall measure of offensive production without regard for position with a low baseline. It doesn't have any real baseball meaning.

A player who creates runs at 90% of the league average could be above-average (if he's a shortstop or catcher, or a great fielder at a less important fielding position), or sub-replacement level (DHs that create 4 runs a game are not valuable properties). Every player is chosen because his total value, both hitting and fielding, is sufficient to justify his inclusion on the team. HRAR fails even if you try to justify it with a thought experiment about a world in which defense doesn't matter, because in that case the absolute replacement level (in terms of RG, without accounting for the league average) would be much higher than it is currently.

The specific positional adjustments I use are based on 1992-2001 data. There's no particular reason for not updating them; at the time I started using them, they represented the ten most recent years. I have stuck with them because I have not seen compelling evidence of a change in the degree of difficulty or scarcity between the positions between now and then, and because I think they are fairly reasonable. The positions for which they diverge the most from the defensive position adjustments in common use are 2B, 3B, and CF. Second base is considered a premium position by the offensive PADJ (.94), while third base and center field are both neutral (1.01 and 1.02).

Another flaw is that the PADJ is applied to the overall league average RG, which is artificially low for the NL because of pitcher's batting. When using the actual league average runs/game, it's tough to just remove pitchers--any adjustment would be an estimate. If you use the league total of runs created instead, it is a much easier fix.

One other note on this topic is that since the offensive PADJ is a proxy for average defensive value by position, ideally it would be applied by tying it to defensive playing time. I have done it by outs, though.

The reason I have taken this flawed path is because 1) it ties the position adjustment directly into the RAR formula rather then leaving it as something to subtract on the outside and more importantly 2) there’s no straightforward way to do it. The best would be to use defensive innings--set the full-time player to X defensive innings, figure how Derek Jeter’s innings compared to X, and adjust his PADJ accordingly. Games in the field or games played are dicey because they can cause distortion for defensive replacements. Plate Appearances avoid the problem that outs have of being highly related to player quality, but they still carry the illogic of basing it on offensive playing time. And of course the differences here are going to be fairly small (a few runs). That is not to say that this way is preferable, but it’s not horrible either, at least as far as I can tell.

To compare this approach to the subtraction approach, start by assuming that a replacement level shortstop would create .86*.73*4.5 = 2.825 RG (or would perform at an overall level of equivalent value to being an average fielder at shortstop while creating 2.825 runs per game). Suppose that we are comparing two shortstops, each of whom compiled 600 PA and played an equal number of defensive games and innings (and thus would have the same positional adjustment using the subtraction approach). Alpha made 380 outs and Bravo made 410 outs, and each ranked as dead-on average in the field.

The difference in overall RAR between the two using the subtraction approach would be equal to the difference between their offensive RAA compared to the league average. Assuming the league average is 4.5 runs, and that both Alpha and Bravo created 75 runs, their offensive RAAs are:

Alpha = (75*25.5/380 - 4.5)*380/25.5 = +7.94

Similarly, Bravo is at +2.65, and so the difference between them will be 5.29 RAR.

Using the flawed approach, Alpha's RAR will be:

(75*25.5/380 - 4.5*.73*.86)*380/25.5 = +32.90

Bravo's RAR will be +29.58, a difference of 3.32 RAR, which is two runs off of the difference using the subtraction approach.

The downside to using PA is that you really need to consider park effects if you, whereas outs allow you to sidestep park effects. Outs are constant; plate appearances are linked to OBA. Thus, they not only depend on the offensive context (including park factor), but also on the quality of one's team. Of course, attempting to adjust for team PA differences opens a huge can of worms which is not really relevant; for now, the point is that using outs for individual players causes distortions, sometimes trivial and sometimes bothersome, but almost always makes one's life easier.

I do not include fielding (or baserunning outside of steals, although that is a trivial consideration in comparison) in the RAR figures--they cover offense and positional value only). This in no way means that I do not believe that fielding is an important consideration in player valuation. However, two of the key principles of these stat reports are 1) not incorporating any data that is not readily available and 2) not simply including other people's results (of course I borrow heavily from other people's methods, but only adapting methodology that I can apply myself).

Any fielding metric worth its salt will fail to meet either criterion--they use zone data or play-by-play data which I do not have easy access to. I do not have a fielding metric that I have stapled together myself, and so I would have to simply lift other analysts' figures.

Setting the practical reason for not including fielding aside, I do have some reservations about lumping fielding and hitting value together in one number because of the obvious differences in reliability between offensive and fielding metrics. In theory, they absolutely should be put together. But in practice, I believe it would be better to regress the fielding metric to a point at which it would be roughly equivalent in reliability to the offensive metric.

Offensive metrics have error bars associated with them, too, of course, and in evaluating a single season's value, I don't care about the vagaries that we often lump together as "luck". Still, there are errors in our assessment of linear weight values and players that collect an unusual proportion of infield hits or hits to the left side, errors in estimation of park factor, and any number of other factors that make their events more or less valuable than an average event of that type.

Fielding metrics offer up all of that and more, as we cannot be nearly as certain of true successes and failures as we are when analyzing offense. Recent investigations, particularly by Colin Wyers, have raised even more questions about the level of uncertainty. So, even if I was including a fielding value, my approach would be to assume that the offensive value was 100% reliable (which it isn't), and regress the fielding metric relative to that (so if the offensive metric was actually 70% reliable, and the fielding metric 40% reliable, I'd treat the fielding metric as .4/.7 = 57% reliable when tacking it on, to illustrate with a simplified and completely made up example presuming that one could have a precise estimate of nebulous "reliability").

Given the inherent assumption of the offensive PADJ that all positions are equally valuable, once RAR has been figured for a player, fielding value can be accounted for by adding on his runs above average relative to a player at his own position. If there is a shortstop that is -2 runs defensively versus an average shortstop, he is without a doubt a plus defensive player, and a more valuable defensive player than a first baseman who was +1 run better than an average first baseman. Regardless, since it was implicitly assumed that they are both average defensively for their position when RAR was calculated, the shortstop will see his value docked two runs. This DOES NOT MEAN that the shortstop has been penalized for his defense. The whole process of accounting for positional differences, going from hitting RAR to positional RAR, has benefited him.

I've found that there is often confusion about the treatment of first baseman and designated hitters in my PADJ methodology, since I consider DHs as in the same pool as first baseman. The fact of the matter is that first baseman outhit DH. There is any number of potential explanations for this; DHs are often old or injured, players hit worse when DHing than they do when playing the field, etc. This actually helps first baseman, since the DHs drag the average production of the pool down, thus resulting in a lower replacement level than I would get if I considered first baseman alone.

However, this method does assume that a 1B and a DH have equal defensive value. Obviously, a DH has no defensive value. What I advocate to correct this is to treat a DH as a bad defensive first baseman, and thus knock another five or ten runs off of his RAR for a full-time player. I do not incorporate this into the published numbers, but you should keep it in mind. However, there is no need to adjust the figures for first baseman upwards --the only necessary adjustment is to take the DHs down a notch.

Finally, I consider each player at his primary defensive position (defined as where he appears in the most games), and do not weight the PADJ by playing time. This does shortchange a player like Ben Zobrist (who saw significant time at a tougher position than his primary position), and unduly boost a player like Joe Mauer (who logged a lot of games at a much easier position than his primary position). For most players, though, it doesn't matter much. I find it preferable to make manual adjustments for the unusual cases rather than add another layer of complexity to the whole endeavor.

2012 Leagues

2012 Park Factors

2012 Teams

2012 Team Offense

2012 Team Defense

2012 AL Relievers

2012 NL Relievers

2012 AL Starters

2012 NL Starters

2012 AL Hitters

2012 NL Hitters

Thursday, October 04, 2012

Playoff Meanderings

* I always like to write a little something about playoff odds. The playoff odds that I publish are not intended to be the most accurate. They incorporate only full seasonal team data, haphazardly thrown into a weighted average. They assume that win probabilities are constant from game to game. There are other nits I could pick, but those are the huge ones.

So why bother? The main goal is to once again make the point that attempting to predict playoff outcomes is largely a fool’s errand. This is probably an obvious point to anyone who reads this blog, but it is one that I feel compelled to come back to each October regardless.

The methodology here was to use my crude team ratings--based on estimated win ratio and adjusted for strength of schedule. I figured three sets of these--based on actual wins and losses, based on runs scored and allowed, and based on runs created and allowed. Then I combined them and built-in some regression to the mean, with no real rhyme or reason to the weighting (the win ratios fed into the rating are based 30% on actual record, 30% on R/RA, 20% on RC/RCA, and 20% on .500).

These then go into this spreadsheet, which calculates the probability of a team winning a playoff series using Log5 and an assumed home field winning percentage of .545 for a .500 team. First, here are the ratings for each of the playoff teams:



As you can see, the rating system still believes that the AL is superior to the NL (the overall AL rating is 106 versus 95 for the NL). Baltimore comes out better than I expected, and it is worth remembering that playing in the AL East means that their schedule was very tough (after the season, I’ll have a post with full rankings). You may be surprised to see Cincinnati so low, but they exceeded their expected W%s by a fair amount and along with St. Louis were the strongest teams in the weakest division, meaning they faced the easiest schedules in MLB.

Feeding those rankings through the playoff odds spreadsheet, here are the crude playoff probabilities:



I would suggest that Oakland and Washington are the teams that benefit the most from the crude nature of this approach. No team has much better than a 1 in 3 chance of winning the pennant, which is typical but completely out of line with the mainstream notions of anointing favorites. It should come as no surprise that the four wildcards rate as having the lowest odds, but it’s worth noting that the wildcards combined have a higher estimated probability of winning the World Series than the #3 seeds in each league.

Whichever team comes out of the wildcard game will be in decent shape, although as the media will remind you many times will be a bit disadvantaged in their starting pitching options for the Division Series. Here are each of the wildcard’s probabilities assuming they win the game (again, with no penalty for fatigue):



Collapse aside, Texas remains one of the strongest teams on paper. Atlanta compares favorably to Cincinnati or San Francisco, and both Baltimore and St. Louis have respectable chances (in fact, the Cards 8% is better than their 7% from a similar methodology last year).

* The 2-3 Division Series format is already being set up as a ready-made excuse for any of the higher seeded teams that lose. I’m not saying it’s the optimal format, but the importance of having the first two games at home can be overstated. Of course, the model I’m using here can’t account for any psychological effects, but there is no difference in the expected outcome of the series as long there are three home games scheduled.

Theoretically, assuming evenly matched teams in each game of a series, 37.5% of five-game series should go the distance and only 25% should be sweeps. But empirically, 41 five-game series since 1969 have been sweeps, while only 27 have gone the distance. Of course, those empirical results include teams that benefitted from jumping up 2-0 at home.

While the 2-3 format is not ideal, and offers the possibility of what I call a reverse home field advantage (the lower seeded team actually playing more home games), I don’t see any reason to accept it as an excuse. A couple of the higher seeded teams will probably lose, but that happens in typical seasons as well.

It’s also worth noting that a 2-3 format has been used before, most recently in 1995-96. In those two seasons, the cross-divisional matchups were pre-determined. In 1995 for example, the 100-44 Indians opened against the 86-58 Red Sox with two games at Jacobs Field and the final three games schedule for Fenway Park (Cleveland swept). Meanwhile, the 79-66 Mariners played the 79-65 Yankees. The 2-3 of 2012 may be a mild annoyance, but playoff formats have previously reached colossal levels of stupidity to which it can only aspire.

* For the last few years I have also expressed my playoff rooting preferences. There’s no reason why you should care, but it’s good to get it off my chest.

Would be happy if they win: Yankees, A’s, Reds, Nationals, Braves
Would be indifferent if they win: Rangers, Cardinals
Would be annoyed if they win: Orioles, Tigers, Giants

So expect a BAL/SF World Series (1.8% chance based on the crude odds).

Tuesday, October 02, 2012

Cleveland Manager Rant

I have considered myself a baseball fan first and a fan of any particular team for over a decade. I can’t pinpoint the exact moment at which this happened, but it’s been a while. I never regret this; college sports offer me plenty of opportunity for simple good v. evil, one team only fanaticism. I consider professional baseball too entertaining of a sport, one too amenable to rational analysis, to tie up much of my interest in a partisan stupor.

Still, I am a fan of the Indians, and I assume I will be until they move to Albuquerque in 2037. And so on Thursday evening, as I learned about Manny Acta’s firing, I vented a little bit on Twitter. The last time I had such a visceral reaction to a piece of Indians news, it was upon learning about the Ubaldo Jimenez trade.

The problem with being a fan first is that it makes one prone to that sort of off-the-cuff emotional reaction, whereas I’d much prefer to think for a while and then react. This is the more refined (albeit still tinted by the irrationality of fandom, poorly written and disjointed) version of that initial screed.

I always liked Manny Acta as the Indians manager. I supported his hiring, and I generally thought that he did a good job as manager from what I could tell. Of course, some of the most important duties of the manager are the things that, as an outsider, I cannot quantify and really can’t even get a good feel for--how he relates to players, how well he works with the front office and how he interacts with them on roster decisions, and the like. It’s certainly possible that Manny Acta is bad at these aspects of the job.

However, I reject the notion pushed by a contingent of Cleveland fans that Acta is a poor tactical manager from a sabermetric tactic. Again, this is an area that’s next to impossible to quantify--it's easy to pick some key categories on which managers have influence (like intentional walks, sacrifice hits, stolen base attempts, pitching changes, lineup construction) and mentally assign the manager a score based on rudimentary criteria (“intentional walks = bad”, “games led off by sub-.330 OBA hitters = bad”, etc.), but it’s difficult to develop a comprehensive evaluation even on these limited criteria. This is all complicated by the fact that some of our sabermetric tools have a margin of error comparable to the theoretical payoffs of alternative strategies, and that the manager always is working with more information than we have regarding the factors that could cause players’ abilities to deviate from our estimate of their true talent.

However, based on my general notion of baseball strategy and ability to process my observations, I have no overarching issues with the tactics employed by Manny Acta. Quite the opposite, in fact--I had less moments of confusion when watching Acta manage than I did with Mike Hargrove, Charlie Manuel, or Eric Wedge. Acta spoke intelligently about strategy in his media appearances and stayed true to his word as much as can be reasonably hoped for from a manager.

Of course, any manager is going to make isolated decisions that are puzzling. If cherry-picking just a few of these instances is enough to call for the skipper’s head, then I can guarantee you that it won’t take much more than a week into his replacement’s regime for a similar emotion to emerge. If you have to point to one specific choice in reliever usage, or one marginal young player that didn’t play enough for your taste, then I humbly suggest you don’t have much of a case. No, it doesn’t make sense to me either that Acta chose to use Vinny Rottino as a leadoff hitter (in one game!), but how many managers would have used Shin-Soo Choo as their leadoff hitters in over half of the team’s games? I’d suggest the latter is a much bigger deviation from the normal practice of managers, and one more amenable to sabermetric orthodoxy than the other is a departure.

In any event, Acta is gone now, making the more important question for Indians fans the matter of what this tells us about the people who run the organization. I don’t think it’s pretty. First, a quote from owner Larry Dolan:

“I fully support Chris' decision to make this change and am confident that he will lead a tireless search to find the right individual to lead the club to our ultimate goal of winning the World Series.”

Of course, this is typical owner-speak and reading into it is pointless. Still, the quote strongly implies that Dolan believes the single individual most responsible for winning the World Series is the manager. If the Indians could just find the right manager, they’d be fine.

Team president (and former GM, for the majority of Dolan’s ownership) Mark Shapiro tweeted:

“One of only levers u can pull w potential for broader change is the manager. Not easy but decision should indicate our desire to improve”

Left unsaid is what those levers are. And those levels have never been pulled. Since Dolan bought the team, the Indians have hired and fired three full-time managers: Charlie Manuel, Eric Wedge, and Manny Acta. They have fired zero general managers: Shapiro was promoted to president and his lieutenant Chris Antonetti took over after the 2010 season.

Manuel’s firing was a little different than those of Wedge and Acta--it came mid-season in the Indians first transition year between perennial contender and rebuilding. Manuel was not seen as a fit for the new paradigm, and so his attempt to force the issue by requesting an extension led to his dismissal.

Shapiro displayed a tremendous amount of loyalty to Wedge. It would have been easy to fire him after the failed attempt at contention in 2006, or the letdown on 2008 on the heels of 2007’s near pennant. But Shapiro stood by Wedge until after 2009, when a team that fancied itself a contender crashed and burned to 65-97.

The Indians’ fundamental problems, however, remain the same in 2012 when Acta was canned as they were in 2009 when Wedge was canned. The Indians possess a number of solid hitters at tough positions (Carlos Santana at catcher, Jason Kipnis at second, Asdrubal Cabrera at short), but gaping holes at the easiest positions (only Shin-Soo Choo was a good producer in the corners, and Travis Hafner’s perennial injuries have also held back the DHs). This is not a temporary problem--the Indians’ farm system has not produced a major league caliber 1B/LF since--Luke Scott? Sean Casey?

The Indians of 2009 and 2012 were also both woefully short on starting pitching. In 2009, the team had just traded CC Sabathia and Cliff Lee, so it was somewhat understandable. In 2012, though, those departures could not be blamed. The Indians ventured into 2012 with a rotation consisting of one pitcher who’d both pitched well and had good peripherals in the prior season (Justin Masterson). They had an enigmatic pitcher acquired at the cost of the organization’s top two pitching prospects (Ubaldo Jimenez); a veteran coming off a lousy season in the NL (Derek Lowe); a finesse righty who was below average in 2011 despite a league-leading 1.1 W/9 (Josh Tomlin); and a sinkerballer with an unremarkable minor league track record (Jeanmar Gomez).

The 2011 Indians started the season 30-15, which was a lot of fun at the time, even for those of us who suspected it was but a mirage. But those 45 games have ultimately proved to be a disaster for the franchise. They transformed what was supposed to be a rebuilding season into an increasingly desperate attempt to cling to the lead in the AL Central. They goaded the front office into trading its two best pitching prospects for Ubaldo Jimenez. And even after the team stumbled to 80-82, those 45 games influenced the team’s expectations heading into 2012: they were contenders.

Regardless of intention, the Indians were either unwilling or unable to acquire additional talent to fill out the roster, and insisted that there was sufficient talent to contend, leaving Acta as the fall guy if the purported contender failed to contend.

The Shapiro regime has controlled the Indians for eleven seasons, and in that time they have managed to make the playoffs just once while playing in one of MLB’s weaker divisions. Assuming that they should have a 20% chance of winning and seasons are independent, there’s a 26% chance that could happen by chance, so it’s not inherently damning.

When I tweeted something to that effect (minus the binomial probability), I got a reply that simply said “Process != Results”. I was unfamiliar with the tweeter, so I’m not sure if it was serious or facetious. I’m inclined to think it’s the latter, as it sounds very much like the kind of sentiment that is often offered by what could be called the Cameron school of sabermetrics.

There is of course a great deal of truth in the statement; a process can be valid and yet produce poor results through decisions made on the basis of incomplete information, unforeseen events, chance, and other factors. But that doesn’t mean that actual results can be ignored, particularly as the sample becomes larger.

Of course, it’s easier to rationalize poor results when the process is in line with one’s ideological leanings (this is true for me as well, of course). It wasn’t long ago that Chris Antonetti was the darling of the organizational rankings crowd.
Some people believe that they possess enough insight about front offices to make ordinal rankings of their quality. I am not one of them--all I can do is lay out the facts as I see them:

* The Indians can generally be classified in the upper tier of publically open to sabermetrics organizations, which is certainly a plus from where I sit

* Shapiro’s Indians have drafted poorly. The most recent Indians first rounder to establish a solid major league career is Jeremy Guthrie (2002). The most recent to have one with the Indians is CC Sabathia (1998). The jury is still out on several recent picks, although if Alex White or Drew Pomeranz is productive, it will be with another organization.

* The Indians have done a great job of trading for players either in the minors or very early in their major league careers. Cliff Lee, Grady Sizemore, Asdrubal Cabrera, Travis Hafner, Carlos Santana, Shin-Soo Choo, Michael Brantley, Chris Perez, Coco Crisp, and Justin Masterson are examples. But it’s much harder to find contributors drafted or signed by the Indians--Jhonny Peralta, Fausto Carmona, Jason Kipnis, Rafael Betancourt, Rafael Perez, Vinny Pestano? (Neither of these lists is comprehensive by any means, but I think thye are representative of the whole).

I don’t think that questioning the efficacy of the current organization at developing talent based on an eleven year fallow is excessively “results-oriented”.

With respect to the next managerial hire, I tend to think it won’t matter much. The organization will not win until it can develop more players, regardless of who is managing them. I’m hoping that Terry Francona’s interest is real and not simply a courtesy to Shapiro, but I doubt that is the case. While I don’t think Francona would be a silver bullet, his tenure in Boston doesn’t raise any obvious red flags. But Francona figures to be the default #1 candidate for any openings, and it’s difficult for me to believe that he would choose Cleveland over other options.

Sandy Alomar appears to have the inside track otherwise, and there’s very little evidence as to what type of manager he would be. There is plenty of evidence that Indians fans will welcome him as 90s nostalgia grows more powerful, and while that may be a plus from a PR perspective, it can be obnoxious for someone who was never a particular fan of Alomar the player. And heaven forbid the fans start talking about Omar Vizquel.