Tuesday, October 03, 2017

End of Season Statistics, 2017

The spreadsheets are published as Google Spreadsheets, which you can download in Excel format by changing the extension in the address from "=html" to "=xls". That way you can download them and manipulate things however you see fit.

The data comes from a number of different sources. Most of the data comes from Baseball-Reference. KJOK's park database is extremely helpful in determining when park factors should reset. Data on bequeathed runners for relievers comes from Baseball Prospectus.

The basic philosophy behind these stats is to use the simplest methods that have acceptable accuracy. Of course, "acceptable" is in the eye of the beholder, namely me. I use Pythagenpat not because other run/win converters, like a constant RPW or a fixed exponent are not accurate enough for this purpose, but because it's mine and it would be kind of odd if I didn't use it.

If I seem to be a stickler for purity in my critiques of others' methods, I'd contend it is usually in a theoretical sense, not an input sense. So when I exclude hit batters, I'm not saying that hit batters are worthless or that they *should* be ignored; it's just easier not to mess with them and not that much less accurate.

I also don't really have a problem with people using sub-standard methods (say, Basic RC) as long as they acknowledge that they are sub-standard. If someone pretends that Basic RC doesn't undervalue walks or cause problems when applied to extreme individuals, I'll call them on it; if they explain its shortcomings but use it regardless, I accept that. Take these last three paragraphs as my acknowledgment that some of the statistics displayed here have shortcomings as well, and I've at least attempted to describe some of them in the discussion below.

The League spreadsheet is pretty straightforward--it includes league totals and averages for a number of categories, most or all of which are explained at appropriate junctures throughout this piece. The advent of interleague play has created two different sets of league totals--one for the offense of league teams and one for the defense of league teams. Before interleague play, these two were identical. I do not present both sets of totals (you can figure the defensive ones yourself from the team spreadsheet, if you desire), just those for the offenses. The exception is for the defense-specific statistics, like innings pitched and quality starts. The figures for those categories in the league report are for the defenses of the league's teams. However, I do include each league's breakdown of basic pitching stats between starters and relievers (denoted by "s" or "r" prefixes), and so summing those will yield the totals from the pitching side. The one abbreviation you might not recognize is "N"--this is the league average of runs/game for one team, and it will pop up again.

The Team spreadsheet focuses on overall team performance--wins, losses, runs scored, runs allowed. The columns included are: Park Factor (PF), Home Run Park Factor (PFhr), Winning Percentage (W%), Expected W% (EW%), Predicted W% (PW%), wins, losses, runs, runs allowed, Runs Created (RC), Runs Created Allowed (RCA), Home Winning Percentage (HW%), Road Winning Percentage (RW%) [exactly what they sound like--W% at home and on the road], Runs/Game (R/G), Runs Allowed/Game (RA/G), Runs Created/Game (RCG), Runs Created Allowed/Game (RCAG), and Runs Per Game (the average number of runs scored an allowed per game). Ideally, I would use outs as the denominator, but for teams, outs and games are so closely related that I don’t think it’s worth the extra effort.

The runs and Runs Created figures are unadjusted, but the per-game averages are park-adjusted, except for RPG which is also raw. Runs Created and Runs Created Allowed are both based on a simple Base Runs formula. The formula is:

A = H + W - HR - CS
B = (2TB - H - 4HR + .05W + 1.5SB)*.76
C = AB - H
D = HR
Naturally, A*B/(B + C) + D.

I have explained the methodology used to figure the PFs before, but the cliff’s notes version is that they are based on five years of data when applicable, include both runs scored and allowed, and they are regressed towards average (PF = 1), with the amount of regression varying based on the number of years of data used. There are factors for both runs and home runs. The initial PF (not shown) is:

iPF = (H*T/(R*(T - 1) + H) + 1)/2
where H = RPG in home games, R = RPG in road games, T = # teams in league (14 for AL and 16 for NL). Then the iPF is converted to the PF by taking x*iPF + (1-x), where x = .6 if one year of data is used, .7 for 2, .8 for 3, and .9 for 4+.

It is important to note, since there always seems to be confusion about this, that these park factors already incorporate the fact that the average player plays 50% on the road and 50% at home. That is what the adding one and dividing by 2 in the iPF is all about. So if I list Fenway Park with a 1.02 PF, that means that it actually increases RPG by 4%.

In the calculation of the PFs, I did not take out “home” games that were actually at neutral sites (of which there were a rash this year).

There are also Team Offense and Defense spreadsheets. These include the following categories:

Team offense: Plate Appearances, Batting Average (BA), On Base Average (OBA), Slugging Average (SLG), Secondary Average (SEC), Walks and Hit Batters per At Bat (WAB), Isolated Power (SLG - BA), R/G at home (hR/G), and R/G on the road (rR/G) BA, OBA, SLG, WAB, and ISO are park-adjusted by dividing by the square root of park factor (or the equivalent; WAB = (OBA - BA)/(1 - OBA), ISO = SLG - BA, and SEC = WAB + ISO).

Team defense: Innings Pitched, BA, OBA, SLG, Innings per Start (IP/S), Starter's eRA (seRA), Reliever's eRA (reRA), Quality Start Percentage (QS%), RA/G at home (hRA/G), RA/G on the road (rRA/G), Battery Mishap Rate (BMR), Modified Fielding Average (mFA), and Defensive Efficiency Record (DER). BA, OBA, and SLG are park-adjusted by dividing by the square root of PF; seRA and reRA are divided by PF.

The three fielding metrics I've included are limited it only to metrics that a) I can calculate myself and b) are based on the basic available data, not specialized PBP data. The three metrics are explained in this post, but here are quick descriptions of each:

1) BMR--wild pitches and passed balls per 100 baserunners = (WP + PB)/(H + W - HR)*100

2) mFA--fielding average removing strikeouts and assists = (PO - K)/(PO - K + E)

3) DER--the Bill James classic, using only the PA-based estimate of plays made. Based on a suggestion by Terpsfan101, I've tweaked the error coefficient. Plays Made = PA - K - H - W - HR - HB - .64E and DER = PM/(PM + H - HR + .64E)

Next are the individual player reports. I defined a starting pitcher as one with 15 or more starts. All other pitchers are eligible to be included as a reliever. If a pitcher has 40 appearances, then they are included. Additionally, if a pitcher has 50 innings and less than 50% of his appearances are starts, he is also included as a reliever (this allows some swingmen type pitchers who wouldn’t meet either the minimum start or appearance standards to get in).

For all of the player reports, ages are based on simply subtracting their year of birth from 2017. I realize that this is not compatible with how ages are usually listed and so “Age 27” doesn’t necessarily correspond to age 27 as I list it, but it makes everything a heckuva lot easier, and I am more interested in comparing the ages of the players to their contemporaries than fitting them into historical studies, and for the former application it makes very little difference. The "R" category records rookie status with a "R" for rookies and a blank for everyone else; I've trusted Baseball Prospectus on this. Also, all players are counted as being on the team with whom they played/pitched (IP or PA as appropriate) the most.

For relievers, the categories listed are: Games, Innings Pitched, estimated Plate Appearances (PA), Run Average (RA), Relief Run Average (RRA), Earned Run Average (ERA), Estimated Run Average (eRA), DIPS Run Average (dRA), Strikeouts per Game (KG), Walks per Game (WG), Guess-Future (G-F), Inherited Runners per Game (IR/G), Batting Average on Balls in Play (%H), Runs Above Average (RAA), and Runs Above Replacement (RAR).

IR/G is per relief appearance (G - GS); it is an interesting thing to look at, I think, in lieu of actual leverage data. You can see which closers come in with runners on base, and which are used nearly exclusively to start innings. Of course, you can’t infer too much; there are bad relievers who come in with a lot of people on base, not because they are being used in high leverage situations, but because they are long men being used in low-leverage situations already out of hand.

For starting pitchers, the columns are: Wins, Losses, Innings Pitched, Estimated Plate Appearances (PA), RA, RRA, ERA, eRA, dRA, KG, WG, G-F, %H, Pitches/Start (P/S), Quality Start Percentage (QS%), RAA, and RAR. RA and ERA you know--R*9/IP or ER*9/IP, park-adjusted by dividing by PF. The formulas for eRA and dRA are based on the same Base Runs equation and they estimate RA, not ERA.

* eRA is based on the actual results allowed by the pitcher (hits, doubles, home runs, walks, strikeouts, etc.). It is park-adjusted by dividing by PF.

* dRA is the classic DIPS-style RA, assuming that the pitcher allows a league average %H, and that his hits in play have a league-average S/D/T split. It is park-adjusted by dividing by PF.

The formula for eRA is:

A = H + W - HR
B = (2*TB - H - 4*HR + .05*W)*.78
C = AB - H = K + (3*IP - K)*x (where x is figured as described below for PA estimation and is typically around .93) = PA (from below) - H - W
eRA = (A*B/(B + C) + HR)*9/IP

To figure dRA, you first need the estimate of PA described below. Then you calculate W, K, and HR per PA (call these %W, %K, and %HR). Percentage of balls in play (BIP%) = 1 - %W - %K - %HR. This is used to calculate the DIPS-friendly estimate of %H (H per PA) as e%H = Lg%H*BIP%.

Now everything has a common denominator of PA, so we can plug into Base Runs:

A = e%H + %W
B = (2*(z*e%H + 4*%HR) - e%H - 5*%HR + .05*%W)*.78
C = 1 - e%H - %W - %HR
cRA = (A*B/(B + C) + %HR)/C*a

z is the league average of total bases per non-HR hit (TB - 4*HR)/(H - HR), and a is the league average of (AB - H) per game.

In the past I presented a couple of batted ball RA estimates. I’ve removed these, not just because batted ball data exhibits questionable reliability but because these metrics were complicated to figure, required me to collate the batted ball data, and were not personally useful to me. I figure these stats for my own enjoyment and have in some form or another going back to 1997. I share them here only because I would do it anyway, so if I’m not interested in certain categories, there’s no reason to keep presenting them.

Instead, I’m showing strikeout and walk rate, both expressed as per game. By game I mean not nine innings but rather the league average of PA/G. I have always been a proponent of using PA and not IP as the denominator for non-run pitching rates, and now the use of per PA rates is widespread. Usually these are expressed as K/PA and W/PA, or equivalently, percentage of PA with a strikeout or walk. I don’t believe that any site publishes these as K and W per equivalent game as I am here. This is not better than K%--it’s simply applying a scalar multiplier. I like it because it generally follows the same scale as the familiar K/9.

To facilitate this, I’ve finally corrected a flaw in the formula I use to estimate plate appearances for pitchers. Previously, I’ve done it the lazy way by not splitting strikeouts out from other outs. I am now using this formula to estimate PA (where PA = AB + W):

PA = K + (3*IP - K)*x + H + W
Where x = league average of (AB - H - K)/(3*IP - K)

Then KG = K*Lg(PA/G) and WG = W*Lg(PA/G).

G-F is a junk stat, included here out of habit because I've been including it for years. It was intended to give a quick read of a pitcher's expected performance in the next season, based on eRA and strikeout rate. Although the numbers vaguely resemble RAs, it's actually unitless. As a rule of thumb, anything under four is pretty good for a starter. G-F = 4.46 + .095(eRA) - .113(K*9/IP). It is a junk stat. JUNK STAT JUNK STAT JUNK STAT. Got it?

%H is BABIP, more or less--%H = (H - HR)/(PA - HR - K - W), where PA was estimated above. Pitches/Start includes all appearances, so I've counted relief appearances as one-half of a start (P/S = Pitches/(.5*G + .5*GS). QS% is just QS/(G - GS); I don't think it's particularly useful, but Doug's Stats include QS so I include it.

I've used a stat called Relief Run Average (RRA) in the past, based on Sky Andrecheck's article in the August 1999 By the Numbers; that one only used inherited runners, but I've revised it to include bequeathed runners as well, making it equally applicable to starters and relievers. I use RRA as the building block for baselined value estimates for all pitchers. I explained RRA in this article, but the bottom line formulas are:

BRSV = BRS - BR*i*sqrt(PF)
IRSV = IR*i*sqrt(PF) - IRS
RRA = ((R - (BRSV + IRSV))*9/IP)/PF

The two baselined stats are Runs Above Average (RAA) and Runs Above Replacement (RAR). Starting in 2015 I revised RAA to use a slightly different baseline for starters and relievers as described here. The adjustment is based on patterns from the last several seasons of league average starter and reliever eRA. Thus it does not adjust for any advantages relief pitchers enjoy that are not reflected in their component statistics. This could include runs allowed scoring rules that benefit relievers (although the use of RRA should help even the scales in this regard, at least compared to raw RA) and the talent advantage of starting pitchers. The RAR baselines do attempt to take the latter into account, and so the difference in starter and reliever RAR will be more stark than the difference in RAA.

RAA (relievers) = (.951*LgRA - RRA)*IP/9
RAA (starters) = (1.025*LgRA - RRA)*IP/9
RAR (relievers) = (1.11*LgRA - RRA)*IP/9
RAR (starters) = (1.28*LgRA - RRA)*IP/9

All players with 250 or more plate appearances (official, total plate appearances) are included in the Hitters spreadsheets (along with some players close to the cutoff point who I was interested in). Each is assigned one position, the one at which they appeared in the most games. The statistics presented are: Games played (G), Plate Appearances (PA), Outs (O), Batting Average (BA), On Base Average (OBA), Slugging Average (SLG), Secondary Average (SEC), Runs Created (RC), Runs Created per Game (RG), Speed Score (SS), Hitting Runs Above Average (HRAA), Runs Above Average (RAA), Hitting Runs Above Replacement (HRAR), and Runs Above Replacement (RAR).

Starting in 2015, I'm including hit batters in all related categories for hitters, so PA is now equal to AB + W+ HB. Outs are AB - H + CS. BA and SLG you know, but remember that without SF, OBA is just (H + W + HB)/(AB + W + HB). Secondary Average = (TB - H + W + HB)/AB = SLG - BA + (OBA - BA)/(1 - OBA). I have not included net steals as many people (and Bill James himself) do, but I have included HB which some do not.

BA, OBA, and SLG are park-adjusted by dividing by the square root of PF. This is an approximation, of course, but I'm satisfied that it works well (I plan to post a couple articles on this some time during the offseason). The goal here is to adjust for the win value of offensive events, not to quantify the exact park effect on the given rate. I use the BA/OBA/SLG-based formula to figure SEC, so it is park-adjusted as well.

Runs Created is actually Paul Johnson's ERP, more or less. Ideally, I would use a custom linear weights formula for the given league, but ERP is just so darn simple and close to the mark that it’s hard to pass up. I still use the term “RC” partially as a homage to Bill James (seriously, I really like and respect him even if I’ve said negative things about RC and Win Shares), and also because it is just a good term. I like the thought put in your head when you hear “creating” a run better than “producing”, “manufacturing”, “generating”, etc. to say nothing of names like “equivalent” or “extrapolated” runs. None of that is said to put down the creators of those methods--there just aren’t a lot of good, unique names available.

For 2015, I refined the formula a little bit to:

1. include hit batters at a value equal to that of a walk
2. value intentional walks at just half the value of a regular walk
3. recalibrate the multiplier based on the last ten major league seasons (2005-2014)

This revised RC = (TB + .8H + W + HB - .5IW + .7SB - CS - .3AB)*.310

RC is park adjusted by dividing by PF, making all of the value stats that follow park adjusted as well. RG, the Runs Created per Game rate, is RC/O*25.5. I do not believe that outs are the proper denominator for an individual rate stat, but I also do not believe that the distortions caused are that bad. (I still intend to finish my rate stat series and discuss all of the options in excruciating detail, but alas you’ll have to take my word for it now).

Several years ago I switched from using my own "Speed Unit" to a version of Bill James' Speed Score; of course, Speed Unit was inspired by Speed Score. I only use four of James' categories in figuring Speed Score. I actually like the construct of Speed Unit better as it was based on z-scores in the various categories (and amazingly a couple other sabermetricians did as well), but trying to keep the estimates of standard deviation for each of the categories appropriate was more trouble than it was worth.

Speed Score is the average of four components, which I'll call a, b, c, and d:

a = ((SB + 3)/(SB + CS + 7) - .4)*20
b = sqrt((SB + CS)/(S + W))*14.3
c = ((R - HR)/(H + W - HR) - .1)*25
d = T/(AB - HR - K)*450

James actually uses a sliding scale for the triples component, but it strikes me as needlessly complex and so I've streamlined it. He looks at two years of data, which makes sense for a gauge that is attempting to capture talent and not performance, but using multiple years of data would be contradictory to the guiding principles behind this set of reports (namely, simplicity. Or laziness. You're pick.) I also changed some of his division to mathematically equivalent multiplications.

There are a whopping four categories that compare to a baseline; two for average, two for replacement. Hitting RAA compares to a league average hitter; it is in the vein of Pete Palmer’s Batting Runs. RAA compares to an average hitter at the player’s primary position. Hitting RAR compares to a “replacement level” hitter; RAR compares to a replacement level hitter at the player’s primary position. The formulas are:

HRAA = (RG - N)*O/25.5
RAA = (RG - N*PADJ)*O/25.5
HRAR = (RG - .73*N)*O/25.5
RAR = (RG - .73*N*PADJ)*O/25.5

PADJ is the position adjustment, and it is based on 2002-2011 offensive data. For catchers it is .89; for 1B/DH, 1.17; for 2B, .97; for 3B, 1.03; for SS, .93; for LF/RF, 1.13; and for CF, 1.02. I had been using the 1992-2001 data as a basis for some time, but finally updated for 2012. I’m a little hesitant about this update, as the middle infield positions are the biggest movers (higher positional adjustments, meaning less positional credit). I have no qualms for second base, but the shortstop PADJ is out of line with the other position adjustments widely in use and feels a bit high to me. But there are some decent points to be made in favor of offensive adjustments, and I’ll have a bit more on this topic in general below.

That was the mechanics of the calculations; now I'll twist myself into knots trying to justify them. If you only care about the how and not the why, stop reading now.

The first thing that should be covered is the philosophical position behind the statistics posted here. They fall on the continuum of ability and value in what I have called "performance". Performance is a technical-sounding way of saying "Whatever arbitrary combination of ability and value I prefer".

With respect to park adjustments, I am not interested in how any particular player is affected, so there is no separate adjustment for lefties and righties for instance. The park factor is an attempt to determine how the park affects run scoring rates, and thus the win value of runs.

I apply the park factor directly to the player's statistics, but it could also be applied to the league context. The advantage to doing it my way is that it allows you to compare the component statistics (like Runs Created or OBA) on a park-adjusted basis. The drawback is that it creates a new theoretical universe, one in which all parks are equal, rather than leaving the player grounded in the actual context in which he played and evaluating how that context (and not the player's statistics) was altered by the park.

The good news is that the two approaches are essentially equivalent; in fact, they are precisely equivalent if you assume that the Runs Per Win factor is equal to the RPG. Suppose that we have a player in an extreme park (PF = 1.15, approximately like Coors Field pre-humidor) who has an 8 RG before adjusting for park, while making 350 outs in a 4.5 N league. The first method of park adjustment, the one I use, converts his value into a neutral park, so his RG is now 8/1.15 = 6.957. We can now compare him directly to the league average:

RAA = (6.957 - 4.5)*350/25.5 = +33.72

The second method would be to adjust the league context. If N = 4.5, then the average player in this park will create 4.5*1.15 = 5.175 runs. Now, to figure RAA, we can use the unadjusted RG of 8:

RAA = (8 - 5.175)*350/25.5 = +38.77

These are not the same, as you can obviously see. The reason for this is that they take place in two different contexts. The first figure is in a 9 RPG (2*4.5) context; the second figure is in a 10.35 RPG (2*4.5*1.15) context. Runs have different values in different contexts; that is why we have RPW converters in the first place. If we convert to WAA (using RPW = RPG, which is only an approximation, so it's usually not as tidy as it appears below), then we have:

WAA = 33.72/9 = +3.75
WAA = 38.77/10.35 = +3.75

Once you convert to wins, the two approaches are equivalent. The other nice thing about the first approach is that once you park-adjust, everyone in the league is in the same context, and you can dispense with the need for converting to wins at all. You still might want to convert to wins, and you'll need to do so if you are comparing the 2015 players to players from other league-seasons (including between the AL and NL in the same year), but if you are only looking to compare Jose Bautista to Miguel Cabrera, it's not necessary. WAR is somewhat ubiquitous now, but personally I prefer runs when possible--why mess with decimal points if you don't have to?

The park factors used to adjust player stats here are run-based. Thus, they make no effort to project what a player "would have done" in a neutral park, or account for the difference effects parks have on specific events (walks, home runs, BA) or types of players. They simply account for the difference in run environment that is caused by the park (as best I can measure it). As such, they don't evaluate a player within the actual run context of his team's games; they attempt to restate the player's performance as an equivalent performance in a neutral park.

I suppose I should also justify the use of sqrt(PF) for adjusting component statistics. The classic defense given for this approach relies on basic Runs Created--runs are proportional to OBA*SLG, and OBA*SLG/PF = OBA/sqrt(PF)*SLG/sqrt(PF). While RC may be an antiquated tool, you will find that the square root adjustment is fairly compatible with linear weights or Base Runs as well. I am not going to take the space to demonstrate this claim here, but I will some time in the future.

Many value figures published around the sabersphere adjust for the difference in quality level between the AL and NL. I don't, but this is a thorny area where there is no right or wrong answer as far as I'm concerned. I also do not make an adjustment in the league averages for the fact that the overall NL averages include pitcher batting and the AL does not (not quite true in the era of interleague play, but you get my drift).

The difference between the leagues may not be precisely calculable, and it certainly is not constant, but it is real. If the average player in the AL is better than the average player in the NL, it is perfectly reasonable to expect the average AL player to have more RAR than the average NL player, and that will not happen without some type of adjustment. On the other hand, if you are only interested in evaluating a player relative to his own league, such an adjustment is not necessarily welcome.

The league argument only applies cleanly to metrics baselined to average. Since replacement level compares the given player to a theoretical player that can be acquired on the cheap, the same pool of potential replacement players should by definition be available to the teams of each league. One could argue that if the two leagues don't have equal talent at the major league level, they might not have equal access to replacement level talent--except such an argument is at odds with the notion that replacement level represents talent that is truly "freely available".

So it's hard to justify the approach I take, which is to set replacement level relative to the average runs scored in each league, with no adjustment for the difference in the leagues. The best justification is that it's simple and it treats each league as its own universe, even if in reality they are connected.

The replacement levels I have used here are very much in line with the values used by other sabermetricians. This is based both on my own "research", my interpretation of other's people research, and a desire to not stray from consensus and make the values unhelpful to the majority of people who may encounter them.

Replacement level is certainly not settled science. There is always going to be room to disagree on what the baseline should be. Even if you agree it should be "replacement level", any estimate of where it should be set is just that--an estimate. Average is clean and fairly straightforward, even if its utility is questionable; replacement level is inherently messy. So I offer the average baseline as well.

For position players, replacement level is set at 73% of the positional average RG (since there's a history of discussing replacement level in terms of winning percentages, this is roughly equivalent to .350). For starting pitchers, it is set at 128% of the league average RA (.380), and for relievers it is set at 111% (.450).

I am still using an analytical structure that makes the comparison to replacement level for a position player by applying it to his hitting statistics. This is the approach taken by Keith Woolner in VORP (and some other earlier replacement level implementations), but the newer metrics (among them Rally and Fangraphs' WAR) handle replacement level by subtracting a set number of runs from the player's total runs above average in a number of different areas (batting, fielding, baserunning, positional value, etc.), which for lack of a better term I will call the subtraction approach.

The offensive positional adjustment makes the inherent assumption that the average player at each position is equally valuable. I think that this is close to being true, but it is not quite true. The ideal approach would be to use a defensive positional adjustment, since the real difference between a first baseman and a shortstop is their defensive value. When you bat, all runs count the same, whether you create them as a first baseman or as a shortstop.

That being said, using "replacement hitter at position" does not cause too many distortions. It is not theoretically correct, but it is practically powerful. For one thing, most players, even those at key defensive positions, are chosen first and foremost for their offense. Empirical research by Keith Woolner has shown that the replacement level hitting performance is about the same for every position, relative to the positional average.

Figuring what the defensive positional adjustment should be, though, is easier said than done. Therefore, I use the offensive positional adjustment. So if you want to criticize that choice, or criticize the numbers that result, be my guest. But do not claim that I am holding this up as the correct analytical structure. I am holding it up as the most simple and straightforward structure that conforms to reality reasonably well, and because while the numbers may be flawed, they are at least based on an objective formula that I can figure myself. If you feel comfortable with some other assumptions, please feel free to ignore mine.

That still does not justify the use of HRAR--hitting runs above replacement--which compares each hitter, regardless of position, to 73% of the league average. Basically, this is just a way to give an overall measure of offensive production without regard for position with a low baseline. It doesn't have any real baseball meaning.

A player who creates runs at 90% of the league average could be above-average (if he's a shortstop or catcher, or a great fielder at a less important fielding position), or sub-replacement level (DHs that create 3.5 runs per game are not valuable properties). Every player is chosen because his total value, both hitting and fielding, is sufficient to justify his inclusion on the team. HRAR fails even if you try to justify it with a thought experiment about a world in which defense doesn't matter, because in that case the absolute replacement level (in terms of RG, without accounting for the league average) would be much higher than it is currently.

The specific positional adjustments I use are based on 2002-2011 data. I stick with them because I have not seen compelling evidence of a change in the degree of difficulty or scarcity between the positions between now and then, and because I think they are fairly reasonable. The positions for which they diverge the most from the defensive position adjustments in common use are 2B, 3B, and CF. Second base is considered a premium position by the offensive PADJ (.97), while third base and center field have similar adjustments in the opposite direction (1.03 and 1.02).

Another flaw is that the PADJ is applied to the overall league average RG, which is artificially low for the NL because of pitcher's batting. When using the actual league average runs/game, it's tough to just remove pitchers--any adjustment would be an estimate. If you use the league total of runs created instead, it is a much easier fix.

One other note on this topic is that since the offensive PADJ is a stand-in for average defensive value by position, ideally it would be applied by tying it to defensive playing time. I have done it by outs, though.

The reason I have taken this flawed path is because 1) it ties the position adjustment directly into the RAR formula rather than leaving it as something to subtract on the outside and more importantly 2) there’s no straightforward way to do it. The best would be to use defensive innings--set the full-time player to X defensive innings, figure how Derek Jeter’s innings compared to X, and adjust his PADJ accordingly. Games in the field or games played are dicey because they can cause distortion for defensive replacements. Plate Appearances avoid the problem that outs have of being highly related to player quality, but they still carry the illogic of basing it on offensive playing time. And of course the differences here are going to be fairly small (a few runs). That is not to say that this way is preferable, but it’s not horrible either, at least as far as I can tell.

To compare this approach to the subtraction approach, start by assuming that a replacement level shortstop would create .86*.73*4.5 = 2.825 RG (or would perform at an overall level of equivalent value to being an average fielder at shortstop while creating 2.825 runs per game). Suppose that we are comparing two shortstops, each of whom compiled 600 PA and played an equal number of defensive games and innings (and thus would have the same positional adjustment using the subtraction approach). Alpha made 380 outs and Bravo made 410 outs, and each ranked as dead-on average in the field.

The difference in overall RAR between the two using the subtraction approach would be equal to the difference between their offensive RAA compared to the league average. Assuming the league average is 4.5 runs, and that both Alpha and Bravo created 75 runs, their offensive RAAs are:

Alpha = (75*25.5/380 - 4.5)*380/25.5 = +7.94

Similarly, Bravo is at +2.65, and so the difference between them will be 5.29 RAR.

Using the flawed approach, Alpha's RAR will be:

(75*25.5/380 - 4.5*.73*.86)*380/25.5 = +32.90

Bravo's RAR will be +29.58, a difference of 3.32 RAR, which is two runs off of the difference using the subtraction approach.

The downside to using PA is that you really need to consider park effects if you do, whereas outs allow you to sidestep park effects. Outs are constant; plate appearances are linked to OBA. Thus, they not only depend on the offensive context (including park factor), but also on the quality of one's team. Of course, attempting to adjust for team PA differences opens a huge can of worms which is not really relevant; for now, the point is that using outs for individual players causes distortions, sometimes trivial and sometimes bothersome, but almost always makes one's life easier.

I do not include fielding (or baserunning outside of steals, although that is a trivial consideration in comparison) in the RAR figures--they cover offense and positional value only). This in no way means that I do not believe that fielding is an important consideration in player evaluation. However, two of the key principles of these stat reports are 1) not incorporating any data that is not readily available and 2) not simply including other people's results (of course I borrow heavily from other people's methods, but only adapting methodology that I can apply myself).

Any fielding metric worth its salt will fail to meet either criterion--they use zone data or play-by-play data which I do not have easy access to. I do not have a fielding metric that I have stapled together myself, and so I would have to simply lift other analysts' figures.

Setting the practical reason for not including fielding aside, I do have some reservations about lumping fielding and hitting value together in one number because of the obvious differences in reliability between offensive and fielding metrics. In theory, they absolutely should be put together. But in practice, I believe it would be better to regress the fielding metric to a point at which it would be roughly equivalent in reliability to the offensive metric.

Offensive metrics have error bars associated with them, too, of course, and in evaluating a single season's value, I don't care about the vagaries that we often lump together as "luck". Still, there are errors in our assessment of linear weight values and players that collect an unusual proportion of infield hits or hits to the left side, errors in estimation of park factor, and any number of other factors that make their events more or less valuable than an average event of that type.

Fielding metrics offer up all of that and more, as we cannot be nearly as certain of true successes and failures as we are when analyzing offense. Recent investigations, particularly by Colin Wyers, have raised even more questions about the level of uncertainty. So, even if I was including a fielding value, my approach would be to assume that the offensive value was 100% reliable (which it isn't), and regress the fielding metric relative to that (so if the offensive metric was actually 70% reliable, and the fielding metric 40% reliable, I'd treat the fielding metric as .4/.7 = 57% reliable when tacking it on, to illustrate with a simplified and completely made up example presuming that one could have a precise estimate of nebulous "reliability").

Given the inherent assumption of the offensive PADJ that all positions are equally valuable, once RAR has been figured for a player, fielding value can be accounted for by adding on his runs above average relative to a player at his own position. If there is a shortstop that is -2 runs defensively versus an average shortstop, he is without a doubt a plus defensive player, and a more valuable defensive player than a first baseman who was +1 run better than an average first baseman. Regardless, since it was implicitly assumed that they are both average defensively for their position when RAR was calculated, the shortstop will see his value docked two runs. This DOES NOT MEAN that the shortstop has been penalized for his defense. The whole process of accounting for positional differences, going from hitting RAR to positional RAR, has benefited him.

I've found that there is often confusion about the treatment of first baseman and designated hitters in my PADJ methodology, since I consider DHs as in the same pool as first baseman. The fact of the matter is that first baseman outhit DH. There are any number of potential explanations for this; DHs are often old or injured, players hit worse when DHing than they do when playing the field, etc. This actually helps first baseman, since the DHs drag the average production of the pool down, thus resulting in a lower replacement level than I would get if I considered first baseman alone.

However, this method does assume that a 1B and a DH have equal defensive value. Obviously, a DH has no defensive value. What I advocate to correct this is to treat a DH as a bad defensive first baseman, and thus knock another five or so runs off of his RAR for a full-time player. I do not incorporate this into the published numbers, but you should keep it in mind. However, there is no need to adjust the figures for first baseman upwards --the only necessary adjustment is to take the DHs down a notch.

Finally, I consider each player at his primary defensive position (defined as where he appears in the most games), and do not weight the PADJ by playing time. This does shortchange a player like Ben Zobrist (who saw significant time at a tougher position than his primary position), and unduly boost a player like Buster Posey (who logged a lot of games at a much easier position than his primary position). For most players, though, it doesn't matter much. I find it preferable to make manual adjustments for the unusual cases rather than add another layer of complexity to the whole endeavor.

2017 League

2017 Park Factors

2017 Teams

2017 Team Offense

2017 Team Defense

2017 AL Relievers

2017 NL Relievers

2017 AL Starters

2017 NL Starters

2017 AL Hitters

2017 NL Hitters

Monday, October 02, 2017

Crude Playoff Odds--2017

These are very simple playoff odds, based on my crude rating system for teams using an equal mix of W%, EW% (based on R/RA), PW% (based on RC/RCA), and 69 games of .500. They account for home field advantage by assuming a .500 team wins 54.2% of home games (major league average 2006-2015). They assume that a team's inherent strength is constant from game-to-game. They do not generally account for any number of factors that you would actually want to account for if you were serious about this, including but not limited to injuries, the current construction of the team rather than the aggregate seasonal performance, pitching rotations, estimated true talent of the players, etc.

The CTRs that are fed in are:

Notable here is that three AL teams rank ahead of the Dodgers, which includes New York rather than Boston. NYA’s raw EW% and PW% are very close to LA, and LA played the second-weakest schedule in MLB while the Red Sox and Yankees played the toughest schedules of any playoff teams.

Wilcard game odds (the least useful since the pitching matchups aren’t taken into account, and that matters most when there is just one game):


I think most people would pick WAS/CHN as the most compelling on paper, which is backed up by the odds. Unfortunately for me, CLE/NYA would be a sneaky-good series.


World Series:

Because I set this spreadsheet up when home field advantage went to a particular league (as it has been for the entire history of the World Series prior to this year), all of the NL teams are listed as the home team. But the probabilities all consider which team would actually have the home field advantage in each matchup.

Put it all together:

This one should make it clear why I don’t have much to say this year.

Tuesday, August 22, 2017

Enby Distribution, pt. 4: Revisiting W%

In my first series about runs per game distributions, I wrote about how to use estimates of the probability of scoring k runs (however these probabilities were estimated, Enby distribution or an alternative approach) to estimate a team’s winning percentage. I’m going to circle back to that here, and most of the content is a repeat of the earlier post.

However, I think this is an important enough topic to rehash. In fact, a winning percentage estimator strikes me as the most logical application for a runs per game distribution, albeit one that is not particularly helpful to everyday sabermetric practice. After all, multiple formulas to estimate W% as a function of runs scored and runs allowed have been developed, and most of them work quite well when working with normal major league teams--well enough to make it difficult to imagine that there is any appreciable gain in accuracy to be had. Better yet, these W% estimators are fairly simple--even the most complex versions in common use, Pythagenport/pat, can be quickly tapped out on a calculator in about thirty seconds.

Given that there are powerful, relatively simple W% models already in use, why even bother to examine a model based on the estimated scoring distribution? There are three obvious reasons that come to my mind. The first is that such a model serves as a check on the others. Depending on how much confidence one has in the underlying run distribution model, it is possible that the resulting W% estimator will produce a batter estimate, at least at the extremes. We know of course that some of the easier models don’t hold up well in extreme situations--linear estimators will return negative or greater than one figures at some point, and fixed Pythagorean exponents will fray at some point. While we know that Pythagenpat works at the known point of 1 RPG and appears to work well at other extreme values, it doesn’t hurt to have another way of estimating W% in those extremes to see if Pythagenpat is corroborated, or whether the models disagree. This can also serve as a check on Enby--if the results vary too much from what we expect, it may imply that Enby does not hold up well at extremes itself.

A second reason is that it’s plain fun if you like esoteric sabermetrics (and if you’re reading this blog, it’s a good bet that you do). I’ve never needed an excuse to mess around with alternative methods, particularly when it comes to W% estimators, which along with run estimators are my own personal favorite sabermetric tools.

But the third reason is the one that I want to focus on here, which is that a W% estimator based on an underlying estimate of the run distribution is from one perspective the simplest possible estimator. This may seem to be an absurd statement given all of the steps that are necessary to compute Enby estimates, let alone plugging these into a W% formula. But from a first principles standpoint, the distribution-based W% estimator is the simplest to explain, because it is defined by the laws of the game itself.

If you score no runs, you don’t win. If you score one run, you win if you allow zero runs. If you score two runs, you win if you allow either zero or one run, and on it goes ad infinitum. If at the end of nine innings you have scored and allowed an equal number of runs, you play on until there is an inning in which an unequal, greater than zero number of runs are scored. This fundamental identity is what all of the other W% estimators attempt to approximate, the mechanics which they attempt to sweep under the rug by taking shortcuts to approximate. The distribution-based approach is computationally dense but conceptually easy (and correct). Of course, to bring points one and three together, the definition may be correct, but the resulting estimates are useless if the underlying model (Enby in this case) does not work.

In order to produce our W% estimate, we first need to use Enby to estimate the scoring distribution for the two teams. This is not as simple as using the Enby parameters we have already developed based on the Tango Distribution with c = .767. Tango has found that his method produces more accurate results for two teams when c is set equal to .852 instead.

In the previous post, I walked through the computations for the Enby distribution with any c value, so this is an easy substitution to make. But why is it necessary? I don’t have a truly satisfactory answer to that question--it's trite to just assert that it works better for head-to-head matchups because of the covariance between runs scored and runs allowed, even if that is in fact the right answer.

How will modifying the control value alter the Enby distribution? All of the parameters will be effected, because all depend on the control value in one way or another. First, B and r (the latter as it is initially figured before zero modification):

VAR = RG^2/9 + (2/c - 1)*RG
r = RG^2/(VAR - RG)
B = VAR/RG - 1

When c is larger, the variance of runs scored will be smaller. We can see this by examining the equations for variance with c = .767 and .852:

VAR (.767) = RG^2/9 + 1.608*RG
VAR (.852) = RG^2/9 + 1.347*RG

This results in a larger value for r and a smaller value for B, but these parameters don’t have an intuitive baseball explanation, unlike variance. It’s difficult to explain (for me at least) why variance of a single team’s runs scored should be lower when considering a head-to-head matchup, but that’s the way it works out.

It should be noted that if the sole purpose of this exercise is to estimate W%, we don’t have to care whether the actual probability of each team scoring k runs is correct. All we need to do is have an accurate estimate of how often Team A’s runs scored are greater than Team B’s.

By increasing c, we also reduce the probability of a shutout, as can be seen from the formula for z:

z =(RI/(RI + c*RI^2))^9

Originally, I had intended to display some graphs showing the behavior of the three parameters by RG with each choice of c, but these turned out to be not of any particular interest. I ran similar graphs earlier in the series with parameters based on the earlier variance model, and the shape of the resulting functions are quite similar. The only real visual difference when c varies is what appears to be linear shifts for r and B (the B shift is linear, the r not quite).

What might be more interesting is looking at how c shapes the estimated run distribution for a team with a given RG. I’ll look at three teams--one average (4.5 RG), one extremely low-scoring (2.25 RG), and one extremely high-scoring (9 RG). First, the 4.5 RG team:

As you may recall from earlier, Enby consistently overestimates the frequency with which a normal major league team will score 2-4 runs. Using the .852 c value exacerbates this issue; in fact, the main thing to take away from this set of graphs is that the higher c value clusters more probability around the mean, while the lower c value leaves more probability for the tails.

The 2.25 RG team:

And the 9 RG team:

Thursday, August 10, 2017

Bottoming Out

On June 5, OSU Athletic Director Gene Smith unceremoniously fired Thad Matta, the winningest men’s basketball coach in the history of the school. He did so months after the normal time to fire coaches had passed, and he did so in a way that ensured that the end of Matta’s tenure would be the dominant story in college basketball over the next week. Matta won four regular season Big Ten championships, went to two Final Fours, and was as close to universally respected and beloved by his former players as you will ever find in college basketball. He did all of this while dealing with a debilitating condition that made routine tasks like walking and taking off his shoes a major challenge; it was a side effect of a surgery performed at the university’s own hospital. OSU was coming off a pair of seasons without making the NCAA Tournament, but basketball is a sport in which a roster can get turned around in a hurry, and this author feels that Matta had more than earned another year or two in which to have the opportunity to do just that. Gene Smith felt otherwise.

On May 20, the OSU baseball team lost to Indiana 4-3 at home. This brought an end to a season in which they went 22-34, the school’s worst record since going 6-12 in 1974. They went 8-16 in the Big Ten, the worst showing since going 4-12 in 1987. The season brought Greg Beals’ seven-year record at OSU to 225-167 (.574) and his Big Ten record to 85-83 (.506). Setting aside 2008-2014, a seven-year stretch in which OSU had a .564 W% (since four of the seasons were coached by Beals), the seven-year record is OSU’s worst since 1986-1992. The seven-year stretch in the Big Ten is the worst since 1984-1990 (.486). The Buckeyes finished eleventh in the Big Ten, which in fairness wasn’t possible until the addition of Nebraska, but since the Big Ten eliminated divisions in 1988, the lowest previous conference standing had been seventh (out of 10 in 2010, out of 11 in 2014, out of 13 in 2015).

The OSU season is hardly worth recapping in detail, except to point out that baseball is such that Oregon State could go 56-6 on the year let have one of those losses come to the Buckeyes (February 24, 6-1; the Beavers won a rematch 5-1 two days later). The other noteworthy statistical oddity is that in eight Big Ten series, Ohio won just one (2-1 at Penn State). They were swept once (home against Minnesota) and the other six were all 1-2 for the opposition. The top eight teams in the conference qualify for the tournament; OSU finished four games out of the running, eliminated even before the final weekend.

The Buckeyes’ .393 overall W% and .412 EW% were both eleventh of thirteen Big Ten teams (the forces of darkness led at .724 and .748 respectively), and their .463 PW% was eighth (again, the forces of darkness led with .699). OSU was twelfth with 5.07 R/G and tenth with 6.05 RA/G, although Bill Davis Staidum is a pitcher’s park and those are unadjusted figures. OSU’s .659 DER was last in the conference.

None of this was surprising; OSU lost a tremendous amount of production from 2016, which was Beals’ most successful team, notching his only championship (Big Ten Tournament) and NCAA appearance. With individual exceptions, outside of the 2016 draft class, Beals has failed to recruit and develop talent, often patching his roster with copious amounts of JUCO transfers rather than underclassmen developed in the program. Never was this more acute than in 2017. None of this is meant to be an indictment of the players, who did the best they could to represent their school. It is not their fault that the coach put them in situations that they couldn’t handle or weren’t ready for.

Sophomore catcher Jacob Barnwell had a solid season, hitting .254/.367/.343 for only -1 RAA; his classmate and backup Andrew Fishel only got 50 PA but posted a .400 OBA. First base/DH was a real problem position, as senior Zach Ratcliff was -8 RAA and JUCO transfer junior Bo Coolen chipped in -6; both had secondary averages well below the team average. Noah McGowan, another JUCO transfer started at second (and got time in left as well), with -3 RAA in 162 PA before getting injured. True freshman Noah West followed him into the lineup, but a lack of offense (.213/.278/.303 in 105 PA) gave classmate Connor Pohl a shot. Pohl is 6’5” and his future likely lies at third, but his bat gave a boost to the struggling offense (.325/.386/.450 in 89 PA).

Senior Jalen Washington manned shortstop and acquitted himself fine defensively and at the plate (.266/.309/.468), and was selected by San Diego in the 28th round. Sophomore third baseman Brady Cherry did not build on the power potential his freshman year seemed to show, hitting four homers in 82 more PA than he had when he hit four in 2016. His overall performance (.260/.333/.410) was about average (-2 RAA).

Outfield was definitely the bright spot for the offense, despite getting little production out of JUCO transfer Tyler Cowles (.190/.309/.314 in 129 PA). Senior Shea Murray emerged from a pitching career marred by injuries to provide adequate production and earn the left field job (.252/.331/.449, 0 RAA) and was drafted in the 18th round by Pittsburgh, albeit as a pitcher. Junior center fielder Tre’ Gantt was the team MVP, hitting .314/.426/.426, leading the team with 18 RAA, and was drafted in the 29th round by Cleveland. True freshman right fielder Dominic Canzone was also a key contributor, challenging for the Big Ten batting average lead (.343/.398/.458 for 8 RAA).

On the mound, OSU never even came close to establishing a starting rotation due to injuries and ineffectiveness. Nine pitchers started a game, and only one of them had greater than 50% of his appearances as a starter. That was senior Jake Post, who went 1-7 over 13 starts with a 6.41 eRA. Sophomore lefty Connor Curlis was most effective, starting eight times for +3 RAA with 8.3/2.7 K/W. He tied for team innings lead with classmate Ryan Feltner, who was -13 RAA with a 6.71 eRA. Junior Yianni Pavloupous, the closer a year ago, was -10 RAA over 40 innings between both roles. Junior Adam Niemeyer missed time with injuries, appearing in just ten games (five starts) for -3 RAA over 34 innings. Freshman Jake Vance was rushed into action and allowed 20 runs and walks in 26 innings (-4 RAA). And JUCO transfer Reece Calvert gave up a shocking 39 runs in 39 innings.

I thought the bullpen would be the strength of the team before the season. In the case of Seth Kinker, I was right. The junior slinger was terrific, pitching 58 innings (21 relief appearances, 3 starts) and leading the team by a huge margin with 13 RAA (8.4/2.0 K/W). But the rest of the bullpen was less effective. Junior Kyle Michalik missed much of the season with injuries and wasn’t that effective when on the mound (6.85 RA and just 4.8 K/9 over 22 innings). Senior Joe Stoll did fine in the LOOGY role, something Beals has brought to OSU, with 3 RAA in 23 innings over 25 appearances. Junior Austin Woodby had a 6.00 RA over 33 innings but deserved better with a 4.79 eRA and 5.5/1.8 K/W. The only other reliever to work more than ten innings was freshman sidearmer Thomas Waning (3 runs, 11 K, 4 W over 12 innings). Again, it’s hard to describe the roles because almost everyone was forced to both start and relieve.

It’s too early to hazard a prognosis for 2018, but given the lack of promising performances from young players, it’s hard to be optimistic. What remains to be seen is whether Smith’s ruthlessness can be transferred from coaches who do not deserve it to those who have earned it in spades. No, baseball is not a revenue sport, and no, baseball is not bringing the athletic department broad media exposure. But when properly curated, the OSU baseball program is a top-tier Big Ten program, with the potential to make runs in the NCAA Tournament, and bring in more revenue than most of the “other” 34 programs that are not football or men’s basketball. Neglected in the hands of a failed coach, it is capable of putting up a .333 W% in conference play. Smith, not Beals, is the man who will most directly impact the future success of the program.

Wednesday, July 12, 2017

Enby Distribution, pt. 3: Enby Distribution Calculator

At this point, I want to re-explain how to use the Enby distribution, step-by-step. While I already did this in part 6 of the original series, I now have the new variance estimator as found by Alan Jordan to plug in, and so to avoid any confusion and to make this is easy if anyone ever wants to implement it themselves, I will recount it all in one location. I will also re-introduce a spreadsheet that you can use to estimate the probability of scoring X runs based on the Enby distribution.

Step 1: Estimate the variance of runs scored per game (VAR) as a function of mean runs/game (RG):

VAR = RG^2/9 + (2/c - 1)*RG
where c is the control value from the Tango Distribution. For normal applications, we’ll assume that c = .767.

Step 2: Use the mean and variance to estimate the parameters (r and B) of the negative binomial distribution:

r = RG^2/(VAR - RG)
B = VAR/RG - 1

B will be retained as a parameter for the Enby distribution.

Step 3: Find the probability of zero runs scored as estimated by the negative binomial distribution (we’ll call this value a):

a = (1 + B)^(-r)

Step 4: Use the Tango Distribution to estimate the probability of being shutout. This will become the Enby distribution parameter z:

z =(RI/(RI + c*RI^2))^9
where RI is runs/inning, which we’ll estimate as RG/9.

Step 5: Use trial and error to estimate a new value of r given the modified value at zero. B and z will stay constant, but r must be chosen so as to ensure that the correct mean RG is returned by the Enby distribution. Use the following formula to estimate the probability of k runs scored per game using the non-modified negative binomial distribution:

q(0) = a
q(k) = (r)(r + 1)(r + 2)(r + 3)…(r + k - 1)*B^k/(k!*(1 + B)^(r + k)) for k >=1

Then modify by taking:

p(0) = z
p(k) = (1 - z)*q(k)/(1 - a)for k >=1

The mean is calculated as:

mean = sum (from k = 1 to infinity) of (k*p(k)) = p(1) + 2*p(2) + 3*p(3) + ...

Now you have the parameters r, B, and z and the probability of scoring k runs in a game.

I previously published a spreadsheet that provided the approximate Enby distribution parameters at each .05 increment of RG between 3 and 7. The link below will take you to an updated version of this calculator. It is updated in two ways: first, the Tango Distribution estimate of variance developed by Alan Jordan is used as in the example above. Secondly, I have added lines for RG levels between 0-3 and 7-15 RG (at intervals of .25). Previously, you could enter in any value between 3-7 RG and the calculator would round it to nearest .05; now I’m going to make you enter a legitimate value yourself or accept whatever vlookup() gives you.

P(x) is the probability of scoring x runs in a game, P(<= x) is the probability of scoring that many or fewer, and P(> x) is the probability of scoring more than x runs.

Enby Calculator

Tuesday, June 20, 2017

Enby Distribution, pt. 2: Revamping the Variance Estimate

All models are approximations of reality, but some are more useful than others. The notion of being able to estimate the runs per game distribution cleanly in one algorithm (rather than patching together runs per inning distributions or using simulators) is one that can be quite useful in estimating winning percentage or trying to distinguish between the effectiveness of team offense beyond similar noting their runs scored total. I’d argue that a runs per game distribution is a fundamentally useful tool in classical sabermetrics.

However, while such a model would be useful, Enby as currently constructed falls well short of being an ideal tool. There are a few major issues:

1) It is not mathematically feasible to solve directly for the parameters of a zero-modified negative binomial distribution, which forces me to use trial and error to estimate Enby coefficients. In doing so, the distribution is no longer able to exactly match the expected mean and variance--instead, I have chosen to match the mean precisely, and hope that the variance is not too badly distorted.

2) The variance that we should expect for runs per game at any given level of average R/G is itself unknown. I developed a simple formula to estimate variance based on some actual team data, but that formula is far from perfect and there’s no particular reason to expect it to perform well outside of the R/G range represented by the data from which it was developed.

3) An issue with run distribution models found by Tom Tango in the course of his research on runs per inning distribution is that the optimal fit for a single team’s distribution may not return optimal results in situations in which two teams are examined simultaneously (such as using the distribution to model winning percentage). One explanation for this phenomenon is the covariance between runs scored and runs allowed in a given game, due to either environmental or strategic causes.

I have recently attempted to improve the Enby distribution by focusing on these obvious flaws. Unfortunately, my findings were not as useful as I had hoped they would be, but I would argue (hope?) that they represent at least small progress in this endeavor.

During the course of writing the original series on this topic, I was made aware of work being done by Alan Jordan, who was developing a spreadsheet that used the Tango Distribution to estimate scoring distributions and winning percentage. One of the underpinnings was that he found (or found work by Darren Glass and Phillip Lowry that demonstrated) that the variance of runs scored per inning as predicted by the Tango Distribution could be calculated as follows (where RI = runs per inning and c is the Tango Distribution constant):

Variance (inning) = RI*(2/c + RI - 1) = RI^2 + (2/c - 1)*RI

Assuming independence of runs per inning (this is a necessary assumption to use the Tango Distribution to estimate runs per game), the variance of runs per game will simply be nine times the variance of runs per inning (assuming of course that there are precisely nine innings per game, as I did in estimating the z parameter of Enby from the Tango Distribution). If we attempt to simply this further by assuming that RI = RG/9, where RG = runs per game:

Variance (game) = 9*(RI^2 + (2/c - 1)*RI) = 9*((RG/9)^2 + (2/c - 1)*RG/9) = RG^2/9 + (2/c - 1)*RG

The traditional value of c used to estimate runs per inning for one team is .767, so if we substitute that for c, we wind up with:

Variance (game) =1.608*RG + .111*RG^2

When I worked on this problem previously, I did not have any theoretical basis for an estimator of variance as a function of RG, so I experimented with a few possibilities and found what appeared to be a workable correlation between mean RG and the ratio of variance to mean. I used linear regression on a set of actual team data (1981-1996) and wound up with an equation that could be written as:

Variance (game) = 1.43*RG + .1345*RG^2

Note the similarities between this equation and the equation based on the Tango Distribution - they both take the form of a quadratic equation less the constant (I purposefully avoided constants in developing my variance estimator so as to avoid unreasonable results at zero and near-zero RG). The coefficients are somewhat different, but the form of the equation is identical.

On one hand, this is wonderful for me, because it vindicates my intuition that this was a reasonable way to estimate variance. On the other hand, this is very disappointing, because I had hoped that Jordan’s insight would allow me to significantly improve the variance estimate. Instead, any gains to be had here are limited to improving the equation by using a more theoretical basis to estimate its coefficients, but there is no change in the form of this equation.

In fact, any revision to the estimator will reduce accuracy over the 1981-96 sample that I am using, since the linear regression already found optimal coefficients for this particular dataset. This by no means should be taken as a claim on my part that the regression-based equation should be used rather than the more theoretically-grounded Tango Distribution estimate, simply an observation that any improvement will not show up given the confines of the data I have at hand.

What about data from out of that set? I have easy access to the four seasons from 2009-2012. In these seasons, major league teams have averaged 4.401 runs per game and the variance of runs scored per game is 9.373. My equation estimates the variance should be 8.90, while the Tango-based formula estimates 9.23. In this case, we could get a near-precise match by using c = .757.

While we know how accurate each estimator is with respect to variance for this case, what happens when we put Enby to use to estimate the run distribution? The Enby parameters for 4.40 RG using my original equation are (B = 1.0218, r = 4.353, z = .0569). If we instead use the Tango estimated variance of 9.23, the parameters become (B = 1.0970, r = 4.041, z = .0569). With that, we can calculate the estimated frequencies of X runs scored using each estimator and compare to the empirical frequencies from 2009-2012:

Eyeballing this, the Tango-based formula is closer for one run, but exacerbates the recurring issue of over-estimating the likelihood of two or three runs. It makes up for this by providing a better estimate at four and five runs, but a worse estimate at six. After that the two are similar, although the Tango estimate provides for more probability in the tail of the distribution, which in this case is consistent with empirical results.

For now, I will move on to another topic, but I will eventually be coming back to this form of the Tango-based variance estimate, re-estimating the parameters for 3-7 RG, and providing an updated Enby calculator, as I do feel that there are distinct advantages to using the theoretical coefficients of the variance estimator rather than my empirical coefficients.

Tuesday, May 09, 2017

Enby Distribution, pt. 1: Pioneers

A few years ago, I attempted to demonstrate that one could do a decent job of estimating the distribution of runs scored per game by using the negative binomial distribution, particularly a zero-modified version given the propensity of an unadulterated negative binomial distribution to underestimate the probability of a shutout. I dubbed this modified distribution Enby.

I’m going to be re-introducing this distribution and adopting a modification to the key formula in this series, but I wanted to start by acknowledging that I am not the first sabermetrician to adopt the negative binomial distribution to the matter of the runs per game distribution. To my knowledge, a zero-modified negative binomial distribution had not been implemented prior to Enby, and while the zero-modification is a significant improvement to the model, it would be disingenuous not to acknowledge and provide an overview of the two previous efforts using the negative binomial distribution of which I am aware.

I acknowledged one of these in the original iteration of this series, but inadvertently overlooked the first. In the early issues of Bill James’ Baseball Analyst newsletter, Dallas Adams published a series of articles on run distributions, ultimately developing an unwieldy formula I discussed in the linked post. What I overlooked was an article in the August 1983 edition in which the author noted that the Poisson distribution worked for hockey, it would not work for baseball because the variance of runs per game is not equal to the mean, but rather is twice the mean. But a "modified Poisson" distribution provided a solution.

The author of the piece? Pete Palmer. Palmer is often overlooked to an undue extent when sabermetric history is recounted. While one could never omit Palmer from such a discussion, his importance is often downplayed. But the sheer volume of methods that he developed or refined is such that I have no qualms about naming him the most important technical sabermetrician by a wide margin. Park factors, run to win converters, linear weights, relative statistics, OPS for better or worse, the construct of an overall metric by adding together runs above average in various discrete components of the game...these were all either pioneered or greatly improved by Palmer. And while it is not nearly as widespread in use as his other innovations, you can add using the negative binomial distribution for the runs per game distribution the list.

Palmer says that he learned about this “modified Poisson” in a book called Facts From Figures by Maroney. The relevant formulas were:

Mean (u) = p/c
Variance (v) = u + u/c
p(0) = (c/(1 + c))^p
p(1) = p(0)*p/(1 + c)
p(2) = p(1)*(p + 1)/(2*(1 + c))
p(3) = p(2)*(p + 2)/(3*(1 + c))
p(n) = p(0)*(p*(p + 1)*(p + 2)*...*(p + n - 1)/(n!*(1 + c)^n)

The text that I used renders the negative binomial distribution as:

p(k) = (1 + B)^(-r) for k = 0
p(k) = (r)(r + 1)(r + 2)(r + 3)…(r + k - 1)*B^k/(k!*(1 + B)^(r + k)) for k >=1
mean (u) = r*B
variance(v) = r*B*(1 + B)

You may be forgiven for not immediately recognizing these two as equivalent; I did not at first glance. But if you recognize that r = p and B = 1/c, then you will find that the mean and variance equations are equivalent and that the formulas for each n or k depending on the nomenclature used are equivalent as well.

So Palmer was positing the negative binomial distribution to model runs scored. He noted that the variance of runs per game is about two times the mean, which is true. In my original Enby implementation, I estimated variance as 1.430*mean + .1345*mean^2, which for the typical mean value of around 4.5 R/G works out to an estimated variance of 9.159, which is 2.04 times the mean. Of course, the model can be made more accurate by allowing the ratio
if variance/mean to vary from two.

The second use of the negative binomial distribution to model runs per game of which I am aware was implemented by Phil Melita. Mr. Melita used it to estimate winning percentage and sent me a copy of his paper (over a decade ago, which is profoundly disturbing in the existential sense). Unfortunately, I am not aware of the paper ever being published so I hesitate to share too much from the copy in my possession.

Melita’s focus was on estimating W%, but he did use negative binomial to look at the run distribution in isolation as well. Unfortunately, I had forgotten his article when I started messing around with various distributions that could be used to model runs per game; when I tried negative binomial and got promising results, I realized that I had seen it before.

So as I begin this update of what I call Enby, I want to be very clear that I am not claiming to have “discovered” the application of the negative binomial distribution in this context. To my knowledge using zero-modification is a new (to sabermetrics) application of the negative binomial, but obviously is a relatively minor twist on the more important task of finding a suitable distribution to use. So if you find that my work in this series has any value at all, remember that Pete Palmer and Phil Melita deserve much of the credit for first applying the negative binomial distribution to runs scored per game.