Walk Like a Sabermetrician: November 2013

Thursday, November 21, 2013

Statistical Meanderings 2013

Below are my annual observations from perusing the end of season stats I post on this blog. They are generally nuggets that I find interesting or amusing rather than an attempt to engage in serious analysis and should be taken in that light. You’ll notice a bit of an Indians bias in terms of what I found interesting:

* Only one team in MLB finished with between 79 and 84 wins, which seems rather remarkable--the 81-81 Diamondbacks. Making the range an even three wins on both sides of .500 (78-84, or more appropriately a W% between .481 and .519), there were two teams in this range (the Angels were 78-84). The last time there were two or fewer teams in this range was 1994, which of course was a strike-shortened season. Prior to that, one must go back to 1978, 1969, 1967 (with 26, 24, and 20 teams in the majors respectively). The last time only one major league team in that range was 1965 as the Cardinals were 80-81, falling in the range, while the Phillies were 85-76 and the Yankees were 77-85. It has been 1937 since there were no teams in this range. There were a whopping ten teams in this range in 1991.

Obviously the particular range I’ve chosen doesn’t have any particular significance, and there are some more rigorous ways one could measure the lack of centrality in 2013 team records.

* No sub-.500 team had an EW% (based on runs scored and allowed) or PW% (based on runs created and runs created allowed) above .500. The Angels were the closest in both (.481 W% with .497 EW% and .497 PW%). Only the Yankees managed a winning record with an EW% or PW% below .500 (.525 W%, .485 EW%, .446 PW%). The RMSE of EW% (Pythagenpat) as a predictor of W% was 3.66 which is definitely lower than the long-term average, although I’ve never looked at the annual breakouts closely enough to tell you if it’s unusually low or not).

* Atlanta had a 56-25 record at home (thanks largely to just 2.96 RA/G at home), which makes one wonder why they’d want to tear Turner Field down; their .690 mark has been matched or exceeded in the last five years only by the 2009 Yankees and Red Sox, 2010 Braves, and 2011 Brewers. On the other hand, Houston was 24-57 at home, tied for fifth-worst since 1961.

The flip side is that Atlanta was the only playoff team(for the sake of this post, I’m counting the two wildcard losers as playoff teams, which I know sets some people off) with a losing road record (Tampa Bay and Cincinnati were both a win better at 41-41). The Mets were .099 points better on the road (they actually had a winning road record at 41-40 but were just 33-48 at home). That was the biggest discrepancy in favor of road since the 2011 Mets, and in the non-Mets category since the 2002 Red Sox.

* I always like to look at the playoff teams by runs above average on offense and defense (park adjusted and just based on runs per game to keep it simple). This often gives me an opportunity to snark about the usual nonsense about pitching being paramount, and this year is no exception:

Note that I’m not making the opposite argument.

* It was probably never a great idea to lump teams into sabermetric and non-sabermetric front office buckets, or assume that the sabermetric front offices would surely produce teams with higher secondary averages, and it’s even sillier to attempt that now. Still, I find it satisfying on some level that the top four teams in secondary average were Oakland, Tampa Bay, Boston, and Cleveland.

* Drew Smyly ranked tenth among AL relievers in RAR with excellent peripherals to back it up. I didn’t realize this, and based on his playoff deployment of Smyly, neither did Jim Leyland.

*Relievers are a little hard to keep track of due to their somewhat fungible nature and the bloated size of modern bullpens--at any given moment there are roughly 210 full-time relievers in the majors. I watch enough MLB Network, enough games of teams around the league, and read enough box scores to be reasonably familiar with all major league players, but there are without fail a couple relievers on the list every year of whom I have no useful knowledge. The highest ranking in RAR was Seattle’s rookie Yoervis Medina, who was 23rd in the AL with 14.

* Brandon McCarthy had something of a disappointing season after signing with Arizona, pitching 135 innings with a 4.68 RRA for 7 RAR. I was amused last offseason, though, that McCarthy signed for 2/$15.5MM while another free agent Brandon signed with the Dodgers for 3/$22.5MM. To the surprise of just about no one, McCarthy was still a much better value than League, who ranked dead last among NL relievers in RAR and strikeout rate (-16 RAR thanks to a 7.11 RRA with a 4.4 KG).

* In the celebration of Ben Cherington’s makeover of the Red Sox that followed their World Series triumph, one move was convieniently glossed over. In pointing this out, I don’t mean to suggest that Cherington was not worthy of praise or that perfection is a reasonable goal. But the Joel Hanrahan trade made little sense to me when it was made, and as Melancon was one of the NL’s best relievers, it looks much worse in retrospect.

* It’s once again time to play: Which Yankee Reliever Whose Name Begins With R Is It?

* Jose Mijares had one of the largest gaps between his eRA (estimated RA based on opponent’s runs created) and dRA (DIPS-style estimate RA) that you’ll ever see as they were 6.53 and 3.57 respectively. This was driven by an eye-popping .428 %H. Granted, he only faced around 230 hitters, but that jumps off the page.

* Q: What do Arolids Chapman, Craig Kimbrel, Kenley Jansen, Jason Grilli, Trevor Rosenthal, Kevin Siegrist, Jim Henderson, Francisco Rodriguez, Manny Parra, Blake Parker, David Carpenter, Jordan Walden, Paco Rodriguez, Pedro Strop, Nick Vincent, Tyler Clippard, Rex Brothers, Steve Cishek, Carlos Marmol, Antonio Bastardo, Sam LeCure, Mike Gonzalez, Mike Dunn, David Hernandez, AJ Ramos, Heath Bell, Mark Melancon, Jake Diekman, JJ Hoover, Tony Sipp, Luke Gregerson, Will Harris, Sergio Romo, Jean Machi, Dale Thayer, Javier Lopez, Adam Ottavino, Jose Mijares, Alex Wood, Tom Gorzelanny, Logan Ondrusek, and Craig Stammen have in common?

A: They were all NL relievers with higher strikeout rates than Jonathan Papelbon. That Papelbon’s KG was 8.6 speaks a lot about the current environment.

*Paul Clemens appeared in 35 games for Houston with a 5.13 RRA over 73 innings for -3 RAR. His peripherals were worse (6.12 eRA, 6.27 dRA, 5.8 KG, 3.1 WG). My honest question: could Roger Clemens have done better?

Speaking more generally of Houston’s bullpen, it had an eRA of 5.76, 1.04 runs higher than the second-worst bullpen (PHI) and 40% higher than the AL average of 4.11. For comparison, the dreadful Arizona pen of 2010 had a 5.54 eRA in a league with an average of 4.35, only 27% higher than average.

I was overly optimistic about the Astros’ outlook this year, but this is an area that an intelligent organization should be able to improve, should they deign to devote any resources to it all.

* Travis Wood was among the better starters in the NL this year, at least from a non-DIPS perspective, pitching 200 innings with a 3.10 RRA and 3.36 eRA. Even if you start from his 4.15 dRA, he was at worst an average starter pitching a lot of innings. Sean Marshall, on the other hand, pitched just 16 innings for the Reds and made about $4 million more. I wouldn’t advise trading a potential starter for a reliever, even a good one like Marshall, particularly when you intend to use that reliever as a LOOGY and when the starter you’re trading could probably fill the reliever’s potential role nearly as well anyway.

* Last year I made a big point of comparing the aggregate performance of Drew Pomeranz and Alex White (not good) to Ubaldo Jimenez (just as bad and a lot more expensive). To be fair, this year I will point out that Ubaldo wiped the floor with them and was a key contributor to the Indians wildcard spot. Jimenez chipped in 35 RAR, good for 24th among AL starters. As you probably know he was his old (Cleveland-style) self in the first half but much better in the second half. This lack of consistency is captured crudely by his QS%--just fifty percent, ranking tied for 46th among AL starters and a tick below the league average of 51%. Jimenez led all AL starters with a below-avergae QS% in RAR and strikeout rate, and was second in innings pitched (behind AJ Griffin) and RAA (behind Alexi Ogando). Only seven AL starters had a RRA better than the league average with a subpar QS%, and three of them pitched for Cleveland (Jimenez, Cory Kluber, and Scott Kazmir).

* The Indians’ starting pitching was easily the worst of any playoff team. Cleveland’s starters had an eRA of 4.55, just ahead of the AL average of 4.60 and 21st in MLB. The next poorest playoff team was Tampa Bay (4.36, 14th in MLB), with the other eight playoff teams ranking in the top ten (only the Nationals and the Cubs missed the playoffs among the top ten). Cleveland starters averaged 5.7 innings/start compared to the league average of 5.9, and only Pittsburgh was similarly poor among playoff teams (also 5.7). Seven of the playoff teams were in the top ten in this category. The Indians’ QS% of 45% was fourth-worst in MLB; Tampa Bay was next worst among playoff teams (49%, 23rd) and six of the playoff teams finished in the top ten.

* How quickly the mighty can fall when they are built on elbows and shoulders: San Francisco had one starting pitcher with positive RAA (Madison Bumgarner) along with the second and third to last NL starters in RAR. Barry Zito ranking down there was no surprise, but Ryan Vogelsong’s magic ride came to a halt with a line that pretty much made him Zito’s right-handed twin:

* Minnesota’s starting pitching was terrible once again; in 2012, they were last in starters’ eRA and second-to last in innings/start and QS%. In 2013, they completed the triple crown--last in IP/S (5.38), QS% (38), and eRA (5.76). No team was even close to being as hapless in this department as Minnesota--Colorado starters worked 5.43 IP/S and had 40% QS, while Houston and Toronto had the next highest eRA (5.24). Rockies starters were actually respectable with a 4.43 eRA versus a NL average of 4.24.

* I came to age as a baseball fan during the mid-90s, so the recent dip in runs scored is difficult for me to process when I peruse the stats--from an analytical perspective I understand the context issue, but there’s something jarring to me about looking at a list of hitters for a league and seeing only seven players with 100 Runs Created as was the case for the NL in 2013 (there were ten in the AL). 2003 is the earliest year for which I have my end of season stats at easy disposal, and in that season 24 NL and 21 AL players created 100 runs.

Another way to express this is to look at the batting lines of NL hitters with 0 HRAA (that is average batters, albeit compared to a league average that includes pitchers). They include Luis Valbuena (.213/.319/.370), Eric Young (.254/.314/.343), Marcell Ozuna (.264/.297/.387), Brandon Crawford (.256/.316/.374), and Jesus Guzman (.232/.300/.388).

The AL average runs scored per game was 4.33, while the NL was at 4.00. For both leagues, it was the lowest scoring output since 1992 (4.32 in the AL, 3.88 in the NL).

* A quick way to see which players had seasons that most surprised me is to look down the list sorted by RAR and find the first name that makes me do a double take. In the AL, that player is definitely Jason Castro. Castro hit .277/.352/.488 over 485 PA for 39 RAR and was arguably the best catcher in the AL as the only two ahead of him on the RAR list spent a significant amount of time at other positions (Carlos Santana and Joe Mauer). Castro was an All-Star, which should have caused me to look at him more closely in-season, but then again I probably just figured that they had to pick someone from Houston.

* Texas’ once-vaunted offense was below average in 2013, scoring 17 fewer runs than an average AL team when adjusted for park. A look down the list of individuals is jarring; only Adrian Beltre, Ian Kinsler, and Nelson Cruz ranked as above average. There’s an interesting case to be made for the Rangers as a cautionary tale for something (anointing a team as the best since the 1998 Yankees in June, maybe? Obviously that was in 2012, not 2013), but I’m not quite sure what it is.

* I list a variant of Bill James’ Speed Score in my stats (I switched from my own knockoff Speed Unit a few years ago because it’s easier to disclaim the results when you just use someone else’s method), but it really serves very little purpose--it's purposefully not expressed in a meaningful unit, it’s a skill measure rather than a value measure and therefore really should consider more data than one season, and the results usually aren’t surprising. One name that popped out at me, though, was Matt Dominguez, who has a Speed Score of 1.1. The AL players with lower Speed Scores are all catchers, first basemen, or DHs, except for fellow third baseman Alberto Callaspo.

I saw five or so Astro games on TV this year but don’t Dominguez’ speed or lack thereof standing out, and my impression was that defense at third was his calling card (not that speed is a key factor for third base defense, but my mental picture of a good third baseman is a big but athletic guy--he wouldn’t have a high speed score, but neither would he be sandwiched between Joe Mauer and Justin Morenau on the list).

But by the components that go into Speed Score, he’s really slow. He’s only attempted one stolen base in 200 major leagues games (and he was caught). He has two triples, but neither came in 2013. And he’s only scored 25% of the time when reaching base, which of course is somewhat attributable to playing for Houston.

* You may have noticed in reading through that I am easily amused by comparisons of players otherwise connected, that is traded for each other or where one replaced the other. My very favorite combination this year are the AL and NL trailers in RAR, who were once swapped as counterweights in the Zack Greinke deal. Alcides Escobar was 12 runs below replacement considering only offense and position, hitting .232/.255/.297 for 2.5 RG over 626 PA. Yuniesky Betancourt was -9 RAR, hitting .211/.238/.354 for 2.7 RG over 405 PA. And I for one am shocked that “Yuniesky Betancourt, first baseman” was a resounding failure.

Friday, November 08, 2013

IBA Ballot: MVP

I think we can all just dust off what we wrote last year, change the numbers a little bit, and save a bunch of time, because the essence of the AL MVP race is once again Cabrera v. Trout. The circumstances have changed a little, though. For one, both had better seasons with the bat in 2013 than they did in 2012 (which serves to illustrate the silliness of positing that leading the league in three particular categories makes a season inherently more valuable than another). Cabrera went from hitting .326/.390/.600 for 8.1 RG in 2012 to .344/.434/.630 for 9.6 RG in 2013. Trout had a less dramatic uptick, from .332/.406/.575 for 8.7 RG to .329/.438/.568 for 9.1 RG. These productivity increases were even more valuable than those figures suggest as the AL’s run/game average dipped from 4.45 to 4.33.

Put it all together (including position, which isn’t a huge difference when comparing a centerfielder and a third baseman using my position adjustments), and the RAR gap between the two is unchanged from 2012--three runs in favor of Trout (81 to 78 in 2012, 93 to 90 in 2013). Fielding and baserunning are still in Trout’s favor, regardless of his slippage in the fielding metrics--Cabrera also saw his fielding metrics take a plunge, and Trout’s -2 FRAA, +4 UZR, and -9 DRS aren’t enough to flip this race. Cabrera’s fielding, if given 100% credibility, might be enough to allow the rest of the field to challenge for the second spot (-13, -17, -18 in those three metrics).

This will mark the fourth consecutive year in which I have placed Cabrera in the #2 position in the AL MVP race, which has to be some kind of “record” (scare quotes since my opinion on awards are not of sufficient heft to constitute a record).

The rest of the ballot is not that interesting to discuss. The top four pitchers are sprinkled in along with Chris Davis, Robinson Cano, Josh Donaldson, and Evan Longoria. I saw no reason to deviate from RAR ordering with those guys except for Longoria, who was slightly behind Carlos Santana and David Ortiz but has a pretty clear fielding advantage over that pair:

1. CF Mike Trout, LAA
2. 3B Miguel Cabrera, DET
3. 1B Chris Davis, BAL
4. SP Max Scherzer, DET
5. SP Yu Darvish, TEX
6. 2B Robinson Cano, NYA
7. 3B Josh Donaldson, OAK
8. SP Hisashi Iwakuma, SEA
9. SP James Shields, KC
10. 3B Evan Longoria, TB

The National League race is actually more interesting, as there are five players who I believe to be very much removed from the rest of the field, any one of whom would make a completely justifiable MVP selection. And since one of the five is a pitcher, there are a number of ancillary issues that come into play.

I’ll set Clayton Kershaw aside for a moment and first discuss the four position player candidates. Two make an easy comparison to each other given position. Joey Votto and Paul Goldschmidt had very similar seasons in terms of overall offensive performance, and very similar numbers in two key broad “shape” categories, yet still achieved those in different ways. Votto had a .303 BA to Goldschmidt’s .296 and a .415 secondary average versus .404 for Goldschmidt. Votto’s SEC was balanced between a .187 walk/at bat ratio (second among all qualified major leaguers behind Mike Trout) and a .185 isolated power (38th in the NL among those with 300 PA). Goldschmidt’s W/AB was .137 (8th in the NL), but his .244 ISO was third.

I estimate that each created about 124 runs, with Votto using 20 less outs to do so, and so he ends up 3 RAR ahead. In the field, Goldschmidt’s metrics come out a little ahead of Votto’s, but not by a large enough margin to tip the comparison. Where Goldschmidt does have a clear edge is in context-dependent metrics like RE24 and WPA; generally I don’t put much weight on these, but Goldschmidt’s advantage is enough to push him just ahead of Votto on my ballot.

Matt Carpenter is also a legitimate candidate, with 66 RAR. Carpenter is a recent convert to second base and his metrics suggest he’s average, which may be a kinder assessment than the eighteen times Mike Matheny inserted him at third base mid-game. However, Carpenter is ranked by Baseball Prospectus as the top baserunner in the game (excluding stolen base attempts which are already considered in my RAR estimates) with an estimated 9 run contribution. Giving full weight to baserunning could move Carpenter to the head of the position player pack.

Andrew McCutchen is the fourth, and he leads the position pack with 71 RAR. His 7.39 RG is an exact match for Goldschmidt; Goldschmidt’s 40 extra PA prevent that comparison from being a runaway. While fielding metrics aren’t and haven’t been universally enthusiastic about McCutchen (-7 FRAA, 7 UZR, 7 DRS in 2013), I don’t think that’s enough to push Goldschmidt/Votto ahead.

So that leaves Kershaw v. McCutchen for NL MVP. Kershaw starts with a 77 to 71 advantage in RAR, but that is based on his actual runs allowed total. Kershaw’s RAR based on his eRA would be 72, and based on dRA it would be just 53. Using either of those figures, there’s no statistical edge for Kershaw; maybe one can create a little space by considering Kershaw’s own hitting, which was pretty good for a pitcher (.187/.238/.266 over 82, probably about 4 runs beyond an average pitcher).

If I’m going to choose a pitcher over a hitter for MVP, I’d prefer that he at least have the edge when using eRA, since the use of a component RA is conceptually the same methodology that is being used to estimate the batter’s contribution through a runs created analysis. That is, both approaches take the components of performance (hits, walks, outs, etc.) and estimate run contributions rather than look at an actual count of runs contributed/allowed.

Of course, pitcher’s runs allowed are more attributable to an individual pitcher than runs scored or batted in or to a batter; while a pitcher’s runs allowed are influenced strongly by his fielding support, and less so by his bullpen support, the pitcher at least bears some responsibility for the situations in which he finds himself (base/out situations). The batter is presented with these situations independently of his own actions. Sequencing does matter, and pitchers have control over it--but so many other factors are in play that I do consider it worthwhile to consider methods that attempt to control for these other factors, be it sequencing (as done in the case of eRA) or fielding (as done bluntly in the case of dRA and other DIPS approaches such as FIP, and attempted more carefully in the case of some other measures like bWAR).

So my natural inclination would be to side with McCutchen, ever so slightly, but in a case like this I think it is useful to bring in the perspective of other methods (In many other cases, looking at different methods is not particularly helpful because the reason for differences is methodological choices about which one is more comfortable, or because the methodologies are quite similar and so differences are minimal). Two two most used methods are Baseball-Reference and Fangraphs’ WAR. I find the latter unhelpful in a case such as this due to its complete reliance on FIP to value pitching; the former estimates that McCutchen was worth 8.2 WAR and Kershaw 7.9--a difference of about three runs.

This race is extremely close, closer still when you consider the narrow margin by which I chose McCutchen over Goldschmidt, Votto, and Carpenter. And in a complete hand waiving of reason, that is what I will use to tilt the scale--that Kershaw was so much better than any other pitcher, while no one hitter could pull away from the pack. Arbitrary and capricious? Yes. Any sillier than any other rationale for separating the two? That’s for you to judge.

The toughest decision for the rest of the ballot is what to do with two players for whom fielding is such an important consideration. Yadier Molina and Carlos Gomez each have 47 RAR, tied for thirteenth in the NL, but Molina’s defense behind the plate is universally lauded and Gomez was rated highly by all the metrics (11, 24, 38). Molina’s fielding value is harder to quantify, and its impact on his overall value is muted by his poor baserunning (a very believable -5 according to BP). I give them enough of a boost to climb over all but one of the other position players ahead of them by six or fewer RAR (Freddie Freeman, Jayson Werth, Hanley Ramirez, Buster Posey, Hunter Pence, Matt Holliday) and the non-Kershaw pitchers, but not above Shin-Soo Choo (59 RAR, bad defensive, extra hit batters) and David Wright (52 RAR with well-regarded fielding and baserunning). I feel bad about leaving Ramirez off the ballot since his 9.4 RG was the highest in MLB among those with 300 PA except for Miguel Cabrera, and 53 RAR in 331 PA is eyepopping, but sketchy fielding makes it a little easier to swallow. My ballot:

1. SP Clayton Kershaw, LA
2. CF Andrew McCutchen, PIT
3. 1B Paul Goldschmidt, ARI
4. 1B Joey Votto, CIN
5. 2B Matt Carpenter, STL
6. CF Shin-Soo Choo, CIN
7. 3B David Wright, NYN
8. C Yadier Molina, STL
9. CF Carlos Gomez, MIL
10. SP Matt Harvey, NYN

Finally, a brief missive on a topic I wrote about in my MVP post last year but thought worth revisiting: the margin of error for advanced metrics (I’ll use RAR, but it applies equally to WAR) and the use of that uncertainty in award discussions. It is good to acknowledge that the metrics we use have an associated level of uncertainty. It is good to recognize that other people’s award picks may be perfectly justifiable, even by your preferred method, due to the uncertainty. It is good to recognize that certain components of an uberstat may be less reliable than other components (fielding v. batting is the most obvious case and the one with the most impact), and adjust one’s rough estimate of uncertainty in the metric accordingly (or regress the components in question prior to aggregation).

But the margin of error should not be used as a backdoor credit for one’s preferred candidate. If the metric you are using can’t distinguish between Paul Goldschmidt and Joey Votto, and you’d like to use your judgment or some non-quantifiable factor to pick Goldschmidt, that’s great. Just don’t try to tell others that they are obligated to do the same. You might think that I am arguing against a strawman here; please don’ t make me search a few message boards to find those making arguments along these lines in last year’s AL MVP debate.

My philosophy is typically to use a metric and follow the results fairly closely in filling out a ballot. I am not saying that this is the only justifiable way to fill out an IBA ballot, but that’s how I choose to do it. Some might dismiss such an approach as an unthinking reliance on a metric, but that ignores all of the thought that has gone into selecting the metric to be used (and more importantly, if you can get away with claiming some credit, the thought that went into developing the metric). If just picking the player with the higher RAR appears to be ducking the question of which player was more valuable by falling back on an easy answer, realize that it’s not--I've already put time into thinking generally about the questions of how to measure value and have a set (but not inflexible) manner of applying that to particular cases.

Additionally, I will tend to defer to differences in the metric, even those that are clearly not meaningful, unless I can be convinced of a good reason to deviate. This does not mean that I think the difference between 65 RAR and 64 RAR is meaningful; if the choice is essentially a coin flip, then I may as well use the metric as the coin. It’s also worth remembering that from a probability distribution, the player who is 65 RAR +/- 10 RAR is more likely to have a higher true RAR than the player who is 64 RAR +/-10 RAR (this is more important when the difference is larger, say five or ten runs).

Wednesday, November 06, 2013

IBA Ballot: Cy Young

There were four AL pitchers who I estimate to have been worth 60 or more RAR in 2013, then a pack of four pitchers with 53 or 54 RAR. These two groups make a natural candidate set for the Cy Young ballot. The less interesting question is which of the four lower RAR pitchers get the #5 position. Bartolo Colon, Chris Sale, Anibal Sanchez, and Felix Hernandez can’t be distinguished based on their RAR, so I go to peripherals to give Sanchez the nod--he has the best eRA of the bunch (just edging out Sale 3.17 to 3.19) and the best dRA (my Base Runs DIPS-style run average); in fact, Sanchez’ 2.88 dRA led all AL pitchers.

The four pitchers vying for the top spot are Hisashi Iwakuma, Yu Darvish, Max Scherzer, and James Shields. Iwaukuma actually leads in RAR at 67, but has two major drawbacks. The first has nothing to do with his pitching but rather with that calculation--it is assuming that Safeco was a neutral park in 2013 based on one-year of data. While my standard procedure is to reset the park factor for a dimension change, it’s truly not correct to treat it as a completely new park. If we assume that Seattle’s PF is a more pitcher-friendly .96, his RAR lead over Darvish dissipates. Iwakuma also benefited from a very low BABIP (.259), although that is not unique to him as Darvish (.267) and Scherzer (.263) also had low figures. However, Iwakuma’s dRA is the worst among the pack at 4.02 (Darvish 3.50, Scherzer 3.19) and his eRA trails as well (3.35, 3.05, 2.81).

Ultimately, it’s Scherzer’s superiority in both peripheral run averages that compels me to place him first. My philosophy has always been that, when assessing value, one should start with the actual runs allowed by the pitcher, but that in cases where two pitchers are very close, peripherals act as a good tiebreaker. The difference of three RAR between Iwakuma/Darvish and Scherzer is minuscule, but Scherzer’s advantages in the peripherals are more significant. When it comes to pitchers, actual runs allowed is very meaningful, and yet still leaves things like bullpen and defensive support completely unaccounted for. Using RAR based on eRA (eRAR) and dRA (dRAR):

Iwakuma (using 1.00 PF): 67 RAR, 54 eRAR, 36 dRAR
Darvish: 64, 58, 47
Scherzer: 61, 65, 66
Shields: 60, 47, 43

Thus, I would fill out my ballot as follows:

1. Max Scherzer, DET
2. Yu Darvish, TEX
3. Hisashi Iwakuma, SEA
4. James Shields, KC
5. Anibal Sanchez, DET

In the NL, there’s no competition at all for the top spot. Clayton Kershaw had 77 RAR, 22 more than his closest competitor. Both Jose Fernandez and Matt Harvey posted similar RRA, eRA, and dRA to Kershaw, but Kershaw pitched 63 more innings than Fernandez and 58 more innings than Harvey, making it no competition from a value perspective. Of the pitchers that could compete on bulk, none come close on quality. Kershaw’s 236 innings trailed only Adam Wainwright (242), and Cliff Lee was next at 223. The only question regarding Kershaw is whether the Cy Young is enough, or whether he was the NL MVP as well.

The spots behind Kershaw on the ballot come down to choosing between two young pitchers in Harvey and Fernandez who pitched brilliantly but didn’t turn in full seasons, and two veteran workhorses in Wainwright and Lee. These four are very closely bunched in terms of RAR, but the youngsters had much lower RRAs and eRAs. In terms of dRA, it’s much closer, but Harvey and Fernandez still both were lower than the vets:

I really don’t see any reason to deviate from the RAR rankings, and filled out my ballot accordingly. That doesn’t mean I’m claiming that the differences in RAR are meaningful; I think they indicate that these four pitchers are indistinguishable in value as measured by RAR. If you believe that the replacement level is set too high, that would be reason to push Lee and Wainwright ahead; I don’t, obviously--my starting pitcher baseline is 128% of the league average runs allowed, which in W% terms is roughly .380. If I felt it was too high (quality wise rather than RA--you can see why people like to use ERA+ even if the scale distorts), I’d lower it; if anything, my inclination would be to raise the replacement level, which would benefit Harvey and Fernandez.

In the end, though, quibbling about spots 2-5 is irrelevant; this is not a year in which down ballot votes should have any impact on the outcome, which should be Clayton Kershaw, unanimous Cy Young winner:

1. Clayton Kershaw, LA
2. Matt Harvey, NYN
3. Jose Fernandez, MIA
4. Cliff Lee, PHI
5. Adam Wainwright, STL

Monday, November 04, 2013

IBA Ballot: Rookie of the Year

In the spirit of full disclosure of the type that no one would even care about, I did not cast a ballot in the IBAs this year; I was busy with some other stuff and forgot about the deadline. But this is how I would have voted if I did. I’m sure the voting went just fine even without my input.

It was not a particularly strong year for American League rookies. JB Shuck led AL rookies with 464 plate appearances, so there weren’t any full-time, full-season position players in the crop. Incidentally, I would love to be able to justify a vote for Shuck given his alma mater, but it wouldn’t be intellectually honest. He demonstrated the ability to be a fifth outfielder but little else, hitting .299 but with a secondary average of just .138 thanks to an .075 isolated power which ranked last among AL corner outfielders (only Ichiro at .080, Melky Cabrera at .081, and the cratering Nick Markakis at .084 failed to crack .100).

The position player who made the strongest case was Wil Myers, whose callup was delayed, holding him to 88 games and 368 PA. In that time, though, he easily led all AL rookies with 25 RAR and was one of the most productive hitters in the league, ranking twelfth with 6.2 RG. The best of the rest among the position players were middle infielders with Seattle’s pair of Brad Miller and Nick Franklin and Boston/Detroit’s Jose Iglesias. However, Iglesias’ fielding metrics did not match his defensive reputation; his BABIP-driven 12 RAR was very close to that of Miller (14) and Franklin (11). Both Miller and Franklin displayed impressive power for middle infielders (.154 and .157 ISO respectively); Franklin had 80 more PA but a BA forty points lower. Defensive metrics did not like Miller (-5, -2, -3 in FRAA, UZR, DRS) but had a mixed take on Franklin (15, -6, 3).

Myers’ best competition for the award came from his teammate Chris Archer, who led AL starters with 27 RAR. Archer’s peripherals were good as well (3.79 eRA), but his .258 BABIP results in just enough of a ding to edge Myers ahead for me. Dan Straily (21 RAR) and Martin Perez (20 RAR) similarly performed less well in DIPS metrics than in actual runs allowed/peripherals. Another Ray, reliever Alex Torres, was good enough to slip into the final spot on my ballot; a 2.02 RRA over 58 innings made him as valuable as Mariano Rivera, leverage and cheap rhetorical tricks aside (18 RAR).

1. RF Wil Myers, TB
2. SP Chris Archer, TB
3. SP Dan Straily, OAK
4. SP Martin Perez, TEX
5. RP Alex Torres, TB

If Wil Myers had played in the NL, he would be on the bubble for a spot on the bottom of the ballot. The top of the ballot belongs to Jose Fernandez, a legitimate candidate for the non-Kershaw division of the Cy Young discussion, who was simply superb with a 2.33 RRA over 173 innings. Say what you will about the way the Marlins organization is managed and the financial consequences of the decision, they were absolutely right that Fernandez was ready for the majors.

Behind him, the next two spots belong to the Dodgers’ key rookies Hyun-jin Ryu and Yasiel Puig. I don’t dock either of them for international experience. Puig only had 418 PA, but when you hit .328/.388/.548 that doesn’t really matter; a full season of that production would have made Puig v. Fernndez a very interesting case. I have Ryu just ahead of Puig in RAR (43 to 41), but Ryu’s dRA wasn’t quite as good as his actual runs allowed which is enough to scoot Puig ahead. Puig faired decently in defensive and baserunning metrics despite the well-publicized questionable decisions, leaving offense-only RAR as a decent gauge of his value.

Three other starters were in the mix, with Julio Teheran, Shelby Miller, and Gerrit Cole all topping 25 RAR, and there are also two more +20 RAR batters in Jedd Gyorko and Matt Adams. Throw in Nolan Arenado, who didn’t hit much (3.6 RG for 8 RAR) but won a Gold Glove and faired great on fielding metrics, and the NL crop puts the AL to shame. Even giving Arenado full credit for his fielding metrics (17 FRAA, 21 UZR, 30 DRS) is only enough to put him just ahead of Gyorko, so I’ll side with the hitting (Gyorko created 4.7 runs/game to Arenado’s 3.6):

1. SP Jose Fernandez, MIA
2. RF Yasiel Puig, LA
3. SP Hyun-jin Ryu, LA
4. SP Julio Teheran, ATL
5. 2B Jedd Gyorko, SD

Sunday, November 03, 2013

End of Season Statistics 2013

The spreadsheets are published as Google Spreadsheets, which you can download in Excel format by changing the extension in the address from "=html" to "=xls". That way you can download them and manipulate things however you see fit. The player spreadsheets are not ready yet, but I want to get the team stuff posted.

The data comes from a number of different sources. Most of the basic data comes from Doug's Stats, which is a very handy site, or Baseball-Reference. KJOK's park database provided some of the data used in the park factors, but for recent seasons park data comes from B-R. Data on pitcher's batted ball types allowed, doubles/triples allowed, and inherited/bequeathed runners comes from Baseball Prospectus.

The basic philosophy behind these stats is to use the simplest methods that have acceptable accuracy. Of course, "acceptable" is in the eye of the beholder, namely me. I use Pythagenpat not because other run/win converters, like a constant RPW or a fixed exponent are not accurate enough for this purpose, but because it's mine and it would be kind of odd if I didn't use it.

If I seem to be a stickler for purity in my critiques of others' methods, I'd contend it is usually in a theoretical sense, not an input sense. So when I exclude hit batters, I'm not saying that hit batters are worthless or that they *should* be ignored; it's just easier not to mess with them and not that much less accurate.

I also don't really have a problem with people using sub-standard methods (say, Basic RC) as long as they acknowledge that they are sub-standard. If someone pretends that Basic RC doesn't undervalue walks or cause problems when applied to extreme individuals, I'll call them on it; if they explain its shortcomings but use it regardless, I accept that. Take these last three paragraphs as my acknowledgment that some of the statistics displayed here have shortcomings as well.

The League spreadsheet is pretty straightforward--it includes league totals and averages for a number of categories, most or all of which are explained at appropriate junctures throughout this piece. The advent of interleague play has created two different sets of league totals--one for the offense of league teams and one for the defense of league teams. Before interleague play, these two were identical. I do not present both sets of totals (you can figure the defensive ones yourself from the team spreadsheet, if you desire), just those for the offenses. The exception is for the defense-specific statistics, like innings pitched and quality starts. The figures for those categories in the league report are for the defenses of the league's teams. However, I do include each league's breakdown of basic pitching stats between starters and relievers (denoted by "s" or "r" prefixes), and so summing those will yield the totals from the pitching side. The one abbreviation you might not recognize is "N"--this is the league average of runs/game for one team, and it will pop up again.

The Team spreadsheet focuses on overall team performance--wins, losses, runs scored, runs allowed. The columns included are: Park Factor (PF), Home Run Park Factor (PFhr), Winning Percentage (W%), Expected W% (EW%), Predicted W% (PW%), wins, losses, runs, runs allowed, Runs Created (RC), Runs Created Allowed (RCA), Home Winning Percentage (HW%), Road Winning Percentage (RW%) [exactly what they sound like--W% at home and on the road], Runs/Game (R/G), Runs Allowed/Game (RA/G), Runs Created/Game (RCG), Runs Created Allowed/Game (RCAG), and Runs Per Game (the average number of runs scored an allowed per game). Ideally, I would use outs as the denominator, but for teams, outs and games are so closely related that I don’t think it’s worth the extra effort.

The runs and Runs Created figures are unadjusted, but the per-game averages are park-adjusted, except for RPG which is also raw. Runs Created and Runs Created Allowed are both based on a simple Base Runs formula. The formula is:

A = H + W - HR - CS
B = (2TB - H - 4HR + .05W + 1.5SB)*.76
C = AB - H
D = HR
Naturally, A*B/(B + C) + D.

I have explained the methodology used to figure the PFs before, but the cliff’s notes version is that they are based on five years of data when applicable, include both runs scored and allowed, and they are regressed towards average (PF = 1), with the amount of regression varying based on the number of years of data used. There are factors for both runs and home runs. The initial PF (not shown) is:

iPF = (H*T/(R*(T - 1) + H) + 1)/2
where H = RPG in home games, R = RPG in road games, T = # teams in league (14 for AL and 16 for NL). Then the iPF is converted to the PF by taking x*iPF + (1-x), where x = .6 if one year of data is used, .7 for 2, .8 for 3, and .9 for 4+.

It is important to note, since there always seems to be confusion about this, that these park factors already incorporate the fact that the average player plays 50% on the road and 50% at home. That is what the adding one and dividing by 2 in the iPF is all about. So if I list Fenway Park with a 1.02 PF, that means that it actually increases RPG by 4%.

In the calculation of the PFs, I did not get picky and take out “home” games that were actually at neutral sites.

There are also Team Offense and Defense spreadsheets. These include the following categories:

Team offense: Plate Appearances, Batting Average (BA), On Base Average (OBA), Slugging Average (SLG), Secondary Average (SEC), Walks per At Bat (WAB), Isolated Power (SLG - BA), R/G at home (hR/G), and R/G on the road (rR/G) BA, OBA, SLG, WAB, and ISO are park-adjusted by dividing by the square root of park factor (or the equivalent; WAB = (OBA - BA)/(1 - OBA) and ISO = SLG - BA).

Team defense: Innings Pitched, BA, OBA, SLG, Innings per Start (IP/S), Starter's eRA (seRA), Reliever's eRA (reRA), Quality Start Percentage (QS%), RA/G at home (hRA/G), RA/G on the road (rRA/G), Battery Mishap Rate (BMR), Modified Fielding Average (mFA), and Defensive Efficiency Record (DER). BA, OBA, and SLG are park-adjusted by dividing by the square root of PF; seRA and reRA are divided by PF.

The three fielding metrics I've included are limited it only to metrics that a) I can calculate myself and b) are based on the basic available data, not specialized PBP data. The three metrics are explained in this post, but here are quick descriptions of each:

1) BMR--wild pitches and passed balls per 100 baserunners = (WP + PB)/(H + W - HR)*100

2) mFA--fielding average removing strikeouts and assists = (PO - K)/(PO - K + E)

3) DER--the Bill James classic, using only the PA-based estimate of plays made. Based on a suggestion by Terpsfan101, I've tweaked the error coefficient. Plays Made = PA - K - H - W - HR - HB - .64E and DER = PM/(PM + H - HR + .64E)

Next are the individual player reports. I defined a starting pitcher as one with 15 or more starts. All other pitchers are eligible to be included as a reliever. If a pitcher has 40 appearances, then they are included. Additionally, if a pitcher has 50 innings and less than 50% of his appearances are starts, he is also included as a reliever (this allows some swingmen type pitchers who wouldn’t meet either the minimum start or appearance standards to get in).

For all of the player reports, ages are based on simply subtracting their year of birth from 2013. I realize that this is not compatible with how ages are usually listed and so “Age 27” doesn’t necessarily correspond to age 27 as I list it, but it makes everything a heckuva lot easier, and I am more interested in comparing the ages of the players to their contemporaries, for which case it makes very little difference. The "R" category records rookie status with a "R" for rookies and a blank for everyone else; I've trusted Baseball Prospectus on this. Also, all players are counted as being on the team with whom they played/pitched (IP or PA as appropriate) the most.

For relievers, the categories listed are: Games, Innings Pitched, estimated Plate Appearances (PA), Run Average (RA), Relief Run Average (RRA), Earned Run Average (ERA), Estimated Run Average (eRA), DIPS Run Average (dRA), Strikeouts per Game (KG), Walks per Game (WG), Guess-Future (G-F), Inherited Runners per Game (IR/G), Batting Average on Balls in Play (%H), Runs Above Average (RAA), and Runs Above Replacement (RAR).

IR/G is per relief appearance (G - GS); it is an interesting thing to look at, I think, in lieu of actual leverage data. You can see which closers come in with runners on base, and which are used nearly exclusively to start innings. Of course, you can’t infer too much; there are bad relievers who come in with a lot of people on base, not because they are being used in high leverage situations, but because they are long men being used in low-leverage situations already out of hand.

For starting pitchers, the columns are: Wins, Losses, Innings Pitched, Estimated Plate Appearances (PA), RA, RRA, ERA, eRA, dRA, KG, WG, G-F, %H, Pitches/Start (P/S), Quality Start Percentage (QS%), RAA, and RAR. RA and ERA you know--R*9/IP or ER*9/IP, park-adjusted by dividing by PF. The formulas for eRA and dRA are based on the same Base Runs equation and they estimate RA, not ERA.

* eRA is based on the actual results allowed by the pitcher (hits, doubles, home runs, walks, strikeouts, etc.). It is park-adjusted by dividing by PF.

* dRA is the classic DIPS-style RA, assuming that the pitcher allows a league average %H, and that his hits in play have a league-average S/D/T split. It is park-adjusted by dividing by PF.

The formula for eRA is:

A = H + W - HR
B = (2*TB - H - 4*HR + .05*W)*.78
C = AB - H = K + (3*IP - K)*x (where x is figured as described below for PA estimation and is typically around .93) = PA (from below) - H - W
eRA = (A*B/(B + C) + HR)*9/IP

To figure dRA, you first need the estimate of PA described below. Then you calculate W, K, and HR per PA (call these %W, %K, and %HR). Percentage of balls in play (BIP%) = 1 - %W - %K - %HR. This is used to calculate the DIPS-friendly estimate of %H (H per PA) as e%H = Lg%H*BIP%.

Now everything has a common denominator of PA, so we can plug into Base Runs:

A = e%H + %W
B = (2*(z*e%H + 4*%HR) - e%H - 5*%HR + .05*%W)*.78
C = 1 - e%H - %W - %HR
cRA = (A*B/(B + C) + %HR)/C*a

z is the league average of total bases per non-HR hit (TB - 4*HR)/(H - HR), and a is the league average of (AB - H) per game.

In the past couple years I’ve presented a couple of batted ball RA estimates. I’ve removed these this year, not just because batted ball data exhibits questionable reliability but because these metrics were complicated to figure, required me to collate the batted ball data, and were not personally useful to me. I figure these stats for my own enjoyment and have in some form or another going back to 1997. I share them here only because I would do it anyway, so if I’m not interested in certain categories, there’s no reason to keep presenting them.

Instead, I’m showing strikeout and walk rate, both expressed as per game. By game I mean not 9 innings but rather the league average of PA/G. I have always been a proponent of using PA and not IP as the denominator for non-run pitching rates, and now the use of per PA rates is widespread. Usually these are expressed as K/PA and W/PA, or equivalently, percentage of PA with a strikeout or walk. I don’t believe that any site publishes these as K and W per equivalent game as I am here. This is not better than K%--it’s simply applying a scalar multiplier. I like it because it generally follows the same scale as the familiar K/9.

To facilitate this, I’ve finally corrected a flaw in the formula I use to estimate plate appearances for pitchers. Previously, I’ve done it the lazy way by not splitting strikeouts out from other outs. I am now using this formula to estimate PA (where PA = AB + W):

PA = K + (3*IP - K)*x + H + W
Where x = league average of (AB - H - K)/(3*IP - K)

Then KG = K*Lg(PA/G) and WG = W*Lg(PA/G).

G-F is a junk stat, included here out of habit because I've been including it for years. It was intended to give a quick read of a pitcher's expected performance in the next season, based on eRA and strikeout rate. Although the numbers vaguely resemble RAs, it's actually unitless. As a rule of thumb, anything under four is pretty good for a starter. G-F = 4.46 + .095(eRA) - .113(K*9/IP). It is a junk stat. JUNK STAT JUNK STAT JUNK STAT. Got it?

%H is BABIP, more or less--%H = (H - HR)/(PA - HR - K - W), where PA was estimated above. Pitches/Start includes all appearances, so I've counted relief appearances as one-half of a start (P/S = Pitches/(.5*G + .5*GS). QS% is just QS/(G - GS); I don't think it's particularly useful, but Doug's Stats include QS so I include it.

I've used a stat called Relief Run Average (RRA) in the past, based on Sky Andrecheck's article in the August 1999 By the Numbers; that one only used inherited runners, but I've revised it to include bequeathed runners as well, making it equally applicable to starters and relievers. I am using RRA as the building block for baselined value estimates for all pitchers this year. I explained RRA in this article , but the bottom line formulas are:

BRSV = BRS - BR*i*sqrt(PF)
IRSV = IR*i*sqrt(PF) - IRS
RRA = ((R - (BRSV + IRSV))*9/IP)/PF

The two baselined stats are Runs Above Average (RAA) and Runs Above Replacement (RAR). RAA uses the league average runs/game (N) for both starters and relievers, while RAR uses separate replacement levels for starters and relievers. Thus, RAA and RAR will be pretty close for relievers:

RAA = (N - RRA)*IP/9
RAR (relievers) = (1.11*N - RRA)*IP/9
RAR (starters) = (1.28*N - RRA)*IP/9

All players with 300 or more plate appearances are included in the Hitters spreadsheets (along with some players close to the cutoff point who I was interested in). Each is assigned one position, the one at which they appeared in the most games. The statistics presented are: Games played (G), Plate Appearances (PA), Outs (O), Batting Average (BA), On Base Average (OBA), Slugging Average (SLG), Secondary Average (SEC), Runs Created (RC), Runs Created per Game (RG), Speed Score (SS), Hitting Runs Above Average (HRAA), Runs Above Average (RAA), Hitting Runs Above Replacement (HRAR), and Runs Above Replacement (RAR).

I do not bother to include hit batters, so take note of that for players who do get plunked a lot. Therefore, PA are simply AB + W. Outs are AB - H + CS. BA and SLG you know, but remember that without HB and SF, OBA is just (H + W)/(AB + W). Secondary Average = (TB - H + W)/AB = SLG - BA + (OBA - BA)/(1 - OBA). I have not included net steals as many people (and Bill James himself) do--it is solely hitting events.

BA, OBA, and SLG are park-adjusted by dividing by the square root of PF. This is an approximation, of course, but I'm satisfied that it works well. The goal here is to adjust for the win value of offensive events, not to quantify the exact park effect on the given rate. I use the BA/OBA/SLG-based formula to figure SEC, so it is park-adjusted as well.

Runs Created is actually Paul Johnson's ERP, more or less. Ideally, I would use a custom linear weights formula for the given league, but ERP is just so darn simple and close to the mark that it’s hard to pass up. I still use the term “RC” partially as a homage to Bill James (seriously, I really like and respect him even if I’ve said negative things about RC and Win Shares), and also because it is just a good term. I like the thought put in your head when you hear “creating” a run better than “producing”, “manufacturing”, “generating”, etc. to say nothing of names like “equivalent” or “extrapolated” runs. None of that is said to put down the creators of those methods--there just aren’t a lot of good, unique names available. Anyway, RC = (TB + .8H + W + .7SB - CS - .3AB)*.322.

RC is park adjusted by dividing by PF, making all of the value stats that follow park adjusted as well. RG, the rate, is RC/O*25.5. I do not believe that outs are the proper denominator for an individual rate stat, but I also do not believe that the distortions caused are that bad. (I still intend to finish my rate stat series and discuss all of the options in excruciating detail, but alas you’ll have to take my word for it now).

I have decided to switch to a watered-down version of Bill James' Speed Score this year; I only use four of his categories. Previously I used my own knockoff version called Speed Unit, but trying to keep it from breaking down every few years was a wasted effort.

Speed Score is the average of four components, which I'll call a, b, c, and d:

a = ((SB + 3)/(SB + CS + 7) - .4)*20
b = sqrt((SB + CS)/(S + W))*14.3
c = ((R - HR)/(H + W - HR) - .1)*25
d = T/(AB - HR - K)*450

James actually uses a sliding scale for the triples component, but it strikes me as needlessly complex and so I've streamlined it. I also changed some of his division to mathematically equivalent multiplications.

There are a whopping four categories that compare to a baseline; two for average, two for replacement. Hitting RAA compares to a league average hitter; it is in the vein of Pete Palmer’s Batting Runs. RAA compares to an average hitter at the player’s primary position. Hitting RAR compares to a “replacement level” hitter; RAR compares to a replacement level hitter at the player’s primary position. The formulas are:

HRAA = (RG - N)*O/25.5
RAA = (RG - N*PADJ)*O/25.5
HRAR = (RG - .73*N)*O/25.5
RAR = (RG - .73*N*PADJ)*O/25.5

PADJ is the position adjustment, and it is based on 2002-2011 offensive data. For catchers it is .89; for 1B/DH, 1.17; for 2B, .97; for 3B, 1.03; for SS, .93; for LF/RF, 1.13; and for CF, 1.02. I had been using the 1992-2001 data as a basis for the last ten years, but finally have done an update. I’m a little hesitant about this update, as the middle infield positions are the biggest movers (higher positional adjustments, meaning less positional credit). I have no qualms for second base, but the shortstop PADJ is out of line with the other position adjustments widely in use and feels a bit high to me. But there are some decent points to be made in favor of offensive adjustments, and I’ll have a bit more on this topic in general below.

That was the mechanics of the calculations; now I'll twist myself into knots trying to justify them. If you only care about the how and not the why, stop reading now.

The first thing that should be covered is the philosophical position behind the statistics posted here. They fall on the continuum of ability and value in what I have called "performance". Performance is a technical-sounding way of saying "Whatever arbitrary combination of ability and value I prefer".

With respect to park adjustments, I am not interested in how any particular player is affected, so there is no separate adjustment for lefties and righties for instance. The park factor is an attempt to determine how the park affects run scoring rates, and thus the win value of runs.

I apply the park factor directly to the player's statistics, but it could also be applied to the league context. The advantage to doing it my way is that it allows you to compare the component statistics (like Runs Created or OBA) on a park-adjusted basis. The drawback is that it creates a new theoretical universe, one in which all parks are equal, rather than leaving the player grounded in the actual context in which he played and evaluating how that context (and not the player's statistics) was altered by the park.

The good news is that the two approaches are essentially equivalent; in fact, they are equivalent if you assume that the Runs Per Win factor is equal to the RPG. Suppose that we have a player in an extreme park (PF = 1.15, approximately like Coors Field pre-humidor) who has an 8 RG before adjusting for park, while making 350 outs in a 4.5 N league. The first method of park adjustment, the one I use, converts his value into a neutral park, so his RG is now 8/1.15 = 6.957. We can now compare him directly to the league average:

RAA = (6.957 - 4.5)*350/25.5 = +33.72

The second method would be to adjust the league context. If N = 4.5, then the average player in this park will create 4.5*1.15 = 5.175 runs. Now, to figure RAA, we can use the unadjusted RG of 8:

RAA = (8 - 5.175)*350/25.5 = +38.77

These are not the same, as you can obviously see. The reason for this is that they take place in two different contexts. The first figure is in a 9 RPG (2*4.5) context; the second figure is in a 10.35 RPG (2*4.5*1.15) context. Runs have different values in different contexts; that is why we have RPW converters in the first place. If we convert to WAA (using RPW = RPG, which is only an approximation, so it's usually not as tidy as it appears below), then we have:

WAA = 33.72/9 = +3.75
WAA = 38.77/10.35 = +3.75

Once you convert to wins, the two approaches are equivalent. The other nice thing about the first approach is that once you park-adjust, everyone in the league is in the same context, and you can dispense with the need for converting to wins at all. You still might want to convert to wins, and you'll need to do so if you are comparing the 2010 players to players from other league-seasons (including between the AL and NL in the same year), but if you are only looking to compare Jose Bautista to Miguel Cabrera, it's not necessary. WAR is somewhat ubiquitous now, but personally I prefer runs when possible--why mess with decimal points if you don't have to?

The park factors used to adjust player stats here are run-based. Thus, they make no effort to project what a player "would have done" in a neutral park, or account for the difference effects parks have on specific events (walks, home runs, BA) or types of players. They simply account for the difference in run environment that is caused by the park (as best I can measure it). As such, they don't evaluate a player within the actual run context of his team's games; they attempt to restate the player's performance as an equivalent performance in a neutral park.

I suppose I should also justify the use of sqrt(PF) for adjusting component statistics. The classic defense given for this approach relies on basic Runs Created--runs are proportional to OBA*SLG, and OBA*SLG/PF = OBA/sqrt(PF)*SLG/sqrt(PF). While RC may be an antiquated tool, you will find that the square root adjustment is fairly compatible with linear weights or Base Runs as well. I am not going to take the space to demonstrate this claim here, but I will some time in the future.

Many value figures published around the sabersphere adjust for the difference in quality level between the AL and NL. I don't, but this is a thorny area where there is no right or wrong answer as far as I'm concerned. I also do not make an adjustment in the league averages for the fact that the overall NL averages include pitcher batting and the AL does not (not quite true in the era of interleague play, but you get my drift).

The difference between the leagues may not be precisely calculable, and it certainly is not constant, but it is real. If the average player in the AL is better than the average player in the NL, it is perfectly reasonable to expect the average AL player to have more RAR than the average NL player, and that will not happen without some type of adjustment. On the other hand, if you are only interested in evaluating a player relative to his own league, such an adjustment is not necessarily welcome.

The league argument only applies cleanly to metrics baselined to average. Since replacement level compares the given player to a theoretical player that can be acquired on the cheap, the same pool of potential replacement players should by definition be available to the teams of each league. One could argue that if the two leagues don't have equal talent at the major league level, they might not have equal access to replacement level talent--except such an argument is at odds with the notion that replacement level represents talent that is truly "freely available".

So it's hard to justify the approach I take, which is to set replacement level relative to the average runs scored in each league, with no adjustment for the difference in the leagues. The best justification is that it's simple and it treats each league as its own universe, even if in reality they are connected.

The replacement levels I have used here are very much in line with the values used by other sabermetricians. This is based both on my own "research", my interpretation of other's people research, and a desire to not stray from consensus and make the values unhelpful to the majority of people who may encounter them.

Replacement level is certainly not settled science. There is always going to be room to disagree on what the baseline should be. Even if you agree it should be "replacement level", any estimate of where it should be set is just that--an estimate. Average is clean and fairly straightforward, even if its utility is questionable; replacement level is inherently messy. So I offer the average baseline as well.

For position players, replacement level is set at 73% of the positional average RG (since there's a history of discussing replacement level in terms of winning percentages, this is roughly equivalent to .350). For starting pitchers, it is set at 128% of the league average RA (.380), and for relievers it is set at 111% (.450).

I am still using an analytical structure that makes the comparison to replacement level for a position player by applying it to his hitting statistics. This is the approach taken by Keith Woolner in VORP (and some other earlier replacement level implementations), but the newer metrics (among them Rally and Fangraphs' WAR) handle replacement level by subtracting a set number of runs from the player's total runs above average in a number of different areas (batting, fielding, baserunning, positional value, etc.), which for lack of a better term I will call the subtraction approach.

The offensive positional adjustment makes the inherent assumption that the average player at each position is equally valuable. I think that this is close to being true, but it is not quite true. The ideal approach would be to use a defensive positional adjustment, since the real difference between a first baseman and a shortstop is their defensive value. When you bat, all runs count the same, whether you create them as a first baseman or as a shortstop.

That being said, using "replacement hitter at position" does not cause too many distortions. It is not theoretically correct, but it is practically powerful. For one thing, most players, even those at key defensive positions, are chosen first and foremost for their offense. Empirical work by Keith Woolner has shown that the replacement level hitting performance is about the same for every position, relative to the positional average.

Figuring what the defensive positional adjustment should be, though, is easier said than done. Therefore, I use the offensive positional adjustment. So if you want to criticize that choice, or criticize the numbers that result, be my guest. But do not claim that I am holding this up as the correct analytical structure. I am holding it up as the most simple and straightforward structure that conforms to reality reasonably well, and because while the numbers may be flawed, they are at least based on an objective formula that I can figure myself. If you feel comfortable with some other assumptions, please feel free to ignore mine.

That still does not justify the use of HRAR--hitting runs above replacement--which compares each hitter, regardless of position, to 73% of the league average. Basically, this is just a way to give an overall measure of offensive production without regard for position with a low baseline. It doesn't have any real baseball meaning.

A player who creates runs at 90% of the league average could be above-average (if he's a shortstop or catcher, or a great fielder at a less important fielding position), or sub-replacement level (DHs that create 4 runs a game are not valuable properties). Every player is chosen because his total value, both hitting and fielding, is sufficient to justify his inclusion on the team. HRAR fails even if you try to justify it with a thought experiment about a world in which defense doesn't matter, because in that case the absolute replacement level (in terms of RG, without accounting for the league average) would be much higher than it is currently.

The specific positional adjustments I use are based on 1992-2001 data. There's no particular reason for not updating them; at the time I started using them, they represented the ten most recent years. I have stuck with them because I have not seen compelling evidence of a change in the degree of difficulty or scarcity between the positions between now and then, and because I think they are fairly reasonable. The positions for which they diverge the most from the defensive position adjustments in common use are 2B, 3B, and CF. Second base is considered a premium position by the offensive PADJ (.94), while third base and center field are both neutral (1.01 and 1.02).

Another flaw is that the PADJ is applied to the overall league average RG, which is artificially low for the NL because of pitcher's batting. When using the actual league average runs/game, it's tough to just remove pitchers--any adjustment would be an estimate. If you use the league total of runs created instead, it is a much easier fix.

One other note on this topic is that since the offensive PADJ is a proxy for average defensive value by position, ideally it would be applied by tying it to defensive playing time. I have done it by outs, though.

The reason I have taken this flawed path is because 1) it ties the position adjustment directly into the RAR formula rather then leaving it as something to subtract on the outside and more importantly 2) there’s no straightforward way to do it. The best would be to use defensive innings--set the full-time player to X defensive innings, figure how Derek Jeter’s innings compared to X, and adjust his PADJ accordingly. Games in the field or games played are dicey because they can cause distortion for defensive replacements. Plate Appearances avoid the problem that outs have of being highly related to player quality, but they still carry the illogic of basing it on offensive playing time. And of course the differences here are going to be fairly small (a few runs). That is not to say that this way is preferable, but it’s not horrible either, at least as far as I can tell.

To compare this approach to the subtraction approach, start by assuming that a replacement level shortstop would create .86*.73*4.5 = 2.825 RG (or would perform at an overall level of equivalent value to being an average fielder at shortstop while creating 2.825 runs per game). Suppose that we are comparing two shortstops, each of whom compiled 600 PA and played an equal number of defensive games and innings (and thus would have the same positional adjustment using the subtraction approach). Alpha made 380 outs and Bravo made 410 outs, and each ranked as dead-on average in the field.

The difference in overall RAR between the two using the subtraction approach would be equal to the difference between their offensive RAA compared to the league average. Assuming the league average is 4.5 runs, and that both Alpha and Bravo created 75 runs, their offensive RAAs are:

Alpha = (75*25.5/380 - 4.5)*380/25.5 = +7.94

Similarly, Bravo is at +2.65, and so the difference between them will be 5.29 RAR.

Using the flawed approach, Alpha's RAR will be:

(75*25.5/380 - 4.5*.73*.86)*380/25.5 = +32.90

Bravo's RAR will be +29.58, a difference of 3.32 RAR, which is two runs off of the difference using the subtraction approach.

The downside to using PA is that you really need to consider park effects if you, whereas outs allow you to sidestep park effects. Outs are constant; plate appearances are linked to OBA. Thus, they not only depend on the offensive context (including park factor), but also on the quality of one's team. Of course, attempting to adjust for team PA differences opens a huge can of worms which is not really relevant; for now, the point is that using outs for individual players causes distortions, sometimes trivial and sometimes bothersome, but almost always makes one's life easier.

I do not include fielding (or baserunning outside of steals, although that is a trivial consideration in comparison) in the RAR figures--they cover offense and positional value only). This in no way means that I do not believe that fielding is an important consideration in player valuation. However, two of the key principles of these stat reports are 1) not incorporating any data that is not readily available and 2) not simply including other people's results (of course I borrow heavily from other people's methods, but only adapting methodology that I can apply myself).

Any fielding metric worth its salt will fail to meet either criterion--they use zone data or play-by-play data which I do not have easy access to. I do not have a fielding metric that I have stapled together myself, and so I would have to simply lift other analysts' figures.

Setting the practical reason for not including fielding aside, I do have some reservations about lumping fielding and hitting value together in one number because of the obvious differences in reliability between offensive and fielding metrics. In theory, they absolutely should be put together. But in practice, I believe it would be better to regress the fielding metric to a point at which it would be roughly equivalent in reliability to the offensive metric.

Offensive metrics have error bars associated with them, too, of course, and in evaluating a single season's value, I don't care about the vagaries that we often lump together as "luck". Still, there are errors in our assessment of linear weight values and players that collect an unusual proportion of infield hits or hits to the left side, errors in estimation of park factor, and any number of other factors that make their events more or less valuable than an average event of that type.

Fielding metrics offer up all of that and more, as we cannot be nearly as certain of true successes and failures as we are when analyzing offense. Recent investigations, particularly by Colin Wyers, have raised even more questions about the level of uncertainty. So, even if I was including a fielding value, my approach would be to assume that the offensive value was 100% reliable (which it isn't), and regress the fielding metric relative to that (so if the offensive metric was actually 70% reliable, and the fielding metric 40% reliable, I'd treat the fielding metric as .4/.7 = 57% reliable when tacking it on, to illustrate with a simplified and completely made up example presuming that one could have a precise estimate of nebulous "reliability").

Given the inherent assumption of the offensive PADJ that all positions are equally valuable, once RAR has been figured for a player, fielding value can be accounted for by adding on his runs above average relative to a player at his own position. If there is a shortstop that is -2 runs defensively versus an average shortstop, he is without a doubt a plus defensive player, and a more valuable defensive player than a first baseman who was +1 run better than an average first baseman. Regardless, since it was implicitly assumed that they are both average defensively for their position when RAR was calculated, the shortstop will see his value docked two runs. This DOES NOT MEAN that the shortstop has been penalized for his defense. The whole process of accounting for positional differences, going from hitting RAR to positional RAR, has benefited him.

I've found that there is often confusion about the treatment of first baseman and designated hitters in my PADJ methodology, since I consider DHs as in the same pool as first baseman. The fact of the matter is that first baseman outhit DH. There is any number of potential explanations for this; DHs are often old or injured, players hit worse when DHing than they do when playing the field, etc. This actually helps first baseman, since the DHs drag the average production of the pool down, thus resulting in a lower replacement level than I would get if I considered first baseman alone.

However, this method does assume that a 1B and a DH have equal defensive value. Obviously, a DH has no defensive value. What I advocate to correct this is to treat a DH as a bad defensive first baseman, and thus knock another five or ten runs off of his RAR for a full-time player. I do not incorporate this into the published numbers, but you should keep it in mind. However, there is no need to adjust the figures for first baseman upwards --the only necessary adjustment is to take the DHs down a notch.

Finally, I consider each player at his primary defensive position (defined as where he appears in the most games), and do not weight the PADJ by playing time. This does shortchange a player like Ben Zobrist (who saw significant time at a tougher position than his primary position), and unduly boost a player like Joe Mauer (who logged a lot of games at a much easier position than his primary position). For most players, though, it doesn't matter much. I find it preferable to make manual adjustments for the unusual cases rather than add another layer of complexity to the whole endeavor.

2013 Leagues

2013 Teams

2013 Team Offense

2013 Team Defense

2013 AL Relievers

2013 NL Relievers

2013 AL Starters

2013 NL Starters

2013 AL Hitters

2013 NL Hitters

Walk Like a Sabermetrician

Thursday, November 21, 2013

Statistical Meanderings 2013

Friday, November 08, 2013

IBA Ballot: MVP

Wednesday, November 06, 2013

IBA Ballot: Cy Young

Monday, November 04, 2013

IBA Ballot: Rookie of the Year

Sunday, November 03, 2013

End of Season Statistics 2013

Me, Elsewhere

Analysis Links

Reference Links

Blog Archive

OSU Baseball

End of Season Statistics

Win Shares Walkthrough

NL 1876-1881 Series

Labels

About Me