Tuesday, July 26, 2016

The Willie Davis Method and OPS+ Park Factors

This post is going to lay out the math necessary to apply the so-called "Willie Davis method" of Bill James to his Runs Created and to Base Runs. The last three posts have explained how you can use it with Linear Weights. This is offered in the interest of comprehensiveness, should you decide that you’d like to fiddle with this stuff yourself.

The Willie Davis method as explained by Bill James is based on the most basic RC formula, (H + W)*TB/(AB + W). You could use one of the technical RC versions too, of course, but then you would introduce the problem of what to do with all of the ancillary categories that are included in those versions. A minor modification that would help matters is to give a weight to walks in the B factor (which is simply TB in the basic version), but James has never done that as it would complicate the basic version and mess up all of the neat little RC properties like OBA*SLG*AB = runs.

While I tried to emphasize that I wouldn’t take any of the results from the linear weight translations too seriously, the output of the Willie Davis method is actually used by Sean Forman to calculate OPS+ at baseball-reference.com. So while James used it in the vein that I advocate, Forman uses it to park-adjust the most-looked at total offensive statistic at his site. For this reason, I’ll compare park-adjusted OPS+ figured by his method to what I would do later in the post.

To apply the Willie Davis method to RC, first define a = 1 + W/H, b = TB/H, and outs as AB-H. You also need to calculate New RC, which I will abbreviate as N. That is just regular RC times the adjustment factor you are using (in a park case, if the PF is 1.05 then N is RC*1.05). Then this relationship holds:

N = (a*H)*(b*H)/(a*H + Outs)

This can be manipulated into a quadratic equation:

abH^2 - NaH - N*Outs = 0

And then we can use the quadratic equation to solve for H, which we’ll call H’:

H' = (Na + sqrt((Na)^2 + 4ab(N*Outs)))/(2ab)

The adjustment factor for all of the basic components (S, D, T, HR, W with outs staying fixed) is simply H'/H. So we multiply the positive events by H'/H and the result is a translated batting line.

Since we have applied this type of approach to RC and LW, we might as well do it for Base Runs as well. Allow me to start with this particular BsR equation, published some time ago by David Smyth:

A = S + D + T + W
B = .78S + 2.34D + 3.9T + 2.34HR + .039W
C = AB - H = outs
D = HR

BsR is of course A*B/(B + C) + D, and New BsR (N) is BsR*adjustment factor. To write everything in terms of singles, let’s define a, b, and c (of course, I didn’t realize until after I wrote this that a, b, and c are terrible abbreviations in this case, but I already had them in my spreadsheet and it would have been a real pain to change everything):

a = (S + D + T + W)/S

b = (.78S + 2.34D + 3.9T + 2.34HR + .039W)/S

c = HR/S

Then we need to solve for S' (the new number of singles) in this equation:

aS'*bS'/(bS' + Outs) + cS' = N

This results in a quadratic equation just as the RC approach does, and it can be solved:

S' = (Nb - cOuts + sqrt((cOuts - Nb)^2 + 4(NOuts)(ab + bc)))/(2*(ab + bc))

S'/S is then the multiplier for all of the positive events.

So we have three different approaches based on three different run estimators to accomplish the same task. Which one should be used? Unfortunately, there’s no good empirical way to test these approaches; the entire point of having them is to make estimates of equivalent value under different conditions…i.e. conditions that did not occur in reality.

However, I think it should be self-evident that the quality of the model from which the estimate is derived says a lot about its value. I don’t need to beat that horse again, but it is well-known that Basic RC is not a very good estimator when applied to individuals, which is exactly what we are doing here.

It would also follow that the Linear Weights-based approach should be marginally better than the Base Runs-based approach since BsR should not be applied directly to individuals. Since BsR is better constructed than RC, though, the discrepancies shouldn’t be as bothersome.

I am going to use the three approaches to derive park-adjusted BA, OBA, and SLG for the 1995 Rockies. In all of the calculations, I am using a 1.23 park factor for Coors Field. The unshaded columns are the players’ raw, unadjusted numbers; the pink columns are adjusted by the RC approach, orange by the ERP approach, and yellow by the BsR approach:

From eyeballing the numbers, I’d say that there is a strong degree of agreement between the ERP and BsR estimates, with the RC estimates as the outliers. As mentioned above, this is along the lines of what I would have expected to see, as both ERP and BsR are better models of the run scoring process than RC. That the ERP and BsR results are close should not come as a surprise, as both estimators give similar weight to each event.

Using RC results in a less severe park adjustment for most players. Why is this? My guess is that it is because RC, with it’s well-known flaw of overvaluing high-end performance, naturally needs to draw down the player’s OBA and SLG less then ERP or BsR to still maintain a high performance. In other words, RC overestimates Larry Walker’s run contribution to begin with, and since the problem only gets worse as OBA and SLG increase, it doesn’t take that that big of a change in OBA or SLG to reduce run value by X%.

As I mentioned earlier, I think it is worth looking at the Willie Davis method closely since some sources (particularly Baseball-Reference) use it for serious things like park-adjusting OPS+. This is in contrast to the position of its creator, Bill James, who presented it more as a toy that yields a rough estimate of what an equal value performance would look like in a different environment.

So, here are the OPS+ figures for the 1995 Rockies figured seven different ways. Let me note off the bat that I am using OBA = (H + W)/(AB + W); for this reason and the fact that I am using my own Coors PF, we should not anticipate exact agreement between these OPS+ results and the ones on Baseball-Reference. The league OBA in the 1995 NL was .328, and SLG was .408, so the basic formula for OPS+ is:

OPS+ = 100*(OBA/.328 + SLG/.408 - 1)

The first column of the table, "unadj" uses the player’s raw stats with no park adjustment. The second column, "trad", reflects the traditional method of figuring OPS+ used by Pete Palmer in The Hidden Game of Baseball, Total Baseball, and the ESPN Baseball Encyclopedia: simply divide OPS+ by the runs park factor (1.23) in this case.

The third column, "sqrt", adjusts OBA and SLG separately by dividing each by the square root of park factor, and uses these adjusted figures in the OPS+ formula above (*). The fourth column, "reg", uses the runs factor to estimate an OPS+ park factor based on a regression equation that relates OPS+ to adjusted runs/out (this is covered in the digression as well).

Finally there are three shaded columns, which use the translated OBA and SLG results from RC, ERP, and BsR respectively as the inputs into the OPS+ equations:

What can we see from this? The traditional approach is more severe than any of the Willie Davis approaches, while the square root approach is a pretty good match for the Willie Davis approaches. Thus, I suggest that the best combination of ease and accuracy in calculating OPS+ is to divide OBA and SLG by the square root of park factor, then plug the adjusted OBA and SLG into the OPS+ equation.

Of course, I should point out that 1995 Coors Field and its 1.23 park factor is one of the most extreme cases in the history of the game. For run of the mill environments, we should expect to see little difference regardless of how the park adjustments are applied, and so I am NOT saying that you should disregard the OPS+ figures on Baseball-Reference (although I do wish that OPS+ would be pushed aside in favor of better comprehensive rate stats). On the other hand, though, I see no reason to use a complicated park adjustment method like the Wille Davis approach when there are much easier approaches which we have some reason to believe better reflect true value.

(*) I shunted some topics down here into a digression because it covers a lot of ground that I’ve covered before and is even drier than what is above. And a lot of sabermetricians are sick and tired of talking about OPS, and I don’t blame them, so just skip this part if you don’t want to rehash it.

As I’ve explained before, OPS+ can be thought of as a quick approximation of runs/out. Some novice sabermetricians are surprised when they discover that OPS+ is adjusted OBA plus adjusted SLG minus one rather than OPS divided by league OPS. And it’s true that the name OPS+ can be misleading, but it is also true that it is a much better metric. One reason is that OPS/LgOPS does not have a 1:1 relationship with runs/out; it has a 2:1 relationship. If a team is 5% above the league average in OPS, your best guess is that they will score 10% more runs. So the OPS/LgOPS ratio has no inherent meaning; to convert it to an estimated unit, you would have to multiply by two and subtract one.

The other reason why OPS+ is superior is that it gives a higher weight to OBA. It doesn’t go far enough--the OBA weight should be something like 1.7 (assuming SLG is weighted at 1), while OPS+ only brings it up to around 1.2--insufficient, but still better than nothing.

Anyway, if you run a regression to estimate adjusted runs/out from OPS+, you find that it’s pretty close to a 1:1 relationship, particularly if you include HB in your OBA. I haven’t though, and so the relationship is something like 1.06(OPS+) - .06 = adjusted runs/out (again, it should be very close to 1:1 if you calculate OPS+ like a non-lazy person). The "reg" park adjustment, then, is to substitute the park factor for adjusted runs/out and solve for OPS+, giving an OPS+ park factor:

OPS+ park factor = (runs park factor + .06)/1.06

The slope of the line relating OPS+ to runs/out is not particularly steep, and so this is an almost negligible adjustment--for Coors Field and its 1.23 run park factor, we get a 1.217 OPS+ park factor.

Now a word about the traditional runs factor v. the individual square root adjustments. Since OPS+ is being used as a stand-in for run creation relative to the league average, I would assume that the goal in choosing a park adjustment approach is to provide the best match between adjusted OPS+ and adjusted runs/out. It turns out that if you figure relative ERP/Out for the ’95 Rockies players, the results are fairly consistent with the ERP/BsR translated OPS+. Thus, I am going to assume that those are the “best” adjusted OPS+ results, and that any simple park adjustment approach should hope to approximate them.

As a consequence, the square root adjustments to OBA and SLG look the best. Why is this? I’m not exactly sure; one might think that since OPS+ is a stand-in for relative runs/out, we should expect that the best adjustment approach once we already have unadjusted OPS+ is to divide by park factor. Yet we can get better results by adjusting each component individually by the square root of PF. OPS+ is far from a perfect approximation of relative runs/out, though, so it may not be that surprising that applying OPS+ logic to park factors is not quite optimal either.

Interestingly, the justification for the square root adjustment can be seen by looking at Runs Created in its OBA*SLG form. While OBA*SLG gives you an estimate of runs/at bat, not runs/out, it is of course related. If you take OBA/sqrt(PF)*SLG/sqrt(PF) you get OBA*SLG/(sqrt(PF)*sqrt(PF)) = OBA*SLG/PF

It is quite possible that there is a different power you could raise PF to that would provide a better match for our ERP-based OPS+ estimates, but getting any more in-depth would defeat the purpose of having a crude tool. In fact, I think that adjusting OPS+ by the Willie Davis method goes too far as well. Regardless, I would be remiss if I didn’t again emphasize that the 1995 Rockies are an extreme case, and so while the differences between the approaches may appear to be significant, they really aren’t 99% of the time.

Monday, July 11, 2016

The Good News Is...

The good news is that OSU baseball had its most successful season under coach Greg Beals. The Buckeyes qualified for the NCAA Tournament and won a Big Ten title of any kind (tournament or regular season) for the first time since 2009 and won the Big Ten Tournament for the first time since 2007. Six Buckeyes were drafted, the most since 1998.

That’s the good news, and it is legitimately good news. The bad news is they finished third in their NCAA regional, knocked out by an in-state foe with no particular history of being a quality baseball program (Wright State); the six drafted players are part of a mass exodus of talent that will leave the 2017 team with little returning production; and the relative success of the season should solidify Beals’ position.

I am not the type of fan that roots for my team to lose because I think that they would be better off if the coach was fired, or other similar reasoning (given the perverse incentives created by pro sports drafts, it may some times be in the team’s best interest to lose, but that never applies to college sports). So the last bit of bad news is worth it given a new trophy for the case. Nevertheless, it remains a cost associated with the season.

At first, it did not appear as if this would be the best season of Beals’ tenure. While OSU embarked on Big Ten play with a 12-6-1 record, it was not particularly impressive given the quality of competition. OSU took two out of three at home from Northwestern in the first Big Ten series, needing a come-from-behind walkoff homer to pull the series out on Sunday (Northwestern would finish twelfth in the Big Ten at 7-17). The Bucks were then swept at Maryland, won a home series against Rutgers and lost a series at Illinois to sit at 5-7 halfway through the Big Ten schedule.

Both road series featured memorable losses. In the second game against Maryland, John Havird took a no-hitter into the ninth, but was pulled with a 1-0 leading and 98 pitches after plunking the leadoff hitter. Maryland pushed across the run without the benefit of a hit, giving OSU a combined nine no-hit innings, but two hits lead to a Terrapin run in the tenth and a 2-1 loss. The opener against Illinois wound up as a 1-0 loss in fifteen innings on a wild pitch, in a game in which Illini starter (and future Baltimore first rounder) Cody Sedlock was allowed to make 132 pitches in 10 2/3 innings (one of Beals’ redeeming qualities as a coach is that he rarely pushes the envelope with his pitchers--Tanner Tully was done after 9 and 106 pitches).

The Buckeyes opened the second half with a sweep of hapless (2-22) Purdue, then won a home series from Iowa and swept Michigan at home to get to 13-8, technically still alive in the conference race heading into the finale at Minnesota (first place at 15-5). The Buckeyes won two out of three, finishing 1.5 games behind the Gophers and 1 behind Nebraska, tied for third with Indiana. At this point it could be noted that OSU had a favorable Big Ten schedule; only three of the thirteen teams finished under .500 in conference play, and OSU played all of them (Purdue, Northwestern, Rutgers), while not playing Nebraska or Indiana.

The Buckeyes opened the tournament with a win over Michigan, but melted down in the ninth inning against Iowa by coughing up a 4-0 lead before losing in ten. Their next contest with Michigan was delayed by weather, forcing the Buckeyes to beat both Michigan and Michigan State on Saturday to stay alive. On Sunday morning, the Buckeyes beat MSU again to claw back through the loser’s bracket, into the winner-take-all final against Iowa that afternoon. The Bucks led 7-4 going into the eighth, but Iowa scored three to tie it. This time, OSU would answer with a run in the ninth and hold on for an 8-7 win to cap a remarkable two days of baseball with a long overdue championship.

OSU was made a #2 seed of the NCAA Tournament, but was deemed one of the weaker #2s and placed in a bracket with #2 national seed Louisville, #3 seed Wright State, and #4 seed Western Michigan. In a recurring tournament theme of losing leads, the Buckeyes opened up 5-0 on the Raiders through three before allowing two in the fourth and four in the fifth. WSU’s 6-5 lead held until the Buckeyes rallied for two in the ninth and a 7-6 win. Alas, Louisville’s offense proved too potent, burying the Bucks 15-3, and WSU extracted revenge with a 7-3 win that ended OSU’s season at 44-20-1.

OSU lead the Big Ten (all games) in W% at .688, was a close third in EW% at .676 (Minnesota led with .685), and was second in PW% (.679 to Minnesota’s .728). OSU had a balanced team, finishing third with 5.83 R/G and fourth with 3.98 RA/G (although it should be noted that Bill Davis Stadium appears to be a solid pitcher’s park). OSU was not a strong fielding team, with a .951 mFA compared to a conference average of .946, and more importantly a below-average DER (.675 to .681 average).

OSU’s offense was power-driven, leading the conference with 57 homers and a .151 ISO and a slightly above-average .348 OBA. Beals rode junior catcher Jalen Washington hard; Washington appeared in 63 of 65 games and created 4.3 RG. Junior Jacob Bosiokovic bounced back from injuries as the primary first baseman, creating 5.9 RG on the strength of power (11 homers, .213 ISO) rather than on base skills (.335 OBA). He was drafted by Colorado in the 19th round. Troy Kuhn settled back in at second base, but had his worst offensive season of three as a starter as a senior (just 4.3 RG). L Grant Davis struggled mightily at the plate (.194/.257/.256 in 151 PA), requiring OSU to sacrifice defense for offense at the keystone.

Nick Sergakis was third on OSU with 13 homers and was second on the team with 8.8 RG and +26 RAA with a .332/.420/.542 line, enough to be a 23rd selection by the Mets (Sergakis played three years at OSU after transferring from this year’s national champions, Coastal Carolina). Shortstop Craig Nennig, who Beals’ refusal to pinch-hit for was a running frustration of mine, made that less of an issue by turning in the best offensive season of his career, although .256/.313/.366 on the strength of five longballs hardly erased any concerns about production.

The outfield was led by junior Ronnie Dawson, who while productive in his first two years really played up to his potential with a team-high 13 homer, 9.3 RG, +32 RAA, .331/.415/.611 season that netted him several All-America honors and a second-round selection by Houston. Classmate Troy Montgomery had a fine season in center, hitting .297/.420/.466 for 7.7 RG and +20 RAA and was drafted in the 8th round by the Angels. Right field was filled by a combination of Bosiokovic and sophomore Tre’ Gantt, who couldn’t match the production from his freshman season (.255/.311/.314 for 3.6 RG in 158 PA).

Freshman Brady Cherry started the year on fire at DH with five homers, but settled into a .218/.307/.411 performance over 143. Senior Ryan Leffel (.205/.283/.301 in 96 PA) took some of his playing time as the year went on. Zach Ratcliff hit better than them early (.268/.362/.341 in 52 PA), but an injury ultimately resulted in a medical redshirt.

The pitching staff was anchored by junior Tanner Tully, who continued to be the perfect image of a finesse lefty with a 3.18 RA, 3.31 eRA, 1.9 W/9, 6.5 K/9, +19 RAA over 108 innings season. Tully was picked by Cleveland in the 26th round. Senior John Havird made it two lefties at the top of the rotation, pitching solidly (4.50 RA and +2 RAA, but 3.91 eRA and 5.8 K/1.5 W in 94 innings). Sophomore Adam Niemeyer was the #3, with similar results to Havird (4.56 RA for +1 RAA, 4.37 eRA, 8.9 K/1.5 W in 71 innings). As you can see, the strength of OSU’s starters was avoiding free passes. Freshman Ryan Feltner was the most frequent mid-week starter (11 starts out of 20 appearances) and wasn’t bad for a freshman (4.72 RA for 0 RAA, 5.67 eRA, 8.0 K/3.9 W in 69 innings). Senior transfer lefty Dalton Mosbarger got a few mid-week, limited inning starts (5 of 13 appearances) and was effective (3.31 RA for +5 RAA, 4.11 eRA, 8.3 K/4.1 W in 33 innings, in addition to his 35 PA as a reserve outfielder). Sophomore transfer Austin Woody rounded out the starters (3 of 19 appearances, 6.92 RA for -10 RAA, 7.75 eRA, 6.7 K/4.2 W in 39 innings).

OSU’s bullpen was terrific, with four key relievers chipping in. Sophomore Yianni Pavlopoulos was closer out of the gate despite just nine career innings and a medical redshirt in 2015, and recorded 14 saves and solid performance numbers (3.03 RA, 3.72 eRA, 10.3 K/3.3 W in 30 innings). Sidearm sophomore Seth Kinker was the workhorse, making 38 appearances and pitching great (1.98 RA for +17 RAA, 3.27 eRA, 7.4 K/1.6 W over 55 innings). Senior Michael Horejsei’s LOOGY role expanded a little, but not too extensively, working 31 innings in 34 appearances with a 3.19 RA, and excellent 1.93 eRA and 11.3 K/2.6 W). His selection in the 21st round by the White Sox as a situational reliever capped a remarkable career arc for a pitcher who began his college career at a regional campus (OSU-Mansfield) and had just 26 career innings entering 2016. Finally, sophomore Kyle Michalik chipped in +9 RA with a 2.25 RA, 2.29 eRA, 5.1 K/1.4 W performance over 32 innings, but was definitely fourth on Beals’ bullpen pecking order. Other pitchers who saw action include junior Joe Stoll and freshman Conor Curlis, who will presumably compete for the LOOGY role next year.

While 2016 was a measured success, the outlook for 2017 is murkier. OSU had a total of 84 offensive RAA contributed by individuals--83 of those contributed by non-returning players (1B Bosikovic, 3B Sergakis, LF Dawson, CF Montgomery). Additionally, OSU must replace Kuhn at second, Nennig at short (who was a very good fielder, at least to this observer’s eyes), and Leffel at utility/DH. It will essentially be a new offense and it will lack any proven plus performers. The rotation will lose two of its top three, but profiles as a more likely strength with Niemeyer and Feltner as two obvious members. To the extent that college bullpens can be predicted, OSU is well-positioned for 2017.

However, my concerns regarding player development were only slightly allayed, and Beals’ tactics may have been toned down a little bit (even he could see that he had power and should probably give away fewer outs on the bases), but one season does not a program make. It’s a positive step, at least, which is more than can be said for most of the previous seasons of Beals’ tenure. And that’s the good news.

Thursday, July 07, 2016

Great Moments in Yahoo! Box Scores

Mike Trout, the center field player, made a point in the match by "stealing" 4th base.

Monday, June 20, 2016


I have a love-hate relationship with Cleveland.

I was born and raised in the exurbs. I left for college but eventually came back to the suburbs but only because I had to, not because that’s where I wanted to live. When it comes to sports, I always rooted for the Cleveland teams, but in all honesty have never been a diehard Cavs fan, that description being much more apt for how I follow the Indians and the Browns. Even in those cases it might not fully apply, since I pride myself on striving to be too rational about baseball to fall into the sheer emotion of fandom (post-childhood), and the Browns have been too bad for too long to not laugh at rather than lament the losses. My sheer sports fan emotions, assigning good v. evil to every game and opponent, living and dying with the team was transferred to my future alma mater around adolescence and will never be directed elsewhere.

Yet there's no question that my baseball team is the Indians, my football team is the Browns, and my basketball team is the Cavaliers. At times this has been embarrassing. Not due to the failure to win championships, but more due to the Cleveland fan culture. Cleveland fans have taken pride in their victimhood, with the heartbreaks (sometimes more real than imagined) a perverse source of pride. Where else does a 35 year distant divisional playoff game (i.e. two games removed from the championship) have a name that every young Browns fan learns ("Red Right 88")? Cleveland had some bad breaks, but more often than not they just had bad teams. Bad management, a little bad luck, and on the rare occasions when the teams had a chance to win it all, the dice rolls were not kind. But the only way one can reasonably expect to win championships is to put multiple championship-caliber teams on the field and let the chips fall where they may. Cleveland's three largely failed in that regard.

Of course, franchise ineptitude is largely not the fault of the fanbase, but there's an important distinction to be made between losing because you're bad and losing because fate didn't look kindly upon you on a given day. Cleveland fans too often conflated the two, resulting in a fatalistic feedback loop that took the former as evidence of the latter.

The other maddening element of the sports culture is the unique grip that the Browns have on the city. For all of the elation that the Cavs victory has brought, it will pale into comparison to the day the Browns win or even make a Super Bowl. The Browns still rule the landscape, and benefit from a remarkable double standard. The Indians, a franchise that has achieved more in any one of nine seasons in the last twenty-five than the Browns have in any, are struggling with attendance. There is an overwhelming cynicism towards the Indians, rooted in a lack of understanding of the economics and nature of baseball. Every trade of a free agent to be furthers the downward spiral of the relationship between the city and the team, even as those trades bring back the future objects of lament (e.g. as Bartolo Colon becomes Cliff Lee becomes Carlos Carrasco). This is not to absolve the Indians of their very real failures in drafting/international signings that only now appear to be reversing, but the Indians have run rings around the Browns and yet it is the former that I will be pleasantly surprised to see take the field in Cleveland rather than Montreal or Portland or San Antonio in, say, 2030.

Cleveland fans have also had the opportunity to root for a winner, but many have passed it up, and I cannot feel too sorry for them. In the last twenty years, OSU has won two national titles in football and been to three Final Fours in basketball. A college team might always belong to the students and alumni most dearly, but the surely the flagship state university is as much an available rooting interest as a private entity that can be moved to Baltimore at the owner's whim.

Sports irrationality aside, one thing I will say for Cleveland and Northeast Ohio is that there is a real pride in their hometown among people here. I'm not well-traveled enough to declare that this feeling is unique, but I can contrast it to my other hometown, Columbus. People in Columbus don't generally exhibit the same pride in their city that Clevelanders do in theirs. Columbus residents might be proud of OSU or proud of Ohio, but they aren't as proud of Columbus per se. Some of this may be due to sports teams; minus the recent (and so far unsuccessful) addition of the Blue Jackets, Columbus' sporting identity ties to OSU and thus more to the state than the city.

This is why it is so appropriate that Cleveland's title drought was ended about as single-handedly as one could be by one man, LeBron James. LeBron was a native son, and that meant something here. Everyone probably feels some sort of connection to LeBrown, however tenuous or forced. Mine is that LeBron and I are the same age. LeBron was a first-name basis celebrity by the time we were freshmen in high school. The night of the 2003 NBA Draft Lottery, I was in a cabin at a state park in Southeast Ohio on our mandatory "senior trip" watching as the Cavs came up with the #1 pick (and a parent chaperone insisted that they had to draft Carmelo Anthony).

LeBron was asked to shoulder the burden of the city himself, and unfair ask for anyone but especially for a rookie. And when he dragged a ragtag bunch to the team's first ever Finals appearance, he simultaneously hurt both his "legacy" in the ridiculous media environment when the Cavs were swept and obliterated any remaining thought of patiently building a worthy team around him. For the next three seasons the Cavs chased in vein, leaving him with the impossible choice of staying to try to drag this motley crew to the promised land, or going to chase titles with a group of stars.

Of course, the way “the Decision” went down was an extra gut punch, but while many Cleveland fans condemned LeBron, I’d like to think that I was fairly level-headed (this comment on The Book Blog is as intemperate towards LeBron as I got, and I still didn't lose sight of who the real villain in sports is):

The real villain in this whole thing, IMO, is ESPN. They cannot try to pass themselves off as a news outlet of any sort when they are willing to whore themselves out for an hour as a player’s personal press corps.

It has been apparent for years that ESPN wants to be part of the stories it reports on, but it has never been more plain to see then it was last night.

On another note, I do not support the childness of the Cleveland fans, but it is worth noting where they are coming from. Cleveland has not won a major sports title since 1964 despite fielding three teams (at least since 1970)--and that one isn’t even celebrated by anyone outside of Cleveland because of the NFL’s whitewash of its pre-Super Bowl history.

Yet here, nearly miraculously, was a local player who just so happened to be the best basketball prospect in anyone’s memory. He was not called the Chosen One for no reason. He was the one who was destined to finally break through the wall, to give Cleveland its championship. Twice the team has looked liked the NBA’s best team in the regular season, only to fold in the playoffs.

I’m not saying that those expectations and hopes were fair, that they all should have been thrown on LeBron’s shoulder. They plainly weren’t. Still, I may be a little biased, but I think that all things considered, this has to be one of the biggest kicks in the gut that an athlete leaving a team as a free agent has ever delivered. That doesn’t excuse Dan Gilbert or Cavs fans, but this is not just ARod leaving the Mariners.

The truth of the matter is that I was still a LeBron fan. The most endearing quality of LeBron in a sports sense to me was his support of OSU. He was on that bandwagon prior to the 2002 national championship (an article on his senior season of high school included an account of his enthusiastic celebration of this victory with his teammates), and he remained a friend of the program even after going to Miami, which he didn’t have to do. Sure, OSU is an important college asset of Nike, but Nike has contracts with a million other colleges, there was no need to keep up appearances.

It seemed like a crazy pipe dream in the spring of 2014, but was realistic by summer, and then remarkably came true. LeBron was coming back, to try to lead a new supporting cast, rebuilt largely through the fruits of the lottery picks that never would have come had he stayed (Kyrie Irving, Tristan Thompson, Dion Waters, Kevin Love via trade). For all of the fury that surrounded The Decision, had LeBron’s goal all along been to win a title in Cleveland, he couldn’t have done any better.

However, nothing is assured, and being Cleveland the natural insecurities were ratcheted up a few levels. Had the moment for LeBron passed, was he just far enough past his prime that he could not deliver? Was the supporting cast good enough (or in the case of Irving, healthy enough) to provide him support? Unexpectedly a new question emerged--would the suddenly dominant Warriors serve an insurmountable foil?

Hopefully the answers to those questions will reduce some of the irrationality of Cleveland sports observers (of course, the probability of the Cavs winning when down 3-1 was greater than the probability of a collective of sports fans becoming more rationally). Whatever small change to the city’s sports mindset might result, Cleveland’s overall inferiority complex is not going to change. On the very morning after the Cavs won the world championship, a new banner appeared on the side of a building bragging that the first traffic light was installed in Cleveland in 1914. This will certainly make the political hacks in town next month to do political hack-y things real impressed. Cleveland is a weird place.

And yet I can watch a ridiculously hokey, wildly overproduced local news commercial from 1995 and find it tugging at my inner childhood Indians partisan in a manner I can’t rationally describe. "Give me a reason for believing in Cleveland." Cleveland can surprise you.

Wednesday, May 25, 2016

Great Moments in Yahoo! PBP

Today Yahoo! unveiled a completely new design for their MLB page. Alas, the PBP now reads backwards and is as ill-equipped to deal with unusual plays as ever:

Wednesday, May 18, 2016

The Only Rule Is It Has to Work

Note: The following is a rare (for this blog) timely book review.

The premise of The Only Rule Is It Has to Work is that respected sabermetrically-inclined authors and podcasters Ben Lindbergh (Baseball Prospectus, Grantland, Five Thirty-Eight) and Sam Miller (Orange County Register, Baseball Prospectus) were given the opportunity to act as the baseball operations department throughout the 2015 campaign of the Sonoma Stompers, member of the four-team Pacific Association, a low-level indy circuit in northern California. Lindbergh and Miller were granted wide berth to put their mark on player acquisition, roster construction, and in-game strategy, and also attempt to bring modern data collection tools (PITCHf/x, video scouting, etc.) to the bush leagues.

Lindbergh and Miller are embedded deep within the team--in the front office, the clubhouse, the dugout, and even (for a moment at least) kangaroo court. Thus it serves as one of the most revealing examinations of daily life in baseball from an outsiders' perspective. Most books that have provided similar access to the inner workings of a team have been written by insiders, even if they might not fully fit into the world in which they have spent many years (think your Jim Boutons). While the life of an indy-league player is certainly less lavish than that of a big leaguer and perhaps less structured than that of an affiliated minor leaguer, it's hard to imagine that the basic human impulses of (largely) twentysomething, athletically-gifted ballplayers varies much between Sonoma and San Jose, San Jose and San Francisco. The authors are able to observe the scene with some combination of bemusement, paternal-ish concern, and comradery to give the audience a different perspective on the people who play the game. Certainly the majority of the audience members can better relate to the authors' stations in life and can now imagine how they might fit in (or not) if thrust into the life of a ballclub.

While it should hardly be necessary at this point for sabermetricians to defend themselves against scurrilous charges of not watching the games, one thing that the authors don’t reflect too closely upon but that is obvious to the reader is just how much low-level baseball they watch over the course of the summer, and just how devoted to their cause they are. Granted, Lindbergh and Miller are aided by a small network of volunteer scouts that earns the derisive nickname "The Corduroy Crew" around the league, but one or the other personally does advance scouting of nearly every game the Stompers' opponents play. This in addition to the hours spent researching potential players with their proverbial noses buried in a spreadsheet. While it would be wrong to hold Lindbergh and Miller's labors (which of course were performed with at least the secondary intent of providing fodder for a book) up as a pure representation of baseball love to be extrapolated to all of their sabermetric compatriots, it would be less wrong to do so than to brandish the common stereotype.

One of the disappointments of the project is that many of the radical ideas the authors dreamed about being able to test are never put into play. While shifts and flexible usage of the relief ace take hold in the second half of Sonoma's season, batting orders largely remain tethered to convention, starting pitchers still generally work in rotation, and the manager holds on to ultimate in-game command. While this may be disappointing to the reader longing for sabermetric red meat, the implications raise questions worth considering. Is it necessary for change in baseball tactics to come one easily digestible piece at a time? Why can a grizzled bench veteran and former pennant-winning manager of a major league team (Clint Hurdle) pivot to the approach his superiors' desire with more aplomb than a 37-year old pot-smoking player-manager who goes by Feh and dabbles in 9/11 conspiracy theories? Do the high stakes of the majors actually make them a more suitable laboratory for experimentation, as players and managers can count on their million dollar checks regardless of whether they may look unconventional on the field? While these questions can't be answered by the book, it provides some entertaining anecdotal evidence to consider.

Along the way, the Stompers inadvertently break ground in the social realm of baseball as well, as one of the authors' hand-picked college signees, relief ace Sean Conroy, comes out as the first openly gay player in professional baseball. The authors do an excellent job of relating this part of the story without falling into self-congratulations or allowing it to swamp the baseball portion of the narrative. Lesser authors with a less interesting baseball story to tell (and perhaps less respect for their subject) could have easily allowed Conroy's story (which includes being one of the Pacific Association's most valuable pitchers) to crowd out other aspects of the Stompers' season in the narrative, and could hardly have been blamed for it.

The authors alternate chapters, and if you are a regular listener (as I am) of their Effectively Wild podcast, you will likely be able to pick out which voice you are reading after a couple of pages even if you forget for a moment whether it is an odd or even chapter. Lindbergh's earnest verbosity and Miller's cheerful nihilism carry through to the written page in book format yet complement each other well, imbuing a diversity of style to the writing while still making you feel as if you are reading the same book.

As luck (or the residue of design) might have it, the story has a dramatic conclusion that I will not spoil here, except to say that I'm very glad the majors have resisted the allure of the half-season format, except for every ninety years when unusual circumstances take hold (if I live to see baseball in 2071 I promise to be grateful and not complain about it too much). Were it ever turned into a movie, the scriptwriter would even have something of a "pick your own adventure" opportunity to affect the outcome with only the proverbial flap of a butterfly's wing.

And maybe that's one of the lasting lessons to take away from The Only Rule Is It Has to Work. That despite the careful planning, the on-the-fly adjustments due to injuries or player poaching (at this level), the dedication of the players and support staff, the superstitious rituals, and the motivational speeches that are poured into baseball clubs, not to mention the attempts to drag baseball kicking and screaming into the sabermetric age, we will never be able to escape what seem from our imperfect perspective to be random rolls of the die.

Tuesday, May 10, 2016

LWR Component Deflators and Replacement Hitters’ Batting Lines

Last time, I explained how we could use Linear Weight Ratio, an offensive metric developed by Tango Tiger, as a shortcut in finding a variable which I call the “component deflator” and symbolized as “a”.

Last time I focused on its application to park adjustments, but that’s not necessary. In fact, the component deflator can be applied generally to any situation in which you’d like to know what across the board percentage change in mutually exclusive offensive events would you have to see in order to alter run scoring by some scalar (assuming you are willing to accept that the linear weight values stay constant, which is certainly an assumption that must be applied with care).

So there are any number of questions that this kind of approach can address. One that I will consider in this piece is “What would a replacement-level hitter’s batting line look like?" First a few caveats, though. As Tango has pointed out, there really is no such thing as a replacement-level hitter. A replacement player is a replacement player because his overall contribution, offense and defense, is at a level so as to have no marginal value. Thinking about it in terms of a replacement level hitter only confuses the issue.

However, the analytical structure of assuming that replacement level players are average in the field, and thus calculating their value as their offensive contribution above a “replacement” performer specific to their position plus their defensive contribution above an average performer at their position can be a useful approximation. It is the structure used by a number of approaches, including Pete Palmer’s TPR (which is above average, but the same principle holds), Keith Woolner’s VORP, and the RAR figures I post here at the end of each season. I am not claiming that this approach is optimal or superior to the others, only that if applied with caution it can be a useful model of player value.

For the sake of this post, let’s just assume that we are going to use a model where a replacement level player hits at some percentage of the league average. Then, if we’d like to know what his batting line might look like, we can use the component deflator approach. The good thing is that we don’t have to worry so much about the fact that we have static linear weights, since we are now applying the process to individuals for whom we’d like to hold the weights constant (ignoring the Theoretical Team arguments). So that caveat is loosened in this application.

Of course, this approach carries some of its own caveats with it: one is that we are again developing a model in which all events are equally deflated. It might actually be that replacement level hitters tend to not be as deficient in BA as one might expect. Or maybe teams are willing to trade BA for power in a replacement level hitter. This is a specific model with specific assumptions, and it is not necessarily reality.

Anyway, if we define R as the percentage of league average (or positional average or anything else if you’d like), then we can just plug it into one of the formulas from last time, and carry out the rest of the calculations as explained in that post:

New LWR = ((LWR/s' + x)*R - x)*s'

In my RAR estimates, I assume that a replacement player’s R/O is 73% of the positional average, where the positional average is figured by taking the overall league average times a long-term offensive positional adjustment. The positional adjustments I use are (note: you can tell how long ago I wrote this by the use of 2008 league totals):

C = .89, 1B/DH = 1.19, 2B = .93, 3B = 1.01, SS = .86, LF/RF = 1.12, CF = 1.02

Combining these adjustments, the LWR component deflator procedure, and the overall 2008 MLB offensive averages, here is the offensive output expected from a replacement player at each position:

How do these numbers look to you? My impression is that the batting averages are too low; teams may resort to replacement level players at 73% of league R/O, but they may be those that trade secondary skills for BA points of equivalent value. (assuming that players of this profile even exist in reality)

Anyway, you don’t have to take any of this too seriously, and I’ve already stated that the assumptions and admitted they may not model reality, so I’m not going to spend too much time justifying the results. Instead, I have another potentially amusing if not completely realistic application.

Namely, it is to take the initial statistics of a real hitter, and maintaining the proportional relationships between his positive events, projecting what his line would look like at a different level of productivity. For example, what would a replacement level hitter with Barry Bonds’ bizarre 2004 proportional relationships look like?

In this case, I’ll assume that a replacement player would have a 3.50 RG. Bonds’ 2004 line comes out to a 18.26 RG, so our “R” will be 3.5/18.26 = 19.2%. This line results:

The bizarro Bonds would hit just .140, but would still manage to put up a .416 secondary average. Of course, such a player would never really exist, but if he did, his offensive value would be about the same as the other replacement level guys above.

Let’s look at Tony Gwynn, 1994 to see what this would look like for a very good singles-type hitter:

And we could try going the other way. What would Mario Mendoza, superstar look like? Here’s the transformation from Mendoza’ career line to a 8 RG:

In order to turn Mendoza’s no-secondary skills profile into an all-time upper echelon great, you have to allow him to hit .400, and increase all of his positive rates by 81%.

This translation approach falls squarely under the category of "toy"; please don’t get the impression that I’m elevating it to any greater pedestal.