Walk Like a Sabermetrician: January 2010

Wednesday, January 27, 2010

Run Distribution and W%, 2009

There are a lot of ways to look at team performance based on margin of victory, runs scored in a game, runs allowed in a game, and a like. In this post I look at a few. It is by no means intended to be a comprehensive examination. The data is from Baseball Prospectus, where you can download it very easily and import it to a spreadsheet.

First, let's look at team performance broken down by run differential for the game. I break games into three categories: one-run games, blowout games (margin of five or more), and other games, which for lack of a better team I'll call middle games. You can certainly quibble with the definition of a blowout, but I like the cutoff at five run because it results in the frequency of one-run games and blowouts being approximately equal.

It is necessary to note (and, embarrassingly, I didn't last year) that any run distribution based on games is subject to the bottom of the ninth/extra inning problem--home team scoring is capped in those innings, or home teams don't bat in those innings at all. The analysis that follows assumes that this impacts all teams equally, but that is not necessarily the case. So keep that thought in the back of your mind if you choose to read what follows.

First up are one-run games. There were 656 one-run games in the majors in 2009, out of 2430 total games (27%). Seattle had the highest frequency (34%) and Pittsburgh the lowest (21%). The table below is sorted by the difference between W% in one-run game and W% in all other games:

The eight playoff participants (who I'll use as a stand-in for quality teams in this piece, although it is somewhat circular) had a combined .556 one-run record and a .590 record in other games. These teams played in one-run games 27% of the time, the same as the league average. Six of the eight playoff teams had lower W%s in one-run games than other games (the Angels and Twins were the exceptions, and .016 was the largest difference). It is of course well-known that one-run records are pulled towards .500 for teams of at all observed W% levels.

Again, I've defined blowout games as those determined by five or more runs. There were 686 such games (28%). Kansas City played in the most (36%) and the Mets the least (23%):

Once again, playoff teams played in blowouts with about the same frequency as other clubs (29%). Five of eight had better records in blowouts than other games, with the Yankees (-.016) displaying the largest dropoff. The playoff teams had a composite .619 record in blowouts versus .565 in other games.

You will notice that the Padres had the largest absolute difference by far, going just 10-32 in blowouts but winning a very respectable 54.2% of non-blowouts.

That leaves the "middle" games--and if you have a catchy idea about what to call them, I'd love to hear it--games determined by 2-4 runs. There were 1088 such games, representing 45% of all big league games. Pittsburgh played in the most (51%) and Kansas City the least (38%):

44% of playoff participants' games were middle games, and they recorded a .570 W% compared to .589 in other games. Two of eight (Yankees and Phillies) had better records in middle games, while Colorado was just about even.

Switching gears, here are the frequencies with which teams scored X runs in a game, along with their W% in those games. I'd run a runs allowed frequency list, but of course it will be the same with the exception of complementary W%s:

The mode is three runs (14.2% of games). The "marg" column shows the marginal increase in observed W% for each additional run scored (I cut it off at ten as the frequencies are not very high). You can see that the most valuable run was the fourth, with a jump from a .337 W% to .523. When teams scored three or fewer runs (which occurred 41.9% of the time), they were 393-1644 (.193); when scoring four or more (58.1% frequency), they were 2037-786 (.722).

Last year, I applied a Jamesian concept of Offensive W% based on team run distribution, and will do so again. However, I will not walk through all of the math in detail, and instead will point you to last year's post if you are interested.

The proposed James approach, which I labeled gOW% for "game" Offensive W%, uses the actual W%s for each X runs scored. Of course, one could attempt to model a theoretical W% for each X rather than using the empirical 2009 data. The former would be preferable, but I don't intend this to be taken as a deadly serious exercise so I will stick with the empirical 2009 data, which is subject to the whims of sample size (we are also ignoring the difference in scoring level between the two leagues). I have also lumped all games with ten or more runs scored together at a W% of .955.

Just to make this clear, if a team was shutout 20% of the time, scored one run 30% of the time, and scored two runs 50% of the time (obviously a ridiculous scenario), their gOW% for 2009 would be:

.20(0) + .30(.075) + .50(.208) = .127

I have also figured a standard OW%, holding the defense's RA/G at the league average of 4.61 R/G and using Pythagenpat (there are no park factors applied to any of the results in this post). The traditional OW% and gOW% are usually quite close, as we should expect; when they differ, that means that the run distribution of the team diverged from the expected distribution. Positive differences of gOW% minus OW% indicate teams that bunched their scoring more efficiently than Pyth expected; negative differences indicate an inefficient distribution (keeping the caveat about the ninth inning from the beginning of the post in mind). So, here are a list of teams whose gOW% and OW% differ by more than two games (prorated to 162):

Positive difference: SEA, NYN, SD, CIN
Negative difference: TB

The four teams with higher than expected gOW%s were all bad offenses; none had a gOW% higher than .467. Only one team had a two game negative difference, but the Rays still had a good offense (.521 gOW%), and the next seven teams on the list were all above .500 in that category as well.

We can of course turn this procedure around and use it to calculate gDW%, and standard DW%. The teams with differences of more than two games/162 between the two were:

Positive difference: NYA, KC
Negative difference: LA, PHI, TOR

Finally, we can combine gOW% and gDW% to get what I call gEW%, and compare that to either actual W% or EW% figured from Pythagenpat. The details of the gEW% calculation are given in last year's linked post.

It is not particularly interesting to compare gEW% to actual W%, as most of the biggest differences will occur when teams wins were out of line with expectations based on runs and runs allowed--whether you consider runs in the aggregate (standard EW%) or on the game level (gEW%). Instead, I will list the teams with differences of two games or more per 162 games between gEW% and EW%:

Positive difference: NYA, KC, CIN, HOU, NYN, SEA, SD
Negative difference: LA, PHI, TB, TOR, ATL

Here are the six discussed W% estimates for all teams, sorted by gEW%:

Monday, January 11, 2010

Hitting by Position, 2009

Offensive performance by position (and the closely related topic of positional adjustments) has always interested me, and so each year I like to examine the most recent season's totals. I believe that offensive positional averages can be an important tool for approximating the defensive value of each position, but they are certainly not a magic bullet and need to include more than one year of data if they are to be utilized in that capacity. So the discussion that follows is not rigorous and focuses on 2009 only.

The first obvious thing to look at is the positional totals for 2009, with the data coming from Baseball-Reference.com. "MLB” is the overall total for MLB, which is not the same as the sum of all the positions here, as pinch-hitters and runners are not included in those. “POS” is the MLB totals minus the pitcher totals, yielding the composite performance by non-pitchers. “PADJ” is the position adjustment, which is the position RG divided by the position (non-pitcher) average. “LPADJ” is the long-term positional adjustment that I used, based on 1992-2001 data. The rows “79” and “3D” are the combined corner outfield and 1B/DH totals, respectively:

I will refrain from commenting, since any drawing any conclusions based on one year of data would be inappropriate.

Next, let's look at NL pitchers' hitting by team. I'm not even going to snark about this, at least not this year. The teams are ranked by RAA above an average pitching staff, which as you can see in the chart above created .28 runs per game. RG is based on just the basic offensive stats plus SB and CS, so sacrifices, which I don't have to tell you are quite frequent for pitchers, play no part in the estimate:

For the second consecutive season, Cubs pitchers were the most productive, but they did drop from +18 to +9. St. Louis, second last year at +8, dropped to +5 but still finished a respectable third. As a group, major league pitchers created runs at just 6% of the overall average, which is about as low as it ever has been.

For the second consecutive season, Toronto pitchers failed to reach base (by hit or walk) in 20 PA, bringing their two year total to 36 PA without reaching. Assuming that they had a true OBA talent of just .150, the probability of this is 1 in 347. Cleveland, Texas, and Minnesota pitchers all failed to collect a hit in their limited chances, but Toronto stood alone in failing to reach base.

Now, let's take a look at the teams with the highest and lowest RAA at each position. RAA is figured using the 2009 positional averages only, without distinguishing between leagues, which means that AL and NL teams will not sum to zero. This is a little distracting but doesn't do much to change the rank order as the opportunities for each team position are relatively equal. Left and right fielders are considered together to figure their baseline (but not first baseman and DHs). The figures have been park-adjusted.

I will not bother running a chart for highest RAA, as it is not exactly surprising that Minnesota catchers or St. Louis first baseman were standouts. It is the trailing teams that are more interesting to investigate. A simple list of the leading teams will suffice:

C--MIN, 1B--STL, 2B--PHI, 3B--TB, SS--FLA, LF--LA, CF--NYN, RF--PHI, DH--NYA

It is interesting that Dodger left fielders and Met center fielders managed to lead the way despite missing their flagship players for significant portions of the season. The Dodgers had Manny (944 OPS in 418 PA) and Juan Pierre (778 in 308) in left almost exclusively (only 4 PA went to other hitters), and it is obvious that their first place showing was mostly Manny's doing. Carlos Beltran was the main force behind the Met performance (930 OPS in 342 PA), but Angel Pagan (818 in 264) helped the cause as well, and 109 PA came from four other players.

Now, the worst performances, along with the player who led the team in games played at the position:

AL teams do not fare well, but it's more of a coincidence than a systematic bias--AL position players (not including pinch-hitters) combined to hit .268/.333/.430 while their NL counterparts hit .267/.336/.425, essentially the same level of production. It just so happened that Kansas City and Seattle punted on getting offensive out of two positions each.

Let me conclude by looking at the RAA for each position, with negative performances in red and those +/- 20 runs bolded (an arbitrary cutoff, to be sure). The teams are grouped by division and sorted by the sum of RAA across all listed positions:

The Phillies led the NL with six above average positions. Strangely, first base, manned by perennial MVP candidate Ryan Howard, was only their fourth-most productive spot. At least the Mets can't blame their star players for team shortcomings this year, although I'm sure someone will try. As you already knew, Washington's offense wasn't really that bad, but in this division that leaves them at the bottom.

Milwaukee picked up the Mets' banner for a star-driven offense this year, leading the NL with three +20 RAA positions, although second base has to be considered a surprise. They led the NL in infield and corner (1B, 3B, LF, and RF) RAA. On the other hand, the Cards were operating on the one star plan. The Pirates had just one above average position, tied for the fewest in the NL, and were last in the NL in infield and corner RAA, but still beat out the Reds, the only NL team with four -20 positions and the owners of the worst outfield production in the circuit. It's a good thing they got rid of Adam Dunn (*) and signed Willy Taveras, obviously.

(*) I'm not making any statement on Dunn's fielding or his contract, just the amazing ability of some Reds fans to scapegoat him for everything, including his offense.

Los Angeles boasted baseball's most productive outfield, but was just +2 total at the other positions. For the second straight season, the Padres got surprising production out of center field; last year it was Jody Gerut and Scott Hairston who manned the position. This year, Hairston was great in 146 PA and 378 PA from Tony Gwynn Jr. were good enough. Arizona and San Francisco both have offenses filled with black holes, but the DBacks had five above average performers while only the Giants third baseman (read: Sandoval) were above average.

Another entry in the obvious department: the Yankee offense was good. They led the AL in above average positions (8), infield RAA, and corner RAA. All four infield positions were +20 or better, and their center fielders just missed breaking even. Boston had the AL's top outfield RAA, with solid performances everywhere except shortstop.

Minnesota did their best to balance out Joe Mauer's amazing season with a disastrous second base performance, while Cleveland joined the Angels and Pirates as the only teams in baseball with no positions +/- 20 runs. The Pirates were bad overall, the Indians average, and the Angels good. Detroit and Kansas City were the only teams in the AL with just two positive positions, but Chicago wasn't far behind as none of their four above average positions were better than +4. The Royals had four -20 positions and the worst outfield RAA in the majors.

I didn't make any attempt to measure it formally, but the Angles probably had the most balanced positional offense in baseball, with all positions in the -6 to +19 range. Ranger first baseman had the lowest RAA of any position. Oakland had the majors' lowest infield and corner RAA, largely driven by their first baseman, who were almost as bad as those of the Rangers. Seattle was worse overall, despite a good performance from Ichiro in right, as they had three black holes.

Yuniesky Betancourt deserves a special mention, as he helped lead the Royals to the lowest RAA at short in the majors and the Mariners to second worst. He didn't do it all alone, however; shockingly, in both stints he had a higher OPS at short than his teammates did. Taking 39% of the Mariners SS PA, he had a 609 OPS vs. 587 for his teammates, and in 43% of the Royals SS PA he OPSed 642 to his teammates 517. And that's just sad.

Here is a link to a spreadsheet with each team's performance by position.

Monday, January 04, 2010

Hitting by Lineup Slot, 2009

This piece has next to no analysis--it is mostly a presentation of data that you could easily get elsewhere. But since I devote an entire post each year to the most productive leadoff slots in the majors, I've decided to also make one post dealing with the other eight lineup positions. You wouldn't want the #7 hitters to feel neglected, now would you?

First, let's take a look at the average production out of each lineup slot, broken down by league. All data has been culled from Baseball-Reference. An important technical note upfront: I have chosen to use the standard ERP formula to estimate runs created for all lineup positions. In reality, of course, the appropriate linear weight values vary by context, one of which is most certainly batting order position. Folks like Tango Tiger have done a great deal of research on LW by batting order, and while the differences in event coefficients are not monumental, it would be more accurate and more interesting to use them. For this post, though, the weights are the same for each position:

NL #3 hitters blow every other position out of the water, boasting stars like Pujols, Utley, Ramirez, Braun, Gonzalez, etc. amongst their ranks. Not surprising is the fact that NL #9 hitters are the least productive, and that amongst spots manned (nearly, thanks to Tony LaRussa) exclusively by real hitters, the #8 batters from both leaguers bring up the rear.

Making conclusions from one year of data of this type is dangerous, but I found it interesting that #2 hitters in both leagues out-produced the league average. Historically, #2 hitters have often been below average in OPS+, although I should point out that the figures here do include runs produced through stolen base attempts. The top producing AL lineup slot was the cleanup hitters, although the #3 and #5 spots were essentially as productive.

Let's next take a look at the top ranking team (in RG) for each lineup slot. The player listed is the most common batter in that spot, by games started:

First, I should acknowledge that the most common batter can be misleading, as in the case of Tampa's #6 hitters. Pat Burrell did appear in the most games in that spot (47), but his OPS was just 764. The team was actually propelled to a strong showing by the performances of Ben Zobrist (26 G, 1147 OPS), Willy Aybar (28, 999) and Gabe Gross (15, 919), as well as productive eight game stints by Evan Longoria, Carlos Pena, and Jason Bartlett.

The Yankees were often cited as having a "deep" lineup, and being the most productive at three spots backs up those claims.

And the trailers:

How bad were the Kansas City cleanup hitters? So bad that only two slots (excluding NL #9s) produced lower RGs: Seattle #7 and Detroit #9. They were last in BA, second last in OBA, and only Seattle #7, San Diego #8, and their teammates who batted ninth compiled a lower SLG.

The Seattle #7s were an embarrassment in their own right--they were a point behind Colorado's #9 hitters in OBA (although park adjustments would mercifully rescue them if applied).

I also figured runs above average versus the league average for the lineup slot (AL and NL separate, no park adjustments). These figures are available in the spreadsheet at the end of the post. I need to emphasize that they compare to the league average for 2009 only, and so they are subject to the players actually batting in particular spots. That's a clumsy way of saying don't take them too seriously. AL #3 and #4 hitters each created 5.5 runs per game, but the NL breakdown was 6.4/5.9. If a given NL team's third hitters and cleanup hitters created 6.2 runs, then the #3s would be ranked below average and the #4s above average. But does the fact that Pujols, Ramirez, Utley, and others bat third rather than fourth really change the value of another team's #3 hitters? No.

Anyway, caveats aside, here were the ten highest RAA figures from individual slots:

Don't worry: David Wright was more responsible for the Mets' #5 hitters than Jeff Francoeur, but Francoeur appeared in more games.

The bottom ten:

Boston and Colorado had the most above-average lineup slots, with eight (Red Sox leadoff and Rockies #7 were the below-average performers). Detroit (#4 the exception), Oakland (#9), Pittsburgh (#1), and San Diego (#9) all had eight below-average slots.

There are a lot of other ways you could look at this data; I'll leave you to it if you want, as I've run out of interesting things to say:

http://spreadsheets.google.com/pub?key=tUplx96l-xDtgKWCJmdXXqw&output=html