Tuesday, May 06, 2014

1882-1883 Introduction

In my long-neglected effort to apply sabermetric measures to the nineteenth century game while also providing a perfunctory history, I have chosen to treat the seasons of 1882 and 1883 together. In fact, the statistical methods that I have used here are for the most part the same as those that I used for 1876-1881. It is not until 1884 that I felt the need to delineate a new era, so to speak (and of course subject to disagreement).

However, I split these two seasons off on the historical level because for the first time, the National League has a competitor engaging it on similar terms. What I mean by this is that there had previously been independent professional clubs, some of which were similar in quality to some of the NL teams. However, until the American Association began play in 1882, there had not been another league that behaved in the way which we would expect a major league to today--setting a schedule centrally (or at least proscribing the number of games that should be played between each club), setting uniform playing rules, restricting membership, attempting to improve the position of the league as a whole rather than solely worrying about the affairs of one club, etc.

In 1884, the Union Association will emerge as another challenger (at least in the dreams of its backers), and I think that is another good place to draw a line. I also found that run estimation in the mid-to-late 1880s is trickier than in the 1876-1883--not any less accurate, in most cases, but only after applying different formulas to smaller groups of seasons.

So, the methodology for the 1882 and 1883 recaps will be much the same as for 1876-1881; see the links on the right side of the page for a refresher. The Base Runs method being used is:

A = H + W - HR + E + .08SH
B = (.726S + 1.948D + 3.134T + 1.694HR + .052W + .799E + .727SH + 1.165WP + 1.174PB - .05(AB - H - E))*1.087
C = AB - H - E + .92SH
D = HR

As I mentioned in the earlier piece, the inclusion of SH was an embarrassing mistake on my part, but it all washes out since the SH are just an estimate based on singles, walks, and estimated errors--it just shifts around the values of those a bit, in a way that made the formula slightly more accurate. But there really were no sacrifice hits at this stage of the game's development, and I knew that, and I flat out forgot it when working on the run estimator. Embarrassing.

I would also be remiss if I did not point out that the formula does not work as well for the AA teams as it does for the NL teams, so all of the results for the AA in 1882-83 should be taken with an extra grain of salt. Again, I cannot stress enough that all of the metrics should be looked at with a much more jaundiced eye than similar figures for recent times.

The formulas used for the NL are the same, except I used a custom error estimate each year. E = x(AB - H - K), and the value for x is .1277 in 1882 and .1409 in 1883.

The BsR formula generates these linear weights for the seasons. The first set is presented as S, D, T, HR, W, E, AB-H-E, SH, PB, WP. The second set is the one that is actually applied to players, where the coefficients for the categories we don’t have (SH, PB, and WP--we also don’t have errors of course, but have estimated those) are “folded” into the weights for singles, walks, outs, etc. So those weights are displayed as S, D, T, HR, W, E, AB-H-E:

1882: .549, .836, 1.114, 1.397, .391, .566, -.143, .080, .275, .273
1882: .583, .867, 1.145, 1.397, .425, .600, -.143
1883: .561, .845, 1.120, 1.393, .405, .578, -.150, .073, .273, .270
1883: .594, .876, 1.151, 1.393, .438, .611, -.150

For the American Association, I have used different estimates for WP and PB (but the same estimate for SH):
WP = .0449*(H + W - HR + E)
PB = .0836*(H + W - HR + E)

The AA did not track batter strikeouts, so errors must be estimated as a proportion of (AB-H). This proportion is .1323 for 1882 and .1269 for 1883.

And the linear weights for the AA:
1882: .549, .846, 1.134, 1.411, .386, .587, -.145, .084, .285, .283
1882: .587, .881, 1.169, 1.411, .423, .605, -.145
1883: .558, .851, 1.136, 1.407, .396, .575, -.149, .079, .282, .280
1883: .595, .886, 1.171, 1.407, .433, .612, -.149

For teams, I have decided to stop using PW% and producing an estimate for Runs Created Allowed. Those estimates weren’t particularly accurate, anyway, and it’s a hassle to come up with different formulas each season for estimating all of the missing data from the defensive perspective--opponent at bats, doubles, etc. If I felt that those categories were adding something substantial, I would go through the effort.

I have also marked "rookies" in pink--I used 100 PA or 50 IP in the majors (NA, NL, AA for this time period) as the cutoff points.

In the next installment I will begin the yearly review with the 1882 NL. The details of the NL/AA relationship will mostly be saved for the AA portion, but they may be hinted at in the NL installment.

2 comments:

  1. With the caveats noted, how do your weights compare to here:

    http://tangotiger.net/markov.html

    And maybe what "advancement" numbers would make the most sense to make sure it all adds up?

    ReplyDelete
  2. Using the 1882 NL with your baserunning defaults, the calculator has:

    .425, .728, 1.031, 1.428, .312, -.084 (-.196 for the average out value)

    I didn't mess around a lot with the advancement (when I go back I lose my previous inputs). I'd be very surprised if a triple was "actually" worth 1.15 runs in this environment; obviously the approach I took here is not theoretically pure by any means.

    ReplyDelete

I reserve the right to reject any comment for any reason.