Wednesday, July 12, 2017

Enby Distribution, pt. 3: Enby Distribution Calculator

At this point, I want to re-explain how to use the Enby distribution, step-by-step. While I already did this in part 6 of the original series, I now have the new variance estimator as found by Alan Jordan to plug in, and so to avoid any confusion and to make this is easy if anyone ever wants to implement it themselves, I will recount it all in one location. I will also re-introduce a spreadsheet that you can use to estimate the probability of scoring X runs based on the Enby distribution.

Step 1: Estimate the variance of runs scored per game (VAR) as a function of mean runs/game (RG):

VAR = RG^2/9 + (2/c - 1)*RG
where c is the control value from the Tango Distribution. For normal applications, we’ll assume that c = .767.

Step 2: Use the mean and variance to estimate the parameters (r and B) of the negative binomial distribution:

r = RG^2/(VAR - RG)
B = VAR/RG - 1

B will be retained as a parameter for the Enby distribution.

Step 3: Find the probability of zero runs scored as estimated by the negative binomial distribution (we’ll call this value a):

a = (1 + B)^(-r)

Step 4: Use the Tango Distribution to estimate the probability of being shutout. This will become the Enby distribution parameter z:

z =(RI/(RI + c*RI^2))^9
where RI is runs/inning, which we’ll estimate as RG/9.

Step 5: Use trial and error to estimate a new value of r given the modified value at zero. B and z will stay constant, but r must be chosen so as to ensure that the correct mean RG is returned by the Enby distribution. Use the following formula to estimate the probability of k runs scored per game using the non-modified negative binomial distribution:

q(0) = a
q(k) = (r)(r + 1)(r + 2)(r + 3)…(r + k - 1)*B^k/(k!*(1 + B)^(r + k)) for k >=1

Then modify by taking:

p(0) = z
p(k) = (1 - z)*q(k)/(1 - a)for k >=1

The mean is calculated as:

mean = sum (from k = 1 to infinity) of (k*p(k)) = p(1) + 2*p(2) + 3*p(3) + ...

Now you have the parameters r, B, and z and the probability of scoring k runs in a game.

I previously published a spreadsheet that provided the approximate Enby distribution parameters at each .05 increment of RG between 3 and 7. The link below will take you to an updated version of this calculator. It is updated in two ways: first, the Tango Distribution estimate of variance developed by Alan Jordan is used as in the example above. Secondly, I have added lines for RG levels between 0-3 and 7-15 RG (at intervals of .25). Previously, you could enter in any value between 3-7 RG and the calculator would round it to nearest .05; now I’m going to make you enter a legitimate value yourself or accept whatever vlookup() gives you.

P(x) is the probability of scoring x runs in a game, P(<= x) is the probability of scoring that many or fewer, and P(> x) is the probability of scoring more than x runs.

Enby Calculator