Negative Binomial Distributions

The random variable $X$ can also represent the number of trials needed to obtain $k$ successes, where $k$ is typically greater than one. (If we had $k=1$, we could use the geometric distribution.) In this situation, the number of trials is (once again) not fixed. But if the trials are still independent, only two outcomes are available for each trial, and the probability of a success is still constant, then the random variable will have a negative binomial distribution.

The Formulas

In a negative binomial distribution, if $p$ is the probability of a success, and $x$ is the number of trials to obtain $k$ successes, then the following formulas apply.

\begin{align} P(x) &= \left( {}_{x-1} C_{k-1} \right) p^k (1-p)^{x-k} \\ M(t) &= p^k \left[ e^{-t} - 1 + p \right]^{-k} \\ E(X) &= \dfrac{k}{p} \\ Var(X) &= \dfrac{k(1-p)}{p^2} \end{align}

Repeatedly Rolling a Die

What is the probability that the third four will first appear on the tenth roll? How many rolls should we expect to need to obtain three fours, and what is the standard deviation for the number of rolls?

Since we are interested in "fours", then a success is a four. There are two outcomes on each die, namely "fours" and "not fours". The probability of a success is $p=\dfrac16$, and is constant. The trials are independent, and we are interested in the number of rolls until we have 3 successes. Therefore all of the conditions for using the negative binomial distribution have been met.

To determine the probability that ten rolls will be needed to obtain three fours, we use $x=10$. This gives $P(X=10) = \left( {}_9 C_2 \right) \left(\dfrac16\right)^3 \left(\dfrac56\right)^7 = \dfrac{2812500}{60466176} \approx 0.0465$. The expected value is $E(X) = \dfrac{3}{1/6} = 18$ rolls, and the standard deviation is $\sigma = \sqrt{ \dfrac {3(5/6)}{(1/6)^2}} = \sqrt{90} \approx 9.49$ rolls.

Sampling from a Very Large Population

Approximately 44% of all Americans have blood type O. What is the probability that exactly 12 people need to be sampled in order to find four who have blood type O? How many people should we expect to need to sample to find four having that blood type, and what is the standard deviation?

In this problem, a success is an individual with blood type O, and the other outcome are those who do not have blood type O. The probability of a success is $p=0.44$. We will actually be sampling without replacement, but since the population of the USA is millions of times greater than the size of the sample, we can assume the probabilities are essentially constant. As long as we randomly sample from the entire population, and not a small group that makes it likely that we would choose relatives, we can assume the trials would be independent. Also, we are interested in the number of samples needed to produce four successes. Therefore, the conditions for using the negative binomial distribution have been basically met (due to the large size of the population compared to the sample).

If we need 12 people before we find 4 with blood type O, we want $x=12$ and $k=4$. We then have $P(X=12) = \left({}_{11} C_3 \right) (0.44)^4 (0.56)^8 \approx 0.0598$. The expected value is $E(X) = \dfrac{4}{0.44} = 9.09$ people to find four with blood type O, and the standard deviation is $\sigma = \sqrt{ \dfrac{4(0.56)}{0.44^2}} \approx 11.57$ people to sample.

Derivation of the Formulas

In the following derivation, we will make use of the Taylor series of a function from calculus, namely $f(x) = \sum\limits_{k=0}^{\infty} \dfrac{f^{(k)}(a)}{k!} (x-a)^k$, when expanded about the value $x=a$. For the function $f(x)=(1-x)^{-n}$ about the value $x=0$, we can determine that the Taylor series expansion will be

\begin{equation} (1-x)^{-n} = 1 + nx + \dfrac{n(n+1)}{2!} x^2 + \dfrac{n(n+1)(n+2)}{3!} x^3 + \cdots \end{equation}

Now imagine a scenario where $x$ trials are needed to obtain $k$ successes. We shall identify the successes by S, and the failures by F. The last of the $x$ trials must be a success, otherwise fewer than $x$ trials would have been needed. Therefore, we have as one possible arrangement (SS...SFF...F)S, where the $(k-1)$ successes and $(x-k)$ failures inside the parentheses can be arranged in any order. The number of different ways to make this arrangement is ${}_{x-1} C_{k-1}$. Each of the $k$ successes has probability $p$, so the probability of the successes is $p^k$. Each failure has probability $(1-p)$, and there are $(x-k)$ failures, so the probability of the failures is $(1-p)^{x-k}$. The probability that $x$ trials are needed for $k$ successes is therefore the product of these factors. Therefore, we have $P(x)= \left( {}_{x-1} C_{k-1}\right) p^k (1-p)^{x-k}$.

The formula for the moment generating function $M(t)$ arises from the evaluation of $E(e^{tX})$.

\begin{align} M(t) &= E(e^{tX}) = \sum\limits_{x=k}^{\infty} e^{tx} \left( {}_{x-1} C_{k-1} \right) p^k (1-p)^{x-k} \\ &= \left(\dfrac{p}{1-p} \right)^k \sum\limits_{x=k}^{\infty} \left( {}_{x-1} C_{k-1} \right) \left(e^t (1-p) \right)^x \\ &= \left(\dfrac{p}{1-p} \right)^k \left[ \left(e^t (1-p) \right)^k + k \left(e^t (1-p) \right)^{k+1} + \dfrac{(k+1)k}{2} \left(e^t (1-p) \right)^{k+2} + \cdots \right] \\ &= \left( pe^t \right)^k \left[ 1 + k \left(e^t (1-p) \right)^1 + \dfrac{(k+1)k}{2} \left(e^t (1-p) \right)^2 + \cdots \right] \end{align}

Now we apply the Taylor series expansion of $(1-x)^{-n}$ that we developed above, using $x = e^t (1-p)$ and $n = k$. We obtain

\begin{equation} M(t) = \left( pe^t \right)^k \left[ 1 - (1-p)e^t \right]^{-k} = p^k \left[ e^{-t} - 1 + p \right]^{-k} \end{equation}

We can obtain the expected value from the first derivative of $M(t)$.

\begin{align} M'(t) &= p^k (-k) \left[ e^{-t} - 1 + p \right]^{-k-1} (-e^{-t}) \\ &= k p^k \left[ e^{-t} - 1 + p \right]^{-k-1} e^{-t} \\ E(X) &= M'(0) = k p^k p^{-k-1} = \dfrac{k}{p} \end{align}

The formulas for $E(X^2)$ and $Var(X)$ follow from the second derivative of $M(t)$.

\begin{align} M''(t) &= k p^k (-k-1) \left[ e^{-t} - 1 + p \right]^{-k-2} (-e^{-t}) e^{-t} + k p^k \left[ e^{-t} - 1 + p \right]^{-k-1} (-e^{-t}) \\ &= k(k+1) p^k \left[ e^{-t} - 1 + p \right]^{-k-2} e^{-t} - k p^k \left[ e^{-t} - 1 + p \right]^{-k-1} e^{-t} \\ E(X^2) &= M''(0) = k(k+1) p^k p^{-k-2} - k p^k p^{-k-1} = \dfrac{k(k+1)}{p^2} - \dfrac{k}{p} \\ Var(X) &= E(X^2) - (E(X))^2 = \dfrac{k(k+1)}{p^2} - \dfrac{k}{p} - \left( \dfrac{k}{p} \right)^2 = \dfrac{k(1-p)}{p^2} \end{align}

And this result implies that the standard deviation of a negative binomial distribution is given by $\sigma = \dfrac{ \sqrt{k(1-p)}}{p}$.