Powered by MathJax
We use MathJax

Normal Distributions

Normal distributions are the most frequently encountered continuous distributions in basic statistics. Their ubiquity is due to their connection to the sampling process, and the distributions of sample means and sample proportions. However, even absent a sampling context, many quantities in the natural world are approximately normally distributed. Normal distributions are always defined over the interval   $(-\infty, \infty)$.

The Formulas

If $X$ is normally distributed with mean $\mu$ and standard deviation $\sigma$, then the PDF and the moment generating functions of the distribution are given by the following formulas.

\begin{align} f(x) &= \dfrac{1}{\sigma \sqrt{2\pi}} e^{-\dfrac{(x-\mu)^2}{2 \sigma^2}} \\ M(t) &= e^{\mu t + \frac12 \sigma^2 t^2} \end{align}

A standard normal distribution is a normal distribution with mean   $\mu = 0$   and standard deviation   $\sigma = 1$.   These values simplify the PDF and the moment generating function.

\begin{align} f(x) &= \dfrac{1}{\sqrt{2\pi}} e^{-\frac12 x^2} \\ M(t) &= e^{\frac12 t^2} \end{align}

Of course, the CDF of either of these distributions would be a definite integral of their PDFs. However, the antiderivative of the function   $f(x)=e^{-x^2}$   cannot be written using only elementary functions. In other words, there is no integration technique (substitution, parts, etc.) that will produce the antiderivative. There are many possible resolutions to this conundrum.

When using the normal distribution in practice, we use either tables of values of the standard normal CDF (or values related to the CDF), or available technology. Since $-\infty$ cannot be entered into the TI calculator, we typically substitute a really large number, like $1 \times 10^{99}$. However, since the area in the tails beyond $z=\pm 7$ of the standard normal distribution is less than $10^{-10}$, it is sufficient to use $\pm 7$ as our really large number.

Computing with the Standard Normal Distribution

Recall that a standard scores (typically called a z-score) is defined by   $z = \dfrac{x - \mu}{\sigma}$,   or its counterpart   $z = \dfrac{x - \bar{x}}{s}$. Due to this definition, z-scores will always have a mean of $0$ and a standard deviation of $1$. (To verify this, compute the z-scores of   $x = \mu$   and $x = \mu + \sigma$.)   Therefore, we always refer to the domain values of the standard normal distribution as z-scores.

The notation $z_{\alpha}$ is used to identify the z-score for which $\alpha$ is the area under the standard normal curve to the right of $z_{\alpha}$. Note that $z_{\alpha}$ and the CDF use different areas, one to the right and one to the left.

Let us consider a few examples. In our solutions, we will use a combination of statistical and calculator notation. Remember that probabilities are areas under a PDF curve.

The Empirical Rule

If a data set is approximately normally distributed (bell-shaped), then

These results are obtained from the standard normal distribution.

Computing with Non-Standard Normal Distributions

In practical situations, data will not have a mean of $0$ and a standard deviation of $1$. When such data is (at least approximately) normally distributed, we can find all relevant z-scores first.

Suppose the mean lifetime of a particular brand of alkaline battery is 6.32 hours, with a standard deviation of 0.47 hours.

Derivation of the Moment Generating Function

The moment generating function is obtained by evaluating $E(e^{tX})$. The last integral below is recognized as the PDF of a translated standard normal distribution, and therefore equal to one.

\begin{align} M(t) &= \int_{-\infty}^{\infty} e^{tx} \dfrac{1}{\sigma \sqrt{2\pi}} e^{-\dfrac{(x-\mu)^2}{2 \sigma^2}} \, \mathrm{d}x \\ &= \int_{-\infty}^{\infty} \dfrac{1}{\sigma \sqrt{2\pi}} e^{-\dfrac{1}{2 \sigma^2} [x^2 - 2(\mu + \sigma^2 t)x + \mu^2]} \, \mathrm{d}x \\ &= \int_{-\infty}^{\infty} \dfrac{1}{\sigma \sqrt{2\pi}} e^{-\dfrac{1}{2 \sigma^2} [(x - (\mu + \sigma^2 t))^2 - 2\mu \sigma^2 t - \sigma^4 t^2 ]} \, \mathrm{d}x \\ &= e^{-\dfrac{1}{2 \sigma^2}[2\mu \sigma^2 t + \sigma^4 t^2 ]} \int_{-\infty}^{\infty} \dfrac{1}{\sigma \sqrt{2\pi}} e^{-\dfrac12 \left[ \dfrac{x - (\mu + \sigma^2 t)}{\sigma} \right]^2} \, \mathrm{d}x \\ &= e^{-\dfrac{1}{2 \sigma^2}[2\mu \sigma^2 t + \sigma^4 t^2 ]} \\ &= e^{\mu t + \frac12 \sigma^2 t^2} \end{align}

From the moment generating function, it is easily verified that the distribution has mean $\mu$ and variance $\sigma^2$.

\begin{align} M'(t) &= e^{\mu t + \frac12 \sigma^2 t^2} (\mu + \sigma^2 t) \\ E(X) &= M'(0) = e^0 (\mu + 0) = \mu \\ M''(t) &= e^{\mu t + \frac12 \sigma^2 t^2} (\mu + \sigma^2 t)^2 + e^{\mu t + \frac12 \sigma^2 t^2} \sigma^2 \\ E(X^2) &= M''(0) = e^0 \mu^2 + e^0 \sigma^2 = \mu^2 + \sigma^2 \\ Var(X) &= E(X^2) - (E(X))^2 = \mu^2 + \sigma^2 - \mu^2 = \sigma^2 \end{align}