The **expected value**, $E(X)$, is defined for discrete
and for continuous random variables $X$ as follows:

$E(X) = \sum\limits_{\text{all }x} x P(X=x) = \sum x P(x)$ |

$E(X) = \int\limits_{\text{all }x} x f(x) \, \mathrm{d}x$ |

The expected value of a discrete random variable is equivalent to a weighted mean, as can be seen in the following derivation.

$E(X) = \dfrac{\sum x P(x)}{1} = \dfrac{\sum x P(x)}{\sum P(x)} = \dfrac{\sum w x}{\sum w} = \mu$ |

The expected value formula arises in the continuous case by allowing the number of rectangles to approach $\infty$, which changes the sum into an integral. Since the connection has been established between the weighted mean and both expected value formulas, we can then conclude that the expected value will describe the long-run behavior that the statistical experiment can be expected to produce.

The **variance**, $Var(X)$, of the random variable $X$ is defined as follows:

$Var(X) = \sum\limits_{\text{all }x} (x-\mu)^2 P(X=x) = \sum (x-\mu)^2 P(x)$ |

$Var(X) = \int\limits_{\text{all }x} (x-\mu)^2 f(x) \, \mathrm{d}x$ |

The discrete formula is equivalent to the weighted population variance formula, as can be seen in the following derivation.

$Var(X) = \dfrac{\sum (x-\mu)^2 P(x)}{1} = \dfrac{\sum (x-\mu)^2 P(x)}{\sum P(x)} = \dfrac{\sum w (x-\mu)^2}{\sum w} = \sigma^2$ |

The continuous form of the variance formula is, once again, what arises as the number of rectangles in the discrete histogram to approach $\infty$.

A more common formula for the variance does not require deviations from the mean. This formula is:

$Var(X) = E(X^2) - (E(X))^2$ |

The derivation of this formula, for both the discrete and continuous cases, proceeds as follows:

\begin{align} Var(X) &= \sum (x-\mu)^2 P(x) \\ &= \sum (x^2 - 2\mu x + {\mu}^2) P(x) \\ &= \sum x^2 P(x) - 2\mu \sum x P(x) + {\mu}^2 \sum P(x) \\ &= E(X^2) - 2\mu \mu + {\mu}^2 (1) \\ &= E(X^2) - {\mu}^2 \\ &= E(X^2) - (E(X))^2 \end{align} | \begin{align} Var(X) &= \int (x-\mu)^2 f(x) \, \mathrm{d}x \\ &= \int (x^2 - 2\mu x + {\mu}^2) f(x) \, \mathrm{d}x \\ &= \int x^2 f(x) \, \mathrm{d}x - 2\mu \int x f(x) \, \mathrm{d}x + {\mu}^2 \int f(x) \, \mathrm{d}x \\ &= E(X^2) - 2\mu \mu + {\mu}^2 (1) \\ &= E(X^2) - {\mu}^2 \\ &= E(X^2) - (E(X))^2 \end{align} |

We first encountered Chebyshev's Theorem in the section on Measures of Dispersion. Now, we can prove the statement.

**Chebyshev's Theorem:** For any distribution of data that has a finite mean $\mu$ and
finite variance $\sigma^2$, then for any $z \ge 1$, at least
$1-\dfrac{1}{z^2}$ of the data will fall within $z$ standard deviations of the mean.

**Proof:** Let $X$ be a discrete random variable for the distribution of data, and
let set $A$ be the set of data that falls at or beyond $z$ standard deviations of the mean.
That is, for any $x \in A$ we have the inequality
$\vert x - \mu \vert > z \sigma$. Then:

\begin{align} P( \vert x - \mu \vert \ge z \sigma) &= P(x \in A) \\ &= \sum \limits_{x \in A} P(X = x) \\ &= \dfrac{1}{z^2 \sigma^2} \sum \limits_{x \in A} (z \sigma)^2 P(X = x) \\ &\le \dfrac{1}{z^2 \sigma^2} \sum \limits_{x \in A} (x - \mu)^2 P(X = x) \\ &\le \dfrac{1}{z^2 \sigma^2} \sum \limits_{\text{all } x} (x - \mu)^2 P(X = x) \\ &= \dfrac{\sigma^2}{z^2 \sigma^2} \\ &= \dfrac{1}{z^2} \end{align} |

This leads to the following succession of inequalities:

\begin{align} P( \vert x - \mu \vert \ge z \sigma) &\le \dfrac{1}{z^2} \\ 1 - P( \vert x - \mu \vert \ge z \sigma) &\ge 1 - \dfrac{1}{z^2} \\ P( \vert x - \mu \vert < z \sigma) &\ge 1 - \dfrac{1}{z^2} \end{align} |

The last inequality is the statement of Chebyshev's Theorem. The proof of the theorem for continuous distributions uses integrals, but is otherwise similar.