Continuous Probability Distributions

When moving from discrete to continuous distributions, the random variable will no longer be restricted to integer values, but will now be able to take on any value in some interval of real numbers. Graphically, we will be moving from the discrete bars of a histogram to the curve of a (possibly piecewise) continuous function.

Comparison of Discrete and Continuous graphs

In the discrete case, probabilities were given by a probability distribution function $P(X=x)$, and graphically displayed by using its value as the height of each bar. We might also observe that each of the bars had width $1$, and therefore the height of each bar was equal to its area. In the continuous case, the function $f(x)$ is called the probability density function, and probabilities are determined by the areas under the curve $f(x)$. So as we move from the discrete to the continuous case, we need to modify how we interpret the graph, so that we see probabilities as areas. And yet, the mathematics has not changed at all, since probabilities are areas in both cases.

Characteristics of a Continous Probability Density Function

In the discrete case of a probability distribution function, there were two requirements. Each probability had to be between 0 and 1, and the sum of all probabilities was equal to 1. As we move to the continuous case, the same requirements are essentially maintained, except that the function $f(x)$ does not measure probabilities, and is therefore not restricted to be less than 1. We also need to modify the sums so as to handle a continuous random variable, so each sum becomes an integral. Therefore, the requirements for a function $f(x)$ to be a probability density function are twofold.

$f(x) \ge 0$

$\int_{-\infty}^\infty f(x) \, \mathrm{d}x = 1$

In the discrete case, the cumulative distribution function was given by $\sum\limits_{\text{all }x} P(x)$. Since the move to the continuous case involves changing the sum to an integral, the cumulative density function $F_X(x)$ is given by the following formula.

$F_X (x) = \int_{-\infty}^x f(t) \, \mathrm{d}t$

Formulas for expected value and variance are also easily transformed by changing the sums to integrals.

$E(X) = \int_{-\infty}^{\infty} x f(x) \, \mathrm{d}x$

$Var(X) = \int_{-\infty}^{\infty} x^2 f(x) \, \mathrm{d}x - (E(X))^2$

And as before, the standard deviation is the square root of the variance. That is, $\sigma = \sqrt{ Var(X)}$.

The median of a continuous PDF will be located so that half of the area is to the left of the median, and half is to the right. In other words, the equation $\int_{-\infty}^x f(t) \, \mathrm{d}t = \dfrac12$ must be solved for $x$ to find the median. Since the integral is the cdf, this is equivalent to solving $F(x) = \dfrac12$ for $x$.

The mode of a continuous PDF is the maximum value of the function $f(x)$. Therefore, examining the critical values of the function will be necessary to determine if a mode exists.

A Polynomial as a Continuous PDF

Suppose $f(x)=\dfrac{4x^3+8x}{117}$ on the interval $[0,3]$, and is equal to zero elsewhere. We shall investigate several probability questions related to this function.

First, let us verify that $f(x)$ meets the requirements to be a PDF. We observe that for values of $x$ in the interval $[0,3]$, the function is in fact nonnegative. Computing the integral, we obtain

$\int_{-\infty}^\infty f(x) \, \mathrm{d}x = \int_0^3 \left(\dfrac{4}{117} x^3 + \dfrac{8}{117}x \right) \, \mathrm{d}x = \left. \dfrac{1}{117} x^4 + \dfrac{4}{117} x^2 \right|_0^3 = \dfrac{81}{117} + \dfrac{36}{117} = 1$

Since $f(x)$ is a polynomial, the cumulative density function is easily computed.

$F(x)= \int_{-\infty}^x f(t) \, \mathrm{d}t = \int_0^x \left(\dfrac{4}{117} t^3 + \dfrac{8}{117}t \right) \, \mathrm{d}x = \left. \dfrac{1}{117} t^4 + \dfrac{4}{117} t^2 \right|_0^x = \dfrac{x^4 + 4x^2}{117}$

Usually, probabilities are most easily computed from the cdf. Here are some examples.

\begin{align} P(x \le 2) &= F(2) = \dfrac{2^4 + 4(2^2)}{117} = \dfrac{32}{117} \\ P(x \ge 1) &= 1 - F(1) = 1 - \dfrac{1^4 + 4(1^2)}{117} = \dfrac{112}{117} \\ P(1 \le x \le 2) &= F(2) - F(1) = \dfrac{32}{117} - \dfrac{5}{117} = \dfrac{27}{117} \\ P(x \le 2 | x \ge 1) &= \dfrac{P(x \le 2 \cap x \ge 1)}{P(x\ge 1)} = \dfrac{ P(1 \le x \le 2)}{P(x\ge 1)} = \dfrac{ \frac{27}{117}}{ \frac{112}{117}} = \dfrac{27}{112} \\ P(x = 2) &= \int_2^2 f(x) \, \mathrm{d}x = 0 \end{align}

Notice that for a continuous PDF, the probability that $x$ takes on any single value will always be zero. This is equivalent to saying that the vertical line at the value $x$ always has area zero. Another consequence of this fact is that $P(x \le a) = P(x \lt a)$ for any continuous PDF and any value $a$.

The mean, variance, and standard deviation are also easily found.

\begin{align} E(X) &= \int_{-\infty}^\infty x f(x) \, \mathrm{d}x = \int_0^3 \left(\dfrac{4}{117} x^4 + \dfrac{8}{117}x^2 \right) \, \mathrm{d}x = \left. \dfrac{4}{585} x^5 + \dfrac{8}{351} x^3 \right|_0^3 = \dfrac{108}{65} + \dfrac{8}{13} = \dfrac{148}{65} \approx 2.2769 \\ Var(X) &= \int_{-\infty}^{\infty} x^2 f(x) \, \mathrm{d}x - (E(X))^2 = \int_0^3 \left(\dfrac{4}{117} x^5 + \dfrac{8}{117}x^3 \right) \, \mathrm{d}x - (E(X))^2 \\ &= \left. \dfrac{2}{351} x^6 + \dfrac{2}{117} x^4 \right|_0^3 - \left( \dfrac{148}{65} \right)^2 = \dfrac{162}{39} + \dfrac{18}{13} - \dfrac{21904}{4225} = \dfrac{1496}{4225} \approx 0.3541 \\ \sigma &= \sqrt{Var(X)} = \sqrt{ \dfrac{1496}{4225}} = \dfrac{2}{65} \sqrt{374} \approx 0.5950 \end{align}

To find the median, we set the cdf equal to $\dfrac12$, and solve the resulting equation. This leads to the equation $x^4 + 4x^2 - \dfrac{117}{2} = 0$, which is quadratic in $x^2$. Using the quadratic formula, we obtain $x^2 = -2+\dfrac52 \sqrt{10}$, and therefore the median occurs at $x = \sqrt{ -2+\dfrac52 \sqrt{10}} \approx 2.4302$.

The mode occurs at the maximum value of $f(x)$. Since our function is zero everywhere except the interval $[0,3]$, where it is nonnegative, it is sufficient for us to consider the values of the function on that closed interval. To find the critical values, we take the derivative of $f(x)$ to obtain $f'(x) = \dfrac{12x^2+8}{117}$. Setting the derivative equal to zero, we find that $f(x)$ has no critical values, so the maximum value must occur at an endpoint of the interval. Evaluating, we find $f(0) = 0$ and $f(3) = \dfrac{132}{117}$. Therefore, the mode is 3.

Finding the PDF from the CDF

Suppose a continuous CDF is given by $F(x) = \dfrac{3x^2+5x-22}{78}$ on the interval $[2,5]$. What is the PDF?

Since we can integrate a PDF to obtain a CDF, we can invert the process by finding a derivative. Therefore, $f(x) = \dfrac{d}{dx} \dfrac{3x^2+5x-22}{78} = \dfrac{6x+5}{78}$. This result is easily shown to meet the requirements of being a PDF, and in fact, we already had its integral from which to compute the probabilities. That is to say, $F(5)-F(2)=1$.

A PDF from an Arbitrary Function

Suppose we want a PDF whose shape resembles a cosine function. Now $\cos x$ will not satisfy either of the two requirements to be a PDF. To make the function nonnegative, we can add a suitable constant, so we will use the function $(1+\cos x)$.

To be a PDF, our function must have area 1. If we allow the function to take on any value of $x$, this would not be possible, since any positive area repeated infinitely many times will be infinite. Therefore, this function needs to be restricted to a smaller domain (or equivalently, the values of the function on large portions of the domain need to be redefined to be zero). So let us define the function $(1+\cos x)$ on the interval $[0, 2\pi]$, and zero elsewhere. This function still does not have area 1, but it does have finite area, which we now compute.

$\int_0^{2 \pi} (1 + \cos x) \, \mathrm{d}x = \left. (x + \sin x) \right|_0^{2\pi} = 2\pi$

Now we can transform our function into a PDF by dividing the function by this area. Therefore, the function $f(x) = \dfrac{1}{2\pi} (1+\cos x)$ on the interval $[0, 2\pi]$ is a PDF.

We should note that each choice we made had an effect on the probabilities that will be given by our result. If we increase the size of the constant that was added, the probabilities over each interval would increase, eventually to a situation where intervals of equal length would have approximately equal probabilities. We could have opted to not add any constant, and just redefined those negative values to be zero instead. And if we chose a different interval besides $[0,2\pi]$, we could have seen more of the periodicity of the cosine shape, but at the same time changing the overall probabilities of any interval.