F Distributions

The F-Distribution (the initial F is in honor of statistician Ronald Fisher) is a ratio of distributions that compares the variances of two populations. It is heavily used in the analysis of variance (ANOVA). Since sample variances have chi-square distributions, the F distribution is therefore related to the ratio of two chi-square distributions. Since each chi-square distribution has degrees of freedom as a parameter, the F distribution will have two parameters.

The Formulas

If the random variable $X$ has an F-distribution over the interval $[0, \infty)$, with degrees of freedom $m$ and $n$, then the PDF of the distribution is given by the following formula.

$f_X(x) = \dfrac{ \Gamma\left(\dfrac{m+n}{2}\right) \left(\dfrac{m}{n}\right)^{m/2} x^{m/2-1}}{ \Gamma\left(\dfrac{m}{2}\right) \Gamma\left(\dfrac{n}{2}\right) \left(1 + \dfrac{m}{n} x\right)^{(m+n)/2}}$

It should be noted that the parameters for the degrees of freedom are not interchangable. That is, the F-distribution with 3 and 5 degrees of freedom is different than the F-Distribution with 5 and 3 degrees of freedom.

In practice, we use either tables of the CDF of F, or available technology.

Computing with the F-Distribution

Probabilities under the F-distribution curve depend not only on the endpoints of the interval, but also on the two degrees of freedom parameters. The Texas Instruments calculator syntax for the CDF is $\operatorname{Fcdf}(x_1,x_2,m,n)$.

For the F-distribution with 7 and 12 degrees of freedom, the area under the curve between 2.5 and 7.3 is $P(2.5 < x < 7.3) = \operatorname{Fcdf}(2.5,7.3,7,12) = 0.0767$.
For the F-distribution with 9 and 5 degrees of freedom, the area under the curve less than 6 is $P(x < 6) = \operatorname{Fcdf}(0,6, 9,5) = 0.9687$.

Quotient Distributions

Suppose that $X$ and $Y$ are independent random variables, with PDFs $f_X(x)$ and $f_Y(y)$. Suppose also that the random variable $X$ is always nonnegative. We shall consider the random variable defined by the quotient $Z = \dfrac{Y}{X}$. Since $Z$ is defined by two independent distributions, we must also consider the joint distribution of $X$ and $Y$, whose PDF will be represented by $f_{XY}(x,y)$.

The probabilities associated with the random variable $Z$ will form a region of the $xy$-plane. Note that $x \ge 0$, so only the right-half plane is required.

\begin{equation} P(Z \le z) = P\left( \dfrac{y}{x} \le z\right) = P(y \le zx) \end{equation}

This probability will be related to the CDF of the joint distribution, where the region of the plane is described by the limits of integration.

\begin{equation} F_{XY}(z) = \int_0^\infty \int_{-\infty}^{zx} f_{XY}(x,y) \,\mathrm{d}y \,\mathrm{d}x \end{equation}

Using the transformation $y = xv$, we will be able to change the order of integration.

\begin{equation} F_{XY}(z) = \int_0^\infty \int_{-\infty}^z x f_{XY}(x,xv) \,\mathrm{d}v \,\mathrm{d}x = \int_{-\infty}^z \int_0^\infty x f_{XY}(x,xv) \,\mathrm{d}x \,\mathrm{d}v \end{equation}

Therefore, the derivative of this quantity (with respect to the variable $v$) will produce the PDF of a quotient distribution.

\begin{equation} f_{XY}(z) = \int_0^\infty x f_{XY}(x, xz) \,\mathrm{d}x \end{equation}

Furthermore, with the assumption of independence of the variables $X$ and $Y$, we can rewrite the joint distribution as a product.

\begin{equation} f_{XY}(z) = \int_0^\infty x f_X(x) f_Y(xz) \,\mathrm{d}x \end{equation}

Derivation of the F Distribution

Let $X$ and $Y$ both have independent chi-square distributions, with degrees of freedom $n$ and $m$ respectively. Define the quotient $Z = \dfrac{Y}{X}$. Then the PDF of $Z$ can be determined from the result of the joint distribution of a quotient.

\begin{align} f_Z(z) &= \int_0^\infty x \dfrac{x^{n/2-1} e^{-x/2}}{\Gamma\left(\dfrac{n}{2}\right) 2^{n/2}} \dfrac{(xz)^{m/2-1} e^{-xz/2}}{\Gamma\left(\dfrac{m}{2}\right) 2^{m/2}} \,\mathrm{d}x \\ &= \dfrac{z^{m/2-1}}{\Gamma\left(\dfrac{m}{2}\right) \Gamma\left(\dfrac{n}{2}\right) 2^{(m+n)/2}} \int_0^\infty x^{(m+n)/2 - 1} e^{-x(z+1)/2} \,\mathrm{d}x \end{align}

Now the Gamma function is actually defined by $\Gamma(x) = \int_0^\infty t^{x-1} e^{-t} \,\mathrm{d}t$. After using the substitution $t = x \left(\dfrac{z+1}{2} \right)$, we recognize the integral above as a value of the gamma function.

\begin{align} f_Z(z) &= \dfrac{z^{m/2-1}}{\Gamma\left(\dfrac{m}{2}\right) \Gamma\left(\dfrac{n}{2}\right) 2^{(m+n)/2}} \dfrac{1}{\left(\dfrac{z+1}{2}\right)^{(m+n)/2}} \int_0^\infty t^{(m+n)/2-1} e^{-t} \,\mathrm{d}t \\ &= \dfrac{ \Gamma\left(\dfrac{m+n}{2}\right) z^{m/2-1}}{ \Gamma\left(\dfrac{m}{2}\right) \Gamma\left(\dfrac{n}{2}\right) (z+1)^{(m+n)/2}} \end{align}

The F-Distribution is actually not this quotient, but the quotient $F = \dfrac{Y/m}{X/n}$. Therefore, if $X$ is a random variable with an F-Distribution, its PDF is

\begin{align} f_X(x) &= \dfrac{m}{n} f_Z \left( \dfrac{m}{n} x\right) \\ &= \dfrac{ \dfrac{m}{n} \Gamma\left(\dfrac{m+n}{2}\right) \left( \dfrac{m}{n} x\right)^{m/2-1}}{ \Gamma\left(\dfrac{m}{2}\right) \Gamma\left(\dfrac{n}{2}\right) \left( \dfrac{m}{n} x + 1\right)^{(m+n)/2}} \\ &= \dfrac{ \Gamma\left(\dfrac{m+n}{2}\right) \left(\dfrac{m}{n}\right)^{m/2} x^{m/2-1}}{ \Gamma\left(\dfrac{m}{2}\right) \Gamma\left(\dfrac{n}{2}\right) \left(1 + \dfrac{m}{n} x\right)^{(m+n)/2}} \end{align}