Powered by MathJax
We use MathJax

Joint Continous Probability Distributions

The joint continuous distribution is the continuous analogue of a joint discrete distribution. For that reason, all of the conceptual ideas will be equivalent, and the formulas will be the continuous counterparts of the discrete formulas.

Most often, the PDF of a joint distribution having two continuous random variables is given as a function of two independent variables.

Formulas

Suppose the PDF of a joint distribution of the random variables $X$ and $Y$ is given by $f_{XY}(x,y)$. As with all continuous distributions, two requirements must hold for each ordered pair   $(x,y)$   in the domain of $f$.

$f_{XY}(x,y) \ge 0$
$\int\limits_x \int\limits_y f_{XY}(x,y) = 1$

Then the marginal PDFs $f_X (x)$ and $f_Y(y)$, the expected values $E(X)$ and $E(Y)$, and the variances $Var(X)$ and $Var(Y)$ can be found by the following formulas.

\begin{align} f_X (x) &= \int\limits_y f_{XY}(x,y) \, \mathrm{d}y \\ f_Y (y) &= \int\limits_x f_{XY}(x,y) \, \mathrm{d}x \\ E(X) &= \int\limits_x x f_X (x) \, \mathrm{d}x \\ E(Y) &= \int\limits_y y f_Y (y) \, \mathrm{d}y \\ Var(X) &= \int\limits_x x^2 f_X (x) \, \mathrm{d}x - (E(X))^2 \\ Var(Y) &= \int\limits_y y^2 f_Y (y) \, \mathrm{d}y - (E(Y))^2 \end{align}

As always, the standard deviations $\sigma_X$ and $\sigma_Y$ are the square roots of their respective variances.

To measure any relationship between two random variables, we use the covariance, defined by the following formula.

$Cov(X,Y) = \int\limits_x \int\limits_y xy f_{XY} (x,y) \, \mathrm{d}y \, \mathrm{d}x - E(X)E(Y)$

The correlation has the same definition,   $\rho_{XY} = \dfrac{Cov(X,Y)}{\sigma_X \sigma_Y}$,   and the same interpretation as for joint discrete distributions.

An Example

A college professor wants to learn if there is a relationship between time spent on homework and the percent of the homework that is completed. Using $X$ as the number of weeks after being distributed that an assignment is turned in, and $Y$ as the percent of the assignment that is completed, he finds that the PDF of the distribution follows the function   $f_{XY}(x,y) = \dfrac{9}{10} xy^2 + \dfrac15$,   when   $0 \le x \le 2$   and   $0 \le y \le 1$.

First, we shall verify that this function meets the requirements to be a continuous PDF. For nonnegative values of $x$ and $y$, the function will satisfy   $f_{XY}(x,y) \ge 0$. As for the integral, we have:

\begin{align} \int_x \int_y f_{XY}(x,y) \, \mathrm{d}y \, \mathrm{d}x &= \int_0^2 \int_0^1 \left( \dfrac{9}{10} xy^2 + \dfrac15 \right) \, \mathrm{d}y \, \mathrm{d}x \\ &= \int_0^2 \left[ \dfrac{3}{10}xy^3 + \dfrac15 y \right]_0^1 \, \mathrm{d}x \\ &= \int_0^2 \left( \dfrac{3}{10} x + \dfrac15 \right) \, \mathrm{d}x \\ &= \left[ \dfrac{3}{20} x^2 + \dfrac15 x \right]_0^2 \\ &= \dfrac{12}{20} + \dfrac25 - 0 - 0 \\ &= 1 \end{align}

The marginal density functions (or marginal PDFs) are found by integrating over the variable to be removed from consideration.

\begin{align} f_X (x) &= \int_0^1 \left( \dfrac{9}{10} xy^2 + \dfrac15 \right) \, \mathrm{d}y \\ &= \left[ \dfrac{3}{10} xy^3 + \dfrac15 y \right]_0^1 \\ &= \dfrac{3}{10} x + \dfrac15 \end{align} \begin{align} f_Y (y) &= \int_0^2 \left( \dfrac{9}{10} xy^2 + \dfrac15 \right) \, \mathrm{d}x \\ &= \left[ \dfrac{9}{20} x^2 y^2 + \dfrac15 x \right]_0^2 \\ &= \dfrac95 y^2 + \dfrac25 \end{align}

With these formulas, we can obtain probabilities. The probability that a student will turn in the assignment less than half of a week after it is assigned is given by

\begin{align} P(x < 0.5) &= \int_0^{0.5} f_X(x) \, \mathrm{d}x \\ &= \int_0^{0.5} \left( \dfrac{3}{10} x + \dfrac15 \right) \, \mathrm{d}x \\ &= \left[ \dfrac{3}{20}x^2 + \dfrac15 x \right]_0^{0.5} \\ &= 0.0375 + 0.1 \\ &= 0.1375 \end{align}

The probability that an assignment will be less than 40% completed when it is turned in is given by

\begin{align} P(y < 0.4) &= \int_0^{0.4} f_Y(y) \, \mathrm{d}y \\ &= \int_0^{0.4} \left( \dfrac95 y^2 + \dfrac25 \right) \, \mathrm{d}y \\ &= \left[ \dfrac35 y^3 + \dfrac25 y \right]_0^{0.4} \\ &= 0.0384 + 0.16 \\ &= 0.1984 \end{align}

The probability that a randomly selected student will turn in an assignment in less than one week with more than half of the assignment completed is given by

\begin{align} P(x < 1, y > 0.5) &= \int_0^1 \int_{0.5}^1 f_{XY}(x,y) \, \mathrm{d}y \, \mathrm{d}x \\ &= \int_0^1 \int_{0.5}^1 \left( \dfrac{9}{10} xy^2 + \dfrac15 \right) \, \mathrm{d}y \, \mathrm{d}x \\ &= \int_0^1 \left[ \dfrac{3}{10} xy^3 + \dfrac15 y \right]_0.5^1 \, \mathrm{d}x \\ &= \int_0^1 \left( \dfrac{21}{80} x + \dfrac{1}{10} \right) \, \mathrm{d}x \\ &= \left[ \dfrac{21}{160} x^2 + \dfrac{1}{10} x \right]_0^1 \\ &= 0.13125 + 0.1 \\ &= 0.23125 \end{align}

The expected value (or mean) of each random variable can be found by use of the formulas.

\begin{align} E(X) &= \int_x x f_X (x) \, \mathrm{d}x \\ &= \int_0^2 x \left( \dfrac{3}{10} x + \dfrac15 \right) \, \mathrm{d}x \\ &= \int_0^2 \left( \dfrac{3}{10} x^2 + \dfrac15 x \right) \, \mathrm{d}x \\ &= \left[ \dfrac{1}{10} x^3 + \dfrac{1}{10} x^2 \right]_0^2 \\ &= \dfrac{8}{10} + \dfrac{4}{10} - 0 - 0 \\ &= \dfrac65 = 1.2 \end{align} \begin{align} E(Y) &= \int_y y f_Y (y) \, \mathrm{d}y \\ &= \int_0^1 y \left( \dfrac95 y^2 + \dfrac25 \right) \, \mathrm{d}y \\ &= \int_0^1 \left( \dfrac95 y^3 + \dfrac25 y \right) \, \mathrm{d}y \\ &= \left[ \dfrac{9}{20} y^4 + \dfrac15 y^2 \right]_0^1 \\ &= \dfrac{9}{20} + \dfrac{1}{5} \\ &= \dfrac{13}{20} = 0.65 \end{align}

Therefore, students are turning in the assignment after 1.2 weeks on average, and the assignments are 65% complete on average. Or in other words, if a student is randomly selected, we could expect them to turn in a paper after 1.2 weeks, and that paper would be 65% complete.

We can also use the formulas to compute the variance and standard deviation of each random variable.

\begin{align} Var(X) &= \int_x x^2 f_X (x) \, \mathrm{d}x - (E(X))^2 \\ &= \int_0^2 x^2 \left( \dfrac{3}{10} x + \dfrac15 \right) \, \mathrm{d}x - \left( \dfrac65 \right)^2 \\ &= \int_0^2 \left( \dfrac{3}{10} x^3 + \dfrac15 x \right) \, \mathrm{d}x - \dfrac{36}{25} \\ &= \left[ \dfrac{3}{40} x^4 + \dfrac{1}{15} x^3 \right]_0^2 - \dfrac{36}{25} \\ &= \dfrac65 + \dfrac{8}{15} - \dfrac{36}{25} \\ &= \dfrac{22}{75} \approx 0.2933 \\ \sigma_X &= \sqrt{ \dfrac{22}{75} } \approx 0.5416 \end{align} \begin{align} Var(Y) &= \int_y y^2 f_Y(y) \, \mathrm{d}y - (E(Y))^2 \\ &= \int_0^1 y^2 \left( \dfrac95 y^2 + \dfrac25 \right) \, \mathrm{d}y - \left( \dfrac{13}{20} \right)^2 \\ &= \int_0^1 \left( \dfrac95 y^4 + \dfrac25 y^2 \right) \, \mathrm{d}y - \dfrac{169}{400} \\ &= \left[ \dfrac{9}{25} y^5 + \dfrac{2}{15} y^3 \right]_0^1 - \dfrac{169}{400} \\ &= \dfrac{9}{25} + \dfrac{2}{15} - \dfrac{169}{400} \\ &= \dfrac{17}{240} \approx 0.0708 \\ \sigma_Y &= \sqrt{ \dfrac{17}{240}} \approx 0.2661 \end{align}

Interpreting these results, we find variances of 0.2933 squared weeks and 0.0708 squared completions. The standard deviations are more clear, and give 0.5416 weeks and 26.61% completion. These standard deviations are an average distance of a data point from the means computed earlier.

To obtain the strength of any relationship between these variables, we can compute the covariance and the correlation.

\begin{align} Cov(X,Y) &= \int_x \int_y xy f_{XY}(x,y) \, \mathrm{d}y \, \mathrm{d}x - E(X)E(Y) \\ &= \int_0^2 \int_0^1 xy \left( \dfrac{9}{10} xy^2 + \dfrac15 \right) \, \mathrm{d}y \, \mathrm{d}x - \left( \dfrac65 \right) \left( \dfrac{13}{20} \right) \\ &= \int_0^2 \int_0^1 \left( \dfrac{9}{10} x^2y^3 + \dfrac15 xy \right) \, \mathrm{d}y \, \mathrm{d}x - \dfrac{39}{50} \\ &= \int_0^2 \left[ \dfrac{9}{40} x^2 y^4 + \dfrac{1}{10} xy^2 \right]_0^1 \, \mathrm{d}x - \dfrac{39}{50} \\ &= \int_0^2 \left( \dfrac{9}{40} x^2 + \dfrac{1}{10} x \right) \, \mathrm{d}x - \dfrac{39}{50} \\ &= \left[ \dfrac{3}{40} x^3 + \dfrac{1}{20} x^2 \right]_0^2 - \dfrac{39}{50} \\ &= \dfrac35 + \dfrac15 - \dfrac{39}{50} \\ &= \dfrac{1}{50} = 0.02 \\ \rho_{XY} &= \dfrac{Cov(X,Y)}{\sigma_X \sigma_Y} = \dfrac{0.02}{(0.5416)(0.2661)} = 0.1388 \end{align}

The correlation between these variables is slightly positive, indicating that papers will generally be more complete as the time spent on them increases. However, it is a rather weak correlation, because the value of $\rho_{XY}$ is quite close to zero.