Powered by MathJax
We use MathJax

Frequency Distributions

Frequencies are counts of the number of times some characteristic or variable was observed or measured. Frequencies are generally obtained by first counting or measuring some characteristic in a cross-sectional study. The list of all of the results of the count or measurement is raw data. This is then summarized in a table called a frequency distribution. Classes (sometimes called bins) are used to describe the quantity being studied, and a frequency is provided for each class. Whether data was obtained from a population or a sample does not affect the construction of a frequency distribution.

A Distribution of Qualitative Data

Suppose, for example, that 150 shoppers at the Crossroads Mall were asked to consider the statement "Taxes should be increased," and to choose the response that best describes their reaction. The raw data was collected, and was summarized in the following frequency distribution.

Degree of Agreement Number of Responses
Strongly Agree 18
Somewhat Agree 22
Not Sure 30
Somewhat Disagree 35
Strongly Disagree 45

The degree of agreement is a qualitative (ordinal) variable, and the classes in this frequency distribution are the values for the degree of agreement. The frequencies are the number of responses for each class.

The data can be further summarized by computing relative frequencies or percents. Relative frequencies are the decimal parts that each class has relative to the whole, and is just the decimal form of the percent. We show below a relative frequency distribution on the left, and a percent distribution on the right.

Degree of Agreement Relative Frequency of Responses
Strongly Agree 0.12
Somewhat Agree 0.1467
Not Sure 0.2
Somewhat Disagree 0.2333
Strongly Disagree 0.3
Degree of Agreement Percentage of Responses
Strongly Agree 12%
Somewhat Agree 14.67%
Not Sure 20%
Somewhat Disagree 23.33%
Strongly Disagree 30%

A Distribution of Quantitative Data

Suppose the heights of 169 freshmen at Western High School were found, and the results were provided in the following table.

Height Number of Students
135-149 cm 23
150-164 cm 36
165-179 cm 29
180-194 cm 64
195-209 cm 17

In this frequency distribution, each class involves a range of heights. The values provided at each end of a class are called the class limits, so in this case the first class has lower class limit 135 cm and upper class limit 149 cm. Since height is a continuous quantity, but the upper limit of one class is not identical to the lower limit of the next class, we infer that the data was rounded to the nearest centimeter when recorded. Two consecutive classes should not use the same number for a class limit, so as to avoid ambiguity as to which class a particular value should be assigned. So the first class is 135-149, not 135-150, so it is clear that 150 cm belongs in the second class, not the first class. The class boundary is the value halfway between the upper class limit of one class and the lower class limit of the next class. In the example above, the class boundary between the first and second classes is 149.5 cm, which is also the boundary that separates the values rounded down into the lower class from the values rounded up in to the upper class. The class midpoint is the value halfway between the class boundaries of a class, and the class width is the distance between class boundaries. It is good practice for all class widths in a frequency distribution to be identical. In this example, all class widths were 15 cm, and the class midpoint of the first class was 142 cm.

Sometimes frequencies are subtotaled, and presented in a cumulative frequency distribution. The table below shows the cumulative frequency distribution for the example above. Note how the classes are now open-ended at one extreme.

Height Number of Students
At most 149 cm 23
At most 164 cm 59
At most 179 cm 88
At most 194 cm 152
At most 209 cm 169

A Distribution of Bivariate Data

In bivariate data, each observation is categorized with two values, or in two dimensions. The resulting distribution is called a contingency table. The column at the right and the row at the bottom are referred to as the margins of the contingency table. The margins contain the totals for each category.

Suppose, for example, that 287 subway riders were surveyed as to the distance they traveled each way to work, and their gender was also recorded, with the following results.

  Men Women Totals
0-4.9 miles 26 18 44
5.0-9.9 miles 33 21 54
10.0-14.9 miles 35 31 66
15.0-19.9 miles 42 45 87
20.0-24.9 miles 17 19 36
Totals 153 134 287