Frequency Distributions

Frequencies are counts of the number of times some characteristic or variable was observed or measured. Frequencies are generally obtained by first counting or measuring some characteristic in a cross-sectional study. The list of all of the results of the count or measurement is raw data. This is then summarized in a table called a frequency distribution. Classes (sometimes called bins) are used to describe the quantity being studied, and a frequency is provided for each class. Whether data was obtained from a population or a sample does not affect the construction of a frequency distribution.

A Distribution of Qualitative Data

Suppose, for example, that 150 shoppers at the Crossroads Mall were asked to consider the statement "Taxes should be increased," and to choose the response that best describes their reaction. The raw data was collected, and was summarized in the following frequency distribution.

Degree of Agreement	Number of Responses
Strongly Agree	18
Somewhat Agree	22
Not Sure	30
Somewhat Disagree	35
Strongly Disagree	45

The degree of agreement is a qualitative (ordinal) variable, and the classes in this frequency distribution are the values for the degree of agreement. The frequencies are the number of responses for each class.

The data can be further summarized by computing relative frequencies or percents. Relative frequencies are the decimal parts that each class has relative to the whole, and is just the decimal form of the percent. We show below a relative frequency distribution on the left, and a percent distribution on the right.

Degree of Agreement	Relative Frequency of Responses
Strongly Agree	0.12
Somewhat Agree	0.1467
Not Sure	0.2
Somewhat Disagree	0.2333
Strongly Disagree	0.3

Degree of Agreement	Percentage of Responses
Strongly Agree	12%
Somewhat Agree	14.67%
Not Sure	20%
Somewhat Disagree	23.33%
Strongly Disagree	30%

A Distribution of Quantitative Data

Suppose the heights of 169 freshmen at Western High School were found, and the results were provided in the following table.

Height	Number of Students
135-149 cm	23
150-164 cm	36
165-179 cm	29
180-194 cm	64
195-209 cm	17

In this frequency distribution, each class involves a range of heights. The values provided at each end of a class are called the class limits, so in this case the first class has lower class limit 135 cm and upper class limit 149 cm. Since height is a continuous quantity, but the upper limit of one class is not identical to the lower limit of the next class, we infer that the data was rounded to the nearest centimeter when recorded. Two consecutive classes should not use the same number for a class limit, so as to avoid ambiguity as to which class a particular value should be assigned. So the first class is 135-149, not 135-150, so it is clear that 150 cm belongs in the second class, not the first class. The class boundary is the value halfway between the upper class limit of one class and the lower class limit of the next class. In the example above, the class boundary between the first and second classes is 149.5 cm, which is also the boundary that separates the values rounded down into the lower class from the values rounded up in to the upper class. The class midpoint is the value halfway between the class boundaries of a class, and the class width is the distance between class boundaries. It is good practice for all class widths in a frequency distribution to be identical. In this example, all class widths were 15 cm, and the class midpoint of the first class was 142 cm.

Sometimes frequencies are subtotaled, and presented in a cumulative frequency distribution. The table below shows the cumulative frequency distribution for the example above. Note how the classes are now open-ended at one extreme.

Height	Number of Students
At most 149 cm	23
At most 164 cm	59
At most 179 cm	88
At most 194 cm	152
At most 209 cm	169

A Distribution of Bivariate Data

In bivariate data, each observation is categorized with two values, or in two dimensions. The resulting distribution is called a contingency table. The column at the right and the row at the bottom are referred to as the margins of the contingency table. The margins contain the totals for each category.

Suppose, for example, that 287 subway riders were surveyed as to the distance they traveled each way to work, and their gender was also recorded, with the following results.

	Men	Women	Totals
0-4.9 miles	26	18	44
5.0-9.9 miles	33	21	54
10.0-14.9 miles	35	31	66
15.0-19.9 miles	42	45	87
20.0-24.9 miles	17	19	36
Totals	153	134	287