We use MathJax
Frequencies are counts of the number of times some characteristic or variable was observed or measured. Frequencies are generally obtained by first counting or measuring some characteristic in a cross-sectional study. The list of all of the results of the count or measurement is raw data. This is then summarized in a table called a frequency distribution. Classes (sometimes called bins) are used to describe the quantity being studied, and a frequency is provided for each class. Whether data was obtained from a population or a sample does not affect the construction of a frequency distribution.
Suppose, for example, that 150 shoppers at the Crossroads Mall were asked to consider the statement "Taxes should be increased," and to choose the response that best describes their reaction. The raw data was collected, and was summarized in the following frequency distribution.
|Degree of Agreement||Number of Responses|
The degree of agreement is a qualitative (ordinal) variable, and the classes in this frequency distribution are the values for the degree of agreement. The frequencies are the number of responses for each class.
The data can be further summarized by computing relative frequencies or percents. Relative frequencies are the decimal parts that each class has relative to the whole, and is just the decimal form of the percent. We show below a relative frequency distribution on the left, and a percent distribution on the right.
Suppose the heights of 169 freshmen at Western High School were found, and the results were provided in the following table.
|Height||Number of Students|
In this frequency distribution, each class involves a range of heights. The values provided at each end of a class are called the class limits, so in this case the first class has lower class limit 135 cm and upper class limit 149 cm. Since height is a continuous quantity, but the upper limit of one class is not identical to the lower limit of the next class, we infer that the data was rounded to the nearest centimeter when recorded. Two consecutive classes should not use the same number for a class limit, so as to avoid ambiguity as to which class a particular value should be assigned. So the first class is 135-149, not 135-150, so it is clear that 150 cm belongs in the second class, not the first class. The class boundary is the value halfway between the upper class limit of one class and the lower class limit of the next class. In the example above, the class boundary between the first and second classes is 149.5 cm, which is also the boundary that separates the values rounded down into the lower class from the values rounded up in to the upper class. The class midpoint is the value halfway between the class boundaries of a class, and the class width is the distance between class boundaries. It is good practice for all class widths in a frequency distribution to be identical. In this example, all class widths were 15 cm, and the class midpoint of the first class was 142 cm.
Sometimes frequencies are subtotaled, and presented in a cumulative frequency distribution. The table below shows the cumulative frequency distribution for the example above. Note how the classes are now open-ended at one extreme.
|Height||Number of Students|
|At most 149 cm||23|
|At most 164 cm||59|
|At most 179 cm||88|
|At most 194 cm||152|
|At most 209 cm||169|
In bivariate data, each observation is categorized with two values, or in two dimensions. The resulting distribution is called a contingency table. The column at the right and the row at the bottom are referred to as the margins of the contingency table. The margins contain the totals for each category.
Suppose, for example, that 287 subway riders were surveyed as to the distance they traveled each way to work, and their gender was also recorded, with the following results.