Comparing Data Sets

Comparing Data Sets

Getting Started

When comparing data sets, it’s important to pay attention to the different statistical measures that you have at your disposal: the mean, median, mode, range, quartiles, and standard deviation.
Besides comparing these statistical measures directly, also consider how they relate to each other in each data set.

Mean, Median, and Mode

Compare the means of different data sets. If one set has a much higher or lower mean, this may indicate a significant difference.
Evaluate the medians of different data sets. The median isn’t affected by extreme values, and so can highlight differences that the mean might obscure.
Consider the modes of different data sets. If data sets have different modes, this could suggest differing trends or most common performances.

Range and Quartiles

Look at the ranges of different data sets. A wider range could indicate a more variable set of data.
Compare the quartiles. If one data set’s quartiles are significantly higher or lower than another’s, it may suggest a difference in central tendency or spread.

Standard Deviation

Compare the standard deviations of the data sets. A higher standard deviation suggests greater variability within a set of data.
Be aware that standard deviation is influenced by outliers. If a set of data has more outliers, it may result in a larger standard deviation.

Analyzing graphs

Draw box-and-whisker plots for each data set. These plots show the median, quartiles, and potential outliers in a data set and can be used to visually compare distinct sets of data.
Compare histograms or frequency polygons for different data sets. Note the areas where one curve is higher or wider than another, suggesting different distributions and frequencies of data.

Trends and Outliers

Keep an eye out for any visible trends in the data. Do the values tend to increase or decrease, or does a pattern emerge?
Look for significant outliers in each set of data. Outliers could affect the mean and standard deviation, potentially skewing a direct comparison of these measures.

Context

Finally, always consider the context as you compare data sets. For example, if you’re comparing test scores between two classes, factors such as different teachers or textbooks could be influencing the results. Always interpret statistical comparisons in light of the wider circumstances.