Further Summary Statistics

Further Summary Statistics

These are additional statistical measures used to describe and make inferences about collected data.

Range

  • The range is the difference between the maximum and minimum values in a data set.
  • It provides a measure of spread, but it is strongly affected by outliers.
  • Although easy to compute, the range is not always descriptive of the main body of data.

Quartiles

  • Quartiles divide a rank-ordered data set into four equal parts.
  • The values that divide each part are called the first quartile (Q1), the median or second quartile (Q2), and the third quartile (Q3).
  • The interquartile range (IQR) can also be calculated, which is the range between the first and third quartiles (Q3 - Q1). This is often used to identify outliers and describe the spread of the main body of data.

Variance

  • The variance is a measure of statistical dispersion, indicating how far data points spread out from the mean.
  • Variance is calculated by taking the average of the squared differences from the Mean.
  • Variance helps quantify the uncertainty or variability of data points in a sample or population.

Standard Deviation

  • The standard deviation is the square root of the variance.
  • It is a measure of dispersion that is used when the data is normally distributed.
  • It reveals the average amount a data point deviates from the mean. When the standard deviation is small, data points tend to be close to the mean, and conversely, a high standard deviation indicates data points are spread out over a wider range.
  • Standard deviation cannot be negative.

Box-and-Whisker Plot

  • A box-and-whisker plot is a graph that presents information from a five-number summary, which includes the minimum, Q1, median (Q2), Q3, and maximum.
  • This plot does not show a distribution in as much detail as a histogram or kernel density plot but it helps to provide a quick visual interpretation of the five-number summary and can be used for comparing distributions.
  • Outliers can be represented by individual points or asterisks.

Summary

  • Choosing the best method to present data depends upon the nature of the data set and the specific information you want to highlight.
  • It’s essential to understand the strengths and limitations of each method, and always check whether your calculations or visualisations make sense in the context of the original data.