Further Summary Statistics
Further Summary Statistics
These are additional statistical measures used to describe and make inferences about collected data.
Range
- The range is the difference between the maximum and minimum values in a data set.
- It provides a measure of spread, but it is strongly affected by outliers.
- Although easy to compute, the range is not always descriptive of the main body of data.
Quartiles
- Quartiles divide a rank-ordered data set into four equal parts.
- The values that divide each part are called the first quartile (Q1), the median or second quartile (Q2), and the third quartile (Q3).
- The interquartile range (IQR) can also be calculated, which is the range between the first and third quartiles (Q3 - Q1). This is often used to identify outliers and describe the spread of the main body of data.
Variance
- The variance is a measure of statistical dispersion, indicating how far data points spread out from the mean.
- Variance is calculated by taking the average of the squared differences from the Mean.
- Variance helps quantify the uncertainty or variability of data points in a sample or population.
Standard Deviation
- The standard deviation is the square root of the variance.
- It is a measure of dispersion that is used when the data is normally distributed.
- It reveals the average amount a data point deviates from the mean. When the standard deviation is small, data points tend to be close to the mean, and conversely, a high standard deviation indicates data points are spread out over a wider range.
- Standard deviation cannot be negative.
Box-and-Whisker Plot
- A box-and-whisker plot is a graph that presents information from a five-number summary, which includes the minimum, Q1, median (Q2), Q3, and maximum.
- This plot does not show a distribution in as much detail as a histogram or kernel density plot but it helps to provide a quick visual interpretation of the five-number summary and can be used for comparing distributions.
- Outliers can be represented by individual points or asterisks.
Summary
- Choosing the best method to present data depends upon the nature of the data set and the specific information you want to highlight.
- It’s essential to understand the strengths and limitations of each method, and always check whether your calculations or visualisations make sense in the context of the original data.