Averages and range
Averages and range
Measures of Central Tendency
-
Mean is calculated as the sum of all the data values divided by the number of data values. It considers all data points, but is susceptible to outliers.
-
The median is the middle value when the data is arranged in numerical order. If there’s an even number of data values, the median is the mean of the two middle numbers.
-
Mode refers to the most frequently occurring value in the data set. A data set may have more than one mode (bimodal, trimodal, etc.) or none at all (no mode).
Measures of Spread
-
Range is a measure of spread calculated as the difference between the highest and lowest values in the data set. It can be influenced significantly by outliers.
-
Interquartile range (IQR) is the range of the middle half of a set of data, calculated as the difference between the upper quartile (Q3) and the lower quartile (Q1). This measure is less affected by outliers.
-
Variance is a measure of how much the values in the data set diverge from the mean. It’s calculated by squaring the standard deviation.
-
Standard Deviation is a measure of how spread out the numbers in the data set are. It is the square root of the variance. A low standard deviation indicates that the data points tend to be close to the mean.
Identifying Skewness
-
If the mean is larger than the median, the data is said to be positively skewed. This indicates a distribution with a long tail on the right.
-
If the medians value is higher than the mean, the data is negatively skewed. This means the distribution has a long tail on the left side.
-
In a symmetric distribution, mean, mode and median are all equal and the distribution is said to have no skewness.
Outliers
-
An outlier is a value that lies an abnormal distance from other values in a random sample from a population. Detecting outliers is important as they can significantly impact measures such as the mean and range.
-
Outliers can be detected using the 1.5xIQR rule, where any data point more than 1.5 interquartile ranges (IQRs) below the first quartile or above the third quartile is deemed an outlier.
Importance of Appropriate Use
-
Always choose the right measure depending on the nature of your data. For example, the mean might not be a good measure for data with outliers, while the mode might not make sense for continuous data.
-
Speak accurately about averages. Don’t call the mean the ‘average’ without clarifying, as median and mode are also types of average.