Measures of Central Tendency and Dispersion
Measures of Central Tendency and Dispersion
Measures of Central Tendency
- Mean: Refers to the average of a data set. It is calculated by summing all values in the data set and dividing by the number of values.
- Median: It’s the middle value in a data set when the data set is arranged in ascending or descending order. If the data set has an even number of entities, the median is the average of the two middle values.
- Mode: This is the most frequently occurring value in a data set. A data set may have more than one mode if several values occur with equal highest frequency.
Measures of Dispersion
- Range: The difference between the highest and lowest value in a data set.
- Interquartile Range (IQR): The range of the middle 50% of values in a data set. It is calculated by subtracting the value of lower quartile (Q1) from the upper quartile (Q3).
- Variance: A measure of how spread out the values in a data set are around the mean. It is calculated as the average of the squared differences from the mean.
- Standard Deviation: It is the square root of variance. It is a more intuitive measure of dispersion as it is expressed in the same units as the original data.
Estimating population parameters from sample data
- A sample is a subset of a population.
- Sample Mean (X̄) and Sample Variance (s²) can be used to estimate the population mean (μ) and population variance (σ²) respectively.
- While calculating sample variance ,the denominator is (n-1) not n, where ‘n’ is the number of items in the sample. This ensures an unbiased estimate of the population variance.
- Confidence Interval (CI) around the sample mean can be used to give an interval estimate of the population mean. This interval has a specified probability (confidence level) of containing the population mean.
- Standard error is a measure of the precision of an estimate. It is the standard deviation of the estimated distribution of the sample mean.