Descriptive Statistics

Basics of Descriptive Statistics

  • Descriptive statistics provides a summary and analysis of a data collected from an experiment or study.
  • It is divided into two broad categories: measures of central tendency and measures of dispersion or variability.

Measures of Central Tendency

  • Mean: It is the average of the data set and it’s calculated by adding all data points and dividing by the number of data points.
  • Median: The median is the middle value that separates the higher half from the lower half of a data set. If the data set has an even number of observations, the median is the average of the two middle numbers.
  • Mode: The mode is the value that appears most frequently in a data set.

Measures of Dispersion or Variability

  • Range: The range of a data set is the spread, which is the difference between the highest and lowest values.
  • Variance: Variance measures how far each number in the set is from the mean (or expected value).
  • Standard deviation: The standard deviation is the square root of the variance and provides a measure of the amount of variation or dispersion of a set of values.

Visual Tools for Descriptive Statistics

  • Histograms and bar graphs: These are visual representations of numerical data divided into bins or classes.
  • Box-and-whisker plots: These plots visually show the median, lower and upper quartiles, and any possible outliers in the data.
  • Scatter plots: These plots can visually display possible correlations between two different data sets.

Rules & Principles

  • Use descriptive statistics to summarize and interpret data but it does not allow you to make conclusions about the population that the data are taken from.
  • Outliers can significantly affect the mean and the standard deviation, but have less impact on the median or mode.

Importance & Relevance

  • Descriptive statistics provide clarity to large amounts of data by reducing lots of data into a simpler summary.
  • These statistics help us understand and describe the features of a specific data set, providing an informative summary of the measure.