Fitting a theoretical distribution to given data

Fitting a Theoretical Distribution to Given Data

Basic Concepts

  • Theoretical distributions are mathematical functions that describe the probabilities of different outcomes.
  • Fitting a distribution to data involves finding the parameters of the distribution that best explain the data.

Choosing an Appropriate Distribution

  • The choice of a suitable theoretical distribution to fit the given data depends on the nature of the data. Understanding the characteristics and patterns in the data is crucial.
  • Histograms and stem-and-leaf plots are useful tools for visualising the shape of the data and guiding the choice of distribution.
  • If the data is symmetrical, a normal distribution might be an appropriate fit. For positively skewed data, an exponential distribution could be an option.

Fitting the Distribution

  • One way to fit a distribution is the method of least squares, minimising the sum of the squares of the differences between the observed and theoretical values.
  • Another method is maximum likelihood estimation (MLE), which identifies the parameter values that make the observed data most probable.
  • Both of these methods require a knowledge of calculus and the ability to solve equations.

Assessing the Fit

  • A theoretical distribution that has been fitted to data can be assessed by comparing observed values with the expected values from the distribution.
  • Chi-square tests are commonly used for this purpose, comparing observed and expected frequencies in different categories or bins. A low chi-square value suggests a good fit.
  • A P-value gives the probability that the differences between observed and expected values arose by chance. A low P-value (typically <0.05) suggests the theoretical distribution is a good fit.

Other Important Points

  • Correlation does not imply causality - so while a theoretical distribution may fit data well, it does not necessarily mean there is a cause-effect relationship.
  • The process of fitting a theoretical distribution should involve iteration - using the results of an initial fit to inform a second (or further) fit to improve the distribution.
  • Remember to consider whether the model and its assumptions are reasonable and consistent with what is known about the process that generated the data.