Goodness of fit test

Goodness of Fit Test Fundamentals

  • The Goodness of fit test is a statistical hypothesis test to see how well observed data fit a theoretical model.
  • The test investigates whether the observed frequency distribution differs significantly from a predicted distribution.
  • It is a form of Chi-Squared Test and heavily used in models where you expect a particular distribution.
  • The Null Hypothesis (H0) in a goodness of fit test states that there is no significant difference between the observed and expected data - that is, the data follow the specified distribution.
  • The Alternative Hypothesis (H1) is that there is a significant difference between the observed and expected data - that is, the data do not follow the specified distribution.

Calculating the Goodness of Fit

  • Initiate by noting the observed frequencies for each category or class of the data.
  • Determine the expected frequencies for each category following the theoretical distribution under the null hypothesis.
  • Calculate the test statistic, which is the sum of the squares of the difference between observed and expected frequencies, divided by the expected frequency, for each category.
  • This test statistic follows a Chi-Square distribution.
  • The number of degrees of freedom is usually calculated as the number of categories minus one.

Interpreting the Goodness of Fit Test Results

  • If the calculated test statistic or Chi-Square value is less than the critical value from the Chi-Square distribution, there is not enough evidence to reject the null hypothesis.
  • When the test statistic is greater than the critical value, the null hypothesis is rejected, indicating a significant difference between observed and expected data.
  • Rejection of the null hypothesis suggests the data does not follow the theoretical distribution.
  • Bear in mind, this test only indicates if there’s a significant difference; it doesn’t necessarily specify what the difference is.
  • Always interpret the results in the context of the specific problem or dataset you’re working on. A low Chi-square value demonstrates a good fit of your data to the theoretical model, whereas a high value suggests a poor fit.
  • Beware of overreliance on tests, including the goodness of fit test. Remember the principle that correlation does not imply causation. A good model fit doesn’t confirm the hypotheses involved but just suggests the model is plausible under the specified assumptions.