# Goodness of fit test

## Goodness of Fit Test Fundamentals

• The Goodness of fit test is a statistical hypothesis test to see how well observed data fit a theoretical model.
• The test investigates whether the observed frequency distribution differs significantly from a predicted distribution.
• It is a form of Chi-Squared Test and heavily used in models where you expect a particular distribution.
• The Null Hypothesis (H0) in a goodness of fit test states that there is no significant difference between the observed and expected data - that is, the data follow the specified distribution.
• The Alternative Hypothesis (H1) is that there is a significant difference between the observed and expected data - that is, the data do not follow the specified distribution.

## Calculating the Goodness of Fit

• Initiate by noting the observed frequencies for each category or class of the data.
• Determine the expected frequencies for each category following the theoretical distribution under the null hypothesis.
• Calculate the test statistic, which is the sum of the squares of the difference between observed and expected frequencies, divided by the expected frequency, for each category.
• This test statistic follows a Chi-Square distribution.
• The number of degrees of freedom is usually calculated as the number of categories minus one.

## Interpreting the Goodness of Fit Test Results

• If the calculated test statistic or Chi-Square value is less than the critical value from the Chi-Square distribution, there is not enough evidence to reject the null hypothesis.
• When the test statistic is greater than the critical value, the null hypothesis is rejected, indicating a significant difference between observed and expected data.
• Rejection of the null hypothesis suggests the data does not follow the theoretical distribution.
• Bear in mind, this test only indicates if there’s a significant difference; it doesn’t necessarily specify what the difference is.
• Always interpret the results in the context of the specific problem or dataset you’re working on. A low Chi-square value demonstrates a good fit of your data to the theoretical model, whereas a high value suggests a poor fit.
• Beware of overreliance on tests, including the goodness of fit test. Remember the principle that correlation does not imply causation. A good model fit doesn’t confirm the hypotheses involved but just suggests the model is plausible under the specified assumptions.