# Goodness of fit test

## Goodness of Fit Test Fundamentals

- The
**Goodness of fit**test is a statistical hypothesis test to see how well observed data fit a theoretical model. - The test investigates whether the observed frequency distribution differs significantly from a predicted distribution.
- It is a form of
**Chi-Squared Test**and heavily used in models where you expect a particular distribution. - The
**Null Hypothesis (H0)**in a goodness of fit test states that there is no significant difference between the observed and expected data - that is, the data follow the specified distribution. - The
**Alternative Hypothesis (H1)**is that there is a significant difference between the observed and expected data - that is, the data do not follow the specified distribution.

## Calculating the Goodness of Fit

- Initiate by noting the observed frequencies for each category or class of the data.
- Determine the expected frequencies for each category following the theoretical distribution under the null hypothesis.
- Calculate the
**test statistic**, which is the sum of the squares of the difference between observed and expected frequencies, divided by the expected frequency, for each category. - This test statistic follows a
**Chi-Square distribution**. - The number of
**degrees of freedom**is usually calculated as the number of categories minus one.

## Interpreting the Goodness of Fit Test Results

- If the calculated test statistic or Chi-Square value is less than the critical value from the Chi-Square distribution, there is not enough evidence to reject the null hypothesis.
- When the test statistic is greater than the critical value, the null hypothesis is
**rejected**, indicating a significant difference between observed and expected data. - Rejection of the null hypothesis suggests the data does not follow the theoretical distribution.
- Bear in mind, this test only indicates if there’s a significant difference; it doesn’t necessarily specify what the difference is.
- Always interpret the results in the context of the specific problem or dataset you’re working on. A
**low Chi-square value**demonstrates a good fit of your data to the theoretical model, whereas a**high value**suggests a poor fit. - Beware of overreliance on tests, including the goodness of fit test. Remember the principle that
**correlation does not imply causation**. A good model fit doesn’t confirm the hypotheses involved but just suggests the model is plausible under the specified assumptions.