Formulatoin of a hypothesis

Formulatoin of a hypothesis

Formulating a Hypothesis

  • A hypothesis is a prediction or claim about a population parameter which is to be tested. Formulating hypotheses is a fundamental step before conducting statistical hypothesis testing.

  • Two complementary hypotheses are formulated, the null hypothesis (H0) and the alternative hypothesis (H1 or Ha).

Null Hypothesis (H0)

  • The null hypothesis is an assumption of no effect or no difference in the population. It’s a statement of status quo.

  • It’s usually formulated with an equality sign, i.e., H0: µ = µ0 or H0: µ1 = µ2, where µ is the population parameter, often the population mean.

Alternative Hypothesis (H1 or Ha)

  • The alternative hypothesis asserts that there is an effect or difference in the population contradicting the null hypothesis.

  • It can be formulated as one-sided (e.g. Ha: µ > µ0 or Ha: µ1 < µ2) or two-sided (e.g. Ha: µ ≠ µ0 or Ha: µ1 ≠ µ2)

Significance Level

  • The significance level (alpha), usually 0.05 or 0.01, is set before testing. It threshold for deciding when to reject the null hypothesis.

  • Lower alpha values indicate a higher level of evidence to reject the null hypothesis.

Hypothesis Test Decision

  • A test statistic is computed from the sample data and this statistic (or its p-value) is compared to the significance level to decide whether to reject the null hypothesis.

  • If the test statistic is more extreme than what is likely under the null hypothesis (i.e., p-value <= alpha), reject the null hypothesis favoring the alternative. Else, fail to reject the null hypothesis.

Considerations

  • Type I Error: Rejecting a true null hypothesis. Its probability equals the significance level.

  • Type II Error: Failing to reject a false null hypothesis. Its probability (beta) should also be considered.

  • Power: The capacity of a test to find an effect when one actually exists. It’s calculated as 1- beta. A powerful test can detect smaller effects.

Hypothesis Testing using t-distribution

  • With small sample sizes and/or unknown population variance, hypothesis testing often involves the t-distribution. The t-test statistics follow a t-distribution under the null hypothesis.

  • The degrees of freedom for the t-test are often determined by the sample size. For a simple one-sample or paired t-test, df=n-1, where n is the sample size. For a two-sample t-test, df is usually approximated to the smaller of n1-1 and n2-1, where n1 and n2 are the sizes of the two samples.