Informal Hypothesis Testing for Correlation/Association

Informal Hypothesis Testing for Correlation/Association

Understanding Correlation

  • Correlation measures the strength and direction of a linear relationship between two variables.
  • It’s measured on a scale between -1 and 1, with -1 indicating a perfect negative correlation, 1 indicating a perfect positive correlation, and 0 indicating no correlation.

Sample Correlation Coefficient (r)

  • The Sample Correlation Coefficient (r) assesses the degree of linear association between two variables, X and Y, in a sample.
  • The value of r ranges from -1 to 1, with -1 showing perfect negative correlation, 0 showing no correlation, and 1 indicating perfect positive correlation.
  • High absolute values of r (close to 1 or -1) suggest a strong association.

Null and Alternative Hypotheses for Correlation

  • The Null Hypothesis (H0) in testing correlation assumes that there is no association between the two variables, i.e., the population correlation coefficient (ρ) is 0.
  • The Alternative Hypothesis (H1) suggests that there is an association; i.e., the population correlation coefficient is not 0. This is a non-directional (two-sided) test.
  • When using a directional (one-side) test, the alternative hypothesis could state that the correlation is less than 0 (negative correlation) or more than 0 (positive correlation).

Significance Level and Decision Rule

  • The Significance Level (commonly 0.05) is the probability of incorrectly rejecting the null hypothesis if it is true.
  • The decision to reject or not reject the null hypothesis is based on the observed value of r and its corresponding p-value compared to the significance level.
  • If the p-value is less than or equal to the significance level, the null hypothesis is rejected in favour of the alternative hypothesis, suggesting a statistically significant correlation.

Association in Scatter Plots

  • Evidence of a potential association between two variables can initially be checked using a scatter plot.
  • Points scattered randomly suggest no correlation. A clear upward trend is indicative of positive correlation, while a clear downward trend suggests negative correlation.
  • However, note that correlation does not imply causation and that scatter plots do not prove a cause-and-effect relationship.

Testing for Correlation with Large Sample Sizes

  • With large sample sizes, a weak or modest correlation coefficient can still achieve statistical significance.
  • For this reason, it’s crucial to also assess the confidence intervals for the population correlation coefficient, providing a range of plausible values.
  • A confidence interval that includes 0 indicates that the null hypothesis of no correlation cannot be rejected at the chosen significance level.

Power and Sample Size in Association Tests

  • A larger sample size increases the power of the correlation test, i.e., the probability of correctly rejecting a false null hypothesis. This is because it reduces the standard error, making the test statistic larger.
  • Conversely, a small sample size may fail to detect a true correlation due to low power (higher chance of a Type II error).
  • As with any statistical test, both the statistical significance and the practical importance of the result should be considered. A statistically significant correlation may not necessarily be practically important, particularly if the correlation is weak.