Informal Hypothesis Testing for Correlation/Association
Informal Hypothesis Testing for Correlation/Association
Understanding Correlation
- Correlation measures the strength and direction of a linear relationship between two variables.
- It’s measured on a scale between -1 and 1, with -1 indicating a perfect negative correlation, 1 indicating a perfect positive correlation, and 0 indicating no correlation.
Sample Correlation Coefficient (r)
- The Sample Correlation Coefficient (r) assesses the degree of linear association between two variables, X and Y, in a sample.
- The value of r ranges from -1 to 1, with -1 showing perfect negative correlation, 0 showing no correlation, and 1 indicating perfect positive correlation.
- High absolute values of r (close to 1 or -1) suggest a strong association.
Null and Alternative Hypotheses for Correlation
- The Null Hypothesis (H0) in testing correlation assumes that there is no association between the two variables, i.e., the population correlation coefficient (ρ) is 0.
- The Alternative Hypothesis (H1) suggests that there is an association; i.e., the population correlation coefficient is not 0. This is a non-directional (two-sided) test.
- When using a directional (one-side) test, the alternative hypothesis could state that the correlation is less than 0 (negative correlation) or more than 0 (positive correlation).
Significance Level and Decision Rule
- The Significance Level (commonly 0.05) is the probability of incorrectly rejecting the null hypothesis if it is true.
- The decision to reject or not reject the null hypothesis is based on the observed value of r and its corresponding p-value compared to the significance level.
- If the p-value is less than or equal to the significance level, the null hypothesis is rejected in favour of the alternative hypothesis, suggesting a statistically significant correlation.
Association in Scatter Plots
- Evidence of a potential association between two variables can initially be checked using a scatter plot.
- Points scattered randomly suggest no correlation. A clear upward trend is indicative of positive correlation, while a clear downward trend suggests negative correlation.
- However, note that correlation does not imply causation and that scatter plots do not prove a cause-and-effect relationship.
Testing for Correlation with Large Sample Sizes
- With large sample sizes, a weak or modest correlation coefficient can still achieve statistical significance.
- For this reason, it’s crucial to also assess the confidence intervals for the population correlation coefficient, providing a range of plausible values.
- A confidence interval that includes 0 indicates that the null hypothesis of no correlation cannot be rejected at the chosen significance level.
Power and Sample Size in Association Tests
- A larger sample size increases the power of the correlation test, i.e., the probability of correctly rejecting a false null hypothesis. This is because it reduces the standard error, making the test statistic larger.
- Conversely, a small sample size may fail to detect a true correlation due to low power (higher chance of a Type II error).
- As with any statistical test, both the statistical significance and the practical importance of the result should be considered. A statistically significant correlation may not necessarily be practically important, particularly if the correlation is weak.