Hypothesis tests using Pearson's product-moment correlation coefficient
Hypothesis Tests using Pearson’s Product-Moment Correlation Coefficient: Understanding the Basics
- Pearson’s Product-Moment Correlation Coefficient (ρ or r) is used to quantify the strength and direction of the linear relationship between two variables, usually denoted X and Y.
- The Pearson’s correlation coefficient value ranges from -1 (indicating a perfect negative linear relationship) through 0 (no linear relationship) to +1 (a perfect positive linear relationship).
- A Hypothesis Test using Pearson’s correlation coefficient is used to determine whether there is evidence of a significant linear relationship between two variables.
- The Null Hypothesis (H_0) in this context is usually that there is no correlation, thus ρ or r = 0.
- The Alternative Hypothesis (H_1) is typically that there is a correlation, thus ρ or r ≠ 0. It could also state that the correlation is less than or greater than zero, depending on the nature of the relationship being studied.
Running Pearson’s Correlation Hypothesis Tests
-
The Test Statistic for Pearson’s Correlation Coefficient is calculated by dividing the sample correlation coefficient (r) by the standard error of the correlation coefficient.
The formula is r = ∑[((x_i - x̅) / s_x) * ((y_i - ȳ) / s_y)] / (n - 1)
where (x_i, y_i) are the data points, (x̅, ȳ) are their means, s_x and s_y are their standard deviations, and n is the number of pairs.
- The calculated Test Statistic value is then compared to a critical value from a T-distribution table with n-2 degrees of freedom, where n is the number of pairs of data points.
- Degrees of Freedom (df) for a Pearson’s correlation hypothesis test is the number of pairs of data points (n) minus 2, that is,
df = n-2
.
Interpreting the Results of Pearson’s Correlation Hypothesis Tests
- If the absolute value of the Test Statistic is greater than the critical value, the Null Hypothesis (H0) is rejected in favour of the alternative hypothesis. This would suggest that there is a significant linear correlation between the two variables.
- Conversely, if the absolute Test Statistic is less than the critical value, there is insufficient evidence to reject the Null Hypothesis (H0). This would suggest that there is not a significant linear correlation between the two variables.
Limitations as well as Assumptions of Pearson’s Correlation
- Pearson’s correlation assumes that the data is continuously scaled and follows a bivariate normal distribution. It is not appropriate for ordinal data or where the relationship is not linear.
- The correlation coefficient only quantifies the strength and direction of a linear relationship. It does not imply causation or depict any complex non-linear relationships between variables.
- Outliers can severely impact the correlation coefficient, often making it seem stronger or weaker than it really is. Therefore, review scatter plots visually to confirm the appropriateness of using this coefficient.
- Finally, Pearson’s Correlation is sensitive to sample size, with a greater likelihood of achieving significance with larger samples, often leading to overestimates of the strength of the relationship.