Hypothesis tests using Pearson's product-moment correlation coefficient – A Level Further Mathematics OCR Revision

Hypothesis Tests using Pearson’s Product-Moment Correlation Coefficient: Understanding the Basics

Pearson’s Product-Moment Correlation Coefficient (ρ or r) is used to quantify the strength and direction of the linear relationship between two variables, usually denoted X and Y.
The Pearson’s correlation coefficient value ranges from -1 (indicating a perfect negative linear relationship) through 0 (no linear relationship) to +1 (a perfect positive linear relationship).
A Hypothesis Test using Pearson’s correlation coefficient is used to determine whether there is evidence of a significant linear relationship between two variables.
The Null Hypothesis (H_0) in this context is usually that there is no correlation, thus ρ or r = 0.
The Alternative Hypothesis (H_1) is typically that there is a correlation, thus ρ or r ≠ 0. It could also state that the correlation is less than or greater than zero, depending on the nature of the relationship being studied.

The Test Statistic for Pearson’s Correlation Coefficient is calculated by dividing the sample correlation coefficient (r) by the standard error of the correlation coefficient.

The formula is r = ∑[((x_i - x̅) / s_x) * ((y_i - ȳ) / s_y)] / (n - 1)

where (x_i, y_i) are the data points, (x̅, ȳ) are their means, s_x and s_y are their standard deviations, and n is the number of pairs.
The calculated Test Statistic value is then compared to a critical value from a T-distribution table with n-2 degrees of freedom, where n is the number of pairs of data points.
Degrees of Freedom (df) for a Pearson’s correlation hypothesis test is the number of pairs of data points (n) minus 2, that is, df = n-2.

If the absolute value of the Test Statistic is greater than the critical value, the Null Hypothesis (H0) is rejected in favour of the alternative hypothesis. This would suggest that there is a significant linear correlation between the two variables.
Conversely, if the absolute Test Statistic is less than the critical value, there is insufficient evidence to reject the Null Hypothesis (H0). This would suggest that there is not a significant linear correlation between the two variables.

Pearson’s correlation assumes that the data is continuously scaled and follows a bivariate normal distribution. It is not appropriate for ordinal data or where the relationship is not linear.
The correlation coefficient only quantifies the strength and direction of a linear relationship. It does not imply causation or depict any complex non-linear relationships between variables.
Outliers can severely impact the correlation coefficient, often making it seem stronger or weaker than it really is. Therefore, review scatter plots visually to confirm the appropriateness of using this coefficient.
Finally, Pearson’s Correlation is sensitive to sample size, with a greater likelihood of achieving significance with larger samples, often leading to overestimates of the strength of the relationship.