Pearson's product-moment correlation coeffecient – A Level Further Mathematics OCR Revision

Pearson’s product-moment correlation coeffecient

Pearson’s product-moment correlation coefficient (often just referred to as Pearson’s correlation coefficient or simply Pearson’s r) is a statistic that measures the strength and direction of the linear relationship between two quantitative variables.
This coefficient, denoted by r, can take any value from -1 to 1. The closer r is to either -1 or 1, the stronger is the linear relationship between the variables.
A Pearson’s coefficient of r = +1 implies a perfect positive linear relationship, r = -1 implies a perfect negative linear relationship, and r = 0 implies no linear relationship.

The formula to calculate Pearson’s correlation coefficient is given by r = Σ[(xi - x̄)(yi - ȳ)] / [(n-1)sx sy] where xi and yi are the individual sample points indexed with i, x̄ and ȳ are the sample means, sx and sy are the sample standard deviations, and n is the number of samples.
This formula represents a covariance normalization: it divides the covariance of the two variables by the product of their standard deviations.

Pearson’s correlation coefficient is symmetric: correlating X with Y is the same as correlating Y with X.
The value of Pearson’s correlation is unaffected by linear transformations (i.e., multiplying a variable by a constant or adding a constant).
However, it is important to note that Pearson’s correlation requires that the variables are measured on at least interval scales, and also assumes linearity and homoscedasticity of the variables.

The sign ofr indicates the direction of the correlation: positive values indicate a positive correlation, negative values a negative correlation.
The magnitude tells us about the strength of the relationship: the closer the coefficient is to either ±1, the stronger the correlation.
Pearson’s coefficient is widely used in scientific research to quantify the degree of linear association between variables.
Even though correlation does not imply causation, this statistic is nonetheless valuable as it can highlight trends in data and suggest potential hypotheses to test.

Pearson’s r only measures linear relationships. Therefore, variables that have a nonlinear relationship may have a correlation coefficient close to zero – even if there is a clear and predictable pattern between them.
Another limitation of Pearson’s r is that it is sensitive to outliers. An unusual observation may greatly increase or decrease the correlation coefficient.
It does not take into account the slope or shape of the distribution, only the strength and direction of the linear relationship. Therefore, it is always recommended to pair it with scatter plots for getting a deeper insight about the nature of association.