Bivariate Data, Association and Correlation
Defining Bivariate Data, Association and Correlation
- Bivariate data involves two different variables gathered from the same entity, and can be graphically represented in a scatterplot.
- Association in bivariate data is evident when changes in one variable may affect changes in the other variable.
- Correlation is the statistical measure that describes the degree of relationship between two variables.
Types of Association
- Positive Association: As the value of one variable increases, the value of the other variable also increases and vice versa.
- Negative Association: When the value of one variable increases, the value of the other variable decreases, or vice versa.
- No association: No apparent relationship is apparent between the two variables.
Correlation Coefficient
- The correlation coefficient, denoted by ‘r’, measures the strength and direction of a linear relationship between two variables on a scatterplot.
- It ranges from -1 to +1 where -1 signifies a perfectly negative linear correlation, +1 a perfectly positive, and 0 indicates no linear correlation.
Interpreting the Correlation Coefficient
- A correlation coefficient close to 1 indicates a strong positive linear relationship, a value close to -1 denotes a strong negative linear relationship.
- A correlation coefficient close to 0 suggests a weak or non-existent linear relationship.
- Note that correlation does not imply causation; even though two variables may be strongly correlated, it does not mean that changes in one variable cause changes in the other.
Spearman’s Rank Correlation Coefficient
- Spearman’s Rank correlation coefficient works on ranked data and is a commonly used alternative to the correlation coefficient when data doesn’t follow a normal distribution.
- It can detect any monotonic relationship (increasing or decreasing) as opposed to just linear.
Calculating and Interpreting Bivariate Data
- Bivariate data is typically presented in a table, scatter plot, or correlation matrix.
- When calculating correlation, ensure the appropriateness of data for the calculation — linear relationships for correlation coefficient and monotonic relationships for Spearman’s Rank.
- Always consider outliers, as these can have a significant impact on the correlation.
- The line of best fit, median-median line, or least squares line are used to provide visual interpretation of the data relationship.
- Remember to interpret the relationship within the context of the data, considering external influences or explanations.