Bivariate Data, Association and Correlation

Defining Bivariate Data, Association and Correlation

  • Bivariate data involves two different variables gathered from the same entity, and can be graphically represented in a scatterplot.
  • Association in bivariate data is evident when changes in one variable may affect changes in the other variable.
  • Correlation is the statistical measure that describes the degree of relationship between two variables.

Types of Association

  • Positive Association: As the value of one variable increases, the value of the other variable also increases and vice versa.
  • Negative Association: When the value of one variable increases, the value of the other variable decreases, or vice versa.
  • No association: No apparent relationship is apparent between the two variables.

Correlation Coefficient

  • The correlation coefficient, denoted by ‘r’, measures the strength and direction of a linear relationship between two variables on a scatterplot.
  • It ranges from -1 to +1 where -1 signifies a perfectly negative linear correlation, +1 a perfectly positive, and 0 indicates no linear correlation.

Interpreting the Correlation Coefficient

  • A correlation coefficient close to 1 indicates a strong positive linear relationship, a value close to -1 denotes a strong negative linear relationship.
  • A correlation coefficient close to 0 suggests a weak or non-existent linear relationship.
  • Note that correlation does not imply causation; even though two variables may be strongly correlated, it does not mean that changes in one variable cause changes in the other.

Spearman’s Rank Correlation Coefficient

  • Spearman’s Rank correlation coefficient works on ranked data and is a commonly used alternative to the correlation coefficient when data doesn’t follow a normal distribution.
  • It can detect any monotonic relationship (increasing or decreasing) as opposed to just linear.

Calculating and Interpreting Bivariate Data

  • Bivariate data is typically presented in a table, scatter plot, or correlation matrix.
  • When calculating correlation, ensure the appropriateness of data for the calculation — linear relationships for correlation coefficient and monotonic relationships for Spearman’s Rank.
  • Always consider outliers, as these can have a significant impact on the correlation.
  • The line of best fit, median-median line, or least squares line are used to provide visual interpretation of the data relationship.
  • Remember to interpret the relationship within the context of the data, considering external influences or explanations.