# Bivariate Data, Association and Correlation

**Defining Bivariate Data, Association and Correlation**

- Bivariate data involves
**two different variables**gathered from the same entity, and can be graphically represented in a scatterplot. **Association**in bivariate data is evident when changes in one variable may affect changes in the other variable.**Correlation**is the statistical measure that describes the degree of relationship between two variables.

**Types of Association**

**Positive Association**: As the value of one variable increases, the value of the other variable also increases and vice versa.**Negative Association**: When the value of one variable increases, the value of the other variable decreases, or vice versa.**No association**: No apparent relationship is apparent between the two variables.

**Correlation Coefficient**

- The
**correlation coefficient**, denoted by ‘r’, measures the strength and direction of a linear relationship between two variables on a scatterplot. - It ranges from -1 to +1 where -1 signifies a perfectly negative linear correlation, +1 a perfectly positive, and 0 indicates no linear correlation.

**Interpreting the Correlation Coefficient**

- A correlation coefficient close to 1 indicates a strong positive linear relationship, a value close to -1 denotes a strong negative linear relationship.
- A correlation coefficient close to 0 suggests a weak or non-existent linear relationship.
- Note that
**correlation does not imply causation**; even though two variables may be strongly correlated, it does not mean that changes in one variable cause changes in the other.

**Spearman’s Rank Correlation Coefficient**

**Spearman’s Rank correlation coefficient**works on ranked data and is a commonly used alternative to the correlation coefficient when data doesn’t follow a normal distribution.- It can detect any monotonic relationship (increasing or decreasing) as opposed to just linear.

**Calculating and Interpreting Bivariate Data**

- Bivariate data is typically presented in a table, scatter plot, or correlation matrix.
- When calculating correlation, ensure the appropriateness of data for the calculation — linear relationships for correlation coefficient and monotonic relationships for Spearman’s Rank.
- Always consider
**outliers**, as these can have a significant impact on the correlation. - The line of best fit, median-median line, or least squares line are used to provide visual interpretation of the data relationship.
- Remember to interpret the relationship within the context of the data, considering external influences or explanations.