Scatter Diagrams and Correlation
Scatter Diagrams and Correlation
Basics of Scatter Diagrams
- Scatter diagrams, or scatter plots, help represent the relationship between two numerical variables.
- They have two axes to represent two measurements with the pattern of data points suggesting a relationship (correlation).
- If one variable increases with the other, it’s a positive correlation. If one variable decreases while the other increases, it’s a negative correlation.
- Absence of any discernible pattern indicates no correlation.
Making Scatter Diagrams
- The horizontal axis (x-axis) represents the predictor, or independent variable.
- The vertical axis (y-axis) represents the outcome, or dependent variable.
- Each point on the plot represents a single observation.
Understanding Correlation
- Correlation measures the degree of relationship between two numerical variables.
- A positive correlation coefficient indicates that the variables increase or decrease together, whereas a negative correlation coefficient indicates that as one variable increases, the other decreases.
The Correlation Coefficient
- The correlation coefficient is a value between -1 and +1 that indicates the strength of the correlation.
- Values close to +1 indicate a strong positive correlation, and values close to -1 indicate a strong negative correlation. A value correlated to 0, on the other hand, indicates a weak or no correlation.
Limitations and Considerations
- Scatter diagrams and correlation coefficients can show that a relationship exists, but they do not prove that one variable causes the other to change. Correlation does not imply causation.
- Outliers can significantly affect the correlation coefficient and may not necessarily reflect a typical relationship.
- In addition to checking the correlation coefficient, it’s important to visually examine the scatter diagram for insight.
A thorough understanding of scatter diagrams and correlation is fundamental in data interpretation and statistical analyses. Remember to investigate the cause of outliers and to consider the context in which data are collected and used.