Scatter diagrams and correlation (Higher Tier)

Scatter diagrams and correlation (Higher Tier)

Scatter Diagrams

  • A scatter diagram (or scatter plot) is a type of graphical representation that uses coordinates to display values from two variables.

  • Each point on the scatter diagram represents a single data pair from the data set.

  • Scatter diagrams are used to identify the type of relationship, if any, exists between two variables. This could be positive, negative, or no correlation.

  • In scatter diagrams, the horizontal axis represents one variable (usually the independent variable), while the vertical axis represents the second variable (usually the dependent variable).

  • If the plotted points create an upward trend from lower left to upper right, it’s said to depict a positive correlation.

  • If the plotted points create a downward trend from upper left to lower right, it’s a negative correlation.

  • If the points are scattered randomly with no discernible pattern, this indicates no correlation.

Correlation

  • Correlation is a measure of the strength and direction of the relationship between two variables.

  • Correlation can either be positive (as one variable increases, so does the other), negative (as one variable increases, the other decreases), or zero (no linear relationship between the variables).

  • The value of the correlation coefficient, denoted by ‘r’ or Rho (ρ), ranges from -1 to 1. A value close to +1 or -1 indicates a strong positive or negative correlation respectively, while a value close to 0 indicates weak correlation.

  • It’s important to note that correlation does not imply causation. Just because two variables are correlated doesn’t mean one variable causes the other to change.

Line of Best Fit

  • Based on the scatter diagram, a line of best fit (or trend line) can be added to the chart to model the correlation between the variables.

  • This line can be drawn either by visually trying to have an about-equal number of points above and below the line or using statistical methods like the method of least squares.

  • The line of best fit can then be used to make predictions, but keep in mind that predictions outside the range of the gathered data (extrapolation) can be unreliable.

Importance of Interpretation

  • Remember, a scatter diagram only gives an overview of the possible relationship between two variables, it can’t prove that one variable is the cause of changes in the other.

  • Even if there’s a strong correlation, other factors that aren’t represented in the scatter plot may be influencing both variables.

  • Be mindful of outliers, which are extreme data points that can significantly influence the line of best fit and overall correlation.