Scatter Diagrams and Correlation

Scatter Diagrams and Correlation

Scatter Diagrams

  • A scatter diagram, or scatter plot, is a type of diagram that uses coordinates to display values for two different variables from a dataset.
  • The points on the scatter diagram are marked with dots, each of which represents an individual observation.
  • The position of a dot on the horizontal and vertical axis indicates values for an individual data point.
  • Scatter diagrams can show a wide range of data on one plot and are particularly helpful in displaying large sets of data.
  • Scatter diagrams are used when variables differ, helping to find a relationship or trend between them.
  • The scattered dots can take any shape, from a vertical line to a perfect circle, each indicating a different type of correlation.

Correlation

  • Correlation is a statistical measurement of the relationship between two variables.
  • A correlation can be positive, showing that as one variable increases, so does the other. It is graphically represented as your plot points bunching up along an upward sloping line on the scatter diagram.
  • A correlation can be negative, revealing that as one variable increases, the other decreases. This is represented as your plot points bunching up along a downward sloping line.
  • A zero correlation or none indicates that there is no relationship between the variables. This will be shown on the scatter diagram by your plot points showing no discernible pattern.
  • Correlation does not imply causation. Just because two variables correlate does not mean that changes in one cause changes in the other.

Understanding Correlation

  • The correlation coefficient, often denoted by ‘r’, measures the strength and direction of the correlation.
  • Its value ranges between -1 and 1.
  • A correlation coefficient of +1 indicates a perfect positive correlation, where as one variable increases, the other does too at a fixed rate.
  • A correlation coefficient of -1 reveals a perfect negative correlation, where as one variable rises, the other falls at a fixed rate.
  • A correlation coefficient of 0 declares there is no linear correlation between the variables.
  • When analysing correlation, it’s important to remember that ‘r’ only measures linear (straight-line) relationships. Even if ‘r’ is zero, there could be a non-linear relationship.