Scatter Graphs
Basics of Scatter Graphs
- A scatter graph or scatter plot is a graph in which data points are plotted on a Cartesian coordinate system.
- Scatter graphs represent bivariate data, i.e., data with two variables, with each corresponding pair of values constituting an (x,y) coordinate on the graph.
- Data points on a scatter graph are not connected with lines; instead, they are plotted as individual dots.
- Scatter graphs are useful for observing and showing relationships between two numeric variables.
Positive Correlation, Negative Correlation, and No Correlation
- If the data points on a scatter graph generally move from the lower left to the upper right, we say the variables exhibit a positive correlation. This indicates that as one variable increases, so does the other.
- If the data points generally move from the upper left to the lower right, the variables exhibit a negative correlation. This indicates that as one variable increases, the other decreases.
- If there is no discernible pattern, and the points are randomly distributed around the graph, we say there is no correlation between the variables.
Line of Best Fit
- A line of best fit (also known as a trend line) can be drawn on a scatter graph to show the direction of the correlation and to estimate future data.
- The line of best fit doesn’t need to touch any of the points, but should be placed to approximate as closely as possible to all the points on the graph.
- It provides a visual way to identify trends in the data and can be used to make predictions.
Interpolation and Extrapolation
- Interpolation is the process of estimating unknown values that fall within the range of known values on the scatter graph.
- Extrapolation is the process of predicting values outside of the range of known values. It should be used cautiously as it assumes that the trend observed within the known values will continue beyond this range.
Outliers
- An outlier is a data point that diverges significantly from the general pattern of a dataset.
- Outliers can greatly affect the line of best fit and consequently the predictions, so it is important to identify and investigate any potential outliers.
- They may indicate an error in data collection or a significant event worth investigating.