Line of Best Fit

Definition of Line of Best Fit

  • The line of best fit, also known as a trend line or regression line, is a straight line that best represents the data on a scatter plot.
  • It gives a general direction, or trend, of the relationship between two variables on the plot.
  • This line may not pass through every point, but strives to be the line that passes as closely as possible to all.
  • The equation of the line of best fit can be used to predict future data points within the same trend.

Sketching the Line of Best Fit

  • There is not a uniquely correct way to draw a line of best fit—it’s an estimation.
  • The line should have about an equal number of points above and below.
  • Try to make the total distances of the points above the line equal to the total distances of the points below the line.
  • Avoid connecting the dots from one to the next. The line of best fit does not need to touch any of the points.
  • Do not extend the line of best fit beyond the scope of the given data, as this may lead to inaccurate predictions.

Calculating The Line of Best Fit

  • You can use the method of least squares to calculate the line of best fit.
  • This involves finding the line that minimizes the sum of the squares of the vertical distances of the points from the line.
  • Two key values you’ll need to know for constructing this line are the slope and the y-intercept.
  • The slope of the line measures the rate of change between the two variables.
  • The y-intercept indicates the value of the dependent variable when the independent variable is zero.

Using the Line of Best Fit

  • The line of best fit can be used to make predictions about one variable based on the known value of the other variable.
  • While the line of best fit provides valuable information, be aware that predictions based on this line should only be made within the scope of the data.
  • Data beyond the existing range (extrapolations) can be inaccurate as the trend may not remain the same outside the range of available data.
  • Be aware of outliers. These are data points that do not fit the general trend. They can affect the line of best fit significantly.

Evaluation

  • Evaluate the goodness of the fit by calculating the correlation coefficient, denoted as r.
  • The value of r ranges between -1 and 1.
  • A perfect positive linear relationship (all points lie perfectly on the line) gives r = 1.
  • A perfect negative relationship (all points lie perfectly on a downward sloping line) has r = -1.
  • If there is no linear relationship, r is approximately = 0.
  • A higher absolute value of r indicates a stronger linear relationship between the variables.