Regression lines

Regression Lines

Definition

  • Regression lines, also known as lines of best fit, are lines that best represent the distribution of data points in a scatter plot.
  • Two types of regression lines typically encountered are the ‘line of y on x’ and ‘line of x on y’.
  • These lines are established using the principle of Least Squares, which minimises the sum of the squared residuals from the lines to the data points.

Calculation Method

  • The formula for calculating the regression line y on x is given by: ŷ = b0 + b1x
  • And for x on y, is given by: x̂ = a0 + a1y
  • In these equations:
    • b0 and a0 are the intercepts
    • b1 and a1 are the slopes
    • ‘ŷ’ and ‘x̂’ are predicted values for y and x respectively based on the regression line.
  • The slopes (b1 and a1) can be determined by the formula: r(sy/sx) for y on x and r(sx/sy) for x on y, where r is the correlation coefficient, and sx and sy are the standard deviations of x and y respectively.
  • The intercepts (b0 and a0) can be calculated knowing the mean values of y and x, and the value of slope.

Characteristics

  • Positive slope in a regression line indicates that with an increase in the values of x, y also tends to increase and vice versa.
  • A negative slope directs that with the increase in x-values, y-values tend to decrease.
  • The regression lines intersect at the point (x̄ , ȳ), i.e., the mean values of x and y in the given data set.

Practical Application

  • Regression lines are used to predict the value of one variable (dependent) based on the value of another variable (independent).
  • The accuracy of the prediction improves as the correlation between the two variables strengthens.

Limitations

  • Predictions based on regression lines are more reliable for values of x which lie within the range of x-values on which the regression line was based. This is referred to as interpolation.
  • Extrapolation – predicting for x-values outside the range of the data set – can lead to misleading results, as it assumes that the relationship found within the data set will continue indefinitely beyond it.

Remember, revise the formulas for both regression line types and practise calculations using different data sets. Understanding how changes in data affect the resulting lines will also deepen your understanding of regression lines.