Calculation of the equation of the regression line

Calculation of the equation of the regression line

Simple Linear Regression

  • Simple linear regression is a statistical method that allows us to summarise and study the relationships between two continuous variables.
  • One variable, denoted Y, is regarded as the dependent or response variable. The other, denoted X, is regarded as the independent or explanatory variable.
  • The equation for a simple linear regression is denoted as Y = a + bX where a is the y-intercept and b is the slope or gradient of the line.

The Regression Coefficients (a and b)

  • The slope (b) of the regression line is the amount by which Y is predicted to change when X changes by one unit. It is calculated using the formula: b = Σ((xi - x̄)(yi - ȳ))/Σ((xi - x̄)²), where xi and yi are the individual sample points, and x̄ and ȳ are the means of X and Y respectively.
  • The intercept (a) is the predicted value of Y when X = 0. It is calculated using the formula: a = ȳ - b*x̄.

Assumptions of the Simple Linear Regression Model

  • Linearity: There is a linear relationship between X and Y.
  • Independence: Observations are independent of each other.
  • Homoscedasticity: The variance of the errors is constant across all levels of X.
  • Normality: The errors of the response variables follow a normal distribution.

Least Squares Method

  • The method of least squares is used in simple linear regression to calculate the best-fitting line through the observed data.
  • The best-fit line is calculated by minimising the sum of the squares of the vertical “errors”. The “errors” are the distances from the observed data points to the line.

Residuals

  • The residuals are used to verify the validity of the model. They are the distances between the observed y-values (Y) and the predicted y-values (Ŷ ).
  • Residuals are calculated by the formula: eᵢ = Yᵣᵢ - Ŷ where Ŷ is the estimated or predicted value of Y based on our line of best fit.

##Using the Regression Line for Prediction

  • Once the equation of the line is calculated, it can be used to predict the value of Y for any given value of X.
  • It’s important to use caution when making predictions outside the range of the original data (extrapolation), as the relationship may not hold outside this range.

Review Questions

After familiarising yourself with these concepts, work out regression problem sets. Start by calculating the slope and intercept of regression equations, then move on to interpretation and prediction. Be cautious of making predictions outside the range and remember to check the assumptions of the linear regression model. You will also find online resources and relevant textbooks useful.