4. Multiple Linear Regression

Multiple linear regression (MLR), also known simply as multiple regression, is a statistical technique that uses several explanatory variables to predict the outcome of a response variable. The goal of multiple linear regression (MLR) is to model the linear relationship between the explanatory (independent) variables and response (dependent) variable.

Multiple regression is the extension of ordinary least-squares (OLS) regression that involves more than one explanatory variable.

(1)\[y_i = \beta_0 + \beta_1*x_1 + \beta_2*x_2 + ... + \beta_n*x_n\]


  • \(y_i\) = dependent variable
  • \(x_i\) = explanatory variable
  • \(\beta_0\) = intercept
  • \(\beta_1\) = slope
  • \(\epsilon\) = model’s error term, also called residual

A simple linear regression is a function that allows to make predictions about one variable based on the information that is known about another variable. Linear regression can only be used when one has two continuous variables — an independent variable and a dependent variable. The independent variable is the parameter that is used to calculate the dependent variable or outcome. A multiple regression model extends to several explanatory variables.

The multiple regression model is based on the following assumptions:

  • There is a linear relationship between the dependent variables and the independent variables.
  • The independent variables are not too highly correlated with each other.
  • \(y_i\) observations are selected independently and randomly from the population.
  • Residuals should be normally distributed with a mean of \(0\) and variance \(\sigma\)

4.1. Least Squared Residual

4.1.1. Method

A general multiple-regression model can be written as

(2)\[y_i = \beta_0*1 + \beta_1*x_{i1} + \beta_2*x_{i2} + \beta_k*x_{ik} + \epsilon_i\]

In matrix form, we can rewrite this model as :

(3)\[\begin{split}\begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{bmatrix}&= \begin{bmatrix} 1 & x_{11} & x_{12} & \cdots & x_{1k} \\ 1 & x_{21} & x_{22} & \cdots & x_{2k} \\ & & \vdots & & \\ 1 & x_{n1} & x_{n2} & \cdots & x_{nk} \end{bmatrix} &* \begin{bmatrix} \beta_0 \\ \beta_1 \\ \beta_2 \\ \vdots \\ \beta_k \end{bmatrix}&+ \begin{bmatrix} \epsilon_1 \\ \epsilon_2 \\ \vdots\\ \epsilon_n \end{bmatrix} \\\end{split}\]

The strategy in the least squared residual approach (ordinary least square) is the same as in the bivariate linear regression model. The idea of the ordinary least squares estimator (OLS) consists in choosing \(\beta_{i}\) in such a way that, the sum of squared residual (i.e. \(\sum_{i=1 }^{N} \epsilon_{i}\)) in the sample is as small as possible. Mathematically this means that in order to estimate the \(\beta\) we have to minimize \(\sum_{i=1}^{N} \epsilon_{i}\) which in matrix notation is nothing else than \(e'e\).

\[\begin{split}e'e = \begin{bmatrix} e_{1} & e_{2} & \cdots & e_{N} \\ \end{bmatrix} \begin{bmatrix} e_{1} \\ e_{2} \\ \vdots \\ e_{N} \end{bmatrix} = \sum_{i=1}^{N}e_{i}^{2}\end{split}\]

Consequently we can write \(e'e\) as \((Y-X\beta)'(Y-X\beta)\) by simply plugging in the expression \(e = Y - X\beta\) into \(e'e.\) This leaves us with the following minimization problem:

(4)\[ \begin{align}\begin{aligned}min_{\beta} e'e &= (Y-X\beta)'(Y-X\beta)\\&= (Y'-\beta'X')(Y-X\beta)\\&= Y'Y - \beta'X'Y - Y'X\beta + \beta'X'X\beta\\&= Y'Y - 2\beta'X'Y + \beta'X'X\beta\end{aligned}\end{align} \]


It is important to understand that \(\beta'X'Y=(\beta'X'Y)'=Y'X\beta\). As both terms are are scalars, meaning of dimension 1×1, the transposition of the term is the same term.

In order to minimize the expression in (4), we have to differentiate the expression with respect to \(\beta\) and set the derivative equal zero.

(5)\[ \begin{align}\begin{aligned}\frac{\partial(e'e)}{\partial b} &= -2X'Y + 2X'X\beta\\-2X'Y + 2X'X\beta &\stackrel{!}{=} 0\\X'X\beta &= X'Y\\\beta &= (X'X)^{-1}X'Y\end{aligned}\end{align} \]


the second order condition for a minimum requires that the matrix \(X'X\) is positive definite. This requirement is fulfilled in case \(X\) has full rank. Intercept

You can obtain the solution for the intercept by setting the partial derivative of the squared loss with respect to the intercept \(\beta_0\) to zero. Let \(\beta \in \mathbb{R}\) denote the intercept, \(\beta \in \mathbb{R}^d\) the coefficients of features, and \(x_i \in \mathbb{R}\) the feature vector of the \(i\)-th sample. All we do is solve for \(\beta_0\) :

\[\begin{split}\sum_{i=1}^n \beta_0 &= \sum_{i=1}^n (y_i - x_i^\top \beta) \\ \beta_0 &= \frac{1}{n} \sum_{i=1}^n (y_i - x_i^\top \beta)\end{split}\]

Usually, we assume that all features are centered, i.e.,

\[\frac{1}{n} \sum_{i=1}^n x_{ij} = 0 \qquad \forall j \in \{1,\ldots,d\}\]

Which simplifies the solution for \(\beta_0\) to be the average response :

(6)\[\begin{split}\beta_0 &= \frac{1}{n} \sum_{i=1}^n y_i - \frac{1}{n} \sum_{i=1}^n \sum_{j=1}^d x_{ij} \beta_j \\ &= \frac{1}{n} \sum_{i=1}^n y_i - \sum_{j=1}^d \beta_j \frac{1}{n} \sum_{i=1}^n x_{ij} \\ &= \frac{1}{n} \sum_{i=1}^n y_i\end{split}\]

If in addition, we also assume that the response \(y\) is centered, i.e., \(\frac{1}{n} \sum_{i=1}^n y_i = 0\), the intercept is zero and thus eliminated.

4.1.2. Evaluation

The coefficient of determination (R-squared) is a statistical metric that is used to measure how much of the variation in outcome can be explained by the variation in the independent variables. \(R^2\) always increases as more predictors are added to the MLR model even though the predictors may not be related to the outcome variable.

\(R^2\) by itself can’t thus be used to identify which predictors should be included in a model and which should be excluded. \(R^2\) can only be between \(0\) and \(1\), where \(0\) indicates that the outcome cannot be predicted by any of the independent variables and \(1\) indicates that the outcome can be predicted without error from the independent variables.

When interpreting the results of a multiple regression, beta coefficients are valid while holding all other variables constant (all else equal). The output from a multiple regression can be displayed horizontally as an equation, or vertically in table form.




  1. OLS Estimator
  2. Yamano Lecture Note