Ordinary linear regression: warmup

39.3. Ordinary linear regression: warmup#

To warm up, and get acquainted with the notation and formalism, let us work out a small example. Assume that we have the situation where we have collected two datapoints \(\data = [y_1,y_2]^T = [-3,3]^T\) for the predictor values \([x_1,x_2]^T = [-2,1]^T\).

This data could have come from any process, even a non-linear one. But this is artificial data that I generated by evaluating the function \(y = 1 + 2x\) at \(x=x_1=-2\) and \(x=x_2=1\). Clearly, the data-generating mechanism is very simple and corresponds to a linear model \(y = \theta_0 + \theta_1 x\) with \([\theta_0,\theta_1] = [1,2]\). This is the kind of information we never have in reality. Indeed, we are always uncertain about the process that maps input to output, and as such our model \(M\) will always be wrong. We are also uncertain about the parameters \(\pars\) of our model. These are the some of the fundamental reasons for why it can be useful to operate with a Bayesian approach where we can assign probabilities to any quantity and statement. In this example, however, we will continue with the standard (frequentist) approach based on finding the parameters that minimize the squared errors (i.e., the norm of the residual vector).

We will now assume a linear model with polynomial basis up to order one to model the data, i.e.,

\[ M(\pars;\inputt) = \para_0 + \para_1 \inputt, \]

which we can express in terms of a design matrix \(\dmat\) and (unknown) parameter vector \(\pars\) as \(M = \dmat \pars\).

In the present case the two unknowns \(\pars = [\para_0,\para_1]^T\) can be fit to the two datapoints \(\data = [-3,3]^T\) using pen a paper.

Exercise 39.1

In the example above you have two data points and two unknowns, which means you can easily solve for the model parameters using a conventional matrix inverse. Do the numerical calculation to make sure you have setup the problem correctly.

Exercise 39.2

Evaluate the normal equations for the design matrix \(\dmat\) and data vector \(\data\) in the example above.

Exercise 39.3

Evaluate the sample variance \(s^2\) for the example above. Do you think the result makes sense?