Home









                     Linear regression model




     For setting the parameters of a linear regression model, the
effective model complexity, governed by the number of basis func‐
tions,  needs  to be controlled according to the size of the data
set.

What is the effective model complexity? Answer: the complexity of
the weight vector, the complexity of the basis functions.

The simplest linear model for regression is one that  involves  a
linear combination of the input variables:



This is often simply known as linear regression.

It  is  a  linear function of the input variables and also linear
function of the parameters w0, w1, ...wD. This  imposes  signifi‐
cant limitations on the model. Here D means dimension.

We  extend it by considering linear combinations of fixed nonlin‐
ear functions of the input variables:



w0 is called a bias parameter for the purpose of setting a  fixed
offset.

If a dummy function φ0 (x) = 1 is defined:



This linearity in the parameters greatly simplify the analysis of
this  class of models. However, it also leads to some significant
limitations.

The example of polynomial regression is a particular  example  of
this  model  in which there is a single input variable x, and the
basis functions take the form of powers of x.

There are many other possible choices for  the  basis  functions,
for example, the Gaussian functions. It should be noted that they
are  not  required to have a probabilistic interpretation, and in
particular the normalization coefficient is  unimportant  because
these  basis  functions will be multiplied by adaptive parameters
wj.










                               ‐2‐





Another possibility is the sigmoidal basis function of the form:





Adding a regularization term to the log likelihood function means
the effective model complexity can  then  be  controlled  by  the
value  of  the regularization coefficient, although the choice of
the number and form of the basis functions is of course still im‐
portatnt in determining the overall behaviour of the model.

Why adding a regularization term works?

This leaves the issue of deciding the appropriate model  complex‐
ity for the particular problem, which cannot be decided simply by
maximizing  the likelihood function, because this always leads to
excessively complex models and over‐fitting.



Regularized least squares


     Adding a regularization term to an error function  in  order
to  control  over‐fitting, so that the total error function to be
minimized takes the form



where λ is the regularization coefficient that controls the rela‐
tive importance of the data‐dependent error and  the  regulariza‐
tion term.

One  of the simplest forms of regularizer is given by the sum‐of‐
squares of the weight vector elements