Saturday, February 23, 2019

Linear regression: Univariate and Multivariate model


Here is a simple univariate linear equation:
$$y=ax+b$$
The goal here is to find the coefficients a,b to fit the linear model. For example, I applied iris data with only petal and sepal length. I want to predict petal length by using sepal length.

Let the input as sepal length, petal length as predict variable.
Sepal Length(cm) Petal Length(cm)
5.1 1.4
5.9 1.4
4.7 1.3
$$\vdots$$ $$\vdots$$

Total 150 data sets in IRIS data.
The figure below shows the relationship between sepal and petal length:


Rewrite in matrix form:  $$\begin{bmatrix}y_{0}\\ \vdots\\y_{n}\end{bmatrix}=\begin{bmatrix}x_{0}& 1\\\vdots & \vdots\\ x_{n}& 1\end{bmatrix}\times \begin{bmatrix}a\\ b \end{bmatrix}\:\:\:\:\:y=X\omega \:, where \: \omega = \begin{bmatrix}a\\ b \end{bmatrix}$$ $$\min\limits_{\omega}||y-X\omega||^{2}$$ Apply matrix derivative: $$\frac{\partial ||y-X\omega||^{2}}{\partial \omega}=\frac{\partial (y-X\omega)^{T}(y-X\omega)}{\partial \omega}=0$$ $$\Rightarrow \frac{\partial \left[y^{T}-(X\omega)^{T}\right](y-X\omega)}{\partial \omega}=\frac{\partial \left[y^{T}y-2y^{T}(X\omega)+(X\omega)^{T}(X\omega)\right]}{\partial \omega}=0$$ $$Since\:\frac{\partial a^{T}x}{\partial x} = a\:,\:\frac{\partial x^{T}Bx}{\partial x}=(B+B^{T})x\:\Rightarrow-2X^{T}y+2X^{T}X\omega=0$$ $$\Rightarrow X^{T}y=X^{T}X\omega\:\: \Rightarrow \omega = (X^{T}X)^{-1}X^{T}y$$ RSS: Residue Sum of Squares$$RSS=\sum_{i=1}^{n}(y_{i}-y'_{i})^{2} \:$$ $y'_{i}$ is the prediction from the estimated model $\omega$
Result: $\omega=\begin{bmatrix}0.8575\\-7.0954\end{bmatrix}$
$RSS=111.348$
Simple and cool isn't it?
The basis function is only order one, which is high bias with low variance. This make the model can't fit perfectly. It may reduced the predict error by apply  higher order of basis function. Which I will discuss in the next few posts. For multivariate model, it actually has the same formula with slightly difference:
$$\begin{bmatrix}y_{0}\\ \vdots\\y_{n}\end{bmatrix}=\begin{bmatrix}x_{00}& x_{10}& 1\\\vdots& \vdots & \vdots\\ x_{0n}&x_{1n}& 1\end{bmatrix}\times \begin{bmatrix}a\\ b\\ c \end{bmatrix}$$

Matlab Code: (IRIS data from wikipedia)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
clear all;
close all;

m = readtable('iris.txt');
s_len = table2array(m(:,1));
p_len = table2array(m(:,3));
omega = zeros(2,1);
X = [s_len, ones(size(s_len))];
Y = p_len;

omega = inv(X'*X)*(X'*Y);
hold on;
scatter(s_len, p_len,'bx');
xlabel('Sepal length(cm)');
ylabel('Petal length(cm)');

input = s_len;
input = [input, ones(size(input))];
pred = input*(omega);
plot(s_len,pred,'r');
[r c]=size(Y);
RSS = sum((pred-Y).^2);
title(['RSS = ' num2str(RSS)]);

No comments:

Post a Comment

CSES Subtree Queries

You are given a rooted tree consisting of n nodes. The nodes are numbered 1,2,…,n, and node 1 is the root. Each node has a value. Your...