Title: The disturbing muses
Creator: Giorgio de Chirico
Creator Lifespan: 1888/1978
Date: 1955/1960
Location: Galleria d’arte Frediano Farsetti
Physical Dimensions: w70 x h100 cm
Type: painting
Medium: oil on canvas
Hypothesis
我们先来假设: \(h_\omega(x)=\omega x+b\)
- $\omega$与$x$为标量时,则为单变量线性回归
- $\omega$与$x$为向量时,则为多变量线性回归
Cost(loss,采用平方损失函数)
损失函数: \(L(f)=L(h_\omega(x))=\sum_{i=1}^{n}{(\omega x_i+b - y_i)^2}\)
Optimization
问题即求cost最小时的$\omega$的取值
采用Gradient Descent方法来求解$minimize(loss)$
实际上线性回归的最优化问题可以直接使用最小二乘法获得解析解
令:
$W = (\omega_1, \omega_2, \omega_3, …, \omega_n, b)^T $
$X = (x_1, x_2, x_3, …, x_n, 1)^T $
$Y = (y_1, y_2, y_3, …, y_n)^T $
则: \(J(\omega) = minimize(loss) = arg min(WX-Y)^2\)
为求极小值,对其进行求导:
\(\frac{\partial J(\omega)}{\partial \omega_j} = 2(h_\omega(x)-y)x_j\)
梯度方向即为导数方向,沿着梯度下降则为:
\(\omega_j = \omega_j - \alpha \frac{\partial J(\omega)}{\partial \omega_j}\)
更新$\omega$的值,即为训练阶段
Regularization
参数过多时容易过拟合,引用正则化对所有参数$\omega$进行惩罚,确保每个$\omega$都取较小值
L1 (Lasso Regression): \(minimize(loss) = arg min(WX-Y)^2 + \lambda||\omega||\)
L2 (Ridge Regression): \(minimize(loss) = arg min(WX-Y)^2 + \lambda||\omega||^2\)
L1 & L2 (Elastic Regression): \(minimize(loss) = arg min(WX-Y)^2 + \frac{\lambda(1-\rho)}{2}||\omega||^2\)
代码实现
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
# coding=UTF-8
'''
@Author: Jove Wei
@Date: 2019-08-02 15:45:54
@LastEditors: Jove Wei
@LastEditTime: 2019-08-02 18:01:23
@Description: Handwriting Linear Regression
'''
import numpy as np
from sklearn import datasets
import matplotlib.pyplot as plt
class LinearRegression():
def __init__(self, X_train, Y_train, learning_rate=0.1, epochs=10000, threshold=0.0001):
self.X_train = X_train
self.Y_train = Y_train
self.theta = np.random.rand(1, self.X_train.shape[1] + 1)
self.grad = []
self.learning_rate = learning_rate
self.epochs = epochs
self.threshold = threshold
def linear_hypo(self, x):
'''
@description: 计算hypothesis
x = (x1, x2, x3, ..., xn, 1)
@param {type}
@return:
'''
return np.dot(np.c_[np.ones(x.shape[0]), x], self.theta.T)
def linear_grad(self):
'''
@description: 计算梯度
@param {type}
@return:
'''
self.grad = np.dot((self.linear_hypo(self.X_train) - self.Y_train).T, np.c_[np.ones(self.X_train.shape[0]), self.X_train])
self.grad = self.grad / self.X_train.shape[0]
# print(self.grad)
def linear_cost(self, x, y):
'''
@description: 计算损失
@param {type}
@return:
'''
self.cost = 0.5 * np.mean((self.linear_hypo(x) - y) ** 2)
def linear_grad_desc(self):
'''
@description: 用梯度下降法来进行优化
@param {type}
@return:
'''
self.linear_cost(self.X_train, self.Y_train)
cost_change = 1
i = 1
while cost_change > self.threshold and i < self.epochs:
pre_cost = self.cost
# 沿着梯度的方向更新参数theta的值
self.linear_grad()
self.theta = self.theta - self.learning_rate*self.grad
self.linear_cost(self.X_train, self.Y_train)
cost_change = abs(pre_cost - self.cost)
i = i + 1
def fit(self):
'''
@description: emm, sklearn里面的fit函数
@param {type}
@return:
'''
self.linear_grad_desc()
def predict(self, x):
'''
@description: 预测
@param {type}
@return:
'''
# print(self.theta)
y_pre = self.linear_hypo(x)
return y_pre
def plot_line(x, y, y_hat, line_color="blue"):
'''
plot outputs
'''
plt.scatter(x, y, color='black')
plt.plot(x, y_hat, color=line_color, linewidth=3)
plt.xticks(())
plt.yticks(())
plt.show()
def main():
# test
# 加载sklean中自带的数据集
dataset = datasets.load_diabetes()
# 选择维度
X = dataset.data[:, 2]
Y = dataset.target
# 切分数据集为训练集和测试集
X_train = X[:-50, None]
Y_train = Y[:-50, None]
X_test = X[-50:, None]
Y_test = Y[-50:, None]
lr = LinearRegression(X_train, Y_train)
lr.fit()
y_pre = lr.predict(X_test)
plot_line(X_test, Y_test, y_pre)
main()