最小二乘法与机器学习正规方程
每个样本都有误差eiyi^−yie_i \hat{y_i} - y_ieiyi^−yi如果直接把误差相加会正负互相抵消所以把误差平方Loss:J(θ)∑i(yi^−yi)2 Loss:J(\theta) \sum_i(\hat{y_i} - y_i)^2Loss:J(θ)i∑(yi^−yi)2在机器学习中假设有mmm个样本把所有输入写成矩阵X[1x11x2⋮⋮1xm]X \begin{bmatrix} 1 x_1 \\ 1 x_2 \\ \vdots \vdots \\ 1 x_m \end{bmatrix}X11⋮1x1x2⋮xm参数θ[θ0θ1]\theta \begin{bmatrix} \theta_0 \\ \theta_1 \end{bmatrix}θ[θ0θ1]真实值y[y1y2⋮ym]y \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_m \end{bmatrix}yy1y2⋮ym预测值y^θ0θ1xXθ\hat{y} \theta_0 \theta_1 x X \thetay^θ0θ1xXθ误差eiyi^−yiXθ−ye_i \hat{y_i} - y_i X\theta - yeiyi^−yiXθ−yLossJ(θ)(Xθ−y)⊺(Xθ−y)θ⊺X⊺Xθ−2y⊺Xθy⊺yJ(\theta) (X\theta - y)^\intercal (X\theta - y) \theta^\intercal X^\intercal X \theta - 2y^\intercal X \theta y^\intercal yJ(θ)(Xθ−y)⊺(Xθ−y)θ⊺X⊺Xθ−2y⊺Xθy⊺y其中y⊺yy^\intercal yy⊺y和θ\thetaθ没有关系求导后消失对θ\thetaθ求导∂J∂θ2X⊺Xθ−2X⊺y0\frac{\partial J}{\partial \theta} 2 X^\intercal X \theta - 2 X^\intercal y 0∂θ∂J2X⊺Xθ−2X⊺y0正规方程X⊺XθX⊺yX^\intercal X \theta X^\intercal yX⊺XθX⊺y、θ(X⊺X)−1X⊺y\theta (X^\intercal X)^{-1} X^\intercal yθ(X⊺X)−1X⊺y