【深度学习】关键技术

【深度学习】关键技术

优化算法详解与代码示例

优化算法是深度学习中的关键组成部分,用于调整神经网络的权重和偏置,以最小化损失函数的值。以下是常见的优化算法及其详细介绍和代码示例:

1. 梯度下降法 (Gradient Descent)

原理:

通过计算损失函数对参数的梯度,按照梯度下降的方向更新参数。

更新公式:

:学习率,控制步长大小。:损失函数对参数的梯度。类型:

批量梯度下降 (Batch Gradient Descent):

使用所有训练数据计算梯度。优点:收敛稳定。缺点:计算代价高,尤其在数据量大时。随机梯度下降 (Stochastic Gradient Descent, SGD):

使用单个样本计算梯度。优点:计算快,适用于大规模数据。缺点:更新不稳定,容易震荡。小批量梯度下降 (Mini-Batch Gradient Descent):

使用一小批样本计算梯度。优点:权衡计算效率和收敛稳定性。代码示例:

import numpy as np

# 损失函数 J(theta) = theta^2

def loss_function(theta):

return theta ** 2

# 损失函数的梯度

def gradient(theta):

return 2 * theta

# 梯度下降

def gradient_descent(initial_theta, learning_rate, epochs):

theta = initial_theta

for epoch in range(epochs):

grad = gradient(theta)

theta = theta - learning_rate * grad

print(f"Epoch {epoch + 1}, Theta: {theta}, Loss: {loss_function(theta)}")

return theta

gradient_descent(initial_theta=10, learning_rate=0.1, epochs=20)

运行结果:

Epoch 1, Theta: 8.0, Loss: 64.0

Epoch 2, Theta: 6.4, Loss: 40.96000000000001

Epoch 3, Theta: 5.12, Loss: 26.2144

Epoch 4, Theta: 4.096, Loss: 16.777216

Epoch 5, Theta: 3.2768, Loss: 10.73741824

Epoch 6, Theta: 2.62144, Loss: 6.871947673600001

Epoch 7, Theta: 2.0971520000000003, Loss: 4.398046511104002

Epoch 8, Theta: 1.6777216000000004, Loss: 2.8147497671065613

Epoch 9, Theta: 1.3421772800000003, Loss: 1.801439850948199

Epoch 10, Theta: 1.0737418240000003, Loss: 1.1529215046068475

Epoch 11, Theta: 0.8589934592000003, Loss: 0.7378697629483825

Epoch 12, Theta: 0.6871947673600002, Loss: 0.47223664828696477

Epoch 13, Theta: 0.5497558138880001, Loss: 0.3022314549036574

Epoch 14, Theta: 0.43980465111040007, Loss: 0.19342813113834073

Epoch 15, Theta: 0.35184372088832006, Loss: 0.12379400392853807

Epoch 16, Theta: 0.281474976710656, Loss: 0.07922816251426434

Epoch 17, Theta: 0.22517998136852482, Loss: 0.050706024009129186

Epoch 18, Theta: 0.18014398509481985, Loss: 0.03245185536584268

Epoch 19, Theta: 0.14411518807585588, Loss: 0.020769187434139313

Epoch 20, Theta: 0.11529215046068471, Loss: 0.013292279957849162

2. 动量优化 (Momentum)

原理:

在梯度下降的基础上引入动量,模拟物体的惯性,避免过早陷入局部最小值。

更新公式:

:动量因子,通常取 0.9。代码示例:

import numpy as np

# 损失函数 J(theta) = theta^2

def loss_function(theta):

return theta ** 2

# 损失函数的梯度

def gradient(theta):

return 2 * theta

def gradient_descent_with_momentum(initial_theta, learning_rate, gamma, epochs):

theta = initial_theta

velocity = 0

for epoch in range(epochs):

grad = gradient(theta)

velocity = gamma * velocity + learning_rate * grad

theta = theta - velocity

print(f"Epoch {epoch + 1}, Theta: {theta}, Loss: {loss_function(theta)}")

return theta

gradient_descent_with_momentum(initial_theta=10, learning_rate=0.1, gamma=0.9, epochs=20)

运行结果:

Epoch 1, Theta: 8.0, Loss: 64.0

Epoch 2, Theta: 4.6, Loss: 21.159999999999997

Epoch 3, Theta: 0.6199999999999992, Loss: 0.384399999999999

Epoch 4, Theta: -3.0860000000000007, Loss: 9.523396000000005

Epoch 5, Theta: -5.8042, Loss: 33.68873764

Epoch 6, Theta: -7.089739999999999, Loss: 50.264413267599984

Epoch 7, Theta: -6.828777999999999, Loss: 46.63220897328399

Epoch 8, Theta: -5.228156599999998, Loss: 27.333621434123543

Epoch 9, Theta: -2.7419660199999982, Loss: 7.518377654834631

Epoch 10, Theta: 0.04399870600000133, Loss: 0.0019358861296745532

Epoch 11, Theta: 2.5425672182000008, Loss: 6.46464805906529

Epoch 12, Theta: 4.28276543554, Loss: 18.342079775856124

Epoch 13, Theta: 4.9923907440379995, Loss: 24.92396534115629

Epoch 14, Theta: 4.632575372878599, Loss: 21.460754585401293

Epoch 15, Theta: 3.382226464259419, Loss: 11.439455855536771

Epoch 16, Theta: 1.580467153650273, Loss: 2.4978764237673956

Epoch 17, Theta: -0.3572096566280132, Loss: 0.12759873878830308

Epoch 18, Theta: -2.029676854552868, Loss: 4.119588133907623

Epoch 19, Theta: -3.128961961774664, Loss: 9.790402958232752

Epoch 20, Theta: -3.4925261659193474, Loss: 12.197739019631296

3. Adagrad

原理:

根据梯度的历史信息自适应调整学习率,对学习率进行缩放,使得更新幅度与梯度大小相关。

更新公式:

:梯度的平方累积。:防止除零的小值。优缺点:

优点:适合稀疏数据问题。缺点:学习率会逐渐变小,导致后期收敛缓慢。代码示例:

import numpy as np

# 损失函数 J(theta) = theta^2

def loss_function(theta):

return theta ** 2

# 损失函数的梯度

def gradient(theta):

return 2 * theta

def adagrad(initial_theta, learning_rate, epsilon, epochs):

theta = initial_theta

g_square_sum = 0

for epoch in range(epochs):

grad = gradient(theta)

g_square_sum += grad ** 2

adjusted_lr = learning_rate / (np.sqrt(g_square_sum) + epsilon)

theta = theta - adjusted_lr * grad

print(f"Epoch {epoch + 1}, Theta: {theta}, Loss: {loss_function(theta)}")

return theta

adagrad(initial_theta=10, learning_rate=0.1, epsilon=1e-8, epochs=20)

运行结果:

Epoch 1, Theta: 9.90000000005, Loss: 98.01000000098999

Epoch 2, Theta: 9.829645540282808, Loss: 96.6219314476017

Epoch 3, Theta: 9.77237939498734, Loss: 95.49939903957312

Epoch 4, Theta: 9.722903358081876, Loss: 94.53484971059981

Epoch 5, Theta: 9.678738726594363, Loss: 93.67798333767746

Epoch 6, Theta: 9.638492461105155, Loss: 92.90053692278092

Epoch 7, Theta: 9.60129025649987, Loss: 92.18477458955935

Epoch 8, Theta: 9.566541030371654, Loss: 91.51870728578436

Epoch 9, Theta: 9.533823158916471, Loss: 90.89378402549204

Epoch 10, Theta: 9.50282343669911, Loss: 90.30365326907788

Epoch 11, Theta: 9.473301675536542, Loss: 89.74344463572345

Epoch 12, Theta: 9.44506890053656, Loss: 89.20932653588291

Epoch 13, Theta: 9.417973260913987, Loss: 88.69822034329084

Epoch 14, Theta: 9.391890561942256, Loss: 88.20760832750003

Epoch 15, Theta: 9.366717691104768, Loss: 87.73540030485503

Epoch 16, Theta: 9.342367925786823, Loss: 87.27983846077038

Epoch 17, Theta: 9.318767503595812, Loss: 86.83942778607332

Epoch 18, Theta: 9.295853063444168, Loss: 86.41288417714433

Epoch 19, Theta: 9.273569701595868, Loss: 85.99909501035687

Epoch 20, Theta: 9.251869471188973, Loss: 85.59708871191853

4. RMSprop

原理:

RMSprop 是 Adagrad 的改进版本,通过引入指数加权平均解决学习率逐渐变小的问题。

更新公式:

代码示例:

import numpy as np

# 损失函数 J(theta) = theta^2

def loss_function(theta):

return theta ** 2

# 损失函数的梯度

def gradient(theta):

return 2 * theta

def rmsprop(initial_theta, learning_rate, gamma, epsilon, epochs):

theta = initial_theta

g_square_ema = 0

for epoch in range(epochs):

grad = gradient(theta)

g_square_ema = gamma * g_square_ema + (1 - gamma) * grad ** 2

adjusted_lr = learning_rate / (np.sqrt(g_square_ema) + epsilon)

theta = theta - adjusted_lr * grad

print(f"Epoch {epoch + 1}, Theta: {theta}, Loss: {loss_function(theta)}")

return theta

rmsprop(initial_theta=10, learning_rate=0.1, gamma=0.9, epsilon=1e-8, epochs=20)

运行结果:

Epoch 1, Theta: 9.683772234483161, Loss: 93.775444689347

Epoch 2, Theta: 9.457880248061212, Loss: 89.4514987866664

Epoch 3, Theta: 9.270530978786274, Loss: 85.94274462863599

Epoch 4, Theta: 9.105434556281987, Loss: 82.90893845873414

Epoch 5, Theta: 8.955067099353235, Loss: 80.19322675391875

Epoch 6, Theta: 8.81524826858932, Loss: 77.708602036867

Epoch 7, Theta: 8.68338298015491, Loss: 75.40113998004396

Epoch 8, Theta: 8.557735821002467, Loss: 73.23484238206876

Epoch 9, Theta: 8.437082563261683, Loss: 71.18436217929433

Epoch 10, Theta: 8.3205241519636, Loss: 69.23112216340958

Epoch 11, Theta: 8.207379341266703, Loss: 67.36107565145147

Epoch 12, Theta: 8.09711886476205, Loss: 65.56333391008548

Epoch 13, Theta: 7.989323078410318, Loss: 63.82928325121972

Epoch 14, Theta: 7.883653610798953, Loss: 62.15199425506338

Epoch 15, Theta: 7.779833754629418, Loss: 60.52581324967126

Epoch 16, Theta: 7.677634521316577, Loss: 58.94607184291202

Epoch 17, Theta: 7.5768644832322165, Loss: 57.4088753972658

Epoch 18, Theta: 7.4773622196930445, Loss: 55.910945764492894

Epoch 19, Theta: 7.378990596134174, Loss: 54.449502217836574

Epoch 20, Theta: 7.281632361364819, Loss: 53.02216984607539

5. Adam (Adaptive Moment Estimation)

原理:

结合动量和 RMSprop 的思想,同时对梯度的一阶动量和二阶动量进行估计。

更新公式:

代码示例:

import numpy as np

# 损失函数 J(theta) = theta^2

def loss_function(theta):

return theta ** 2

# 损失函数的梯度

def gradient(theta):

return 2 * theta

def adam(initial_theta, learning_rate, beta1, beta2, epsilon, epochs):

theta = initial_theta

m, v = 0, 0

for epoch in range(1, epochs + 1):

grad = gradient(theta)

m = beta1 * m + (1 - beta1) * grad

v = beta2 * v + (1 - beta2) * grad ** 2

m_hat = m / (1 - beta1 ** epoch)

v_hat = v / (1 - beta2 ** epoch)

theta = theta - (learning_rate / (np.sqrt(v_hat) + epsilon)) * m_hat

print(f"Epoch {epoch}, Theta: {theta}, Loss: {loss_function(theta)}")

return theta

adam(initial_theta=10, learning_rate=0.1, beta1=0.9, beta2=0.999, epsilon=1e-8, epochs=20)

运行结果:

Epoch 1, Theta: 9.90000000005, Loss: 98.01000000098999

Epoch 2, Theta: 9.800027459059471, Loss: 96.04053819831964

Epoch 3, Theta: 9.70010099242815, Loss: 94.09195926330557

Epoch 4, Theta: 9.600239395419266, Loss: 92.16459644936006

Epoch 5, Theta: 9.500461600614251, Loss: 90.2587706247459

Epoch 6, Theta: 9.40078663510384, Loss: 88.37478935874698

Epoch 7, Theta: 9.30123357774574, Loss: 86.51294606778484

Epoch 8, Theta: 9.201821516812585, Loss: 84.67351922727505

Epoch 9, Theta: 9.102569508342574, Loss: 82.85677165420798

Epoch 10, Theta: 9.003496535489624, Loss: 81.06294986457367

Epoch 11, Theta: 8.904621469150118, Loss: 79.29228350884921

Epoch 12, Theta: 8.80596303012035, Loss: 77.5449848878464

Epoch 13, Theta: 8.70753975301269, Loss: 75.82124855029629

Epoch 14, Theta: 8.60936995213032, Loss: 74.12125097264443

Epoch 15, Theta: 8.511471689470543, Loss: 72.44515032065854

Epoch 16, Theta: 8.41386274499579, Loss: 70.79308629162809

Epoch 17, Theta: 8.31656058928038, Loss: 69.16518003517162

Epoch 18, Theta: 8.219582358610113, Loss: 67.56153414997459

Epoch 19, Theta: 8.122944832581695, Loss: 65.98223275316566

Epoch 20, Theta: 8.026664414220157, Loss: 64.42734161850822

优化算法对比总结

优化算法是否自适应学习率是否结合动量是否适合稀疏数据收敛速度常见应用场景梯度下降否否否较慢基础优化算法动量优化否是否较快避免局部最小值问题Adagrad是否是较快稀疏特征数据RMSprop是否是较快深度学习,尤其是 RNNAdam是是是较快深度学习中的默认优化算法以上优化算法根据任务特点和模型需求选用,能显著提高模型的训练效率和性

相关推荐

华为5X移动版手机评测(全面升级的性能与拍照体验,华为5X移动版值得期待!)
报名丝路教育培训怎么样靠谱吗?学费贵不贵?
365bet在线足球开户

报名丝路教育培训怎么样靠谱吗?学费贵不贵?

📅 10-02 👁️ 630
斯内德回顾与传奇:从荷兰天才到世界足球的不可或缺之力