本文为365天深度学习训练营 中的学习记录博客 原作者K同学啊文章目录1. 简介2. 环境3. 代码实现3.1 前期准备3.1.1 设置GPU 导入库3.1.2 数据加载3.2 模型建立与训练3.2.1 定义 DenseNet 网络模型3.2.2 模型结构概览3.2.3 定义训练和测试函数3.2.4 训练模型训练配置训练策略4. 模型评估4.1 可视化训练过程4.2 加载最优模型并评估5.总结1. 简介项目内容模型DenseNet-121growth_rate32, block_config(6,12,24,16)任务三分类图像分类Normal / Mild / Severe数据集1661 张图像80/20 划分最优性能测试准确率 94.3%测试损失 0.1622. 环境语言环境Python 3.14.6编译器Jupyter Notebook深度学习环境PyTorch ( torch 2.12.1 torchvision 0.27.1 )3. 代码实现3.1 前期准备3.1.1 设置GPU 导入库导入 PyTorch、torchvision 等深度学习库配置 matplotlib 中文字体自动选择 GPU/CPU 设备.importtorchimporttorch.nnasnnimporttorch.optimasoptimimporttorch.nn.functionalasFfromtorchvisionimporttransforms,datasetsimportwarningsimportcopyimportmatplotlib.pyplotaspltfromdatetimeimportdatetimefromtorchsummaryimportsummary warnings.filterwarnings(ignore)plt.rcParams[figure.dpi]100plt.rcParams[font.sans-serif][SimHei]plt.rcParams[axes.unicode_minus]Falsedevicetorch.device(cudaiftorch.cuda.is_available()elsecpu)device3.1.2 数据加载data_dir./Data/data/train_transformstransforms.Compose([transforms.Resize((224,224)),transforms.ToTensor(),transforms.Normalize(mean[0.485,0.456,0.406],std[0.229,0.224,0.225])])total_datadatasets.ImageFolder(data_dir,transformtrain_transforms)print(fClasses:{total_data.class_to_idx})print(fTotal samples:{len(total_data)})train_sizeint(0.8*len(total_data))test_sizelen(total_data)-train_size train_dataset,test_datasettorch.utils.data.random_split(total_data,[train_size,test_size])batch_size16train_dltorch.utils.data.DataLoader(train_dataset,batch_sizebatch_size,shuffleTrue)test_dltorch.utils.data.DataLoader(test_dataset,batch_sizebatch_size)forX,yintest_dl:print(fBatch shape:{X.shape}, Labels:{y.shape})break3.2 模型建立与训练3.2.1 定义 DenseNet 网络模型实现经典的DenseNetDensely Connected Convolutional Network架构其核心思想是密集连接——每一层都接收前面所有层的特征图作为输入。四个核心组件组件作用Bottleneck瓶颈层BN→ReLU→1×1Conv压缩通道至 4×growth_rate→BN→ReLU→3×3Conv输出与输入在通道维度拼接torch.cat实现密集连接DenseBlock密集块由多个 Bottleneck 层堆叠而成每层输入通道数随层数线性增长in_channels i × growth_rateTransition过渡层位于两个 DenseBlock 之间通过 1×1 卷积压缩通道数至一半再用 2×2 平均池化缩小特征图尺寸DenseNet整体网络7×7Conv → MaxPool → 4个DenseBlock中间穿插Transition→ BN → GlobalAvgPool → FC本实验中num_classes3对应三分类任务Normal / Mild / Severe。classBottleneck(nn.Module):def__init__(self,in_channels,growth_rate):super(Bottleneck,self).__init__()self.bn1nn.BatchNorm2d(in_channels)self.conv1nn.Conv2d(in_channels,4*growth_rate,kernel_size1,biasFalse)self.bn2nn.BatchNorm2d(4*growth_rate)self.conv2nn.Conv2d(4*growth_rate,growth_rate,kernel_size3,padding1,biasFalse)defforward(self,x):outself.conv1(F.relu(self.bn1(x)))outself.conv2(F.relu(self.bn2(out)))outtorch.cat([out,x],1)returnoutclassDenseBlock(nn.Module):def__init__(self,in_channels,num_Layers,growth_rate):super(DenseBlock,self).__init__()self.layersnn.ModuleList([Bottleneck(in_channelsi*growth_rate,growth_rate)foriinrange(num_Layers)])defforward(self,x):forlayerinself.layers:xlayer(x)returnxclassTransition(nn.Module):def__init__(self,in_channels,out_channels):super(Transition,self).__init__()self.bnnn.BatchNorm2d(in_channels)self.convnn.Conv2d(in_channels,out_channels,kernel_size1,biasFalse)self.avg_poolnn.AvgPool2d(kernel_size2,stride2)defforward(self,x):outself.conv(F.relu(self.bn(x)))outself.avg_pool(out)returnoutclassDenseNet(nn.Module):def__init__(self,growth_rate32,block_config(6,12,24,16),num_classes1000):super(DenseNet,self).__init__()self.conv1nn.Conv2d(3,64,kernel_size7,stride2,padding3,biasFalse)self.bn1nn.BatchNorm2d(64)self.max_poolnn.MaxPool2d(kernel_size3,stride2,padding1)num_features64self.blocksnn.ModuleList([])fori,num_Layersinenumerate(block_config):blockDenseBlock(num_features,num_Layers,growth_rate)self.blocks.append(block)num_featuresnum_featuresnum_Layers*growth_rateifi!len(block_config)-1:transTransition(num_features,num_features//2)self.blocks.append(trans)num_featuresnum_features//2self.bn_finalnn.BatchNorm2d(num_features)self.avg_poolnn.AdaptiveAvgPool2d((1,1))self.fcnn.Linear(num_features,num_classes)defforward(self,x):outself.conv1(x)outself.bn1(out)outF.relu(out)outself.max_pool(out)forblockinself.blocks:outblock(out)outself.bn_final(out)outF.relu(out)outself.avg_pool(out)outout.view(out.size(0),-1)outself.fc(out)returnout devicetorch.device(cudaiftorch.cuda.is_available()elsecpu)modelDenseNet(num_classes3).to(device)summary(model,input_size(3,32,32))3.2.2 模型结构概览通过随机输入验证模型的前向传播是否正常工作输入(1, 3, 224, 224)— 1张 224×224 的 RGB 图像输出(1, 3)— 3个类别的预测分数logits参数量统计模型共有6,956,931个参数且全部可训练未使用预训练权重冻结。相比 ResNet-50约 2500 万参数DenseNet-121 的参数效率更高这得益于 1×1 卷积的通道压缩和特征复用机制。# 测试前向传播xtorch.randn(1,3,224,224).to(device)outmodel(x)print(fInput shape:{x.shape})print(fOutput shape:{out.shape})# 参数量统计total_paramssum(p.numel()forpinmodel.parameters())trainable_paramssum(p.numel()forpinmodel.parameters()ifp.requires_grad)print(fTotal params:{total_params:,})print(fTrainable params:{trainable_params:,})3.2.3 定义训练和测试函数train()函数 — 训练阶段对每个 mini-batch 执行标准的前向传播→计算损失→反向传播→参数更新流程累加每个 batch 的损失和正确预测数返回平均损失和整体准确率总正确数 / 总样本数test()函数 — 测试/验证阶段使用torch.no_grad()禁用梯度计算减少内存开销并加速推理不进行参数更新仅评估模型在测试集上的泛化性能同样返回平均损失和整体准确率deftrain(dataloader,model,loss_fn,optimizer):sizelen(dataloader.dataset)num_batcheslen(dataloader)train_loss,train_acc0,0forX,yindataloader:X,yX.to(device),y.to(device)predmodel(X)lossloss_fn(pred,y)optimizer.zero_grad()loss.backward()optimizer.step()train_lossloss.item()train_acc(pred.argmax(1)y).type(torch.float).sum().item()returntrain_loss/num_batches,train_acc/sizedeftest(dataloader,model,loss_fn):sizelen(dataloader.dataset)num_batcheslen(dataloader)test_loss,test_acc0,0withtorch.no_grad():forimgs,targetindataloader:imgs,targetimgs.to(device),target.to(device)target_predmodel(imgs)lossloss_fn(target_pred,target)test_acc(target_pred.argmax(1)target).type(torch.float).sum().item()test_lossloss.item()returntest_loss/num_batches,test_acc/size3.2.4 训练模型训练配置优化器AdamW带动量的自适应学习率优化器含权重衰减正则化学习率lr1e-4损失函数CrossEntropyLoss交叉熵损失适用于多分类任务训练轮次10 个 Epoch训练策略每个 Epoch 结束后在测试集上评估记录训练/测试的损失和准确率Best Model 保存当测试准确率超过历史最佳时深拷贝当前模型权重训练结束后将最优模型保存至./best_resnet50v2.pth最优模型出现在第 9 个 Epoch测试准确率达94.3%。optimizertorch.optim.AdamW(model.parameters(),lr1e-4)loss_fnnn.CrossEntropyLoss()epochs10train_loss,train_acc[],[]test_loss,test_acc[],[]best_acc0forepochinrange(epochs):model.train()train_epoch_loss,train_epoch_acctrain(train_dl,model,loss_fn,optimizer)model.eval()epoch_test_loss,epoch_test_acctest(test_dl,model,loss_fn)ifepoch_test_accbest_acc:best_accepoch_test_acc best_model_wtscopy.deepcopy(model)train_acc.append(train_epoch_acc)train_loss.append(train_epoch_loss)test_acc.append(epoch_test_acc)test_loss.append(epoch_test_loss)lroptimizer.param_groups[0][lr]print(fEpoch:{epoch1:2d}, Train_acc:{train_epoch_acc*100:.1f}%, Train_loss:{train_epoch_loss:.3f}, fTest_acc:{epoch_test_acc*100:.1f}%, Test_loss:{epoch_test_loss:.3f}, Lr:{lr:.2E})PATH./best_resnet50v2.pthtorch.save(best_model_wts.state_dict(),PATH)print(Done.)Epoch:1, Train_acc:71.5%, Train_loss:0.752, Test_acc:74.2%, Test_loss:0.649, Lr:1.00E-04 Epoch:2, Train_acc:78.7%, Train_loss:0.568, Test_acc:83.8%, Test_loss:0.425, Lr:1.00E-04 Epoch:3, Train_acc:81.6%, Train_loss:0.482, Test_acc:76.3%, Test_loss:0.546, Lr:1.00E-04 Epoch:4, Train_acc:83.3%, Train_loss:0.432, Test_acc:79.3%, Test_loss:0.596, Lr:1.00E-04 Epoch:5, Train_acc:85.5%, Train_loss:0.399, Test_acc:80.2%, Test_loss:0.438, Lr:1.00E-04 Epoch:6, Train_acc:87.3%, Train_loss:0.348, Test_acc:83.8%, Test_loss:0.375, Lr:1.00E-04 Epoch:7, Train_acc:89.6%, Train_loss:0.284, Test_acc:86.5%, Test_loss:0.312, Lr:1.00E-04 Epoch:8, Train_acc:91.4%, Train_loss:0.255, Test_acc:85.9%, Test_loss:0.389, Lr:1.00E-04 Epoch:9, Train_acc:92.2%, Train_loss:0.221, Test_acc:94.3%, Test_loss:0.162, Lr:1.00E-04 Epoch:10, Train_acc:91.7%, Train_loss:0.225, Test_acc:93.1%, Test_loss:0.203, Lr:1.00E-04 Done.4. 模型评估4.1 可视化训练过程绘制训练和测试的准确率、损失曲线并使用 Matplotlib 将训练集与验证集的准确率Accuracy和损失值Loss随时间变化的趋势绘制成了两幅直观的折线图。current_timedatetime.now()epochs_rangerange(epochs)plt.figure(figsize(12,3))plt.subplot(1,2,1)plt.plot(epochs_range,train_acc,labelTraining Accuracy)plt.plot(epochs_range,test_acc,labelTest Accuracy)plt.legend(loclower right)plt.title(Training and Validation Accuracy)plt.xlabel(current_time)plt.subplot(1,2,2)plt.plot(epochs_range,train_loss,labelTraining Loss)plt.plot(epochs_range,test_loss,labelTest Loss)plt.legend(locupper right)plt.title(Training and Validation Loss)plt.show()4.2 加载最优模型并评估加载训练过程中保存的最优模型权重在测试集上进行最终评估输出测试准确率和损失值。best_model_wts.load_state_dict(torch.load(PATH,map_locationdevice))test_epoch_loss,test_epoch_acctest(test_dl,best_model_wts,loss_fn)print(fBest model - Test Acc:{test_epoch_acc*100:.1f}%, Test Loss:{test_epoch_loss:.3f})Best model - Test Acc:94.3%, Test Loss:0.1625.总结DenseNet 表现优异在仅 10 个 Epoch 的训练后测试准确率达到94.3%说明密集连接机制能有效提取图像特征即使在较小的数据集上也能取得良好效果。参数效率高模型仅约695 万参数远少于 ResNet-50 的 2500 万却达到了较高的分类精度体现了 DenseNet 通过特征复用减少冗余参数的优势。收敛速度快模型从第 1 个 Epoch 的 74.2% 快速提升第 9 个 Epoch 即达到最优说明 AdamW 优化器配合合理的学习率1e-4能有效加速收敛。存在轻微过拟合迹象训练损失持续下降而测试损失有波动如第 8→9 Epoch 测试损失从 0.389 降至 0.162第 10 Epoch 又回升至 0.203建议后续可通过以下方式改善增加数据增强随机翻转、旋转、色彩抖动等引入学习率调度策略如 Cosine Annealing增加 Dropout 或权重衰减使用交叉验证替代单次随机划分数据规模的影响1661 张图像的数据集相对较小测试集仅约 333 张测试准确率的波动可能与样本量不足有关。增加数据量或使用预训练权重微调有望进一步提升性能。