开始讲解之前推荐一下我的专栏本专栏的内容支持(分类、检测、分割、追踪、关键点检测),专栏目前为限时折扣欢迎大家订阅本专栏本专栏每周更新3-5篇最新机制更有包含我所有改进的文件和交流群提供给大家。一、本文介绍本文给大家带来的最新改进机制是针对性的改进针对于小目标检测增加P2层针对于大目标检测增加P6层利用DynamicHead(原版本一比一复现全网独一份不同于网上魔改版本)进行检测其中我们增加P2层其拥有更高的分辨率这使得模型能够更好地捕捉到小尺寸目标的细节。我们增加P6层是一个较低分辨率但具有更大感受野的特征层。对于大尺寸目标这意味着模型可以更有效地捕捉到整体的结构信息。在这些的基础上我们配合DynamicHead可以使模型根据不同尺寸的目标动态调整其检测策略进一步提升模型的精度。本文的内容是订阅专栏的读者提出来的所以大家订阅专栏以后如果有感兴趣的机制均可指定。欢迎大家订阅我的专栏一起学习YOLO专栏链接YOLOv26有效涨点专栏包含Conv、注意力机制、主干/Backbone、损失函数、优化器、后处理等改进机制目录一、本文介绍二、增加P2和P6层的好处三、DynamicHead的核心代码四、手把手教你添加DynamicHead检测头4.1 修改一4.2 修改二4.3 修改三4.4 修改四4.5 修改五4.6 修改六4.7 修改七4.8 修改八五、DynamicHead检测头的yaml文件5.1 DynamicHead和P2融合yaml文件5.2 DynamicHead和P6融合yaml文件六、完美运行记录七、本文总结二、增加P2和P6层的好处我们增加P2和P6层是为了改进目标检测模型特别是在处理不同大小目标的能力上。1. 增加P2层的好处改善小目标检测P2层通常有更高的分辨率这使得模型能够更好地捕捉到小尺寸目标的细节。较高分辨率的特征图能够提供更多的空间信息有助于检测小物体。更精细的特征由于P2层处于网络的较浅层它能够捕捉到更多的细粒度特征这对于理解小目标的形状和纹理非常重要。2. 增加P6层的好处提升大目标检测性能P6层是一个较低分辨率但具有更大感受野的特征层。对于大尺寸目标这意味着模型可以更有效地捕捉到整体的结构信息。降低计算复杂度对于大目标使用较低分辨率的特征图可以减少计算量因为处理每个大目标需要的像素数较少。3. 适应性能力的提升使用DynamicHead可以使模型根据不同尺寸的目标动态调整其检测策略进一步提升模型的泛化能力和适应性从而进一步提高精度。总结增加P2和P6层是为了让模型在处理不同尺寸的目标时更加高效和准确。这种策略特别适用于那些需要同时处理多种尺寸目标的应用场景的数据集如街景图像分析、无人机视觉监控等。三、DynamicHead的核心代码代码的使用方式看章节四import torch.nn as nn import torch import math import copy from ultralytics.utils.torch_utils import TORCH_1_11 import torch.nn.functional as F from mmcv.ops import ModulatedDeformConv2d def _make_divisible(v, divisor, min_valueNone): if min_value is None: min_value divisor new_v max(min_value, int(v divisor / 2) // divisor * divisor) # Make sure that round down does not go down by more than 10%. if new_v 0.9 * v: new_v divisor return new_v class h_swish(nn.Module): def __init__(self, inplaceFalse): super(h_swish, self).__init__() self.inplace inplace def forward(self, x): return x * F.relu6(x 3.0, inplaceself.inplace) / 6.0 class h_sigmoid(nn.Module): def __init__(self, inplaceTrue, h_max1): super(h_sigmoid, self).__init__() self.relu nn.ReLU6(inplaceinplace) self.h_max h_max def forward(self, x): return self.relu(x 3) * self.h_max / 6 class DYReLU(nn.Module): def __init__(self, inp, oup, reduction4, lambda_a1.0, K2True, use_biasTrue, use_spatialFalse, init_a[1.0, 0.0], init_b[0.0, 0.0]): super(DYReLU, self).__init__() self.oup oup self.lambda_a lambda_a * 2 self.K2 K2 self.avg_pool nn.AdaptiveAvgPool2d(1) self.use_bias use_bias if K2: self.exp 4 if use_bias else 2 else: self.exp 2 if use_bias else 1 self.init_a init_a self.init_b init_b # determine squeeze if reduction 4: squeeze inp // reduction else: squeeze _make_divisible(inp // reduction, 4) # print(reduction: {}, squeeze: {}/{}.format(reduction, inp, squeeze)) # print(init_a: {}, init_b: {}.format(self.init_a, self.init_b)) self.fc nn.Sequential( nn.Linear(inp, squeeze), nn.ReLU(inplaceTrue), nn.Linear(squeeze, oup * self.exp), h_sigmoid() ) if use_spatial: self.spa nn.Sequential( nn.Conv2d(inp, 1, kernel_size1), nn.BatchNorm2d(1), ) else: self.spa None def forward(self, x): if isinstance(x, list): x_in x[0] x_out x[1] else: x_in x x_out x b, c, h, w x_in.size() y self.avg_pool(x_in).view(b, c) y self.fc(y).view(b, self.oup * self.exp, 1, 1) if self.exp 4: a1, b1, a2, b2 torch.split(y, self.oup, dim1) a1 (a1 - 0.5) * self.lambda_a self.init_a[0] # 1.0 a2 (a2 - 0.5) * self.lambda_a self.init_a[1] b1 b1 - 0.5 self.init_b[0] b2 b2 - 0.5 self.init_b[1] out torch.max(x_out * a1 b1, x_out * a2 b2) elif self.exp 2: if self.use_bias: # bias but not PL a1, b1 torch.split(y, self.oup, dim1) a1 (a1 - 0.5) * self.lambda_a self.init_a[0] # 1.0 b1 b1 - 0.5 self.init_b[0] out x_out * a1 b1 else: a1, a2 torch.split(y, self.oup, dim1) a1 (a1 - 0.5) * self.lambda_a self.init_a[0] # 1.0 a2 (a2 - 0.5) * self.lambda_a self.init_a[1] out torch.max(x_out * a1, x_out * a2) elif self.exp 1: a1 y a1 (a1 - 0.5) * self.lambda_a self.init_a[0] # 1.0 out x_out * a1 if self.spa: ys self.spa(x_in).view(b, -1) ys F.softmax(ys, dim1).view(b, 1, h, w) * h * w ys F.hardtanh(ys, 0, 3, inplaceTrue) / 3 out out * ys return out class Conv3x3Norm(torch.nn.Module): def __init__(self, in_channels, out_channels, stride): super(Conv3x3Norm, self).__init__() self.conv ModulatedDeformConv2d(in_channels, out_channels, kernel_size3, stridestride, padding1) self.bn nn.GroupNorm(num_groups16, num_channelsout_channels) def forward(self, input, **kwargs): x self.conv(input.contiguous(), **kwargs) x self.bn(x) return x class DyConv(nn.Module): def __init__(self, in_channels256, out_channels256, conv_funcConv3x3Norm): super(DyConv, self).__init__() self.DyConv nn.ModuleList() self.DyConv.append(conv_func(in_channels, out_channels, 1)) self.DyConv.append(conv_func(in_channels, out_channels, 1)) self.DyConv.append(conv_func(in_channels, out_channels, 2)) self.AttnConv nn.Sequential( nn.AdaptiveAvgPool2d(1), nn.Conv2d(in_channels, 1, kernel_size1), nn.ReLU(inplaceTrue)) self.h_sigmoid h_sigmoid() self.relu DYReLU(in_channels, out_channels) self.offset nn.Conv2d(in_channels, 27, kernel_size3, stride1, padding1) self.init_weights() def init_weights(self): for m in self.DyConv.modules(): if isinstance(m, nn.Conv2d): nn.init.normal_(m.weight.data, 0, 0.01) if m.bias is not None: m.bias.data.zero_() for m in self.AttnConv.modules(): if isinstance(m, nn.Conv2d): nn.init.normal_(m.weight.data, 0, 0.01) if m.bias is not None: m.bias.data.zero_() def forward(self, x): next_x {} feature_names list(x.keys()) for level, name in enumerate(feature_names): feature x[name] offset_mask self.offset(feature) offset offset_mask[:, :18, :, :] mask offset_mask[:, 18:, :, :].sigmoid() conv_args dict(offsetoffset, maskmask) temp_fea [self.DyConv[1](feature, **conv_args)] if level 0: temp_fea.append(self.DyConv[2](x[feature_names[level - 1]], **conv_args)) if level len(x) - 1: input x[feature_names[level 1]] temp_fea.append(F.interpolate(self.DyConv[0](input, **conv_args), size[feature.size(2), feature.size(3)])) attn_fea [] res_fea [] for fea in temp_fea: res_fea.append(fea) attn_fea.append(self.AttnConv(fea)) res_fea torch.stack(res_fea) spa_pyr_attn self.h_sigmoid(torch.stack(attn_fea)) mean_fea torch.mean(res_fea * spa_pyr_attn, dim0, keepdimFalse) next_x[name] self.relu(mean_fea) return next_x def make_anchors(feats, strides, grid_cell_offset0.5): Generate anchors from features. anchor_points, stride_tensor [], [] assert feats is not None dtype, device feats[0].dtype, feats[0].device for i in range(len(feats)): # use len(feats) to avoid TracerWarning from iterating over strides tensor stride strides[i] h, w feats[i].shape[2:] if isinstance(feats, list) else (int(feats[i][0]), int(feats[i][1])) sx torch.arange(endw, devicedevice, dtypedtype) grid_cell_offset # shift x sy torch.arange(endh, devicedevice, dtypedtype) grid_cell_offset # shift y sy, sx torch.meshgrid(sy, sx, indexingij) if TORCH_1_11 else torch.meshgrid(sy, sx) anchor_points.append(torch.stack((sx, sy), -1).view(-1, 2)) stride_tensor.append(torch.full((h * w, 1), stride, dtypedtype, devicedevice)) return torch.cat(anchor_points), torch.cat(stride_tensor) def dist2bbox(distance, anchor_points, xywhTrue, dim-1): Transform distance(ltrb) to box(xywh or xyxy). lt, rb distance.chunk(2, dim) x1y1 anchor_points - lt x2y2 anchor_points rb if xywh: c_xy (x1y1 x2y2) / 2 wh x2y2 - x1y1 return torch.cat([c_xy, wh], dim) # xywh bbox return torch.cat((x1y1, x2y2), dim) # xyxy bbox def autopad(k, pNone, d1): # kernel, padding, dilation Pad to same shape outputs. if d 1: k d * (k - 1) 1 if isinstance(k, int) else [d * (x - 1) 1 for x in k] # actual kernel-size if p is None: p k // 2 if isinstance(k, int) else [x // 2 for x in k] # auto-pad return p class Conv(nn.Module): Standard convolution module with batch normalization and activation. Attributes: conv (nn.Conv2d): Convolutional layer. bn (nn.BatchNorm2d): Batch normalization layer. act (nn.Module): Activation function layer. default_act (nn.Module): Default activation function (SiLU). default_act nn.SiLU() # default activation def __init__(self, c1, c2, k1, s1, pNone, g1, d1, actTrue): Initialize Conv layer with given parameters. Args: c1 (int): Number of input channels. c2 (int): Number of output channels. k (int): Kernel size. s (int): Stride. p (int, optional): Padding. g (int): Groups. d (int): Dilation. act (bool | nn.Module): Activation function. super().__init__() self.conv nn.Conv2d(c1, c2, k, s, autopad(k, p, d), groupsg, dilationd, biasFalse) self.bn nn.BatchNorm2d(c2) self.act self.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity() def forward(self, x): Apply convolution, batch normalization and activation to input tensor. Args: x (torch.Tensor): Input tensor. Returns: (torch.Tensor): Output tensor. return self.act(self.bn(self.conv(x))) def forward_fuse(self, x): Apply convolution and activation without batch normalization. Args: x (torch.Tensor): Input tensor. Returns: (torch.Tensor): Output tensor. return self.act(self.conv(x)) class DWConv(Conv): Depth-wise convolution module. def __init__(self, c1, c2, k1, s1, d1, actTrue): Initialize depth-wise convolution with given parameters. Args: c1 (int): Number of input channels. c2 (int): Number of output channels. k (int): Kernel size. s (int): Stride. d (int): Dilation. act (bool | nn.Module): Activation function. super().__init__(c1, c2, k, s, gmath.gcd(c1, c2), dd, actact) class DFL(nn.Module): Integral module of Distribution Focal Loss (DFL). Proposed in Generalized Focal Loss https://ieeexplore.ieee.org/document/9792391 def __init__(self, c1: int 16): Initialize a convolutional layer with a given number of input channels. Args: c1 (int): Number of input channels. super().__init__() self.conv nn.Conv2d(c1, 1, 1, biasFalse).requires_grad_(False) x torch.arange(c1, dtypetorch.float) self.conv.weight.data[:] nn.Parameter(x.view(1, c1, 1, 1)) self.c1 c1 def forward(self, x: torch.Tensor) - torch.Tensor: Apply the DFL module to input tensor and return transformed output. b, _, a x.shape # batch, channels, anchors return self.conv(x.view(b, 4, self.c1, a).transpose(2, 1).softmax(1)).view(b, 4, a) # return self.conv(x.view(b, self.c1, 4, a).softmax(1)).view(b, 4, a) class DyHeadDetect(nn.Module): YOLO Detect head for object detection models. This class implements the detection head used in YOLO models for predicting bounding boxes and class probabilities. It supports both training and inference modes, with optional end-to-end detection capabilities. Attributes: dynamic (bool): Force grid reconstruction. export (bool): Export mode flag. format (str): Export format. end2end (bool): End-to-end detection mode. max_det (int): Maximum detections per image. shape (tuple): Input shape. anchors (torch.Tensor): Anchor points. strides (torch.Tensor): Feature map strides. legacy (bool): Backward compatibility for v3/v5/v8/v9/v11 models. xyxy (bool): Output format, xyxy or xywh. nc (int): Number of classes. nl (int): Number of detection layers. reg_max (int): DFL channels. no (int): Number of outputs per anchor. stride (torch.Tensor): Strides computed during build. cv2 (nn.ModuleList): Convolution layers for box regression. cv3 (nn.ModuleList): Convolution layers for classification. dfl (nn.Module): Distribution Focal Loss layer. one2one_cv2 (nn.ModuleList): One-to-one convolution layers for box regression. one2one_cv3 (nn.ModuleList): One-to-one convolution layers for classification. Methods: forward: Perform forward pass and return predictions. bias_init: Initialize detection head biases. decode_bboxes: Decode bounding boxes from predictions. postprocess: Post-process model predictions. Examples: Create a detection head for 80 classes detect Detect(nc80, ch(256, 512, 1024)) x [torch.randn(1, 256, 80, 80), torch.randn(1, 512, 40, 40), torch.randn(1, 1024, 20, 20)] outputs detect(x) dynamic False # force grid reconstruction export False # export mode format None # export format max_det 300 # max_det agnostic_nms False shape None anchors torch.empty(0) # init strides torch.empty(0) # init legacy False # backward compatibility for v3/v5/v8/v9 models xyxy False # xyxy or xywh output def __init__(self, nc: int 80, reg_max16, end2endFalse, ch: tuple ()): Initialize the YOLO detection layer with specified number of classes and channels. Args: nc (int): Number of classes. reg_max (int): Maximum number of DFL channels. end2end (bool): Whether to use end-to-end NMS-free detection. ch (tuple): Tuple of channel sizes from backbone feature maps. super().__init__() self.nc nc # number of classes self.nl len(ch) # number of detection layers self.reg_max reg_max # DFL channels (ch[0] // 16 to scale 4/8/12/16/20 for n/s/m/l/x) self.no nc self.reg_max * 4 # number of outputs per anchor self.stride torch.zeros(self.nl) # strides computed during build c2, c3 max((16, ch[0] // 4, self.reg_max * 4)), max(ch[0], min(self.nc, 100)) # channels self.cv2 nn.ModuleList( nn.Sequential(Conv(x, c2, 3), Conv(c2, c2, 3), nn.Conv2d(c2, 4 * self.reg_max, 1)) for x in ch ) self.cv3 ( nn.ModuleList(nn.Sequential(Conv(x, c3, 3), Conv(c3, c3, 3), nn.Conv2d(c3, self.nc, 1)) for x in ch) if self.legacy else nn.ModuleList( nn.Sequential( nn.Sequential(DWConv(x, x, 3), Conv(x, c3, 1)), nn.Sequential(DWConv(c3, c3, 3), Conv(c3, c3, 1)), nn.Conv2d(c3, self.nc, 1), ) for x in ch ) ) self.dfl DFL(self.reg_max) if self.reg_max 1 else nn.Identity() dyhead_tower [] for i in range(self.nl): channel ch[i] dyhead_tower.append( DyConv( channel, channel, conv_funcConv3x3Norm, ) ) self.add_module(dyhead_tower, nn.Sequential(*dyhead_tower)) if end2end: self.one2one_cv2 copy.deepcopy(self.cv2) self.one2one_cv3 copy.deepcopy(self.cv3) property def one2many(self): Returns the one-to-many head components, here for v3/v5/v8/v9/v11 backward compatibility. return dict(box_headself.cv2, cls_headself.cv3) property def one2one(self): Returns the one-to-one head components. return dict(box_headself.one2one_cv2, cls_headself.one2one_cv3) property def end2end(self): Checks if the model has one2one for v3/v5/v8/v9/v11 backward compatibility. return getattr(self, _end2end, True) and hasattr(self, one2one) end2end.setter def end2end(self, value): Override the end-to-end detection mode. self._end2end value def forward_head( self, x: list[torch.Tensor], box_head: torch.nn.Module None, cls_head: torch.nn.Module None ) - dict[str, torch.Tensor]: Concatenates and returns predicted bounding boxes and class probabilities. if box_head is None or cls_head is None: # for fused inference return dict() bs x[0].shape[0] # batch size boxes torch.cat([box_head[i](x[i]).view(bs, 4 * self.reg_max, -1) for i in range(self.nl)], dim-1) scores torch.cat([cls_head[i](x[i]).view(bs, self.nc, -1) for i in range(self.nl)], dim-1) return dict(boxesboxes, scoresscores, featsx) def forward( self, x: list[torch.Tensor] ) - dict[str, torch.Tensor] | torch.Tensor | tuple[torch.Tensor, dict[str, torch.Tensor]]: Concatenates and returns predicted bounding boxes and class probabilities. preds self.forward_head(x, **self.one2many) if self.end2end: x_detach [xi.detach() for xi in x] one2one self.forward_head(x_detach, **self.one2one) preds {one2many: preds, one2one: one2one} if self.training: return preds y self._inference(preds[one2one] if self.end2end else preds) if self.end2end: y self.postprocess(y.permute(0, 2, 1)) return y if self.export else (y, preds) def _inference(self, x: dict[str, torch.Tensor]) - torch.Tensor: Decode predicted bounding boxes and class probabilities based on multiple-level feature maps. Args: x (dict[str, torch.Tensor]): Dictionary of predictions from detection layers. Returns: (torch.Tensor): Concatenated tensor of decoded bounding boxes and class probabilities. # Inference path dbox self._get_decode_boxes(x) return torch.cat((dbox, x[scores].sigmoid()), 1) def _get_decode_boxes(self, x: dict[str, torch.Tensor]) - torch.Tensor: Get decoded boxes based on anchors and strides. shape x[feats][0].shape # BCHW if self.dynamic or self.shape ! shape: self.anchors, self.strides (a.transpose(0, 1) for a in make_anchors(x[feats], self.stride, 0.5)) self.shape shape dbox self.decode_bboxes(self.dfl(x[boxes]), self.anchors.unsqueeze(0)) * self.strides return dbox def bias_init(self): Initialize Detect() biases, WARNING: requires stride availability. for i, (a, b) in enumerate(zip(self.one2many[box_head], self.one2many[cls_head])): # from a[-1].bias.data[:] 2.0 # box b[-1].bias.data[: self.nc] math.log( 5 / self.nc / (640 / self.stride[i]) ** 2 ) # cls (.01 objects, 80 classes, 640 img) if self.end2end: for i, (a, b) in enumerate(zip(self.one2one[box_head], self.one2one[cls_head])): # from a[-1].bias.data[:] 2.0 # box b[-1].bias.data[: self.nc] math.log( 5 / self.nc / (640 / self.stride[i]) ** 2 ) # cls (.01 objects, 80 classes, 640 img) def decode_bboxes(self, bboxes: torch.Tensor, anchors: torch.Tensor, xywh: bool True) - torch.Tensor: Decode bounding boxes from predictions. return dist2bbox( bboxes, anchors, xywhxywh and not self.end2end and not self.xyxy, dim1, ) def postprocess(self, preds: torch.Tensor) - torch.Tensor: Post-processes YOLO model predictions. Args: preds (torch.Tensor): Raw predictions with shape (batch_size, num_anchors, 4 nc) with last dimension format [x1, y1, x2, y2, class_probs]. Returns: (torch.Tensor): Processed predictions with shape (batch_size, min(max_det, num_anchors), 6) and last dimension format [x1, y1, x2, y2, max_class_prob, class_index]. boxes, scores preds.split([4, self.nc], dim-1) scores, conf, idx self.get_topk_index(scores, self.max_det) boxes boxes.gather(dim1, indexidx.repeat(1, 1, 4)) return torch.cat([boxes, scores, conf], dim-1) def get_topk_index(self, scores: torch.Tensor, max_det: int) - tuple[torch.Tensor, torch.Tensor, torch.Tensor]: Get top-k indices from scores. Args: scores (torch.Tensor): Scores tensor with shape (batch_size, num_anchors, num_classes). max_det (int): Maximum detections per image. Returns: (torch.Tensor, torch.Tensor, torch.Tensor): Top scores, class indices, and filtered indices. batch_size, anchors, nc scores.shape # i.e. shape(16,8400,84) # Use max_det directly during export for TensorRT compatibility (requires k to be constant), # otherwise use min(max_det, anchors) for safety with small inputs during Python inference k max_det if self.export else min(max_det, anchors) if self.agnostic_nms: scores, labels scores.max(dim-1, keepdimTrue) scores, indices scores.topk(k, dim1) labels labels.gather(1, indices) return scores, labels, indices ori_index scores.max(dim-1)[0].topk(k)[1].unsqueeze(-1) scores scores.gather(dim1, indexori_index.repeat(1, 1, nc)) scores, index scores.flatten(1).topk(k) idx ori_index[torch.arange(batch_size)[..., None], index // nc] # original index return scores[..., None], (index % nc)[..., None].float(), idx def fuse(self) - None: Remove the one2many head for inference optimization. self.cv2 self.cv3 None四、手把手教你添加DynamicHead检测头4.1 修改一首先我们将上面的代码复制粘贴到ultralytics/nn 目录下新建一个py文件复制粘贴进去具体名字自己来定我这里起名为DynamicHead.py。​​4.2 修改二第二步我们在该目录下创建一个新的py文件名字为__init__.py(用群内的文件的话已经有了无需新建)然后在其内部导入我们的检测头如下图所示。​​​4.3 修改三第三步我门中到如下文件ultralytics/nn/tasks.py进行导入和注册我们的模块(用群内的文件的话已经有了无需重新导入直接开始第四步即可)​​​4.4 修改四第四步我门找到如下文件ultralytics/nn/tasks.py找到如下的代码进行将检测头添加进去这里给大家推荐个快速搜索的方法用ctrlf然后搜索Detect然后就能快速查找了。​​​​4.5 修改五同理​​​​4.6 修改六同理4.7 修改七这里有一些不一样我们需要加一行代码else: return detect为啥呢不一样因为这里的m在代码执行过程中会将你的代码自动转换为小写所以直接else方便一点以后出现一些其它分割或者其它的教程的时候在提供其它的修改教程。​​​​4.8 修改八同理.​​​​到此就修改完成了大家可以复制下面的yaml文件运行注意上面添加的步骤可能某一步你没修改对但是模型可以成功运行会出现模型精度为0或者无法收敛的情况。五、DynamicHead检测头的yaml文件5.1 DynamicHead和P2融合yaml文件此版本训练信息YOLO26-p2-DyHead summary: 394 layers, 5,099,768 parameters, 5,099,768 gradients, 7.5 GFLOPs# Ultralytics AGPL-3.0 License - https://ultralytics.com/license # Ultralytics YOLO26 object detection model with P2/4 - P5/32 outputs # Model docs: https://docs.ultralytics.com/models/yolo26 # Task docs: https://docs.ultralytics.com/tasks/detect # Parameters nc: 80 # number of classes end2end: True # whether to use end-to-end mode reg_max: 1 # DFL bins scales: # model compound scaling constants, i.e. modelyolo26n-p2.yaml will call yolo26-p2.yaml with scale n # [depth, width, max_channels] n: [0.50, 0.25, 1024] # summary: 329 layers, 2,662,400 parameters, 2,662,400 gradients, 9.5 GFLOPs s: [0.50, 0.50, 1024] # summary: 329 layers, 9,765,856 parameters, 9,765,856 gradients, 27.8 GFLOPs m: [0.50, 1.00, 512] # summary: 349 layers, 21,144,288 parameters, 21,144,288 gradients, 91.4 GFLOPs l: [1.00, 1.00, 512] # summary: 489 layers, 25,815,520 parameters, 25,815,520 gradients, 115.3 GFLOPs x: [1.00, 1.50, 512] # summary: 489 layers, 57,935,232 parameters, 57,935,232 gradients, 256.9 GFLOPs # YOLO26n backbone backbone: # [from, repeats, module, args] - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2 - [-1, 1, Conv, [128, 3, 2]] # 1-P2/4 - [-1, 2, C3k2, [256, False, 0.25]] - [-1, 1, Conv, [256, 3, 2]] # 3-P3/8 - [-1, 2, C3k2, [512, False, 0.25]] - [-1, 1, Conv, [512, 3, 2]] # 5-P4/16 - [-1, 2, C3k2, [512, True]] - [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32 - [-1, 2, C3k2, [1024, True]] - [-1, 1, SPPF, [1024, 5, 3, True]] # 9 - [-1, 2, C2PSA, [1024]] # 10 # YOLO26n head head: - [-1, 1, nn.Upsample, [None, 2, nearest]] - [[-1, 6], 1, Concat, [1]] # cat backbone P4 - [-1, 2, C3k2, [512, True]] # 13 - [-1, 1, nn.Upsample, [None, 2, nearest]] - [[-1, 4], 1, Concat, [1]] # cat backbone P3 - [-1, 2, C3k2, [256, True]] # 16 (P3/8-small) - [-1, 1, nn.Upsample, [None, 2, nearest]] - [[-1, 2], 1, Concat, [1]] # cat backbone P2 - [-1, 2, C3k2, [128, True]] # 19 (P2/4-xsmall) - [-1, 1, Conv, [128, 3, 2]] - [[-1, 16], 1, Concat, [1]] # cat head P3 - [-1, 2, C3k2, [256, True]] # 22 (P3/8-small) - [-1, 1, Conv, [256, 3, 2]] - [[-1, 13], 1, Concat, [1]] # cat head P4 - [-1, 2, C3k2, [512, True]] # 25 (P4/16-medium) - [-1, 1, Conv, [512, 3, 2]] - [[-1, 10], 1, Concat, [1]] # cat head P5 - [-1, 1, C3k2, [1024, True, 0.5, True]] # 28 (P5/32-large) - [[19, 22, 25, 28], 1, DyHeadDetect, [nc]] # Detect(P2, P3, P4, P5)5.2 DynamicHead和P6融合yaml文件此版本训练信息YOLO26-p6-DyHead summary: 414 layers, 7,610,432 parameters, 7,610,432 gradients, 5.7 GFLOPs# Ultralytics AGPL-3.0 License - https://ultralytics.com/license # Ultralytics YOLO26 object detection model with P3/8 - P6/64 outputs # Model docs: https://docs.ultralytics.com/models/yolo26 # Task docs: https://docs.ultralytics.com/tasks/detect # Parameters nc: 80 # number of classes end2end: True # whether to use end-to-end mode reg_max: 1 # DFL bins scales: # model compound scaling constants, i.e. modelyolo26n-p6.yaml will call yolo26-p6.yaml with scale n # [depth, width, max_channels] n: [0.50, 0.25, 1024] # summary: 349 layers, 4,063,872 parameters, 4,063,872 gradients, 6.0 GFLOPs s: [0.50, 0.50, 1024] # summary: 349 layers, 15,876,448 parameters, 15,876,448 gradients, 22.3 GFLOPs m: [0.50, 1.00, 512] # summary: 369 layers, 32,400,096 parameters, 32,400,096 gradients, 77.3 GFLOPs l: [1.00, 1.00, 512] # summary: 523 layers, 39,365,600 parameters, 39,365,600 gradients, 97.0 GFLOPs x: [1.00, 1.50, 512] # summary: 523 layers, 88,330,368 parameters, 88,330,368 gradients, 216.6 GFLOPs # YOLO26n backbone backbone: # [from, repeats, module, args] - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2 - [-1, 1, Conv, [128, 3, 2]] # 1-P2/4 - [-1, 2, C3k2, [256, False, 0.25]] - [-1, 1, Conv, [256, 3, 2]] # 3-P3/8 - [-1, 2, C3k2, [512, False, 0.25]] - [-1, 1, Conv, [512, 3, 2]] # 5-P4/16 - [-1, 2, C3k2, [512, True]] - [-1, 1, Conv, [768, 3, 2]] # 7-P5/32 - [-1, 2, C3k2, [768, True]] - [-1, 1, Conv, [1024, 3, 2]] # 9-P6/64 - [-1, 2, C3k2, [1024, True]] - [-1, 1, SPPF, [1024, 5]] # 11 - [-1, 2, C2PSA, [1024]] # 12 # YOLO26n head head: - [-1, 1, nn.Upsample, [None, 2, nearest]] - [[-1, 8], 1, Concat, [1]] # cat backbone P5 - [-1, 2, C3k2, [768, True]] # 15 - [-1, 1, nn.Upsample, [None, 2, nearest]] - [[-1, 6], 1, Concat, [1]] # cat backbone P4 - [-1, 2, C3k2, [512, True]] # 18 - [-1, 1, nn.Upsample, [None, 2, nearest]] - [[-1, 4], 1, Concat, [1]] # cat backbone P3 - [-1, 2, C3k2, [256, True]] # 21 (P3/8-small) - [-1, 1, Conv, [256, 3, 2]] - [[-1, 18], 1, Concat, [1]] # cat head P4 - [-1, 2, C3k2, [512, True]] # 24 (P4/16-medium) - [-1, 1, Conv, [512, 3, 2]] - [[-1, 15], 1, Concat, [1]] # cat head P5 - [-1, 2, C3k2, [768, True]] # 27 (P5/32-large) - [-1, 1, Conv, [768, 3, 2]] - [[-1, 12], 1, Concat, [1]] # cat head P6 - [-1, 1, C3k2, [1024, True, 0.5, True]] # 30 (P6/64-large) - [[21, 24, 27, 30], 1, DyHeadDetect, [nc]] # Detect(P3, P4, P5, P6)六、完美运行记录最后提供一下完美运行的图片。​​​​七、本文总结到此本文的正式分享内容就结束了在这里给大家推荐我的YOLOv26改进有效涨点专栏本专栏目前为新开的平均质量分98分后期我会根据各种最新的前沿顶会进行论文复现也会对一些老的改进机制进行补充如果大家觉得本文帮助到你了订阅本专栏关注后续更多的更新~专栏链接YOLOv26有效涨点专栏包含Conv、注意力机制、主干/Backbone、损失函数、优化器、后处理等改进机制​​