核心代码可以分为下面3个模块encoder各模态各自编码camera radarfuser跨模态融合decoder backboneneck对融合结果再加工成最终给 head 的特征encoder各模态各自编码camera radarcamera编码radar编码fuser跨模态融合多模态融合x self.fuser(features) 会把features列表里面的相机BEV特征和雷达BEV特征融合成一个统一的BEV特征图class ConvFuser(nn.Sequential):输入参数feature 类型列表长度2feature[0]: camera BEV 形状[B, 64, H, W]feature[1]: radar BEV, 形状[B, 64, H, W]输出参数x: 形状[B, 64, H, W]实现过程step1: 拼接通道z cat(features, dim1) 形状[B, 128, H, W]step2: 再通过3*3的卷积128-64, 后BN和Relu, 如下图所示class ConvFuser(nn.Sequential): def __init__(self, in_channels: int, out_channels: int) - None: self.in_channels in_channels self.out_channels out_channels super().__init__( nn.Conv2d(sum(in_channels), out_channels, 3, padding1, biasFalse), nn.BatchNorm2d(out_channels), nn.ReLU(True), ) def forward(self, inputs: List[torch.Tensor]) - torch.Tensor: return super().forward(torch.cat(inputs, dim1))decoder backboneneck对融合结果再加工成最终给 head 的特征