一. SPD-Conv介绍

SPD-Conv原文地址
A New CNN Building Block for Low-Resolution Images and Small Objects
大意就是这个SPD-Conv是一个着力于低分辨率与小图像构建的CNN组件。
Note:由于这次做的实验是基于YOLOv8的信号灯检测,目标框都非常小,有的还是用行车记录仪拍摄的画面,隔了一层车窗后分辨率大大降低,因此找到了这个模块,想着能不能提点。

SPD-Conv 由空间到深度 (SPD,Space to Depth) 层 和非Stride卷积 (Conv) 层组成。

二. 修改YOLOv8源码以添加SPD-Conv组件

1. 修改block.py

找到ultralytics包所在的文件夹,然后找到ultralytics/nn/modules/block.py
添加如下代码

############## SPD-Conv ##############
class space_to_depth(nn.Module):
    # Changing the dimension of the Tensor
    def __init__(self, dimension=1):
        super().__init__()
        self.d = dimension

    def forward(self, x):
         return torch.cat([x[..., ::2, ::2], x[..., 1::2, ::2], x[..., ::2, 1::2], x[..., 1::2, 1::2]], 1)
############## SPD-Conv ##############

2.修改__init__.py

找到ultralytics/nn/modules/__init__.py

from .block import (C1, C2, C3, C3TR, DFL, SPP, SPPF, Bottleneck, BottleneckCSP, C2f, C3Ghost, C3x, GhostBottleneck,
                    HGBlock, HGStem, Proto, RepC3, space_to_depth) #将space_to_depth加到import

# ...
__all__ = ('...', 'space_to_depth') #将space_to_depth加到__all__

3.修改tasks.py

找到ultralytics/nn/tasks.py
在import部分找到 from ultralytics.nn.modules import ()

from ultralytics.nn.modules import (AIFI, C1, C2, C3, C3TR, SPP, SPPF, Bottleneck, BottleneckCSP, C2f, C3Ghost, C3x,
                                    Classify, Concat, Conv, Conv2, ConvTranspose, Detect, DWConv, DWConvTranspose2d,
                                    Focus, GhostBottleneck, GhostConv, HGBlock, HGStem, Pose, RepC3, RepConv,
                                    RTDETRDecoder, Segment, space_to_depth) #添加space_to_depth

要让它能被调用,在解析模型函数

def parse_model

添加一条判断语句

# --- SPDConv
        elif m is space_to_depth:
            c2 = 4 * ch[f]
# --- SPDConv

4. 修改模型配置yolov8.yaml文件

这里由于nc是类别数量,scales我其他的都注释掉了,只留下了s,底下backbone部分,每个Conv之后都加上SPD组件

# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLOv8 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect

# Parameters
nc: 4  # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n.yaml' will call yolov8.yaml with scale 'n'
  # [depth, width, max_channels]
  # n: [0.33, 0.25, 1024]  # YOLOv8n summary: 225 layers,  3157200 parameters,  3157184 gradients,   8.9 GFLOPs
  s: [0.33, 0.50, 1024]  # YOLOv8s summary: 225 layers, 11166560 parameters, 11166544 gradients,  28.8 GFLOPs
  #m: [0.67, 0.75, 768]   # YOLOv8m summary: 295 layers, 25902640 parameters, 25902624 gradients,  79.3 GFLOPs
  #l: [1.00, 1.00, 512]   # YOLOv8l summary: 365 layers, 43691520 parameters, 43691504 gradients, 165.7 GFLOPs
  #x: [1.00, 1.25, 512]   # YOLOv8x summary: 365 layers, 68229648 parameters, 68229632 gradients, 258.5 GFLOPs

# YOLOv8.0n backbone
backbone:
  # [from, repeats, module, args]
  - [-1, 1, Conv, [64, 3, 2]]  # 0-P1/2
  - [-1, 1, Conv, [128, 3, 1]]  # 1-P2/4
  - [-1,1,space_to_depth,[1]]  # 2
  - [-1, 3, C2f, [128, True]]  # 3
  - [-1, 1, Conv, [256, 3, 1]]  # 4-P3/8
  - [-1,1,space_to_depth,[1]]  # 5
  - [-1, 6, C2f, [256, True]]  # 6
  - [-1, 1, Conv, [512, 3, 1]]  # 7-P4/16
  - [-1,1,space_to_depth,[1]]  # 8
  - [-1, 6, C2f, [512, True]]  # 9
  - [-1, 1, Conv, [1024, 3, 1]]  # 10-P5/32
  - [-1,1,space_to_depth,[1]]  # 11
  - [-1, 3, C2f, [1024, True]]  # 12
  - [-1, 1, SPPF, [1024, 5]]  # 13

# YOLOv8.0n head
head:
  - [-1, 1, nn.Upsample, [None, 2, 'nearest']]
  - [[-1, 9], 1, Concat, [1]]  # cat backbone P4
  - [-1, 3, C2f, [512]]  # 16

  - [-1, 1, nn.Upsample, [None, 2, 'nearest']]
  - [[-1, 6], 1, Concat, [1]]  # cat backbone P3
  - [-1, 3, C2f, [256]]  # 19 (P3/8-small)

  - [-1, 1, Conv, [256, 3, 1]]
  - [-1,1,space_to_depth,[1]]
  - [[-1, 16], 1, Concat, [1]]  # cat head P4
  - [-1, 3, C2f, [512]]  # 23 (P4/16-medium)

  - [-1, 1, Conv, [512, 3, 1]]
  - [-1,1,space_to_depth,[1]]
  - [[-1, 13], 1, Concat, [1]]  # cat head P5
  - [-1, 3, C2f, [1024]]  # 27 (P5/32-large)

  - [[19, 23, 27], 1, Detect, [nc]]  # Detect(P3, P4, P5)

注意: 这里踩了个坑,由于没看人家论文直接往上想当然地套用SPD组件,导致最开始怎么都跑不通。后来看了别人修改的才发现,每个Conv的步长(stride)都要设置为1。

三. 测试

至此,SPD组件就添加好了,新建一个train.py文件

from ultralytics import YOLO


if __name__ == '__main__':
    model = YOLO('./yolov8.yaml')

run it !

                   from  n    params  module                                       arguments                     
  0                  -1  1       928  ultralytics.nn.modules.conv.Conv             [3, 32, 3, 2]                 
  1                  -1  1     18560  ultralytics.nn.modules.conv.Conv             [32, 64, 3, 1]                
  2                  -1  1         0  ultralytics.nn.modules.block.space_to_depth  [1]                           
  3                  -1  1     41344  ultralytics.nn.modules.block.C2f             [256, 64, 1, True]            
  4                  -1  1     73984  ultralytics.nn.modules.conv.Conv             [64, 128, 3, 1]               
  5                  -1  1         0  ultralytics.nn.modules.block.space_to_depth  [1]                           
  6                  -1  2    246784  ultralytics.nn.modules.block.C2f             [512, 128, 2, True]           
  7                  -1  1    295424  ultralytics.nn.modules.conv.Conv             [128, 256, 3, 1]              
  8                  -1  1         0  ultralytics.nn.modules.block.space_to_depth  [1]                           
  9                  -1  2    985088  ultralytics.nn.modules.block.C2f             [1024, 256, 2, True]          
 10                  -1  1   1180672  ultralytics.nn.modules.conv.Conv             [256, 512, 3, 1]              
 11                  -1  1         0  ultralytics.nn.modules.block.space_to_depth  [1]                           
 12                  -1  1   2624512  ultralytics.nn.modules.block.C2f             [2048, 512, 1, True]          
 13                  -1  1    656896  ultralytics.nn.modules.block.SPPF            [512, 512, 5]                 
 14                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']          
 15             [-1, 9]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
 16                  -1  1    591360  ultralytics.nn.modules.block.C2f             [768, 256, 1]                 
 17                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']          
 18             [-1, 6]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
 19                  -1  1    148224  ultralytics.nn.modules.block.C2f             [384, 128, 1]                 
 20                  -1  1    147712  ultralytics.nn.modules.conv.Conv             [128, 128, 3, 1]              
 21                  -1  1         0  ultralytics.nn.modules.block.space_to_depth  [1]                           
 22            [-1, 16]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
 23                  -1  1    591360  ultralytics.nn.modules.block.C2f             [768, 256, 1]                 
 24                  -1  1    590336  ultralytics.nn.modules.conv.Conv             [256, 256, 3, 1]              
 25                  -1  1         0  ultralytics.nn.modules.block.space_to_depth  [1]                           
 26            [-1, 13]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
 27                  -1  1   2362368  ultralytics.nn.modules.block.C2f             [1536, 512, 1]                
 28        [19, 23, 27]  1   2117596  ultralytics.nn.modules.head.Detect           [4, [128, 256, 512]]          
YOLOv8-SPDconv summary: 231 layers, 12673148 parameters, 12673132 gradients, 46.0 GFLOPs

可以看到参数量暴增。