Code for AAAI 2021 paper. "SCNet: Training Inference Sample Consistency for Instance Segmentation" (#4356)

* add scnet * add more configs and minors * fix unittest for scnet * update docstring * add forward test for htc and scnet * fix build on pytorch 1.3 for empty tensor * update doc string and minor refactor * update unit-test and use inheritance for SCNet heads * support scnet tta * minor * update readme * minor fixes
4 years ago · 40f168937d
parent 31809ece87
commit 40f168937d
24 changed files with 1294 additions and 38 deletions
--- a/README.md
+++ b/README.md
@ -108,6 +108,7 @@ Supported methods:
 - [x] [VFNet](configs/vfnet/README.md)
 - [x] [DETR](configs/detr/README.md)
 - [x] [CascadeRPN](configs/cascade_rpn/README.md)
 - [x] [SCNet](configs/scnet/README.md)
 Some other methods are also supported in [projects using MMDetection](./docs/projects.md).
--- a/configs/scnet/README.md
+++ b/configs/scnet/README.md
@ -0,0 +1,51 @@
 # SCNet
 ## Introduction
 [ALGORITHM]
 We provide the code for reproducing experiment results of [SCNet](https://arxiv.org/abs/2012.10150).
 ```
@inproceedings{vu2019cascade,
  title={SCNet: Training Inference Sample Consistency for Instance Segmentation},
  author={Vu, Thang and Haeyong, Kang and Yoo, Chang D},
  booktitle={AAAI},
  year={2021}
 }
 ```
 ## Dataset
 SCNet requires COCO and COCO-stuff dataset for training. You need to download and extract it in the COCO dataset path.
 The directory should be like this.
 ```none
 mmdetection
 ├── mmdet
 ├── tools
 ├── configs
 ├── data
 │   ├── coco
 │   │   ├── annotations
 │   │   ├── train2017
 │   │   ├── val2017
 │   │   ├── test2017
 |   |   ├── stuffthingmaps
 ```
 ## Results and Models
 The results on COCO 2017val are shown in the below table. (results on test-dev are usually slightly higher than val)
 |     Backbone    |  Style  | Lr schd | Mem (GB) | Inf speed (fps) | box AP | mask AP | TTA box AP | TTA mask AP | Config |   Download   |
 |:---------------:|:-------:|:-------:|:--------:|:---------------:|:------:|:-------:|:----------:|:-----------:|:------:|:------------:|
 |     R-50-FPN    | pytorch |    1x   |    7.0   |       6.2       |  43.5  |   39.2  |    44.8    |     40.9    | [config](https://github.com/open-mmlab/mmdetection/tree/master/configs/scnet/scnet_r50_fpn_1x_coco.py) | [model](https://drive.google.com/file/d/179pcG-sNVDglJoZcQLsM8GWZnJhZtgWx/view?usp=sharing) \| [log](https://drive.google.com/file/d/1ZFS6QhFfxlOnDYPiGpSDP_Fzgb7iDGN3/view?usp=sharing) |
 |     R-50-FPN    | pytorch |   20e   |    7.0   |       6.2       |  44.5  |   40.0  |    45.8    |     41.5    | [config](https://github.com/open-mmlab/mmdetection/tree/master/configs/scnet/scnet_r50_fpn_20e_coco.py) | [model](https://drive.google.com/file/d/1hCH4raiUXsnYQYmKc4ADeL5pNc2X72fJ/view?usp=sharing) \| [log](https://drive.google.com/file/d/1-LnkOXN8n5ojQW34H0qZ625cgrnWpqSX/view?usp=sharing) |
 |    R-101-FPN    | pytorch |   20e   |    8.9   |       5.8       |  45.8  |   40.9  |    47.3    |     42.7    | [config](https://github.com/open-mmlab/mmdetection/tree/master/configs/scnet/scnet_r101_fpn_20e_coco.py) | [model](https://drive.google.com/file/d/1LcnegOX9YYZ7G8hqUa1F_Mp6PeT6_3jA/view?usp=sharing) \| [log](https://drive.google.com/file/d/1iRx-9GRgTaIDsz-we3DGwFVH22nbvCLa/view?usp=sharing) |
 | X-101-64x4d-FPN | pytorch |   20e   |   13.2   |       4.9       |  47.5  |   42.3  |    48.9    |     44.0    | [config](https://github.com/open-mmlab/mmdetection/tree/master/configs/scnet/scnet_x101_64x4d_fpn_20e_coco.py) | [model](https://drive.google.com/file/d/1MDIoOBKwfXUdtEduz2BQm3MSgC05IlqZ/view?usp=sharing) \| [log](https://drive.google.com/file/d/1OsfQJ8gwtqIQ61k358yxY21sCvbUcRjs/view?usp=sharing) |
 ### Notes
 - Training hyper-parameters are identical to those of [HTC](https://github.com/open-mmlab/mmdetection/tree/master/configs/htc).
 - TTA means Test Time Augmentation, which applies horizonal flip and multi-scale testing. Refer to [config](https://github.com/open-mmlab/mmdetection/tree/master/configs/scnet/scnet_r50_fpn_1x_coco.py).
--- a/configs/scnet/scnet_r101_fpn_20e_coco.py
+++ b/configs/scnet/scnet_r101_fpn_20e_coco.py
@ -0,0 +1,2 @@
 _base_ = './scnet_r50_fpn_20e_coco.py'
 model = dict(pretrained='torchvision://resnet101', backbone=dict(depth=101))
--- a/configs/scnet/scnet_r50_fpn_1x_coco.py
+++ b/configs/scnet/scnet_r50_fpn_1x_coco.py
@ -0,0 +1,136 @@
 _base_ = '../htc/htc_r50_fpn_1x_coco.py'
 # model settings
 model = dict(
    type='SCNet',
    roi_head=dict(
        _delete_=True,
        type='SCNetRoIHead',
        num_stages=3,
        stage_loss_weights=[1, 0.5, 0.25],
        bbox_roi_extractor=dict(
            type='SingleRoIExtractor',
            roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
            out_channels=256,
            featmap_strides=[4, 8, 16, 32]),
        bbox_head=[
            dict(
                type='SCNetBBoxHead',
                num_shared_fcs=2,
                in_channels=256,
                fc_out_channels=1024,
                roi_feat_size=7,
                num_classes=80,
                bbox_coder=dict(
                    type='DeltaXYWHBBoxCoder',
                    target_means=[0., 0., 0., 0.],
                    target_stds=[0.1, 0.1, 0.2, 0.2]),
                reg_class_agnostic=True,
                loss_cls=dict(
                    type='CrossEntropyLoss',
                    use_sigmoid=False,
                    loss_weight=1.0),
                loss_bbox=dict(type='SmoothL1Loss', beta=1.0,
                               loss_weight=1.0)),
            dict(
                type='SCNetBBoxHead',
                num_shared_fcs=2,
                in_channels=256,
                fc_out_channels=1024,
                roi_feat_size=7,
                num_classes=80,
                bbox_coder=dict(
                    type='DeltaXYWHBBoxCoder',
                    target_means=[0., 0., 0., 0.],
                    target_stds=[0.05, 0.05, 0.1, 0.1]),
                reg_class_agnostic=True,
                loss_cls=dict(
                    type='CrossEntropyLoss',
                    use_sigmoid=False,
                    loss_weight=1.0),
                loss_bbox=dict(type='SmoothL1Loss', beta=1.0,
                               loss_weight=1.0)),
            dict(
                type='SCNetBBoxHead',
                num_shared_fcs=2,
                in_channels=256,
                fc_out_channels=1024,
                roi_feat_size=7,
                num_classes=80,
                bbox_coder=dict(
                    type='DeltaXYWHBBoxCoder',
                    target_means=[0., 0., 0., 0.],
                    target_stds=[0.033, 0.033, 0.067, 0.067]),
                reg_class_agnostic=True,
                loss_cls=dict(
                    type='CrossEntropyLoss',
                    use_sigmoid=False,
                    loss_weight=1.0),
                loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0))
        ],
        mask_roi_extractor=dict(
            type='SingleRoIExtractor',
            roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0),
            out_channels=256,
            featmap_strides=[4, 8, 16, 32]),
        mask_head=dict(
            type='SCNetMaskHead',
            num_convs=12,
            in_channels=256,
            conv_out_channels=256,
            num_classes=80,
            conv_to_res=True,
            loss_mask=dict(
                type='CrossEntropyLoss', use_mask=True, loss_weight=1.0)),
        semantic_roi_extractor=dict(
            type='SingleRoIExtractor',
            roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0),
            out_channels=256,
            featmap_strides=[8]),
        semantic_head=dict(
            type='SCNetSemanticHead',
            num_ins=5,
            fusion_level=1,
            num_convs=4,
            in_channels=256,
            conv_out_channels=256,
            num_classes=183,
            ignore_label=255,
            loss_weight=0.2,
            conv_to_res=True),
        glbctx_head=dict(
            type='GlobalContextHead',
            num_convs=4,
            in_channels=256,
            conv_out_channels=256,
            num_classes=81,
            loss_weight=3.0,
            conv_to_res=True),
        feat_relay_head=dict(
            type='FeatureRelayHead',
            in_channels=1024,
            out_conv_channels=256,
            roi_feat_size=7,
            scale_factor=2)))
 # uncomment below code to enable test time augmentations
 # img_norm_cfg = dict(
 #     mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
 # test_pipeline = [
 #     dict(type='LoadImageFromFile'),
 #     dict(
 #         type='MultiScaleFlipAug',
 #         img_scale=[(600, 900), (800, 1200), (1000, 1500), (1200, 1800),
 #                    (1400, 2100)],
 #         flip=True,
 #         transforms=[
 #             dict(type='Resize', keep_ratio=True),
 #             dict(type='RandomFlip', flip_ratio=0.5),
 #             dict(type='Normalize', **img_norm_cfg),
 #             dict(type='Pad', size_divisor=32),
 #             dict(type='ImageToTensor', keys=['img']),
 #             dict(type='Collect', keys=['img']),
 #         ])
 # ]
 # data = dict(
 #     val=dict(pipeline=test_pipeline),
 #     test=dict(pipeline=test_pipeline))
--- a/configs/scnet/scnet_r50_fpn_20e_coco.py
+++ b/configs/scnet/scnet_r50_fpn_20e_coco.py
@ -0,0 +1,4 @@
 _base_ = './scnet_r50_fpn_1x_coco.py'
 # learning policy
 lr_config = dict(step=[16, 19])
 total_epochs = 20
--- a/configs/scnet/scnet_x101_64x4d_fpn_20e_coco.py
+++ b/configs/scnet/scnet_x101_64x4d_fpn_20e_coco.py
@ -0,0 +1,14 @@
 _base_ = './scnet_r50_fpn_20e_coco.py'
 model = dict(
    pretrained='open-mmlab://resnext101_64x4d',
    backbone=dict(
        type='ResNeXt',
        depth=101,
        groups=64,
        base_width=4,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        frozen_stages=1,
        norm_cfg=dict(type='BN', requires_grad=True),
        norm_eval=True,
        style='pytorch'))
--- a/configs/scnet/scnet_x101_64x4d_fpn_8x1_20e_coco.py
+++ b/configs/scnet/scnet_x101_64x4d_fpn_8x1_20e_coco.py
@ -0,0 +1,3 @@
 _base_ = './scnet_x101_64x4d_fpn_20e_coco.py'
 data = dict(samples_per_gpu=1, workers_per_gpu=1)
 optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001)
--- a/mmdet/models/detectors/init.py
+++ b/mmdet/models/detectors/init.py
@ -19,6 +19,7 @@ from .point_rend import PointRend
 from .reppoints_detector import RepPointsDetector
 from .retinanet import RetinaNet
 from .rpn import RPN
 from .scnet import SCNet
 from .single_stage import SingleStageDetector
 from .sparse_rcnn import SparseRCNN
 from .trident_faster_rcnn import TridentFasterRCNN
@ -32,5 +33,6 @@ __all__ = [
    'FastRCNN', 'FasterRCNN', 'MaskRCNN', 'CascadeRCNN', 'HybridTaskCascade',
    'RetinaNet', 'FCOS', 'GridRCNN', 'MaskScoringRCNN', 'RepPointsDetector',
    'FOVEA', 'FSAF', 'NASFCOS', 'PointRend', 'GFL', 'CornerNet', 'PAA',
-    'YOLOV3', 'YOLACT', 'VFNet', 'DETR', 'TridentFasterRCNN', 'SparseRCNN'
+    'YOLOV3', 'YOLACT', 'VFNet', 'DETR', 'TridentFasterRCNN', 'SparseRCNN',
    'SCNet'
 ]
--- a/mmdet/models/detectors/scnet.py
+++ b/mmdet/models/detectors/scnet.py
@ -0,0 +1,10 @@
 from ..builder import DETECTORS
 from .cascade_rcnn import CascadeRCNN
@DETECTORS.register_module()
 class SCNet(CascadeRCNN):
    """Implementation of `SCNet <https://arxiv.org/abs/2012.10150>`_"""
    def __init__(self, **kwargs):
        super(SCNet, self).__init__(**kwargs)
--- a/mmdet/models/roi_heads/init.py
+++ b/mmdet/models/roi_heads/init.py
@ -1,17 +1,21 @@
 from .base_roi_head import BaseRoIHead
 from .bbox_heads import (BBoxHead, ConvFCBBoxHead, DoubleConvFCBBoxHead,
-                         Shared2FCBBoxHead, Shared4Conv1FCBBoxHead)
+                         SCNetBBoxHead, Shared2FCBBoxHead,
                         Shared4Conv1FCBBoxHead)
 from .cascade_roi_head import CascadeRoIHead
 from .double_roi_head import DoubleHeadRoIHead
 from .dynamic_roi_head import DynamicRoIHead
 from .grid_roi_head import GridRoIHead
 from .htc_roi_head import HybridTaskCascadeRoIHead
-from .mask_heads import (CoarseMaskHead, FCNMaskHead, FusedSemanticHead,
+from .mask_heads import (CoarseMaskHead, FCNMaskHead, FeatureRelayHead,
-                         GridHead, HTCMaskHead, MaskIoUHead, MaskPointHead)
+                         FusedSemanticHead, GlobalContextHead, GridHead,
                         HTCMaskHead, MaskIoUHead, MaskPointHead,
                         SCNetMaskHead, SCNetSemanticHead)
 from .mask_scoring_roi_head import MaskScoringRoIHead
 from .pisa_roi_head import PISARoIHead
 from .point_rend_roi_head import PointRendRoIHead
 from .roi_extractors import SingleRoIExtractor
 from .scnet_roi_head import SCNetRoIHead
 from .shared_heads import ResLayer
 from .sparse_roi_head import SparseRoIHead
 from .standard_roi_head import StandardRoIHead
@ -24,5 +28,7 @@ __all__ = [
    'Shared4Conv1FCBBoxHead', 'DoubleConvFCBBoxHead', 'FCNMaskHead',
    'HTCMaskHead', 'FusedSemanticHead', 'GridHead', 'MaskIoUHead',
    'SingleRoIExtractor', 'PISARoIHead', 'PointRendRoIHead', 'MaskPointHead',
-    'CoarseMaskHead', 'DynamicRoIHead', 'SparseRoIHead', 'TridentRoIHead'
+    'CoarseMaskHead', 'DynamicRoIHead', 'SparseRoIHead', 'TridentRoIHead',
    'SCNetRoIHead', 'SCNetMaskHead', 'SCNetSemanticHead', 'SCNetBBoxHead',
    'FeatureRelayHead', 'GlobalContextHead'
 ]
--- a/mmdet/models/roi_heads/bbox_heads/init.py
+++ b/mmdet/models/roi_heads/bbox_heads/init.py
@ -4,8 +4,10 @@ from .convfc_bbox_head import (ConvFCBBoxHead, Shared2FCBBoxHead,
 from .dii_head import DIIHead
 from .double_bbox_head import DoubleConvFCBBoxHead
 from .sabl_head import SABLHead
 from .scnet_bbox_head import SCNetBBoxHead
 __all__ = [
    'BBoxHead', 'ConvFCBBoxHead', 'Shared2FCBBoxHead',
-    'Shared4Conv1FCBBoxHead', 'DoubleConvFCBBoxHead', 'SABLHead', 'DIIHead'
+    'Shared4Conv1FCBBoxHead', 'DoubleConvFCBBoxHead', 'SABLHead', 'DIIHead',
    'SCNetBBoxHead'
 ]
--- a/mmdet/models/roi_heads/bbox_heads/scnet_bbox_head.py
+++ b/mmdet/models/roi_heads/bbox_heads/scnet_bbox_head.py
@ -0,0 +1,76 @@
 from mmdet.models.builder import HEADS
 from .convfc_bbox_head import ConvFCBBoxHead
@HEADS.register_module()
 class SCNetBBoxHead(ConvFCBBoxHead):
    """BBox head for `SCNet <https://arxiv.org/abs/2012.10150>`_.
    This inherits ``ConvFCBBoxHead`` with modified forward() function, allow us
    to get intermediate shared feature.
    """
    def _forward_shared(self, x):
        """Forward function for shared part."""
        if self.num_shared_convs > 0:
            for conv in self.shared_convs:
                x = conv(x)
        if self.num_shared_fcs > 0:
            if self.with_avg_pool:
                x = self.avg_pool(x)
            x = x.flatten(1)
            for fc in self.shared_fcs:
                x = self.relu(fc(x))
        return x
    def _forward_cls_reg(self, x):
        """Forward function for classification and regression parts."""
        x_cls = x
        x_reg = x
        for conv in self.cls_convs:
            x_cls = conv(x_cls)
        if x_cls.dim() > 2:
            if self.with_avg_pool:
                x_cls = self.avg_pool(x_cls)
            x_cls = x_cls.flatten(1)
        for fc in self.cls_fcs:
            x_cls = self.relu(fc(x_cls))
        for conv in self.reg_convs:
            x_reg = conv(x_reg)
        if x_reg.dim() > 2:
            if self.with_avg_pool:
                x_reg = self.avg_pool(x_reg)
            x_reg = x_reg.flatten(1)
        for fc in self.reg_fcs:
            x_reg = self.relu(fc(x_reg))
        cls_score = self.fc_cls(x_cls) if self.with_cls else None
        bbox_pred = self.fc_reg(x_reg) if self.with_reg else None
        return cls_score, bbox_pred
    def forward(self, x, return_shared_feat=False):
        """Forward function.
        Args:
            x (Tensor): input features
            return_shared_feat (bool): If True, return cls-reg-shared feature.
        Return:
            out (tuple[Tensor]): contain ``cls_score`` and ``bbox_pred``,
                if  ``return_shared_feat`` is True, append ``x_shared`` to the
                returned tuple.
        """
        x_shared = self._forward_shared(x)
        out = self._forward_cls_reg(x_shared)
        if return_shared_feat:
            out += (x_shared, )
        return out
--- a/mmdet/models/roi_heads/mask_heads/init.py
+++ b/mmdet/models/roi_heads/mask_heads/init.py
@ -1,12 +1,17 @@
 from .coarse_mask_head import CoarseMaskHead
 from .fcn_mask_head import FCNMaskHead
 from .feature_relay_head import FeatureRelayHead
 from .fused_semantic_head import FusedSemanticHead
 from .global_context_head import GlobalContextHead
 from .grid_head import GridHead
 from .htc_mask_head import HTCMaskHead
 from .mask_point_head import MaskPointHead
 from .maskiou_head import MaskIoUHead
 from .scnet_mask_head import SCNetMaskHead
 from .scnet_semantic_head import SCNetSemanticHead
 __all__ = [
    'FCNMaskHead', 'HTCMaskHead', 'FusedSemanticHead', 'GridHead',
-    'MaskIoUHead', 'CoarseMaskHead', 'MaskPointHead'
+    'MaskIoUHead', 'CoarseMaskHead', 'MaskPointHead', 'SCNetMaskHead',
    'SCNetSemanticHead', 'GlobalContextHead', 'FeatureRelayHead'
 ]
--- a/mmdet/models/roi_heads/mask_heads/feature_relay_head.py
+++ b/mmdet/models/roi_heads/mask_heads/feature_relay_head.py
@ -0,0 +1,55 @@
 import torch.nn as nn
 from mmcv.cnn import kaiming_init
 from mmcv.runner import auto_fp16
 from mmdet.models.builder import HEADS
@HEADS.register_module()
 class FeatureRelayHead(nn.Module):
    """Feature Relay Head used in SCNet https://arxiv.org/abs/2012.10150.
    Args:
        in_channels (int, optional): number of input channels. Default: 256.
        conv_out_channels (int, optional): number of output channels before
            classification layer. Default: 256.
        roi_feat_size (int, optional): roi feat size at box head. Default: 7.
        scale_factor (int, optional): scale factor to match roi feat size
            at mask head. Default: 2.
    """
    def __init__(self,
                 in_channels=1024,
                 out_conv_channels=256,
                 roi_feat_size=7,
                 scale_factor=2):
        super(FeatureRelayHead, self).__init__()
        assert isinstance(roi_feat_size, int)
        self.in_channels = in_channels
        self.out_conv_channels = out_conv_channels
        self.roi_feat_size = roi_feat_size
        self.out_channels = (roi_feat_size**2) * out_conv_channels
        self.scale_factor = scale_factor
        self.fp16_enabled = False
        self.fc = nn.Linear(self.in_channels, self.out_channels)
        self.upsample = nn.Upsample(
            scale_factor=scale_factor, mode='bilinear', align_corners=True)
    def init_weights(self):
        """Init weights for the head."""
        kaiming_init(self.fc)
    @auto_fp16()
    def forward(self, x):
        """Forward function."""
        N, in_C = x.shape
        if N > 0:
            out_C = self.out_conv_channels
            out_HW = self.roi_feat_size
            x = self.fc(x)
            x = x.reshape(N, out_C, out_HW, out_HW)
            x = self.upsample(x)
            return x
        return None
--- a/mmdet/models/roi_heads/mask_heads/global_context_head.py
+++ b/mmdet/models/roi_heads/mask_heads/global_context_head.py
@ -0,0 +1,101 @@
 import torch.nn as nn
 from mmcv.cnn import ConvModule
 from mmcv.runner import auto_fp16, force_fp32
 from mmdet.models.builder import HEADS
 from mmdet.models.utils import ResLayer, SimplifiedBasicBlock
@HEADS.register_module()
 class GlobalContextHead(nn.Module):
    """Global context head used in SCNet https://arxiv.org/abs/2012.10150.
    Args:
        num_convs (int, optional): number of convolutional layer in GlbCtxHead.
            Default: 4.
        in_channels (int, optional): number of input channels. Default: 256.
        conv_out_channels (int, optional): number of output channels before
            classification layer. Default: 256.
        loss_weight (float, optional): global context loss weight. Default: 1.
        conv_cfg (dict, optional): config to init conv layer. Default: None.
        norm_cfg (dict, optional): config to init norm layer. Default: None.
        conv_to_res (bool, optional): if True, 2 convs will be grouped into
            1 `SimplifiedBasicBlock` using a skip connection. Default: False.
    """
    def __init__(self,
                 num_convs=4,
                 in_channels=256,
                 conv_out_channels=256,
                 num_classes=81,
                 loss_weight=1.0,
                 conv_cfg=None,
                 norm_cfg=None,
                 conv_to_res=False):
        super(GlobalContextHead, self).__init__()
        self.num_convs = num_convs
        self.in_channels = in_channels
        self.conv_out_channels = conv_out_channels
        self.num_classes = num_classes
        self.loss_weight = loss_weight
        self.conv_cfg = conv_cfg
        self.norm_cfg = norm_cfg
        self.conv_to_res = conv_to_res
        self.fp16_enabled = False
        if self.conv_to_res:
            num_res_blocks = num_convs // 2
            self.convs = ResLayer(
                SimplifiedBasicBlock,
                in_channels,
                self.conv_out_channels,
                num_res_blocks,
                conv_cfg=self.conv_cfg,
                norm_cfg=self.norm_cfg)
            self.num_convs = num_res_blocks
        else:
            self.convs = nn.ModuleList()
            for i in range(self.num_convs):
                in_channels = self.in_channels if i == 0 else conv_out_channels
                self.convs.append(
                    ConvModule(
                        in_channels,
                        conv_out_channels,
                        3,
                        padding=1,
                        conv_cfg=self.conv_cfg,
                        norm_cfg=self.norm_cfg))
        self.pool = nn.AdaptiveAvgPool2d(1)
        self.fc = nn.Linear(conv_out_channels, num_classes)
        self.criterion = nn.BCEWithLogitsLoss()
    def init_weights(self):
        """Init weights for the head."""
        nn.init.normal_(self.fc.weight, 0, 0.01)
        nn.init.constant_(self.fc.bias, 0)
    @auto_fp16()
    def forward(self, feats):
        """Forward function."""
        x = feats[-1]
        for i in range(self.num_convs):
            x = self.convs[i](x)
        x = self.pool(x)
        # multi-class prediction
        mc_pred = x.reshape(x.size(0), -1)
        mc_pred = self.fc(mc_pred)
        return mc_pred, x
    @force_fp32(apply_to=('pred', ))
    def loss(self, pred, labels):
        """Loss function."""
        labels = [lbl.unique() for lbl in labels]
        targets = pred.new_zeros(pred.size())
        for i, label in enumerate(labels):
            targets[i, label] = 1.0
        loss = self.loss_weight * self.criterion(pred, targets)
        return loss
--- a/mmdet/models/roi_heads/mask_heads/scnet_mask_head.py
+++ b/mmdet/models/roi_heads/mask_heads/scnet_mask_head.py
@ -0,0 +1,27 @@
 from mmdet.models.builder import HEADS
 from mmdet.models.utils import ResLayer, SimplifiedBasicBlock
 from .fcn_mask_head import FCNMaskHead
@HEADS.register_module()
 class SCNetMaskHead(FCNMaskHead):
    """Mask head for `SCNet <https://arxiv.org/abs/2012.10150>`_.
    Args:
        conv_to_res (bool, optional): if True, change the conv layers to
            ``SimplifiedBasicBlock``.
    """
    def __init__(self, conv_to_res=True, **kwargs):
        super(SCNetMaskHead, self).__init__(**kwargs)
        self.conv_to_res = conv_to_res
        if conv_to_res:
            assert self.conv_kernel_size == 3
            self.num_res_blocks = self.num_convs // 2
            self.convs = ResLayer(
                SimplifiedBasicBlock,
                self.in_channels,
                self.conv_out_channels,
                self.num_res_blocks,
                conv_cfg=self.conv_cfg,
                norm_cfg=self.norm_cfg)
--- a/mmdet/models/roi_heads/mask_heads/scnet_semantic_head.py
+++ b/mmdet/models/roi_heads/mask_heads/scnet_semantic_head.py
@ -0,0 +1,27 @@
 from mmdet.models.builder import HEADS
 from mmdet.models.utils import ResLayer, SimplifiedBasicBlock
 from .fused_semantic_head import FusedSemanticHead
@HEADS.register_module()
 class SCNetSemanticHead(FusedSemanticHead):
    """Mask head for `SCNet <https://arxiv.org/abs/2012.10150>`_.
    Args:
        conv_to_res (bool, optional): if True, change the conv layers to
            ``SimplifiedBasicBlock``.
    """
    def __init__(self, conv_to_res=True, **kwargs):
        super(SCNetSemanticHead, self).__init__(**kwargs)
        self.conv_to_res = conv_to_res
        if self.conv_to_res:
            num_res_blocks = self.num_convs // 2
            self.convs = ResLayer(
                SimplifiedBasicBlock,
                self.in_channels,
                self.conv_out_channels,
                num_res_blocks,
                conv_cfg=self.conv_cfg,
                norm_cfg=self.norm_cfg)
            self.num_convs = num_res_blocks
--- a/mmdet/models/roi_heads/scnet_roi_head.py
+++ b/mmdet/models/roi_heads/scnet_roi_head.py
@ -0,0 +1,582 @@
 import torch
 import torch.nn.functional as F
 from mmdet.core import (bbox2result, bbox2roi, bbox_mapping, merge_aug_bboxes,
                        merge_aug_masks, multiclass_nms)
 from ..builder import HEADS, build_head, build_roi_extractor
 from .cascade_roi_head import CascadeRoIHead
@HEADS.register_module()
 class SCNetRoIHead(CascadeRoIHead):
    """RoIHead for `SCNet <https://arxiv.org/abs/2012.10150>`_.
    Args:
        num_stages (int): number of cascade stages.
        stage_loss_weights (list): loss weight of cascade stages.
        semantic_roi_extractor (dict): config to init semantic roi extractor.
        semantic_head (dict): config to init semantic head.
        feat_relay_head (dict): config to init feature_relay_head.
        glbctx_head (dict): config to init global context head.
    """
    def __init__(self,
                 num_stages,
                 stage_loss_weights,
                 semantic_roi_extractor=None,
                 semantic_head=None,
                 feat_relay_head=None,
                 glbctx_head=None,
                 **kwargs):
        super(SCNetRoIHead, self).__init__(num_stages, stage_loss_weights,
                                           **kwargs)
        assert self.with_bbox and self.with_mask
        assert not self.with_shared_head  # shared head is not supported
        if semantic_head is not None:
            self.semantic_roi_extractor = build_roi_extractor(
                semantic_roi_extractor)
            self.semantic_head = build_head(semantic_head)
        if feat_relay_head is not None:
            self.feat_relay_head = build_head(feat_relay_head)
        if glbctx_head is not None:
            self.glbctx_head = build_head(glbctx_head)
    def init_mask_head(self, mask_roi_extractor, mask_head):
        """Initialize ``mask_head``"""
        if mask_roi_extractor is not None:
            self.mask_roi_extractor = build_roi_extractor(mask_roi_extractor)
            self.mask_head = build_head(mask_head)
    def init_weights(self, pretrained):
        """Initialize the weights in head.
        Args:
            pretrained (str, optional): Path to pre-trained weights.
                Defaults to None.
        """
        for i in range(self.num_stages):
            if self.with_bbox:
                self.bbox_roi_extractor[i].init_weights()
                self.bbox_head[i].init_weights()
        if self.with_mask:
            self.mask_roi_extractor.init_weights()
            self.mask_head.init_weights()
        if self.with_semantic:
            self.semantic_head.init_weights()
        if self.with_glbctx:
            self.glbctx_head.init_weights()
        if self.with_feat_relay:
            self.feat_relay_head.init_weights()
    @property
    def with_semantic(self):
        """bool: whether the head has semantic head"""
        return hasattr(self,
                       'semantic_head') and self.semantic_head is not None
    @property
    def with_feat_relay(self):
        """bool: whether the head has feature relay head"""
        return (hasattr(self, 'feat_relay_head')
                and self.feat_relay_head is not None)
    @property
    def with_glbctx(self):
        """bool: whether the head has global context head"""
        return hasattr(self, 'glbctx_head') and self.glbctx_head is not None
    def _fuse_glbctx(self, roi_feats, glbctx_feat, rois):
        """Fuse global context feats with roi feats."""
        assert roi_feats.size(0) == rois.size(0)
        img_inds = torch.unique(rois[:, 0].cpu(), sorted=True).long()
        fused_feats = torch.zeros_like(roi_feats)
        for img_id in img_inds:
            inds = (rois[:, 0] == img_id.item())
            fused_feats[inds] = roi_feats[inds] + glbctx_feat[img_id]
        return fused_feats
    def _slice_pos_feats(self, feats, sampling_results):
        """Get features from pos rois."""
        num_rois = [res.bboxes.size(0) for res in sampling_results]
        num_pos_rois = [res.pos_bboxes.size(0) for res in sampling_results]
        inds = torch.zeros(sum(num_rois), dtype=torch.bool)
        start = 0
        for i in range(len(num_rois)):
            start = 0 if i == 0 else start + num_rois[i - 1]
            stop = start + num_pos_rois[i]
            inds[start:stop] = 1
        sliced_feats = feats[inds]
        return sliced_feats
    def _bbox_forward(self,
                      stage,
                      x,
                      rois,
                      semantic_feat=None,
                      glbctx_feat=None):
        """Box head forward function used in both training and testing."""
        bbox_roi_extractor = self.bbox_roi_extractor[stage]
        bbox_head = self.bbox_head[stage]
        bbox_feats = bbox_roi_extractor(
            x[:len(bbox_roi_extractor.featmap_strides)], rois)
        if self.with_semantic and semantic_feat is not None:
            bbox_semantic_feat = self.semantic_roi_extractor([semantic_feat],
                                                             rois)
            if bbox_semantic_feat.shape[-2:] != bbox_feats.shape[-2:]:
                bbox_semantic_feat = F.adaptive_avg_pool2d(
                    bbox_semantic_feat, bbox_feats.shape[-2:])
            bbox_feats += bbox_semantic_feat
        if self.with_glbctx and glbctx_feat is not None:
            bbox_feats = self._fuse_glbctx(bbox_feats, glbctx_feat, rois)
        cls_score, bbox_pred, relayed_feat = bbox_head(
            bbox_feats, return_shared_feat=True)
        bbox_results = dict(
            cls_score=cls_score,
            bbox_pred=bbox_pred,
            relayed_feat=relayed_feat)
        return bbox_results
    def _mask_forward(self,
                      x,
                      rois,
                      semantic_feat=None,
                      glbctx_feat=None,
                      relayed_feat=None):
        """Mask head forward function used in both training and testing."""
        mask_feats = self.mask_roi_extractor(
            x[:self.mask_roi_extractor.num_inputs], rois)
        if self.with_semantic and semantic_feat is not None:
            mask_semantic_feat = self.semantic_roi_extractor([semantic_feat],
                                                             rois)
            if mask_semantic_feat.shape[-2:] != mask_feats.shape[-2:]:
                mask_semantic_feat = F.adaptive_avg_pool2d(
                    mask_semantic_feat, mask_feats.shape[-2:])
            mask_feats += mask_semantic_feat
        if self.with_glbctx and glbctx_feat is not None:
            mask_feats = self._fuse_glbctx(mask_feats, glbctx_feat, rois)
        if self.with_feat_relay and relayed_feat is not None:
            mask_feats = mask_feats + relayed_feat
        mask_pred = self.mask_head(mask_feats)
        mask_results = dict(mask_pred=mask_pred)
        return mask_results
    def _bbox_forward_train(self,
                            stage,
                            x,
                            sampling_results,
                            gt_bboxes,
                            gt_labels,
                            rcnn_train_cfg,
                            semantic_feat=None,
                            glbctx_feat=None):
        """Run forward function and calculate loss for box head in training."""
        bbox_head = self.bbox_head[stage]
        rois = bbox2roi([res.bboxes for res in sampling_results])
        bbox_results = self._bbox_forward(
            stage,
            x,
            rois,
            semantic_feat=semantic_feat,
            glbctx_feat=glbctx_feat)
        bbox_targets = bbox_head.get_targets(sampling_results, gt_bboxes,
                                             gt_labels, rcnn_train_cfg)
        loss_bbox = bbox_head.loss(bbox_results['cls_score'],
                                   bbox_results['bbox_pred'], rois,
                                   *bbox_targets)
        bbox_results.update(
            loss_bbox=loss_bbox, rois=rois, bbox_targets=bbox_targets)
        return bbox_results
    def _mask_forward_train(self,
                            x,
                            sampling_results,
                            gt_masks,
                            rcnn_train_cfg,
                            semantic_feat=None,
                            glbctx_feat=None,
                            relayed_feat=None):
        """Run forward function and calculate loss for mask head in
        training."""
        pos_rois = bbox2roi([res.pos_bboxes for res in sampling_results])
        mask_results = self._mask_forward(
            x,
            pos_rois,
            semantic_feat=semantic_feat,
            glbctx_feat=glbctx_feat,
            relayed_feat=relayed_feat)
        mask_targets = self.mask_head.get_targets(sampling_results, gt_masks,
                                                  rcnn_train_cfg)
        pos_labels = torch.cat([res.pos_gt_labels for res in sampling_results])
        loss_mask = self.mask_head.loss(mask_results['mask_pred'],
                                        mask_targets, pos_labels)
        mask_results = loss_mask
        return mask_results
    def forward_train(self,
                      x,
                      img_metas,
                      proposal_list,
                      gt_bboxes,
                      gt_labels,
                      gt_bboxes_ignore=None,
                      gt_masks=None,
                      gt_semantic_seg=None):
        """
        Args:
            x (list[Tensor]): list of multi-level img features.
            img_metas (list[dict]): list of image info dict where each dict
                has: 'img_shape', 'scale_factor', 'flip', and may also contain
                'filename', 'ori_shape', 'pad_shape', and 'img_norm_cfg'.
                For details on the values of these keys see
                `mmdet/datasets/pipelines/formatting.py:Collect`.
            proposal_list (list[Tensors]): list of region proposals.
            gt_bboxes (list[Tensor]): Ground truth bboxes for each image with
                shape (num_gts, 4) in [tl_x, tl_y, br_x, br_y] format.
            gt_labels (list[Tensor]): class indices corresponding to each box
            gt_bboxes_ignore (None, list[Tensor]): specify which bounding
                boxes can be ignored when computing the loss.
            gt_masks (None, Tensor) : true segmentation masks for each box
                used if the architecture supports a segmentation task.
            gt_semantic_seg (None, list[Tensor]): semantic segmentation masks
                used if the architecture supports semantic segmentation task.
        Returns:
            dict[str, Tensor]: a dictionary of loss components
        """
        losses = dict()
        # semantic segmentation branch
        if self.with_semantic:
            semantic_pred, semantic_feat = self.semantic_head(x)
            loss_seg = self.semantic_head.loss(semantic_pred, gt_semantic_seg)
            losses['loss_semantic_seg'] = loss_seg
        else:
            semantic_feat = None
        # global context branch
        if self.with_glbctx:
            mc_pred, glbctx_feat = self.glbctx_head(x)
            loss_glbctx = self.glbctx_head.loss(mc_pred, gt_labels)
            losses['loss_glbctx'] = loss_glbctx
        else:
            glbctx_feat = None
        for i in range(self.num_stages):
            self.current_stage = i
            rcnn_train_cfg = self.train_cfg[i]
            lw = self.stage_loss_weights[i]
            # assign gts and sample proposals
            sampling_results = []
            bbox_assigner = self.bbox_assigner[i]
            bbox_sampler = self.bbox_sampler[i]
            num_imgs = len(img_metas)
            if gt_bboxes_ignore is None:
                gt_bboxes_ignore = [None for _ in range(num_imgs)]
            for j in range(num_imgs):
                assign_result = bbox_assigner.assign(proposal_list[j],
                                                     gt_bboxes[j],
                                                     gt_bboxes_ignore[j],
                                                     gt_labels[j])
                sampling_result = bbox_sampler.sample(
                    assign_result,
                    proposal_list[j],
                    gt_bboxes[j],
                    gt_labels[j],
                    feats=[lvl_feat[j][None] for lvl_feat in x])
                sampling_results.append(sampling_result)
            bbox_results = \
                self._bbox_forward_train(
                    i, x, sampling_results, gt_bboxes, gt_labels,
                    rcnn_train_cfg, semantic_feat, glbctx_feat)
            roi_labels = bbox_results['bbox_targets'][0]
            for name, value in bbox_results['loss_bbox'].items():
                losses[f's{i}.{name}'] = (
                    value * lw if 'loss' in name else value)
            # refine boxes
            if i < self.num_stages - 1:
                pos_is_gts = [res.pos_is_gt for res in sampling_results]
                with torch.no_grad():
                    proposal_list = self.bbox_head[i].refine_bboxes(
                        bbox_results['rois'], roi_labels,
                        bbox_results['bbox_pred'], pos_is_gts, img_metas)
        if self.with_feat_relay:
            relayed_feat = self._slice_pos_feats(bbox_results['relayed_feat'],
                                                 sampling_results)
            relayed_feat = self.feat_relay_head(relayed_feat)
        else:
            relayed_feat = None
        mask_results = self._mask_forward_train(x, sampling_results, gt_masks,
                                                rcnn_train_cfg, semantic_feat,
                                                glbctx_feat, relayed_feat)
        mask_lw = sum(self.stage_loss_weights)
        losses['loss_mask'] = mask_lw * mask_results['loss_mask']
        return losses
    def simple_test(self, x, proposal_list, img_metas, rescale=False):
        """Test without augmentation."""
        if self.with_semantic:
            _, semantic_feat = self.semantic_head(x)
        else:
            semantic_feat = None
        if self.with_glbctx:
            mc_pred, glbctx_feat = self.glbctx_head(x)
        else:
            glbctx_feat = None
        num_imgs = len(proposal_list)
        img_shapes = tuple(meta['img_shape'] for meta in img_metas)
        ori_shapes = tuple(meta['ori_shape'] for meta in img_metas)
        scale_factors = tuple(meta['scale_factor'] for meta in img_metas)
        # "ms" in variable names means multi-stage
        ms_scores = []
        rcnn_test_cfg = self.test_cfg
        rois = bbox2roi(proposal_list)
        for i in range(self.num_stages):
            bbox_head = self.bbox_head[i]
            bbox_results = self._bbox_forward(
                i,
                x,
                rois,
                semantic_feat=semantic_feat,
                glbctx_feat=glbctx_feat)
            # split batch bbox prediction back to each image
            cls_score = bbox_results['cls_score']
            bbox_pred = bbox_results['bbox_pred']
            num_proposals_per_img = tuple(len(p) for p in proposal_list)
            rois = rois.split(num_proposals_per_img, 0)
            cls_score = cls_score.split(num_proposals_per_img, 0)
            bbox_pred = bbox_pred.split(num_proposals_per_img, 0)
            ms_scores.append(cls_score)
            if i < self.num_stages - 1:
                bbox_label = [s[:, :-1].argmax(dim=1) for s in cls_score]
                rois = torch.cat([
                    bbox_head.regress_by_class(rois[i], bbox_label[i],
                                               bbox_pred[i], img_metas[i])
                    for i in range(num_imgs)
                ])
        # average scores of each image by stages
        cls_score = [
            sum([score[i] for score in ms_scores]) / float(len(ms_scores))
            for i in range(num_imgs)
        ]
        # apply bbox post-processing to each image individually
        det_bboxes = []
        det_labels = []
        for i in range(num_imgs):
            det_bbox, det_label = self.bbox_head[-1].get_bboxes(
                rois[i],
                cls_score[i],
                bbox_pred[i],
                img_shapes[i],
                scale_factors[i],
                rescale=rescale,
                cfg=rcnn_test_cfg)
            det_bboxes.append(det_bbox)
            det_labels.append(det_label)
        det_bbox_results = [
            bbox2result(det_bboxes[i], det_labels[i],
                        self.bbox_head[-1].num_classes)
            for i in range(num_imgs)
        ]
        if self.with_mask:
            if all(det_bbox.shape[0] == 0 for det_bbox in det_bboxes):
                mask_classes = self.mask_head.num_classes
                det_segm_results = [[[] for _ in range(mask_classes)]
                                    for _ in range(num_imgs)]
            else:
                if rescale and not isinstance(scale_factors[0], float):
                    scale_factors = [
                        torch.from_numpy(scale_factor).to(det_bboxes[0].device)
                        for scale_factor in scale_factors
                    ]
                _bboxes = [
                    det_bboxes[i][:, :4] *
                    scale_factors[i] if rescale else det_bboxes[i]
                    for i in range(num_imgs)
                ]
                mask_rois = bbox2roi(_bboxes)
                # get relay feature on mask_rois
                bbox_results = self._bbox_forward(
                    -1,
                    x,
                    mask_rois,
                    semantic_feat=semantic_feat,
                    glbctx_feat=glbctx_feat)
                relayed_feat = bbox_results['relayed_feat']
                relayed_feat = self.feat_relay_head(relayed_feat)
                mask_results = self._mask_forward(
                    x,
                    mask_rois,
                    semantic_feat=semantic_feat,
                    glbctx_feat=glbctx_feat,
                    relayed_feat=relayed_feat)
                mask_pred = mask_results['mask_pred']
                # split batch mask prediction back to each image
                num_bbox_per_img = tuple(len(_bbox) for _bbox in _bboxes)
                mask_preds = mask_pred.split(num_bbox_per_img, 0)
                # apply mask post-processing to each image individually
                det_segm_results = []
                for i in range(num_imgs):
                    if det_bboxes[i].shape[0] == 0:
                        det_segm_results.append(
                            [[] for _ in range(self.mask_head.num_classes)])
                    else:
                        segm_result = self.mask_head.get_seg_masks(
                            mask_preds[i], _bboxes[i], det_labels[i],
                            self.test_cfg, ori_shapes[i], scale_factors[i],
                            rescale)
                        det_segm_results.append(segm_result)
        # return results
        if self.with_mask:
            return list(zip(det_bbox_results, det_segm_results))
        else:
            return det_bbox_results
    def aug_test(self, img_feats, proposal_list, img_metas, rescale=False):
        if self.with_semantic:
            semantic_feats = [
                self.semantic_head(feat)[1] for feat in img_feats
            ]
        else:
            semantic_feats = [None] * len(img_metas)
        if self.with_glbctx:
            glbctx_feats = [self.glbctx_head(feat)[1] for feat in img_feats]
        else:
            glbctx_feats = [None] * len(img_metas)
        rcnn_test_cfg = self.test_cfg
        aug_bboxes = []
        aug_scores = []
        for x, img_meta, semantic_feat, glbctx_feat in zip(
                img_feats, img_metas, semantic_feats, glbctx_feats):
            # only one image in the batch
            img_shape = img_meta[0]['img_shape']
            scale_factor = img_meta[0]['scale_factor']
            flip = img_meta[0]['flip']
            proposals = bbox_mapping(proposal_list[0][:, :4], img_shape,
                                     scale_factor, flip)
            # "ms" in variable names means multi-stage
            ms_scores = []
            rois = bbox2roi([proposals])
            for i in range(self.num_stages):
                bbox_head = self.bbox_head[i]
                bbox_results = self._bbox_forward(
                    i,
                    x,
                    rois,
                    semantic_feat=semantic_feat,
                    glbctx_feat=glbctx_feat)
                ms_scores.append(bbox_results['cls_score'])
                if i < self.num_stages - 1:
                    bbox_label = bbox_results['cls_score'].argmax(dim=1)
                    rois = bbox_head.regress_by_class(
                        rois, bbox_label, bbox_results['bbox_pred'],
                        img_meta[0])
            cls_score = sum(ms_scores) / float(len(ms_scores))
            bboxes, scores = self.bbox_head[-1].get_bboxes(
                rois,
                cls_score,
                bbox_results['bbox_pred'],
                img_shape,
                scale_factor,
                rescale=False,
                cfg=None)
            aug_bboxes.append(bboxes)
            aug_scores.append(scores)
        # after merging, bboxes will be rescaled to the original image size
        merged_bboxes, merged_scores = merge_aug_bboxes(
            aug_bboxes, aug_scores, img_metas, rcnn_test_cfg)
        det_bboxes, det_labels = multiclass_nms(merged_bboxes, merged_scores,
                                                rcnn_test_cfg.score_thr,
                                                rcnn_test_cfg.nms,
                                                rcnn_test_cfg.max_per_img)
        det_bbox_results = bbox2result(det_bboxes, det_labels,
                                       self.bbox_head[-1].num_classes)
        if self.with_mask:
            if det_bboxes.shape[0] == 0:
                det_segm_results = [[]
                                    for _ in range(self.mask_head.num_classes)]
            else:
                aug_masks = []
                for x, img_meta, semantic_feat, glbctx_feat in zip(
                        img_feats, img_metas, semantic_feats, glbctx_feats):
                    img_shape = img_meta[0]['img_shape']
                    scale_factor = img_meta[0]['scale_factor']
                    flip = img_meta[0]['flip']
                    _bboxes = bbox_mapping(det_bboxes[:, :4], img_shape,
                                           scale_factor, flip)
                    mask_rois = bbox2roi([_bboxes])
                    # get relay feature on mask_rois
                    bbox_results = self._bbox_forward(
                        -1,
                        x,
                        mask_rois,
                        semantic_feat=semantic_feat,
                        glbctx_feat=glbctx_feat)
                    relayed_feat = bbox_results['relayed_feat']
                    relayed_feat = self.feat_relay_head(relayed_feat)
                    mask_results = self._mask_forward(
                        x,
                        mask_rois,
                        semantic_feat=semantic_feat,
                        glbctx_feat=glbctx_feat,
                        relayed_feat=relayed_feat)
                    mask_pred = mask_results['mask_pred']
                    aug_masks.append(mask_pred.sigmoid().cpu().numpy())
                merged_masks = merge_aug_masks(aug_masks, img_metas,
                                               self.test_cfg)
                ori_shape = img_metas[0][0]['ori_shape']
                det_segm_results = self.mask_head.get_seg_masks(
                    merged_masks,
                    det_bboxes,
                    det_labels,
                    rcnn_test_cfg,
                    ori_shape,
                    scale_factor=1.0,
                    rescale=False)
            return [(det_bbox_results, det_segm_results)]
        else:
            return [det_bbox_results]
--- a/mmdet/models/utils/init.py
+++ b/mmdet/models/utils/init.py
@ -2,7 +2,7 @@ from .builder import build_positional_encoding, build_transformer
 from .gaussian_target import gaussian_radius, gen_gaussian_target
 from .positional_encoding import (LearnedPositionalEncoding,
                                  SinePositionalEncoding)
-from .res_layer import ResLayer
+from .res_layer import ResLayer, SimplifiedBasicBlock
 from .transformer import (FFN, DynamicConv, MultiheadAttention, Transformer,
                          TransformerDecoder, TransformerDecoderLayer,
                          TransformerEncoder, TransformerEncoderLayer)
@ -12,5 +12,5 @@ __all__ = [
    'FFN', 'TransformerEncoderLayer', 'TransformerEncoder',
    'TransformerDecoderLayer', 'TransformerDecoder', 'Transformer',
    'build_transformer', 'build_positional_encoding', 'SinePositionalEncoding',
-    'LearnedPositionalEncoding', 'DynamicConv'
+    'LearnedPositionalEncoding', 'DynamicConv', 'SimplifiedBasicBlock'
 ]
--- a/mmdet/models/utils/res_layer.py
+++ b/mmdet/models/utils/res_layer.py
@ -100,3 +100,88 @@ class ResLayer(nn.Sequential):
                    norm_cfg=norm_cfg,
                    **kwargs))
        super(ResLayer, self).__init__(*layers)
 class SimplifiedBasicBlock(nn.Module):
    """Simplified version of original basic residual block. This is used in
    `SCNet <https://arxiv.org/abs/2012.10150>`_.
    - Norm layer is now optional
    - Last ReLU in forward function is removed
    """
    expansion = 1
    def __init__(self,
                 inplanes,
                 planes,
                 stride=1,
                 dilation=1,
                 downsample=None,
                 style='pytorch',
                 with_cp=False,
                 conv_cfg=None,
                 norm_cfg=dict(type='BN'),
                 dcn=None,
                 plugins=None):
        super(SimplifiedBasicBlock, self).__init__()
        assert dcn is None, 'Not implemented yet.'
        assert plugins is None, 'Not implemented yet.'
        assert not with_cp, 'Not implemented yet.'
        self.with_norm = norm_cfg is not None
        with_bias = True if norm_cfg is None else False
        self.conv1 = build_conv_layer(
            conv_cfg,
            inplanes,
            planes,
            3,
            stride=stride,
            padding=dilation,
            dilation=dilation,
            bias=with_bias)
        if self.with_norm:
            self.norm1_name, norm1 = build_norm_layer(
                norm_cfg, planes, postfix=1)
            self.add_module(self.norm1_name, norm1)
        self.conv2 = build_conv_layer(
            conv_cfg, planes, planes, 3, padding=1, bias=with_bias)
        if self.with_norm:
            self.norm2_name, norm2 = build_norm_layer(
                norm_cfg, planes, postfix=2)
            self.add_module(self.norm2_name, norm2)
        self.relu = nn.ReLU(inplace=True)
        self.downsample = downsample
        self.stride = stride
        self.dilation = dilation
        self.with_cp = with_cp
    @property
    def norm1(self):
        """nn.Module: normalization layer after the first convolution layer"""
        return getattr(self, self.norm1_name) if self.with_norm else None
    @property
    def norm2(self):
        """nn.Module: normalization layer after the second convolution layer"""
        return getattr(self, self.norm2_name) if self.with_norm else None
    def forward(self, x):
        """Forward function."""
        identity = x
        out = self.conv1(x)
        if self.with_norm:
            out = self.norm1(out)
        out = self.relu(out)
        out = self.conv2(out)
        if self.with_norm:
            out = self.norm2(out)
        if self.downsample is not None:
            identity = self.downsample(x)
        out += identity
        return out
--- a/tests/test_config.py
+++ b/tests/test_config.py
@ -167,10 +167,11 @@ def _check_roi_head(config, head):
 def _check_roi_extractor(config, roi_extractor, prev_roi_extractor=None):
    import torch.nn as nn
    # Separate roi_extractor and prev_roi_extractor checks for flexibility
    if isinstance(roi_extractor, nn.ModuleList):
        if prev_roi_extractor:
            prev_roi_extractor = prev_roi_extractor[0]
        roi_extractor = roi_extractor[0]
    if prev_roi_extractor and isinstance(prev_roi_extractor, nn.ModuleList):
        prev_roi_extractor = prev_roi_extractor[0]
    assert (len(config.featmap_strides) == len(roi_extractor.roi_layers))
    assert (config.out_channels == roi_extractor.out_channels)
--- a/tests/test_data/test_models_aug_test.py
+++ b/tests/test_data/test_models_aug_test.py
@ -87,6 +87,14 @@ def test_htc_aug_test():
    assert len(aug_result[0][1]) == 80
 def test_scnet_aug_test():
    aug_result = model_aug_test_template(
        'configs/scnet/scnet_r50_fpn_1x_coco.py')
    assert len(aug_result[0]) == 2
    assert len(aug_result[0][0]) == 80
    assert len(aug_result[0][1]) == 80
 def test_cornernet_aug_test():
    # get config
    cfg = mmcv.Config.fromfile(
--- a/tests/test_models/test_backbones.py
+++ b/tests/test_models/test_backbones.py
@ -12,12 +12,13 @@ from mmdet.models.backbones.resnest import Bottleneck as BottleneckS
 from mmdet.models.backbones.resnet import BasicBlock, Bottleneck
 from mmdet.models.backbones.resnext import Bottleneck as BottleneckX
 from mmdet.models.backbones.trident_resnet import TridentBottleneck
-from mmdet.models.utils import ResLayer
+from mmdet.models.utils import ResLayer, SimplifiedBasicBlock
 def is_block(modules):
    """Check if is ResNet building block."""
-    if isinstance(modules, (BasicBlock, Bottleneck, BottleneckX, Bottle2neck)):
+    if isinstance(modules, (BasicBlock, Bottleneck, BottleneckX, Bottle2neck,
                            SimplifiedBasicBlock)):
        return True
    return False
@ -381,6 +382,59 @@ def test_trident_resnet_bottleneck():
    assert x_out.shape == torch.Size([block.num_branch, 64, 56, 56])
 def test_simplied_basic_block():
    with pytest.raises(AssertionError):
        # Not implemented yet.
        dcn = dict(type='DCN', deform_groups=1, fallback_on_stride=False)
        SimplifiedBasicBlock(64, 64, dcn=dcn)
    with pytest.raises(AssertionError):
        # Not implemented yet.
        plugins = [
            dict(
                cfg=dict(type='ContextBlock', ratio=1. / 16),
                position='after_conv3')
        ]
        SimplifiedBasicBlock(64, 64, plugins=plugins)
    with pytest.raises(AssertionError):
        # Not implemented yet
        plugins = [
            dict(
                cfg=dict(
                    type='GeneralizedAttention',
                    spatial_range=-1,
                    num_heads=8,
                    attention_type='0010',
                    kv_stride=2),
                position='after_conv2')
        ]
        SimplifiedBasicBlock(64, 64, plugins=plugins)
    with pytest.raises(AssertionError):
        # Not implemented yet
        SimplifiedBasicBlock(64, 64, with_cp=True)
    # test SimplifiedBasicBlock structure and forward
    block = SimplifiedBasicBlock(64, 64)
    assert block.conv1.in_channels == 64
    assert block.conv1.out_channels == 64
    assert block.conv1.kernel_size == (3, 3)
    assert block.conv2.in_channels == 64
    assert block.conv2.out_channels == 64
    assert block.conv2.kernel_size == (3, 3)
    x = torch.randn(1, 64, 56, 56)
    x_out = block(x)
    assert x_out.shape == torch.Size([1, 64, 56, 56])
    # test SimplifiedBasicBlock without norm
    block = SimplifiedBasicBlock(64, 64, norm_cfg=None)
    assert block.norm1 is None
    assert block.norm2 is None
    x_out = block(x)
    assert x_out.shape == torch.Size([1, 64, 56, 56])
 def test_trident_resnet_backbone():
    tridentresnet_config = dict(
        num_branch=3,
--- a/tests/test_models/test_forward.py
+++ b/tests/test_models/test_forward.py
@ -227,14 +227,24 @@ def test_faster_rcnn_ohem_forward():
    assert float(loss.item()) > 0
 # HTC is not ready yet
@pytest.mark.parametrize('cfg_file', [
    'cascade_rcnn/cascade_mask_rcnn_r50_fpn_1x_coco.py',
    'mask_rcnn/mask_rcnn_r50_fpn_1x_coco.py',
    'grid_rcnn/grid_rcnn_r50_fpn_gn-head_2x_coco.py',
-    'ms_rcnn/ms_rcnn_r50_fpn_1x_coco.py'
+    'ms_rcnn/ms_rcnn_r50_fpn_1x_coco.py',
    'htc/htc_r50_fpn_1x_coco.py',
    'scnet/scnet_r50_fpn_20e_coco.py',
 ])
 def test_two_stage_forward(cfg_file):
    models_with_semantic = [
        'htc/htc_r50_fpn_1x_coco.py',
        'scnet/scnet_r50_fpn_20e_coco.py',
    ]
    if cfg_file in models_with_semantic:
        with_semantic = True
    else:
        with_semantic = False
    model = _get_detector_cfg(cfg_file)
    model['pretrained'] = None
@ -244,19 +254,11 @@ def test_two_stage_forward(cfg_file):
    input_shape = (1, 3, 256, 256)
    # Test forward train with a non-empty truth batch
-    mm_inputs = _demo_mm_inputs(input_shape, num_items=[10])
+    mm_inputs = _demo_mm_inputs(
        input_shape, num_items=[10], with_semantic=with_semantic)
    imgs = mm_inputs.pop('imgs')
    img_metas = mm_inputs.pop('img_metas')
-    gt_bboxes = mm_inputs['gt_bboxes']
+    losses = detector.forward(imgs, img_metas, return_loss=True, **mm_inputs)
    gt_labels = mm_inputs['gt_labels']
    gt_masks = mm_inputs['gt_masks']
    losses = detector.forward(
        imgs,
        img_metas,
        gt_bboxes=gt_bboxes,
        gt_labels=gt_labels,
        gt_masks=gt_masks,
        return_loss=True)
    assert isinstance(losses, dict)
    loss, _ = detector._parse_losses(losses)
    loss.requires_grad_(True)
@ -264,19 +266,11 @@ def test_two_stage_forward(cfg_file):
    loss.backward()
    # Test forward train with an empty truth batch
-    mm_inputs = _demo_mm_inputs(input_shape, num_items=[0])
+    mm_inputs = _demo_mm_inputs(
        input_shape, num_items=[0], with_semantic=with_semantic)
    imgs = mm_inputs.pop('imgs')
    img_metas = mm_inputs.pop('img_metas')
-    gt_bboxes = mm_inputs['gt_bboxes']
+    losses = detector.forward(imgs, img_metas, return_loss=True, **mm_inputs)
    gt_labels = mm_inputs['gt_labels']
    gt_masks = mm_inputs['gt_masks']
    losses = detector.forward(
        imgs,
        img_metas,
        gt_bboxes=gt_bboxes,
        gt_labels=gt_labels,
        gt_masks=gt_masks,
        return_loss=True)
    assert isinstance(losses, dict)
    loss, _ = detector._parse_losses(losses)
    loss.requires_grad_(True)
@ -330,7 +324,8 @@ def test_single_stage_forward_cpu(cfg_file):
 def _demo_mm_inputs(input_shape=(1, 3, 300, 300),
-                    num_items=None, num_classes=10):  # yapf: disable
+                    num_items=None, num_classes=10,
                    with_semantic=False):  # yapf: disable
    """Create a superset of inputs needed to run test or train batches.
    Args:
@ -358,6 +353,7 @@ def _demo_mm_inputs(input_shape=(1, 3, 300, 300),
        'filename': '<demo>.png',
        'scale_factor': 1.0,
        'flip': False,
        'flip_direction': None,
    } for _ in range(N)]
    gt_bboxes = []
@ -394,6 +390,14 @@ def _demo_mm_inputs(input_shape=(1, 3, 300, 300),
        'gt_bboxes_ignore': None,
        'gt_masks': gt_masks,
    }
    if with_semantic:
        # assume gt_semantic_seg using scale 1/8 of the img
        gt_semantic_seg = np.random.randint(
            0, num_classes, (1, 1, H // 8, W // 8), dtype=np.uint8)
        mm_inputs.update(
            {'gt_semantic_seg': torch.ByteTensor(gt_semantic_seg)})
    return mm_inputs
		`@ -0,0 +1,2 @@`
							`_base_ = './scnet_r50_fpn_20e_coco.py'`
							`model = dict(pretrained='torchvision://resnet101', backbone=dict(depth=101))`