OpenMMLab Detection Toolbox and Benchmark https://mmdetection.readthedocs.io/
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 

5.2 KiB

Pyramid vision transformer: A versatile backbone for dense prediction without convolutions

Introduction

@article{wang2021pyramid,
  title={Pyramid vision transformer: A versatile backbone for dense prediction without convolutions},
  author={Wang, Wenhai and Xie, Enze and Li, Xiang and Fan, Deng-Ping and Song, Kaitao and Liang, Ding and Lu, Tong and Luo, Ping and Shao, Ling},
  journal={arXiv preprint arXiv:2102.12122},
  year={2021}
}
@article{wang2021pvtv2,
  title={PVTv2: Improved Baselines with Pyramid Vision Transformer},
  author={Wang, Wenhai and Xie, Enze and Li, Xiang and Fan, Deng-Ping and Song, Kaitao and Liang, Ding and Lu, Tong and Luo, Ping and Shao, Ling},
  journal={arXiv preprint arXiv:2106.13797},
  year={2021}
}

Results and Models

RetinaNet (PVTv1)

Backbone Lr schd Mem (GB) box AP Config Download
PVT-Tiny 12e 8.5 36.6 config model | log
PVT-Small 12e 14.5 40.4 config model | log
PVT-Medium 12e 20.9 41.7 config model | log

RetinaNet (PVTv2)

Backbone Lr schd Mem (GB) box AP Config Download
PVTv2-B0 12e 7.4 37.1 config model | log
PVTv2-B1 12e 9.5 41.2 config model | log
PVTv2-B2 12e 16.2 44.6 config model | log
PVTv2-B3 12e 23.0 46.0 config model | log
PVTv2-B4 12e 17.0 46.3 config model | log
PVTv2-B5 12e 18.7 46.1 config model | log