OpenMMLab Detection Toolbox and Benchmark
https://mmdetection.readthedocs.io/
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
5.2 KiB
5.2 KiB
Pyramid vision transformer: A versatile backbone for dense prediction without convolutions
Introduction
@article{wang2021pyramid,
title={Pyramid vision transformer: A versatile backbone for dense prediction without convolutions},
author={Wang, Wenhai and Xie, Enze and Li, Xiang and Fan, Deng-Ping and Song, Kaitao and Liang, Ding and Lu, Tong and Luo, Ping and Shao, Ling},
journal={arXiv preprint arXiv:2102.12122},
year={2021}
}
@article{wang2021pvtv2,
title={PVTv2: Improved Baselines with Pyramid Vision Transformer},
author={Wang, Wenhai and Xie, Enze and Li, Xiang and Fan, Deng-Ping and Song, Kaitao and Liang, Ding and Lu, Tong and Luo, Ping and Shao, Ling},
journal={arXiv preprint arXiv:2106.13797},
year={2021}
}
Results and Models
RetinaNet (PVTv1)
Backbone | Lr schd | Mem (GB) | box AP | Config | Download |
---|---|---|---|---|---|
PVT-Tiny | 12e | 8.5 | 36.6 | config | model | log |
PVT-Small | 12e | 14.5 | 40.4 | config | model | log |
PVT-Medium | 12e | 20.9 | 41.7 | config | model | log |
RetinaNet (PVTv2)
Backbone | Lr schd | Mem (GB) | box AP | Config | Download |
---|---|---|---|---|---|
PVTv2-B0 | 12e | 7.4 | 37.1 | config | model | log |
PVTv2-B1 | 12e | 9.5 | 41.2 | config | model | log |
PVTv2-B2 | 12e | 16.2 | 44.6 | config | model | log |
PVTv2-B3 | 12e | 23.0 | 46.0 | config | model | log |
PVTv2-B4 | 12e | 17.0 | 46.3 | config | model | log |
PVTv2-B5 | 12e | 18.7 | 46.1 | config | model | log |