6.7 KiB
Preparation for ImageNet-1k pretraining
See /INSTALL.md to prepare pip
dependencies and the ImageNet dataset.
Note: for network definitions, we directly use timm.models.ResNet
and official ConvNeXt.
Tutorial for pretraining your own CNN model
See /pretrain/models/custom.py. The things needed to do is:
- implementing member function
get_downsample_ratio
in /pretrain/models/custom.py line20. - implementing member function
get_feature_map_channels
in /pretrain/models/custom.py line29. - implementing member function
forward
in /pretrain/models/custom.py line38. - define
your_convnet(...)
with@register_model
in /pretrain/models/custom.py line54. - add default kwargs of
your_convnet(...)
in /pretrain/models/__init__.py line34.
Then you can use --model=your_convnet
in the pretraining script.
Tutorial for pretraining your own dataset
Replace the function build_dataset_to_pretrain
in line54-75 of /pretrain/utils/imagenet.py to yours.
This function should return a Dataset
object. You may use args like args.data_path
and args.input_size
to help build your dataset. And when running experiment you can use --data_path=... --input_size=...
to specify them.
Note the batch size --bs
is the total batch size of all GPU, which may also need to be tuned.
Debug on 1 GPU (without DistributedDataParallel)
Use a small batch size --bs=32
for avoiding OOM.
python3 main.py --exp_name=debug --data_path=/path/to/imagenet --model=resnet50 --bs=32
Pretraining Any Model on ImageNet-1k (224x224)
For pretraining, run /pretrain/main.py with torchrun
.
It is required to specify the ImageNet data folder (--data_path
), your experiment name & log dir (--exp_name
and --exp_dir
, automatically created if not exists), and the model name (--model
, valid choices see the keys of 'pretrain_default_model_kwargs' in /pretrain/models/__init__.py line34).
We use the same pretraining configurations (lr, batch size, etc.) for all models (ResNets and ConvNeXts) in 224 pretraining.
Their names and default values are in /pretrain/utils/arg_util.py line23-44.
All these default configurations (like batch size 4096) would be used, unless you specify some like --bs=512
.
Note: the batch size --bs
is the total batch size of all GPU, and the learning rate --base_lr
is the base lr. The actual lr would be base_lr * bs / 256
, as in /pretrain/utils/arg_util.py line131. So don't use --lr
to specify a lr (will be ignored)
Here is an example to pretrain a ResNet50 on an 8-GPU single machine (we use DistributedDataParallel), overwriting the default batch size to 512:
$ cd /path/to/SparK/pretrain
$ torchrun --nproc_per_node=8 --nnodes=1 --node_rank=0 --master_addr=localhost --master_port=<some_port> main.py \
--data_path=/path/to/imagenet --exp_name=<your_exp_name> --exp_dir=/path/to/logdir \
--model=resnet50 --bs=512
For multiple machines, change the --nnodes
and --master_addr
to your configurations. E.g.:
$ torchrun --nproc_per_node=8 --nnodes=<your_nnodes> --node_rank=<rank_starts_from_0> --master_address=<some_address> --master_port=<some_port> main.py \
...
Pretraining ConvNeXt-Large on ImageNet-1k (384x384)
For 384 pretraining we use a larger mask ratio (0.75), a half batch size (2048), and a double base learning rate (4e-4):
$ cd /path/to/SparK/pretrain
$ torchrun --nproc_per_node=8 --nnodes=<your_nnodes> --node_rank=<rank_starts_from_0> --master_address=<some_address> --master_port=<some_port> main.py \
--data_path=/path/to/imagenet --exp_name=<your_exp_name> --exp_dir=/path/to/logdir \
--model=convnext_large --input_size=384 --mask=0.75 --bs=2048 --base_lr=4e-4
Logging
See files under --exp_dir
to track your experiment:
-
<model>_still_pretraining.pth
: saves model and optimizer states, current epoch, current reconstruction loss, etc; can be used to resume pretraining -
<model>_1kpretrained.pth
: can be used for downstream finetuning -
pretrain_log.txt
: records some important information such as:git_commit_id
: git versioncmd
: all arguments passed to the script
It also reports the loss and remaining pretraining time at each epoch.
-
tensorboard_log/
: saves a lot of tensorboard logs, you can visualize loss values, learning rates, gradient norms and more things viatensorboard --logdir /path/to/this/tensorboard_log/ --port 23333
. -
stdout_backup.txt
andstderr_backup.txt
: will save all output to stdout/stderr
Resuming
Add the arg --resume_from=path/to/<model>_still_pretraining.pth
to resume pretraining.
Regarding sparse convolution
We do not use sparse convolutions in this pytorch implementation, due to their limited optimization on modern hardware. As can be found in /pretrain/encoder.py, we use masked dense convolution to simulate submanifold sparse convolution. We also define some sparse pooling or normalization layers in /pretrain/encoder.py. All these "sparse" layers are implemented through pytorch built-in operators.
Some details: how we mask images and how to set the patch size
In SparK, the mask patch size equals to the downsample ratio of the CNN model (so there is no configuration like --patch_size=32
).
Here is the reason: when we do mask, we:
- first generate the binary mask for the smallest resolution feature map, i.e., generate the
_cur_active
oractive_b1ff
in /pretrain/spark.py line86-87, which is atorch.BoolTensor
shaped as[B, 1, fmap_h, fmap_w]
, and would be used to mask the smallest feature map. - then progressively upsample it (i.e., expand its 2nd and 3rd dimensions by calling
repeat_interleave(..., 2)
andrepeat_interleave(..., 3)
in /pretrain/encoder.py line16), to mask those feature maps (x
in line21) with larger resolutions .
So if you want a patch size of 16 or 8, you should actually define a new CNN model with a downsample ratio of 16 or 8. See Tutorial for pretraining your own CNN model (above).