keyu-tian
6ffe453fa5
|
2 years ago | |
---|---|---|
.. | ||
models | 2 years ago | |
utils | 2 years ago | |
viz_imgs | 2 years ago | |
README.md | 2 years ago | |
decoder.py | 2 years ago | |
dist.py | 2 years ago | |
encoder.py | 2 years ago | |
launch.py | 2 years ago | |
main.py | 2 years ago | |
main.sh | 2 years ago | |
requirements.txt | 2 years ago | |
sampler.py | 2 years ago | |
spark.py | 2 years ago | |
viz_reconstruction.ipynb | 2 years ago | |
viz_spconv.ipynb | 2 years ago |
README.md
Preparation for ImageNet-1k pre-training
See /INSTALL.md to prepare pip
dependencies and the ImageNet dataset.
Note: for network definitions, we directly use timm.models.ResNet
and official ConvNeXt.
Tutorial for customizing your own CNN model
See /pretrain/models/custom.py. The things needed to do is:
- implementing member function
get_downsample_ratio
in /pretrain/models/custom.py. - implementing member function
get_feature_map_channels
in /pretrain/models/custom.py. - implementing member function
forward
in /pretrain/models/custom.py. - define
your_convnet(...)
with@register_model
in /pretrain/models/custom.py. - add default kwargs of
your_convnet(...)
in /pretrain/models/init.py.
Then you can use --model=your_convnet
in the pre-training script.
Pre-training Any Model on ImageNet-1k (224x224)
For pre-training, run /pretrain/main.sh with bash.
It is required to specify the ImageNet data folder (--data_path
), the model name (--model
), and your experiment name (the first argument of main.sh
) when running the script.
We use the same pre-training configurations (lr, batch size, etc.) for all models (ResNets and ConvNeXts).
Their names and default values can be found in /pretrain/utils/arg_util.py line24-47.
These default configurations (like batch size 4096) would be used, unless you specify some like --bs=512
.
Here is an example command pre-training a ResNet50 on single machine with 8 GPUs:
$ cd /path/to/SparK/pretrain
$ bash ./main.sh <experiment_name> \
--num_nodes=1 --ngpu_per_node=8 \
--data_path=/path/to/imagenet \
--model=resnet50 --bs=512
For multiple machines, change the --num_nodes
to your count, and plus these args:
--node_rank=<rank_starts_from_0> --master_address=<some_address> --master_port=<some_port>
Note the <experiment_name>
is the name of your experiment, which would be used to create an output directory named output_<experiment_name>
.
Pre-training ConvNeXt-Large on ImageNet-1k (384x384)
For pre-training with resolution 384, we use a larger mask ratio (0.75), a smaller batch size (2048), and a larger learning rate (4e-4):
$ cd /path/to/SparK/pretrain
$ bash ./main.sh <experiment_name> \
--num_nodes=8 --ngpu_per_node=8 --node_rank=... --master_address=... --master_port=... \
--data_path=/path/to/imagenet \
--model=convnext_large --input_size=384 --mask=0.75 \
--bs=2048 --base_lr=4e-4
Logging
Once an experiment starts running, the following files would be automatically created and updated in output_<experiment_name>
:
-
<model>_still_pretraining.pth
: saves model and optimizer states, current epoch, current reconstruction loss, etc; can be used to resume pre-training -
<model>_1kpretrained.pth
: can be used for downstream fine-tuning -
pretrain_log.txt
: records some important information such as:git_commit_id
: git versioncmd
: all arguments passed to the script
It also reports the loss and remaining pre-training time at each epoch.
-
stdout_backup.txt
andstderr_backup.txt
: will save all output to stdout/stderr
These files can help trace the experiment well.
Resuming
Add --resume_from=path/to/<model>still_pretraining.pth
to resume from a saved checkpoint.
Regarding sparse convolution
We do not use sparse convolutions in this pytorch implementation, due to their limited optimization on modern hardwares. As can be found in /pretrain/encoder.py, we use masked dense convolution to simulate submanifold sparse convolution. We also define some sparse pooling or normalization layers in /pretrain/encoder.py. All these "sparse" layers are implemented through pytorch built-in operators.
Some details: how we mask images and how to set the patch size
In SparK, the mask patch size equals to the downsample ratio of the CNN model (so there is no configuration like --patch_size=32
).
Here is the reason: when we do mask, we:
- first generate the binary mask for the smallest resolution feature map, i.e., generate the
_cur_active
oractive_b1ff
in /pretrain/spark.py line86-87, which is atorch.BoolTensor
shaped as[B, 1, fmap_size, fmap_size]
, and would be used to mask the smallest feature map. - then progressively upsample it (i.e., expand its 2nd and 3rd dimensions by calling
repeat_interleave(..., 2)
andrepeat_interleave(..., 3)
in /pretrain/encoder.py line16), to mask those feature maps (x
in line21) with larger resolutions .
So if you want a patch size of 16 or 8, you should actually define a new CNN model with a downsample ratio of 16 or 8.
See Tutorial for customizing your own CNN model
above.