[upd] add more explanations of pretraining

2 years ago · 93cb3e4a92
parent f6bef8b555
commit 93cb3e4a92
1 changed files with 23 additions and 8 deletions
--- a/pretrain/README.md
+++ b/pretrain/README.md
@ -7,11 +7,12 @@ See [INSTALL.md](https://github.com/keyu-tian/SparK/blob/main/INSTALL.md) to pre

 ## Pre-training on ImageNet-1k from scratch

-Run [main.sh](https://github.com/keyu-tian/SparK/blob/main/pretrain/main.sh).
+For pre-training, run [main.sh](https://github.com/keyu-tian/SparK/blob/main/pretrain/main.sh) with bash.

-It is **required** to specify ImageNet data folder and model name to run pre-training.
-Besides, you can pass arbitrary key-word arguments (like `--ep=400 --bs=2048`) to `main.sh` to specify some pre-training hyperparameters (see [utils/arg_utils.py](https://github.com/keyu-tian/SparK/blob/main/pretrain/utils/arg_utils.py) for all hyperparameters and their default values).
+Note that it is **required** to specify the ImageNet data folder (`--data_path`) and model name (`--model`) to run pre-training.

+For **all** other configurations/hyperparameters, their names and **default values** can be found in [utils/arg_util.py line24-47](https://github.com/keyu-tian/SparK/blob/main/pretrain/utils/arg_util.py#L24).
+If you do not specify them like `--ep=800`, those default configurations would be used.

 Here is an example command pre-training a ResNet50 on single machine with 8 GPUs:
 ```shell script
@ -19,7 +20,7 @@ $ cd /path/to/SparK/pretrain
 $ bash ./main.sh <experiment_name> \
  --num_nodes=1 --ngpu_per_node=8 \
  --data_path=/path/to/imagenet \
-  --model=resnet50 --ep=1600 --bs=4096
+  --model=resnet50
 ```

 For multiple machines, change the `num_nodes` to your count and plus these args:
@ -54,8 +55,22 @@ Add `--resume_from=path/to/<model>still_pretraining.pth` to resume from a saved

 ## Regarding sparse convolution

-For generality, we use the masked convolution implemented in [encoder.py](https://github.com/keyu-tian/SparK/blob/main/pretrain/encoder.py) to simulate submanifold sparse convolution by default.
+We do not use sparse convolutions in this pytorch implementation, due to their limited optimization on modern hardwares.
+As can be found in [encoder.py](https://github.com/keyu-tian/SparK/blob/main/pretrain/encoder.py), we use masked dense convolution to simulate submanifold sparse convolution.
+We also define some sparse pooling or normalization layers in [encoder.py](https://github.com/keyu-tian/SparK/blob/main/pretrain/encoder.py).
+All these "sparse" layers are implemented through pytorch built-in operators.

-**For anyone who might want to run SparK on another architectures**:
-we recommend you to use the default masked convolution, 
-considering the limited optimization of sparse convolution on hardwares, and in particular the lack of efficient implementation of many modern operators like grouped conv and dilated conv.
+
+## Some details: how we mask images and how to set the patch size
+
+In SparK, the mask patch size **equals to** the downsample ratio of the CNN model (so there is no configuration like `--patch_size=32`).
+
+Here is the reason: when we do mask, we:
+
+1. first generate the binary mask for the **smallest** resolution feature map, i.e., generate the `_cur_active` or `active_b1ff` in [line86-87](https://github.com/keyu-tian/SparK/blob/main/pretrain/spark.py#L86), which is a `torch.BoolTensor` shaped as `[B, 1, fmap_size, fmap_size]`, and would be used to mask the smallest feature map.
+3. then progressively upsample it (i.e., expand its 2nd and 3rd dimensions by calling `repeat_interleave(..., 2)` and `repeat_interleave(..., 3)` in [line16](https://github.com/keyu-tian/SparK/blob/main/pretrain/encoder.py#L16)), to mask those feature maps ([`x` in line21](https://github.com/keyu-tian/SparK/blob/main/pretrain/encoder.py#L21)) with larger resolutions .
+
+So if you want a patch size of 16 or 8, you should actually define a new CNN model with a downsample ratio of 16 or 8.
+Note that the `forward` function of this CNN should have an arg named `hierarchy`. You can look at https://github.com/keyu-tian/SparK/blob/main/pretrain/models/convnext.py#L78 to see what `hierarchy` means and how to handle it.
+
+After that, you can simply run `main.sh` with `--hierarchy=3` and see if it works.