@ -95,7 +95,8 @@ We also provide [pretrain/viz_spconv.ipynb](pretrain/viz_spconv.ipynb) that show
<summary>catalog</summary>
<summary>catalog</summary>
- [x] Pretraining code
- [x] Pretraining code
- [x] Pretraining toturial for custom CNN model ([pretrain/models/custom.py](pretrain/models/custom.py))
- [x] Pretraining toturial for customized CNN model ([Tutorial for pretraining your own CNN model](https://github.com/keyu-tian/SparK/tree/main/pretrain/#tutorial-for-pretraining-your-own-cnn-model))
- [x] Pretraining toturial for customized dataset ([Tutorial for pretraining your own dataset](https://github.com/keyu-tian/SparK/tree/main/pretrain/#tutorial-for-pretraining-your-own-dataset))
See [/INSTALL.md](/INSTALL.md) to prepare `pip` dependencies and the ImageNet dataset.
See [/INSTALL.md](/INSTALL.md) to prepare `pip` dependencies and the ImageNet dataset.
**Note: for network definitions, we directly use `timm.models.ResNet` and [official ConvNeXt](https://github.com/facebookresearch/ConvNeXt/blob/048efcea897d999aed302f2639b6270aedf8d4c8/models/convnext.py).**
**Note: for network definitions, we directly use `timm.models.ResNet` and [official ConvNeXt](https://github.com/facebookresearch/ConvNeXt/blob/048efcea897d999aed302f2639b6270aedf8d4c8/models/convnext.py).**
## Tutorial for customizing your own CNN model
## Tutorial for pretraining your own CNN model
See [/pretrain/models/custom.py](/pretrain/models/custom.py). The things needed to do is:
See [/pretrain/models/custom.py](/pretrain/models/custom.py). The things needed to do is:
@ -15,19 +15,37 @@ See [/pretrain/models/custom.py](/pretrain/models/custom.py). The things needed
- define `your_convnet(...)` with `@register_model` in [/pretrain/models/custom.py line54](/pretrain/models/custom.py#L53-L54).
- define `your_convnet(...)` with `@register_model` in [/pretrain/models/custom.py line54](/pretrain/models/custom.py#L53-L54).
- add default kwargs of `your_convnet(...)` in [/pretrain/models/\_\_init\_\_.py line34](/pretrain/models/__init__.py#L34).
- add default kwargs of `your_convnet(...)` in [/pretrain/models/\_\_init\_\_.py line34](/pretrain/models/__init__.py#L34).
Then you can use `--model=your_convnet` in the pre-training script.
Then you can use `--model=your_convnet` in the pretraining script.
## Pre-training Any Model on ImageNet-1k (224x224)
## Tutorial for pretraining your own dataset
For pre-training, run [/pretrain/main.sh](/pretrain/main.sh) with bash.
Replace the function `build_dataset_to_pretrain` in [line54-75 of /pretrain/utils/imagenet.py](/pretrain/utils/imagenet.py#L54-L75) to yours.
This function should return a `Dataset` object. You may use args like `args.data_path` and `args.input_size` to help build your dataset. And when runing experiment with `main.sh` you can use `--data_path=... --input_size=...` to specify them.
Note the batch size `--bs` is the total batch size of all GPU, which may also need to be tuned.
## Debug on 1 GPU (without DistributedDataParallel)
Use a small batch size `--bs=32` for avoiding OOM.
For pretraining, run [/pretrain/main.sh](/pretrain/main.sh) with bash.
It is **required** to specify the ImageNet data folder (`--data_path`), the model name (`--model`), and your experiment name (the first argument of `main.sh`) when running the script.
It is **required** to specify the ImageNet data folder (`--data_path`), the model name (`--model`), and your experiment name (the first argument of `main.sh`) when running the script.
We use the **same** pre-training configurations (lr, batch size, etc.) for all models (ResNets and ConvNeXts).
We use the **same** pretraining configurations (lr, batch size, etc.) for all models (ResNets and ConvNeXts).
Their names and **default values** can be found in [/pretrain/utils/arg_util.py line23-44](/pretrain/utils/arg_util.py#L23-L44).
Their names and **default values** can be found in [/pretrain/utils/arg_util.py line23-44](/pretrain/utils/arg_util.py#L23-L44).
These default configurations (like batch size 4096) would be used, unless you specify some like `--bs=512`.
These default configurations (like batch size 4096) would be used, unless you specify some like `--bs=512`.
Here is an example command pre-training a ResNet50 on single machine with 8 GPUs:
**Note: the batch size `--bs` is the total batch size of all GPU, and the learning rate `--lr` is the base learning rate. The actual learning rate would be `lr * bs / 256`, as in [/pretrain/utils/arg_util.py line131](/pretrain/utils/arg_util.py#L131).**
Here is an example command pretraining a ResNet50 on single machine with 8 GPUs (we use DistributedDataParallel):
```shell script
```shell script
$ cd /path/to/SparK/pretrain
$ cd /path/to/SparK/pretrain
$ bash ./main.sh <experiment_name> \
$ bash ./main.sh <experiment_name> \
@ -44,9 +62,9 @@ For multiple machines, change the `--num_nodes` to your count, and plus these ar
Note the `<experiment_name>` is the name of your experiment, which would be used to create an output directory named `output_<experiment_name>`.
Note the `<experiment_name>` is the name of your experiment, which would be used to create an output directory named `output_<experiment_name>`.
## Pre-training ConvNeXt-Large on ImageNet-1k (384x384)
## Pretraining ConvNeXt-Large on ImageNet-1k (384x384)
For pre-training with resolution 384, we use a larger mask ratio (0.75), a smaller batch size (2048), and a larger learning rate (4e-4):
For pretraining with resolution 384, we use a larger mask ratio (0.75), a smaller batch size (2048), and a larger learning rate (4e-4):
Once an experiment starts running, the following files would be automatically created and updated in `output_<experiment_name>`:
Once an experiment starts running, the following files would be automatically created and updated in `output_<experiment_name>`:
- `<model>_still_pretraining.pth`: saves model and optimizer states, current epoch, current reconstruction loss, etc; can be used to resume pre-training
- `<model>_still_pretraining.pth`: saves model and optimizer states, current epoch, current reconstruction loss, etc; can be used to resume pretraining
- `<model>_1kpretrained.pth`: can be used for downstream fine-tuning
- `<model>_1kpretrained.pth`: can be used for downstream finetuning
- `pretrain_log.txt`: records some important information such as:
- `pretrain_log.txt`: records some important information such as:
- `git_commit_id`: git version
- `git_commit_id`: git version
- `cmd`: all arguments passed to the script
- `cmd`: all arguments passed to the script
It also reports the loss and remaining pre-training time at each epoch.
It also reports the loss and remaining pretraining time at each epoch.
- `stdout_backup.txt` and `stderr_backup.txt`: will save all output to stdout/stderr
- `stdout_backup.txt` and `stderr_backup.txt`: will save all output to stdout/stderr
@ -97,4 +115,4 @@ Here is the reason: when we do mask, we:
3. then progressively upsample it (i.e., expand its 2nd and 3rd dimensions by calling `repeat_interleave(..., 2)` and `repeat_interleave(..., 3)` in [/pretrain/encoder.py line16](/pretrain/encoder.py#L16)), to mask those feature maps ([`x` in line21](/pretrain/encoder.py#L21)) with larger resolutions .
3. then progressively upsample it (i.e., expand its 2nd and 3rd dimensions by calling `repeat_interleave(..., 2)` and `repeat_interleave(..., 3)` in [/pretrain/encoder.py line16](/pretrain/encoder.py#L16)), to mask those feature maps ([`x` in line21](/pretrain/encoder.py#L21)) with larger resolutions .
So if you want a patch size of 16 or 8, you should actually define a new CNN model with a downsample ratio of 16 or 8.
So if you want a patch size of 16 or 8, you should actually define a new CNN model with a downsample ratio of 16 or 8.
See [Tutorial for customizing your own CNN model (above)](https://github.com/keyu-tian/SparK/tree/main/pretrain/#tutorial-for-customizing-your-own-cnn-model).
See [Tutorial for pretraining your own CNN model (above)](https://github.com/keyu-tian/SparK/tree/main/pretrain/#tutorial-for-pretraining-your-own-cnn-model).