You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

63 lines
2.9 KiB

2 years ago
## Preparation for ImageNet-1k fine-tuning
2 years ago
2 years ago
See [INSTALL.md](https://github.com/keyu-tian/SparK/blob/main/INSTALL.md) to prepare dependencies and ImageNet dataset.
2 years ago
2 years ago
**Note: for network definitions, we directly use `timm.models.ResNet` and [official ConvNeXt](https://github.com/facebookresearch/ConvNeXt/blob/048efcea897d999aed302f2639b6270aedf8d4c8/models/convnext.py).**
2 years ago
2 years ago
## Pre-training on ImageNet-1k from scratch
Run [main.sh](https://github.com/keyu-tian/SparK/blob/main/main.sh).
It is **required** to specify ImageNet data folder and model name to run pre-training.
Besides, you can pass arbitrary key-word arguments (like `--ep=400 --bs=2048`) to `main.sh` to specify some pre-training hyperparameters (see [utils/arg_utils.py](https://github.com/keyu-tian/SparK/blob/main/utils/arg_utils.py) for all hyperparameters and their default values).
Here is an example command pre-training a ResNet50 on single machine with 8 GPUs:
2 years ago
```shell script
$ cd /path/to/SparK
$ bash ./main.sh <experiment_name> \
2 years ago
--num_nodes=1 --ngpu_per_node=8 \
--data_path=/path/to/imagenet \
--model=resnet50 --ep=1600 --bs=4096
```
For multiple machines, change the `num_nodes` to your count and plus these args:
```shell script
--node_rank=<rank_starts_from_0> --master_address=<some_address> --master_port=<some_port>
2 years ago
```
Note that the first argument `<experiment_name>` is the name of your experiment, which would be used to create an output directory named `output_<experiment_name>`.
2 years ago
## Logging
2 years ago
Once an experiment starts running, the following files would be automatically created and updated in `output_<experiment_name>`:
2 years ago
- `<model>_still_pretraining.pth`: saves model and optimizer states, current epoch, current reconstruction loss, etc; can be used to resume pre-training
2 years ago
- `<model>_1kpretrained.pth`: can be used for downstream fine-tuning
- `pretrain_log.txt`: records some important information such as:
- `git_commit_id`: git version
- `cmd`: all arguments passed to the script
2 years ago
It also reports the loss and remaining pre-training time at each epoch.
2 years ago
- `stdout_backup.txt` and `stderr_backup.txt`: will save all output to stdout/stderr
2 years ago
These files can help trace the experiment well.
2 years ago
2 years ago
## Resuming
2 years ago
Add `--resume_from=path/to/<model>still_pretraining.pth` to resume from a saved checkpoint.
2 years ago
2 years ago
## Regarding sparse convolution
2 years ago
2 years ago
For generality, we use the masked convolution implemented in [encoder.py](https://github.com/keyu-tian/SparK/blob/main/encoder.py) to simulate submanifold sparse convolution by default.
<!--If `--sparse_conv=1` is not specified, this masked convolution would be used in pre-training.-->
2 years ago
2 years ago
**For anyone who might want to run SparK on another architectures**:
2 years ago
we recommend you to use the default masked convolution,
considering the limited optimization of sparse convolution on hardwares, and in particular the lack of efficient implementation of many modern operators like grouped conv and dilated conv.