You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 

54 lines
2.8 KiB

## About code isolation
This `downstream_imagenet` is isolated from pre-training codes. One can treat this `downstream_imagenet` as an independent codebase 🛠.
## Preparation for ImageNet-1k fine-tuning
See [INSTALL.md](https://github.com/keyu-tian/SparK/blob/main/INSTALL.md) to prepare `pip` dependencies and the ImageNet dataset.
**Note: for network definitions, we directly use `timm.models.ResNet` and [official ConvNeXt](https://github.com/facebookresearch/ConvNeXt/blob/048efcea897d999aed302f2639b6270aedf8d4c8/models/convnext.py).**
## Fine-tuning on ImageNet-1k from pre-trained weights
Run [/downstream_imagenet/main.py](/downstream_imagenet/main.py) via `torchrun`.
**It is required to specify** the ImageNet data folder (`--data_path`), your experiment name & log dir (`--exp_name` and `--exp_dir`, automatically created if not exists), the model name (`--model`, valid choices see the keys of 'HP_DEFAULT_VALUES' in [/downstream_imagenet/arg.py line14](/downstream_imagenet/arg.py#L14)), and the pretrained weight file `--resume_from` to run fine-tuning.
All the other configurations have their default values, listed in [/downstream_imagenet/arg.py#L13](/downstream_imagenet/arg.py#L13).
You can overwrite any defaults by `--bs=1024` or something like that.
Here is an example to pretrain a ConvNeXt-Small on an 8-GPU single machine:
```shell script
$ cd /path/to/SparK/downstream_imagenet
$ torchrun --nproc_per_node=8 --nnodes=1 --node_rank=0 --master_addr=localhost --master_port=<some_port> main.py \
--data_path=/path/to/imagenet --exp_name=<your_exp_name> --exp_dir=/path/to/logdir \
--model=convnext_small --resume_from=/some/path/to/convnextS_1kpretrained_official_style.pth
```
For multiple machines, change the `--nnodes` and `--master_addr` to your configurations. E.g.:
```shell script
$ torchrun --nproc_per_node=8 --nnodes=<your_nnodes> --node_rank=<rank_starts_from_0> --master_address=<some_address> --master_port=<some_port> main.py \
...
```
## Logging
See files under `--exp_dir` to track your experiment:
- `<model>_1kfinetuned_last.pth`: the latest model weights
- `<model>_1kfinetuned_best.pth`: model weights with the highest acc
- `<model>_1kfinetuned_best_ema.pth`: EMA weights with the highest acc
- `finetune_log.txt`: records some important information such as:
- `git_commit_id`: git version
- `cmd`: all arguments passed to the script
It also reports training loss/acc, best evaluation acc, and remaining time at each epoch.
- `tensorboard_log/`: saves a lot of tensorboard logs, you can visualize accuracies, loss values, learning rates, gradient norms and more things via `tensorboard --logdir /path/to/this/tensorboard_log/ --port 23333`.
## Resuming
Use `--resume_from` again, like `--resume_from=path/to/<model>_1kfinetuned_last.pth`.