You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

2.8 KiB

About code isolation

This downstream_imagenet is isolated from pre-training codes. One can treat this downstream_imagenet as an independent codebase 🛠.

Preparation for ImageNet-1k fine-tuning

See to prepare pip dependencies and the ImageNet dataset.

Note: for network definitions, we directly use timm.models.ResNet and official ConvNeXt.

Fine-tuning on ImageNet-1k from pre-trained weights

Run /downstream_imagenet/ via torchrun. It is required to specify the ImageNet data folder (--data_path), your experiment name & log dir (--exp_name and --exp_dir, automatically created if not exists), the model name (--model, valid choices see the keys of 'HP_DEFAULT_VALUES' in /downstream_imagenet/ line14), and the pretrained weight file --resume_from to run fine-tuning.

All the other configurations have their default values, listed in /downstream_imagenet/ You can overwrite any defaults by --bs=1024 or something like that.

Here is an example to pretrain a ConvNeXt-Small on an 8-GPU single machine:

$ cd /path/to/SparK/downstream_imagenet
$ torchrun --nproc_per_node=8 --nnodes=1 --node_rank=0 --master_addr=localhost --master_port=<some_port> \
  --data_path=/path/to/imagenet --exp_name=<your_exp_name> --exp_dir=/path/to/logdir \
  --model=convnext_small --resume_from=/some/path/to/convnextS_1kpretrained_official_style.pth

For multiple machines, change the --nnodes and --master_addr to your configurations. E.g.:

$ torchrun --nproc_per_node=8 --nnodes=<your_nnodes> --node_rank=<rank_starts_from_0> --master_address=<some_address> --master_port=<some_port> \


See files under --exp_dir to track your experiment:

  • <model>_1kfinetuned_last.pth: the latest model weights

  • <model>_1kfinetuned_best.pth: model weights with the highest acc

  • <model>_1kfinetuned_best_ema.pth: EMA weights with the highest acc

  • finetune_log.txt: records some important information such as:

    • git_commit_id: git version
    • cmd: all arguments passed to the script

    It also reports training loss/acc, best evaluation acc, and remaining time at each epoch.

  • tensorboard_log/: saves a lot of tensorboard logs, you can visualize accuracies, loss values, learning rates, gradient norms and more things via tensorboard --logdir /path/to/this/tensorboard_log/ --port 23333.


Use --resume_from again, like --resume_from=path/to/<model>_1kfinetuned_last.pth.