mae-cnn

2.5 KiB

Raw Blame History Unescape Escape

About code isolation

This downstream_imagenet is isolated from pre-training codes. One can treat this downstream_imagenet as an independent codebase 🛠️.

Preparation for ImageNet-1k fine-tuning

See INSTALL.md to prepare dependencies and ImageNet dataset.

Note: for network definitions, we directly use timm.models.ResNet and official ConvNeXt.

Fine-tuning on ImageNet-1k from pre-trained weights

Run downstream_imagenet/main.sh.

It is required to specify ImageNet data folder, model name, and checkpoint file path to run fine-tuning. All the other arguments have their default values, listed in downstream_imagenet/arg.py#L13. You can override any defaults by passing key-word arguments (like --bs=2048) to main.sh.

Here is an example command fine-tuning a ResNet50 on single machine with 8 GPUs:

$ cd /path/to/SparK/downstream_imagenet
$ bash ./main.sh <experiment_name> \
  --num_nodes=1 --ngpu_per_node=8 \
  --data_path=/path/to/imagenet \
  --model=resnet50 --resume_from=/some/path/to/timm_resnet50_1kpretrained.pth

For multiple machines, change the num_nodes to your count and plus these args:

--node_rank=<rank_starts_from_0> --master_address=<some_address> --master_port=<some_port>

Note that the first argument <experiment_name> is the name of your experiment, which would be used to create an output directory named output_<experiment_name>.

Logging

Once an experiment starts running, the following files would be automatically created and updated in output_<experiment_name>:

<model>_1kfinetuned_last.pth: the latest model weights
<model>_1kfinetuned_best.pth: model weights with the highest acc
<model>_1kfinetuned_best_ema.pth: EMA weights with the highest acc
finetune_log.txt: records some important information such as:
- git_commit_id: git version
- cmd: all arguments passed to the script
It also reports training loss/acc, best evaluation acc, and remaining time at each epoch.

These files can help trace the experiment well.

Resuming

Add --resume_from=path/to/<model>_1kfinetuned_last.pth to resume from a latest saved checkpoint.

2.5 KiB Raw Blame History Unescape Escape