- download the dataset to a folder`/path/to/imagenet`
- the file structure should look like:
3. Prepare the [ImageNet-1k](http://image-net.org/) dataset
- assume the dataset is in`/path/to/imagenet`
- check the file path, it should look like this:
```
/path/to/imagenet/:
train/:
@ -29,55 +28,85 @@ It is highly recommended to follow these instructions to ensure a consistent env
class2:
a_lot_images.jpeg
val/:
class3:
class1:
a_lot_images.jpeg
class4:
class2:
a_lot_images.jpeg
```
- that argument of `--data_path=/path/to/imagenet` should be passed to the training script introduced later
4. (optional) if want to use sparse convolution rather than masked convolution, please install this [library](https://github.com/facebookresearch/SparseConvNet) and set `--sparse_conv=1` later
> `Tips:` In our default implementation, masked convolution (defined in [encoder.py](https://github.com/keyu-tian/SparK/blob/main/encoder.py)) is used to simulate the submanifold sparse convolution for speed.
It has equivalent computational results to sparse convolution.
If you would like to use the *true* sparse convolution installed above, please pass `--sparse_conv=1` to the training script.
## Pre-training from scratch
1. since `torch.nn.parallel.DistributedDataParallel` is used for distributed training, you are expected to specify some distributed arguments on each node, including:
- `--num_nodes`
- `--ngpu_per_node`
- `--node_rank`
- `--master_address`
- `--master_port`
The script for pre-training is [exp/pt.sh](https://github.com/keyu-tian/SparK/blob/main/scripts/pt.sh).
Since `torch.nn.parallel.DistributedDataParallel` is used for distributed training, you are expected to specify some distributed arguments on each node, including:
- `--num_nodes=<INTEGER>`
- `--ngpu_per_node=<INTEGER>`
- `--node_rank=<INTEGER>`
- `--master_address=<ADDRESS>`
- `--master_port=<INTEGER>`
Set `--num_nodes=0` if your task is running on a single GPU.
2. besides, you also need to specify the name of experiment and the ImageNet path in the first two arguments, and you may add arbitrary hyperparameter key-words (like `--ep=400 --bs=2048`) for other configurations, and the final command should like this:
You can add arbitrary key-word arguments (like `--ep=400 --bs=2048`) to specify some pre-training hyperparameters (see [utils/meta.py](https://github.com/keyu-tian/SparK/blob/main/utils/meta.py) for all).
Note that the first argument is the name of experiment.
It will be used to create the output directory named `output_<experiment_name>`.
## Logging
Once an experiment starts running, the following files would be automatically created and updated in `SparK/output_<experiment_name>`:
- `ckpt-last.pth`: includes model states, optimizer states, current epoch, current reconstruction loss, etc.
- `log.txt`: records important meta information such as:
- the git version (commid_id) at the start of the experiment
- all arguments passed to the script
It also reports the loss and remaining training time at each epoch.
- `stdout_backup.txt` and `stderr_backup.txt`: will save all output to stdout/stderr
## Resume
We believe these files can help trace the experiment well.
When an experiment starts running, the folder `SparK/<experiment_name>` would be created and record per-epoch checkpoints (e.g., `ckpt-last.pth`) and log files (`log.txt`).
To resume from a checkpoint, specify `--resume=/path/to/checkpoint.pth`.
## Resuming
To resume from a saved checkpoint, run `pt.sh` with `--resume=/path/to/checkpoint.pth`.
## Read logs
The `stdout` and `stderr` are also saved in `SparK/<experiment_name>/stdout.txt` and `SparK/<experiment_name>/stderr.txt`.
## Regarding sparse convolution
Note `SparK/<experiment_name>/log.txt` would record the most important information like current loss values and the remaining time.
For speed, we use the masked convolution implemented in [encoder.py](https://github.com/keyu-tian/SparK/blob/main/encoder.py) to simulate submanifold sparse convolution by default.
If `--sparse_conv=1` is not specified, this masked convolution would be used in pre-training.
**For anyone who might want to run SparK on another architectures**:
we still recommend you to use the default masked convolution,
given the limited optimization of sparse convolution in hardware, and in particular the lack of efficient implementation of many modern operators like grouped conv and dilated conv.
## The Official PyTorch Implementation of SparK🔥 (Sparse and Hierarchical Masked Modeling) [![arXiv](https://img.shields.io/badge/arXiv-2301.03580-b31b1b.svg)](https://arxiv.org/abs/2301.03580)
## SparK🔥: "Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling" [![arXiv](https://img.shields.io/badge/arXiv-2301.03580-b31b1b.svg)](https://arxiv.org/abs/2301.03580)
This is an official implementation of the paper: "Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling".
We'll be updating frequently these days, so you might consider star ⭐ or watch 👓 this repository to get the latest information.
We'll be updating frequently these days, so you might consider star ⭐ or watch 👓 this repository to get the latest information!
Updates including downstream implementations, Colab tutorial, inference and visualization code will come soon.
In this work we designed a BERT-style pre-training framework (a.k.a. masked image modeling) for any hierarchical (multi-scale) convnets.
As shown above, it gathers all unmasked patches to form a sparse image and uses sparse convolution for encoding.
@ -30,19 +31,19 @@ This method is general and powerful: it can be used directly on any convolutiona
See our [paper](https://www.researchgate.net/profile/Keyu-Tian-2/publication/366984303_Designing_BERT_for_Convolutional_Networks_Sparse_and_Hierarchical_Masked_Modeling/links/63bcf24bc3c99660ebe253c5/Designing-BERT-for-Convolutional-Networks-Sparse-and-Hierarchical-Masked-Modeling.pdf) for more analysis, discussion, and evaluation.
## Pre-train
## Pre-training
See [PRETRAIN.md](PRETRAIN.md) for preparation and pre-training.
## Fine-tune on ImageNet
## ImageNet Fine-tuning
After finishing the preparation in [PRETRAIN.md](PRETRAIN.md), see [downstream_imagenet](downstream_imagenet) for subsequent instructions.
After finishing the preparation in [PRETRAIN.md](PRETRAIN.md), check [downstream_imagenet](downstream_imagenet) for subsequent instructions.
## Fine-tune ResNets on COCO
## Fine-tuning ResNets on COCO
Install `Detectron2` and see [downstream_d2](downstream_d2) for more details.
Install `detectron2` and see [downstream_d2](downstream_d2) for more details.
## Fine-tune ConvNeXts on COCO
## Fine-tuning ConvNeXts on COCO
Install `mmcv` and `mmdetection` then see [downstream_mmdet](downstream_mmdet) for more details.