> `Tips:` In our default implementation, masked convolution (defined in [encoder.py](https://github.com/keyu-tian/SparK/blob/main/encoder.py)) is used to simulate the submanifold sparse convolution for speed.
It has equivalent computational results to sparse convolution.
If you would like to use the *true* sparse convolution installed above, please pass `--sparse_conv=1` to the training script.
The script for pre-training is [exp/pt.sh](https://github.com/keyu-tian/SparK/blob/main/scripts/pt.sh).
Since `torch.nn.parallel.DistributedDataParallel` is used for distributed training, you are expected to specify some distributed arguments on each node, including:
-`--num_nodes=<INTEGER>`
-`--ngpu_per_node=<INTEGER>`
-`--node_rank=<INTEGER>`
-`--master_address=<ADDRESS>`
-`--master_port=<INTEGER>`
Set `--num_nodes=0` if your task is running on a single GPU.
You can add arbitrary key-word arguments (like `--ep=400 --bs=2048`) to specify some pre-training hyperparameters (see [utils/meta.py](https://github.com/keyu-tian/SparK/blob/main/utils/meta.py) for all).
For speed, we use the masked convolution implemented in [encoder.py](https://github.com/keyu-tian/SparK/blob/main/encoder.py) to simulate submanifold sparse convolution by default.
If `--sparse_conv=1` is not specified, this masked convolution would be used in pre-training.
**For anyone who might want to run SparK on another architectures**:
we still recommend you to use the default masked convolution,
given the limited optimization of sparse convolution in hardware, and in particular the lack of efficient implementation of many modern operators like grouped conv and dilated conv.