diff --git a/pretrain/README.md b/pretrain/README.md index 9903944..e7cff66 100644 --- a/pretrain/README.md +++ b/pretrain/README.md @@ -20,7 +20,7 @@ Then you can use `--model=your_convnet` in the pretraining script. ## Tutorial for pretraining your own dataset -See the comment of function `build_dataset_to_pretrain` in [line55 of /pretrain/utils/imagenet.py](/pretrain/utils/imagenet.py#L55) to yours. +See the comment of function `build_dataset_to_pretrain` in [line55 of /pretrain/utils/imagenet.py](/pretrain/utils/imagenet.py#L55). Define a subclass of `torch.utils.data.Dataset` for your own unlabeled dataset, to replace our `ImageNetDataset`. Use `args.data_path` and `args.input_size` to help build your dataset, with `--data_path=... --input_size=...` to specify them. Note the batch size `--bs` is the total batch size of all GPU, which may need to be adjusted based on your dataset size. For instance, we use `--bs=4096` for ImageNet, which contains 1.28 million images.