--- comments: true description: Explore the Caltech-256 dataset, featuring 30,000 images across 257 categories, ideal for training and testing object recognition algorithms. keywords: Caltech-256 dataset, object classification, image dataset, machine learning, computer vision, deep learning, YOLO, training dataset --- # Caltech-256 Dataset The [Caltech-256](https://data.caltech.edu/records/nyy15-4j048) dataset is an extensive collection of images used for object classification tasks. It contains around 30,000 images divided into 257 categories (256 object categories and 1 background category). The images are carefully curated and annotated to provide a challenging and diverse benchmark for object recognition algorithms.



Watch: How to Train Image Classification Model using Caltech-256 Dataset with Ultralytics HUB

## Key Features - The Caltech-256 dataset comprises around 30,000 color images divided into 257 categories. - Each category contains a minimum of 80 images. - The categories encompass a wide variety of real-world objects, including animals, vehicles, household items, and people. - Images are of variable sizes and resolutions. - Caltech-256 is widely used for training and testing in the field of machine learning, particularly for object recognition tasks. ## Dataset Structure Like Caltech-101, the Caltech-256 dataset does not have a formal split between training and testing sets. Users typically create their own splits according to their specific needs. A common practice is to use a random subset of images for training and the remaining images for testing. ## Applications The Caltech-256 dataset is extensively used for training and evaluating [deep learning](https://www.ultralytics.com/glossary/deep-learning-dl) models in object recognition tasks, such as [Convolutional Neural Networks](https://www.ultralytics.com/glossary/convolutional-neural-network-cnn) (CNNs), Support Vector Machines (SVMs), and various other machine learning algorithms. Its diverse set of categories and high-quality images make it an invaluable dataset for research and development in the field of machine learning and [computer vision](https://www.ultralytics.com/glossary/computer-vision-cv). ## Usage To train a YOLO model on the Caltech-256 dataset for 100 epochs, you can use the following code snippets. For a comprehensive list of available arguments, refer to the model [Training](../../modes/train.md) page. !!! example "Train Example" === "Python" ```python from ultralytics import YOLO # Load a model model = YOLO("yolo11n-cls.pt") # load a pretrained model (recommended for training) # Train the model results = model.train(data="caltech256", epochs=100, imgsz=416) ``` === "CLI" ```bash # Start training from a pretrained *.pt model yolo classify train data=caltech256 model=yolo11n-cls.pt epochs=100 imgsz=416 ``` ## Sample Images and Annotations The Caltech-256 dataset contains high-quality color images of various objects, providing a comprehensive dataset for object recognition tasks. Here are some examples of images from the dataset ([credit](https://ml4a.github.io/demos/tsne_viewer.html)): ![Dataset sample image](https://github.com/ultralytics/docs/releases/download/0/caltech256-sample-image.avif) The example showcases the diversity and complexity of the objects in the Caltech-256 dataset, emphasizing the importance of a varied dataset for training robust object recognition models. ## Citations and Acknowledgments If you use the Caltech-256 dataset in your research or development work, please cite the following paper: !!! quote "" === "BibTeX" ```bibtex @article{griffin2007caltech, title={Caltech-256 object category dataset}, author={Griffin, Gregory and Holub, Alex and Perona, Pietro}, year={2007} } ``` We would like to acknowledge Gregory Griffin, Alex Holub, and Pietro Perona for creating and maintaining the Caltech-256 dataset as a valuable resource for the [machine learning](https://www.ultralytics.com/glossary/machine-learning-ml) and computer vision research community. For more information about the Caltech-256 dataset and its creators, visit the [Caltech-256 dataset website](https://data.caltech.edu/records/nyy15-4j048). ## FAQ ### What is the Caltech-256 dataset and why is it important for machine learning? The [Caltech-256](https://data.caltech.edu/records/nyy15-4j048) dataset is a large image dataset used primarily for object classification tasks in machine learning and computer vision. It consists of around 30,000 color images divided into 257 categories, covering a wide range of real-world objects. The dataset's diverse and high-quality images make it an excellent benchmark for evaluating object recognition algorithms, which is crucial for developing robust machine learning models. ### How can I train a YOLO model on the Caltech-256 dataset using Python or CLI? To train a YOLO model on the Caltech-256 dataset for 100 [epochs](https://www.ultralytics.com/glossary/epoch), you can use the following code snippets. Refer to the model [Training](../../modes/train.md) page for additional options. !!! example "Train Example" === "Python" ```python from ultralytics import YOLO # Load a model model = YOLO("yolo11n-cls.pt") # load a pretrained model # Train the model results = model.train(data="caltech256", epochs=100, imgsz=416) ``` === "CLI" ```bash # Start training from a pretrained *.pt model yolo classify train data=caltech256 model=yolo11n-cls.pt epochs=100 imgsz=416 ``` ### What are the most common use cases for the Caltech-256 dataset? The Caltech-256 dataset is widely used for various object recognition tasks such as: - Training Convolutional [Neural Networks](https://www.ultralytics.com/glossary/neural-network-nn) (CNNs) - Evaluating the performance of [Support Vector Machines](https://www.ultralytics.com/glossary/support-vector-machine-svm) (SVMs) - Benchmarking new deep learning algorithms - Developing [object detection](https://www.ultralytics.com/glossary/object-detection) models using frameworks like Ultralytics YOLO Its diversity and comprehensive annotations make it ideal for research and development in machine learning and computer vision. ### How is the Caltech-256 dataset structured and split for training and testing? The Caltech-256 dataset does not come with a predefined split for training and testing. Users typically create their own splits according to their specific needs. A common approach is to randomly select a subset of images for training and use the remaining images for testing. This flexibility allows users to tailor the dataset to their specific project requirements and experimental setups. ### Why should I use Ultralytics YOLO for training models on the Caltech-256 dataset? Ultralytics YOLO models offer several advantages for training on the Caltech-256 dataset: - **High Accuracy**: YOLO models are known for their state-of-the-art performance in object detection tasks. - **Speed**: They provide real-time inference capabilities, making them suitable for applications requiring quick predictions. - **Ease of Use**: With Ultralytics HUB, users can train, validate, and deploy models without extensive coding. - **Pretrained Models**: Starting from pretrained models, like `yolo11n-cls.pt`, can significantly reduce training time and improve model [accuracy](https://www.ultralytics.com/glossary/accuracy). For more details, explore our [comprehensive training guide](../../modes/train.md).