update default video and labels in action recog example for a better getting started experience

4 months ago · dc700c790b
parent b3b3a15086
commit dc700c790b
2 changed files with 15 additions and 13 deletions
--- a/examples/YOLOv8-Action-Recognition/action_recognition.py
+++ b/examples/YOLOv8-Action-Recognition/action_recognition.py
@ -266,7 +266,7 @@ def crop_and_pad(frame, box, margin_percent):
 def run(
    weights: str = "yolov8n.pt",
    device: str = "",
-    source: str = "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
+    source: str = "https://www.youtube.com/watch?v=uXlWYZ022zU",
    output_path: Optional[str] = None,
    crop_margin_percentage: int = 10,
    num_video_sequence_samples: int = 8,
@ -427,7 +427,7 @@ def parse_opt():
    parser.add_argument(
        "--source",
        type=str,
-        default="https://www.youtube.com/watch?v=dQw4w9WgXcQ",
+        default="https://www.youtube.com/watch?v=uXlWYZ022zU",
        help="video file path or youtube URL",
    )
    parser.add_argument("--output-path", type=str, default="output_video.mp4", help="output video file path")
--- a/examples/YOLOv8-Action-Recognition/readme.md
+++ b/examples/YOLOv8-Action-Recognition/readme.md
@ -38,7 +38,7 @@ The action recognition model will automatically detect and track people in the v
 python action_recognition.py

 # Basic usage
-python action_recognition.py --source "https://www.youtube.com/watch?v=dQw4w9WgXcQ" --labels "dancing" "singing a song"
+python action_recognition.py --source "https://www.youtube.com/watch?v=uXlWYZ022zU" --labels "walking" "running" "brushing teeth" "looking into phone" "weight lifting" "cooking" "sitting"

 # Use local video file
 python action_recognition.py --source path/to/video.mp4
@ -59,7 +59,8 @@ python action_recognition.py --fp16
 python action_recognition.py --output-path output.mp4

 # Combine multiple options
-python action_recognition.py --source "https://www.youtube.com/watch?v=dQw4w9WgXcQ" --device 0 --video-classifier-model "microsoft/xclip-base-patch32" --labels "dancing" "singing a song" --fp16
+python action_recognition.py --source "https://www.youtube.com/watch?v=uXlWYZ022zU" --device 0 --video-classifier-model "microsoft/xclip-base-patch32" --labels "walking" "running" "brushing teeth" "looking into phone" "weight lifting" "cooking" "sitting"
+ --fp16
 ```

 ## Usage Options
@ -74,7 +75,7 @@ python action_recognition.py --source "https://www.youtube.com/watch?v=dQw4w9WgX
 - `--video-cls-overlap-ratio`: Overlap ratio between video sequences (default: 0.25)
 - `--fp16`: Use FP16 for inference (only for HuggingFace models)
 - `--video-classifier-model`: Video classifier model name or path (default: "microsoft/xclip-base-patch32")
- `--labels`: Labels for zero-shot video classification (default: \["dancing" "singing a song"\])
+- `--labels`: Labels for zero-shot video classification (default: \["walking" "running" "brushing teeth" "looking into phone" "weight lifting" "cooking" "sitting"\])

 ## FAQ

@ -87,7 +88,8 @@ Action recognition is a computational method used to identify and classify actio
 Yes, custom action labels are supported by the action recognition system. The `action_recognition.py` script allows users to specify their own custom labels for zero-shot video classification. This can be done using the `--labels` argument when running the script. For example:

 ```bash
-python action_recognition.py --source https://www.youtube.com/watch?v=dQw4w9WgXcQ --labels "dancing" "singing" "jumping"
+python action_recognition.py --source https://www.youtube.com/watch?v=uXlWYZ022zU --labels "walking" "running" "brushing teeth" "looking into phone" "weight lifting" "cooking" "sitting"
+
 ```

 You can adjust these labels to match the specific actions you want to recognize in your video. The system will then attempt to classify the detected actions based on these custom labels.
@ -96,16 +98,16 @@ Additionally, you can choose between different video classification models:

 1. For Hugging Face models, you can use any compatible video classification model. The default is set to:

-   - "microsoft/xclip-base-patch32"
+    - "microsoft/xclip-base-patch32"

 2. For TorchVision models (no support for zero-shot labels), you can select from the following options:

-   - "s3d"
-   - "r3d_18"
-   - "swin3d_t"
-   - "swin3d_b"
-   - "mvit_v1_b"
-   - "mvit_v2_s"
+    - "s3d"
+    - "r3d_18"
+    - "swin3d_t"
+    - "swin3d_b"
+    - "mvit_v1_b"
+    - "mvit_v2_s"

 **3. Why Combine Action Recognition with YOLOv8?**