Real-time webcam / video / RTSP object detection with Ultralytics YOLO11 and OpenCV. Auto-selects the best available device (CUDA → Apple Silicon MPS → CPU), draws boxes + labels + FPS overlay, optionally records the annotated stream, and writes detections to a rotating log.
Demo GIF coming soon. Record one in under a minute with the recipe below — the workflow is
python detect.py --duration 12 --save assets/demo.mp4 …then a singleffmpegstep.
- 80-class COCO detection with the latest YOLO11 family — swap any variant via a flag (
yolo11n/yolo11s/yolo11m/yolo11l/yolo11x). - Auto device selection — picks
cudaif you have an NVIDIA GPU, else Apple Siliconmps, elsecpu. No code change needed across machines. - Multiple sources — webcam (
--source 0), local video files, or RTSP / HTTP streams. - Rotating log — detection events go to
detection_log.txtwith size-based rotation (no infinite growth). - Optional annotated recording —
--save out.mp4writes the live overlay to disk for replay or sharing. - HUD overlay — current time, rolling FPS, active device.
- Class filter — restrict detection to specific COCO classes (e.g.
--classes 0for people only). - Headless mode —
--no-displayruns without a preview window, ideal for servers or batch processing. - Graceful shutdown on
Ctrl+CandSIGTERM.
git clone https://github.com/kairwang01/Computer-Vision-python.git
cd Computer-Vision-python
# Recommended: virtual environment
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
# Run with the default webcam, default model (YOLO11n auto-downloads on first run)
python detect.py
# Press 'q' in the preview window to quit| Flag | Default | Description |
|---|---|---|
--source |
0 |
Camera index, video path, or RTSP / HTTP URL |
--model |
yolo11n.pt |
YOLO weights — auto-downloaded by Ultralytics |
--conf |
0.4 |
Confidence threshold for displaying / logging |
--iou |
0.5 |
IoU threshold for non-maximum suppression |
--device |
auto |
auto / cpu / cuda / mps |
--imgsz |
640 |
Inference image size (square) |
--classes |
(all) | COCO class indices to keep, e.g. --classes 0 2 7 (person, car, truck) |
--save |
(off) | Path to save annotated MP4, e.g. --save out.mp4 |
--log-file |
detection_log.txt |
Rotating log path |
--no-display |
false |
Headless mode |
--max-fps |
0 (uncapped) |
Soft FPS cap |
--duration |
0 (no limit) |
Auto-stop after N seconds (handy for demos) |
# Webcam, larger model, lower threshold (catch more)
python detect.py --model yolo11s.pt --conf 0.25
# Process a video file and save the annotated output
python detect.py --source clip.mp4 --save out.mp4
# RTSP camera, headless, log to a custom file, only people
python detect.py --source rtsp://cam.local/stream --no-display \
--classes 0 --log-file logs/people.txt
# Force CPU even if CUDA / MPS is available (e.g. for benchmarking)
python detect.py --device cpu┌────────────────┐ ┌──────────────────────┐ ┌────────────────┐
│ Source │ ──▶ │ YOLO11 inference │ ──▶ │ OpenCV render │
│ webcam / file │ │ (auto cuda/mps/cpu) │ │ + HUD overlay │
│ RTSP / HTTP │ │ conf + iou + classes │ │ │
└────────────────┘ └──────────────────────┘ └────────────────┘
│
┌─────────────────────────────┴────────┐
▼ ▼
┌────────────────────┐ ┌────────────────────┐
│ Rotating log file │ │ Optional MP4 writer│
│ (auto-rotated) │ │ (--save out.mp4) │
└────────────────────┘ └────────────────────┘
The --duration flag exits cleanly after N seconds, which makes it trivial to capture a short clip and convert it to a GIF for the README. Requires ffmpeg (brew install ffmpeg / apt install ffmpeg).
mkdir -p assets
# 1. Record a 12-second annotated MP4 from the webcam (no preview window)
python detect.py --duration 12 --max-fps 15 --no-display --save assets/demo.mp4
# 2. Convert MP4 → optimized GIF (~720px wide, 15 fps)
ffmpeg -i assets/demo.mp4 \
-vf "fps=15,scale=720:-1:flags=lanczos,split[a][b];[a]palettegen[p];[b][p]paletteuse" \
-loop 0 assets/demo.gif
# 3. Wire it into the README and commit
sed -i '' "s|_Demo GIF coming soon.*||" README.md
git add assets/demo.gif README.md && git commit -m "docs: add demo GIF"| Layer | Choice |
|---|---|
| Language | Python 3.10+ |
| Detection model | Ultralytics YOLO11 |
| Inference backend | PyTorch (auto-selected: CUDA / MPS / CPU) |
| Video I/O + drawing | OpenCV ≥ 4.10 |
| Logging | logging.handlers.RotatingFileHandler (stdlib) |
| CLI | argparse (stdlib) |
.
├── detect.py Main entry — CLI, capture loop, inference, render, log
├── requirements.txt Pinned major-version constraints
├── assets/ Demo GIF goes here (recording recipe above)
├── .gitignore Excludes weights, logs, caches, output videos
├── LICENSE MIT
└── README.md
Numbers are rough indicators on common hardware with yolo11n.pt at --imgsz 640. Your mileage will vary with frame size and other classes' density.
| Hardware | Device | FPS (typical) |
|---|---|---|
| Apple Silicon M2 / M3 | mps |
~30–60 |
| NVIDIA RTX 3060+ | cuda |
~60–120 |
| Modern CPU only | cpu |
~10–20 |
For tighter latency targets, use yolo11n with smaller --imgsz. For higher accuracy, switch to yolo11s / yolo11m and accept lower FPS.
Kair Wang (@kairwang01)
Demo GIF 待补。 用下面这段命令一分钟内能录一个——核心就是
python detect.py --duration 12 --save assets/demo.mp4 …加一行ffmpeg。
- 80 类 COCO 检测,使用最新的 YOLO11 系列——通过
--model一行切换型号(yolo11n/yolo11s/yolo11m/yolo11l/yolo11x)。 - 设备自动选择——优先
cuda,其次 Apple Siliconmps,最后回落cpu。换机器无需改代码。 - 多种输入源——摄像头(
--source 0)、本地视频文件、或 RTSP / HTTP 流。 - 轮转日志——检测事件写入
detection_log.txt,按大小自动轮转,不会无限增长。 - 可选录制叠加视频——
--save out.mp4把带框的实时画面落盘,便于回看或分享。 - HUD 叠加层——当前时间、滑动窗口 FPS、当前 device。
- 类别过滤——只保留指定 COCO 类别(例如
--classes 0仅检测人)。 - 无显示模式——
--no-display不弹预览窗,适合跑服务器或批处理。 - 优雅退出——支持
Ctrl+C和SIGTERM。
git clone https://github.com/kairwang01/Computer-Vision-python.git
cd Computer-Vision-python
# 推荐使用虚拟环境
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
# 默认摄像头 + 默认模型(YOLO11n 首次运行自动下载)
python detect.py
# 在预览窗口按 'q' 退出| 参数 | 默认值 | 说明 |
|---|---|---|
--source |
0 |
摄像头编号、视频路径、或 RTSP / HTTP URL |
--model |
yolo11n.pt |
YOLO 权重——Ultralytics 会自动下载 |
--conf |
0.4 |
显示 / 记录的置信度阈值 |
--iou |
0.5 |
NMS 用的 IoU 阈值 |
--device |
auto |
auto / cpu / cuda / mps |
--imgsz |
640 |
推理图像尺寸(正方形) |
--classes |
(全部) | 只保留的 COCO 类别索引,例如 --classes 0 2 7(人、车、卡车) |
--save |
(关闭) | 保存带框 MP4 的路径,例如 --save out.mp4 |
--log-file |
detection_log.txt |
轮转日志路径 |
--no-display |
false |
无显示模式 |
--max-fps |
0(不限) |
软 FPS 上限 |
--duration |
0(不限) |
跑 N 秒后自动停(录 demo 友好) |
# 摄像头 + 更大模型 + 更低阈值(多检出)
python detect.py --model yolo11s.pt --conf 0.25
# 处理视频文件并保存带框输出
python detect.py --source clip.mp4 --save out.mp4
# RTSP 摄像头,无显示,只检测人,写到自定义日志
python detect.py --source rtsp://cam.local/stream --no-display \
--classes 0 --log-file logs/people.txt
# 强制走 CPU(哪怕有 CUDA / MPS,用于基准对比)
python detect.py --device cpu┌────────────────┐ ┌──────────────────────┐ ┌────────────────┐
│ 输入源 │ ──▶ │ YOLO11 推理 │ ──▶ │ OpenCV 渲染 │
│ 摄像头/文件 │ │ (auto cuda/mps/cpu) │ │ + HUD 叠加 │
│ RTSP/HTTP │ │ conf + iou + classes │ │ │
└────────────────┘ └──────────────────────┘ └────────────────┘
│
┌─────────────────────────────┴────────┐
▼ ▼
┌────────────────────┐ ┌────────────────────┐
│ 轮转日志文件 │ │ 可选 MP4 写入器 │
│ (按大小自动轮转) │ │ (--save out.mp4) │
└────────────────────┘ └────────────────────┘
--duration 参数可以让程序到时间自动干净退出,配合 ffmpeg 一行就能录好 demo GIF。需要本机有 ffmpeg(brew install ffmpeg / apt install ffmpeg)。
mkdir -p assets
# 1. 用摄像头录 12 秒带框 MP4(不弹预览窗)
python detect.py --duration 12 --max-fps 15 --no-display --save assets/demo.mp4
# 2. MP4 → 优化过的 GIF(720px 宽、15 fps)
ffmpeg -i assets/demo.mp4 \
-vf "fps=15,scale=720:-1:flags=lanczos,split[a][b];[a]palettegen[p];[b][p]paletteuse" \
-loop 0 assets/demo.gif
# 3. 把 README 顶部占位行替换成图片,并提交
sed -i '' "s|_Demo GIF 待补.*||" README.md
git add assets/demo.gif README.md && git commit -m "docs: add demo GIF"| 层 | 选型 |
|---|---|
| 语言 | Python 3.10+ |
| 检测模型 | Ultralytics YOLO11 |
| 推理后端 | PyTorch(自动选择 CUDA / MPS / CPU) |
| 视频 I/O + 绘制 | OpenCV ≥ 4.10 |
| 日志 | logging.handlers.RotatingFileHandler(标准库) |
| CLI | argparse(标准库) |
.
├── detect.py 主入口——CLI、采集循环、推理、渲染、日志
├── requirements.txt 固定主版本的依赖约束
├── assets/ 放 demo GIF(录制方法见上一节)
├── .gitignore 忽略权重、日志、缓存、输出视频
├── LICENSE MIT
└── README.md
下面是常见硬件上 yolo11n.pt + --imgsz 640 的粗略 FPS。实际数字取决于画面大小和场景中目标密度。
| 硬件 | Device | FPS(典型) |
|---|---|---|
| Apple Silicon M2 / M3 | mps |
~30–60 |
| NVIDIA RTX 3060+ | cuda |
~60–120 |
| 仅 CPU | cpu |
~10–20 |
追求更低延迟:用 yolo11n + 更小 --imgsz。追求更高准确率:换 yolo11s / yolo11m,接受 FPS 下降。
Kair Wang (@kairwang01)