Real-Time Object Detection (YOLO11)

Real-time webcam / video / RTSP object detection with Ultralytics YOLO11 and OpenCV. Auto-selects the best available device (CUDA → Apple Silicon MPS → CPU), draws boxes + labels + FPS overlay, optionally records the annotated stream, and writes detections to a rotating log.

🌐 Languages: English · 中文

English

Demo

Demo GIF coming soon. Record one in under a minute with the recipe below — the workflow is python detect.py --duration 12 --save assets/demo.mp4 … then a single ffmpeg step.

Features

80-class COCO detection with the latest YOLO11 family — swap any variant via a flag (yolo11n / yolo11s / yolo11m / yolo11l / yolo11x).
Auto device selection — picks cuda if you have an NVIDIA GPU, else Apple Silicon mps, else cpu. No code change needed across machines.
Multiple sources — webcam (--source 0), local video files, or RTSP / HTTP streams.
Rotating log — detection events go to detection_log.txt with size-based rotation (no infinite growth).
Optional annotated recording — --save out.mp4 writes the live overlay to disk for replay or sharing.
HUD overlay — current time, rolling FPS, active device.
Class filter — restrict detection to specific COCO classes (e.g. --classes 0 for people only).
Headless mode — --no-display runs without a preview window, ideal for servers or batch processing.
Graceful shutdown on Ctrl+C and SIGTERM.

Quick Start

git clone https://github.com/kairwang01/Computer-Vision-python.git
cd Computer-Vision-python

# Recommended: virtual environment
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

# Run with the default webcam, default model (YOLO11n auto-downloads on first run)
python detect.py
# Press 'q' in the preview window to quit

CLI Options

Flag	Default	Description
`--source`	`0`	Camera index, video path, or RTSP / HTTP URL
`--model`	`yolo11n.pt`	YOLO weights — auto-downloaded by Ultralytics
`--conf`	`0.4`	Confidence threshold for displaying / logging
`--iou`	`0.5`	IoU threshold for non-maximum suppression
`--device`	`auto`	`auto` / `cpu` / `cuda` / `mps`
`--imgsz`	`640`	Inference image size (square)
`--classes`	(all)	COCO class indices to keep, e.g. `--classes 0 2 7` (person, car, truck)
`--save`	(off)	Path to save annotated MP4, e.g. `--save out.mp4`
`--log-file`	`detection_log.txt`	Rotating log path
`--no-display`	`false`	Headless mode
`--max-fps`	`0` (uncapped)	Soft FPS cap
`--duration`	`0` (no limit)	Auto-stop after N seconds (handy for demos)

Examples

# Webcam, larger model, lower threshold (catch more)
python detect.py --model yolo11s.pt --conf 0.25

# Process a video file and save the annotated output
python detect.py --source clip.mp4 --save out.mp4

# RTSP camera, headless, log to a custom file, only people
python detect.py --source rtsp://cam.local/stream --no-display \
                 --classes 0 --log-file logs/people.txt

# Force CPU even if CUDA / MPS is available (e.g. for benchmarking)
python detect.py --device cpu

How It Works

┌────────────────┐     ┌──────────────────────┐     ┌────────────────┐
│ Source         │ ──▶ │ YOLO11 inference     │ ──▶ │ OpenCV render  │
│ webcam / file  │     │ (auto cuda/mps/cpu)  │     │ + HUD overlay  │
│ RTSP / HTTP    │     │ conf + iou + classes │     │                │
└────────────────┘     └──────────────────────┘     └────────────────┘
                                                            │
                              ┌─────────────────────────────┴────────┐
                              ▼                                      ▼
                  ┌────────────────────┐                  ┌────────────────────┐
                  │ Rotating log file  │                  │ Optional MP4 writer│
                  │ (auto-rotated)     │                  │ (--save out.mp4)   │
                  └────────────────────┘                  └────────────────────┘

Recording a demo

The --duration flag exits cleanly after N seconds, which makes it trivial to capture a short clip and convert it to a GIF for the README. Requires ffmpeg (brew install ffmpeg / apt install ffmpeg).

mkdir -p assets

# 1. Record a 12-second annotated MP4 from the webcam (no preview window)
python detect.py --duration 12 --max-fps 15 --no-display --save assets/demo.mp4

# 2. Convert MP4 → optimized GIF (~720px wide, 15 fps)
ffmpeg -i assets/demo.mp4 \
       -vf "fps=15,scale=720:-1:flags=lanczos,split[a][b];[a]palettegen[p];[b][p]paletteuse" \
       -loop 0 assets/demo.gif

# 3. Wire it into the README and commit
sed -i '' "s|_Demo GIF coming soon.*|![demo](assets/demo.gif)|" README.md
git add assets/demo.gif README.md && git commit -m "docs: add demo GIF"

Tech Stack

Layer	Choice
Language	Python 3.10+
Detection model	Ultralytics YOLO11
Inference backend	PyTorch (auto-selected: CUDA / MPS / CPU)
Video I/O + drawing	OpenCV ≥ 4.10
Logging	`logging.handlers.RotatingFileHandler` (stdlib)
CLI	`argparse` (stdlib)

Project Structure

.
├── detect.py            Main entry — CLI, capture loop, inference, render, log
├── requirements.txt     Pinned major-version constraints
├── assets/              Demo GIF goes here (recording recipe above)
├── .gitignore           Excludes weights, logs, caches, output videos
├── LICENSE              MIT
└── README.md

Performance Reference

Numbers are rough indicators on common hardware with yolo11n.pt at --imgsz 640. Your mileage will vary with frame size and other classes' density.

Hardware	Device	FPS (typical)
Apple Silicon M2 / M3	`mps`	~30–60
NVIDIA RTX 3060+	`cuda`	~60–120
Modern CPU only	`cpu`	~10–20

For tighter latency targets, use yolo11n with smaller --imgsz. For higher accuracy, switch to yolo11s / yolo11m and accept lower FPS.

Acknowledgements

Author

Kair Wang (@kairwang01)

中文

演示

Demo GIF 待补。 用下面这段命令一分钟内能录一个——核心就是 python detect.py --duration 12 --save assets/demo.mp4 … 加一行 ffmpeg。

功能

80 类 COCO 检测，使用最新的 YOLO11 系列——通过 --model 一行切换型号（yolo11n / yolo11s / yolo11m / yolo11l / yolo11x）。
设备自动选择——优先 cuda，其次 Apple Silicon mps，最后回落 cpu。换机器无需改代码。
多种输入源——摄像头（--source 0）、本地视频文件、或 RTSP / HTTP 流。
轮转日志——检测事件写入 detection_log.txt，按大小自动轮转，不会无限增长。
可选录制叠加视频——--save out.mp4 把带框的实时画面落盘，便于回看或分享。
HUD 叠加层——当前时间、滑动窗口 FPS、当前 device。
类别过滤——只保留指定 COCO 类别（例如 --classes 0 仅检测人）。
无显示模式——--no-display 不弹预览窗，适合跑服务器或批处理。
优雅退出——支持 Ctrl+C 和 SIGTERM。

快速开始

git clone https://github.com/kairwang01/Computer-Vision-python.git
cd Computer-Vision-python

# 推荐使用虚拟环境
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

# 默认摄像头 + 默认模型（YOLO11n 首次运行自动下载）
python detect.py
# 在预览窗口按 'q' 退出

CLI 参数

参数	默认值	说明
`--source`	`0`	摄像头编号、视频路径、或 RTSP / HTTP URL
`--model`	`yolo11n.pt`	YOLO 权重——Ultralytics 会自动下载
`--conf`	`0.4`	显示 / 记录的置信度阈值
`--iou`	`0.5`	NMS 用的 IoU 阈值
`--device`	`auto`	`auto` / `cpu` / `cuda` / `mps`
`--imgsz`	`640`	推理图像尺寸（正方形）
`--classes`	(全部)	只保留的 COCO 类别索引，例如 `--classes 0 2 7`（人、车、卡车）
`--save`	(关闭)	保存带框 MP4 的路径，例如 `--save out.mp4`
`--log-file`	`detection_log.txt`	轮转日志路径
`--no-display`	`false`	无显示模式
`--max-fps`	`0`（不限）	软 FPS 上限
`--duration`	`0`（不限）	跑 N 秒后自动停（录 demo 友好）

示例

# 摄像头 + 更大模型 + 更低阈值（多检出）
python detect.py --model yolo11s.pt --conf 0.25

# 处理视频文件并保存带框输出
python detect.py --source clip.mp4 --save out.mp4

# RTSP 摄像头，无显示，只检测人，写到自定义日志
python detect.py --source rtsp://cam.local/stream --no-display \
                 --classes 0 --log-file logs/people.txt

# 强制走 CPU（哪怕有 CUDA / MPS，用于基准对比）
python detect.py --device cpu

工作原理

┌────────────────┐     ┌──────────────────────┐     ┌────────────────┐
│ 输入源         │ ──▶ │ YOLO11 推理          │ ──▶ │ OpenCV 渲染    │
│ 摄像头/文件    │     │ (auto cuda/mps/cpu)  │     │ + HUD 叠加     │
│ RTSP/HTTP      │     │ conf + iou + classes │     │                │
└────────────────┘     └──────────────────────┘     └────────────────┘
                                                            │
                              ┌─────────────────────────────┴────────┐
                              ▼                                      ▼
                  ┌────────────────────┐                  ┌────────────────────┐
                  │ 轮转日志文件       │                  │ 可选 MP4 写入器    │
                  │ (按大小自动轮转)   │                  │ (--save out.mp4)   │
                  └────────────────────┘                  └────────────────────┘

录制 demo

--duration 参数可以让程序到时间自动干净退出，配合 ffmpeg 一行就能录好 demo GIF。需要本机有 ffmpeg（brew install ffmpeg / apt install ffmpeg）。

mkdir -p assets

# 1. 用摄像头录 12 秒带框 MP4（不弹预览窗）
python detect.py --duration 12 --max-fps 15 --no-display --save assets/demo.mp4

# 2. MP4 → 优化过的 GIF（720px 宽、15 fps）
ffmpeg -i assets/demo.mp4 \
       -vf "fps=15,scale=720:-1:flags=lanczos,split[a][b];[a]palettegen[p];[b][p]paletteuse" \
       -loop 0 assets/demo.gif

# 3. 把 README 顶部占位行替换成图片，并提交
sed -i '' "s|_Demo GIF 待补.*|![demo](assets/demo.gif)|" README.md
git add assets/demo.gif README.md && git commit -m "docs: add demo GIF"

技术栈

层	选型
语言	Python 3.10+
检测模型	Ultralytics YOLO11
推理后端	PyTorch（自动选择 CUDA / MPS / CPU）
视频 I/O + 绘制	OpenCV ≥ 4.10
日志	`logging.handlers.RotatingFileHandler`（标准库）
CLI	`argparse`（标准库）

仓库结构

.
├── detect.py            主入口——CLI、采集循环、推理、渲染、日志
├── requirements.txt     固定主版本的依赖约束
├── assets/              放 demo GIF（录制方法见上一节）
├── .gitignore           忽略权重、日志、缓存、输出视频
├── LICENSE              MIT
└── README.md

性能参考

下面是常见硬件上 yolo11n.pt + --imgsz 640 的粗略 FPS。实际数字取决于画面大小和场景中目标密度。

硬件	Device	FPS（典型）
Apple Silicon M2 / M3	`mps`	~30–60
NVIDIA RTX 3060+	`cuda`	~60–120
仅 CPU	`cpu`	~10–20

追求更低延迟：用 yolo11n + 更小 --imgsz。追求更高准确率：换 yolo11s / yolo11m，接受 FPS 下降。

致谢

作者

Kair Wang (@kairwang01)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Real-Time Object Detection (YOLO11)

English

Demo

Features

Quick Start

CLI Options

Examples

How It Works

Recording a demo

Tech Stack

Project Structure

Performance Reference

Acknowledgements

Author

中文

演示

功能

快速开始

CLI 参数

示例

工作原理

录制 demo

技术栈

仓库结构

性能参考

致谢

作者

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Real-Time Object Detection (YOLO11)

English

Demo

Features

Quick Start

CLI Options

Examples

How It Works

Recording a demo

Tech Stack

Project Structure

Performance Reference

Acknowledgements

Author

中文

演示

功能

快速开始

CLI 参数

示例

工作原理

录制 demo

技术栈

仓库结构

性能参考

致谢

作者