A deep learning system for real-time fall detection from video using the R(2+1)D-18 spatiotemporal convolutional neural network. Achieves 98.71% F1 score on a custom dataset of ~7,000 video clips.
| Metric | Score |
|---|---|
| F1 Score | 98.71% |
| Accuracy | 98.71% |
| Precision (Fall) | 99% |
| Recall (Fall) | 98% |
| Inference Time | <1 sec/video |
The system uses R(2+1)D-18, a factored 3D CNN that decomposes spatiotemporal convolutions into separate spatial (2D) and temporal (1D) components. This architecture:
- Captures body posture (spatial) and motion dynamics (temporal) simultaneously
- Uses transfer learning from Kinetics-400 (pretrained on 400 action classes)
- Processes 16 frames per clip at 112×112 resolution
- Outputs a binary classification: Fall or No Fall
Video Input → Frame Extraction (16 frames) → Resize (112×112) → R(2+1)D-18 → Softmax → Fall / No Fall
- Fall videos: Frames sampled from the latter half (falls typically occur at the end)
- No-Fall videos: Frames sampled uniformly across the full duration
Folio_Finder_AI/
├── train_fall_final.py # Training pipeline
├── predict_fall.py # Inference / prediction script
├── r2plus1d_fall_v3.pth # Best model weights
├── r2plus1d_fall_checkpoint.pth # Training checkpoint
├── videos_info.csv # Full dataset catalog
├── train.csv # Training split
├── test.csv # Test split
├── confusion_matrix_v3.png # Confusion matrix visualization
├── training_metrics_v3.png # Training curves
├── requirements.txt # Python dependencies
└── falldataset/
├── Fall/
│ └── Raw_Video/ # Fall event clips
└── Video/
└── Raw_Video/ # No-fall activity clips
- Python 3.8+
- NVIDIA GPU with CUDA support
- ~10 GB disk space for dataset
# Clone the repository
git clone https://github.com/[your-username]/Folio_Finder_AI.git
cd Folio_Finder_AI
# Create virtual environment
python -m venv venv
source venv/bin/activate # Linux/Mac
# venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txttorch>=2.0.0
torchvision>=0.15.0
opencv-python>=4.8.0
pandas>=2.0.0
scikit-learn>=1.3.0
matplotlib>=3.7.0
tqdm>=4.65.0
numpy>=1.24.0
python train_fall_final.py| Parameter | Value |
|---|---|
| Optimizer | Adam |
| Learning Rate | 0.0001 |
| Batch Size | 16 |
| Clip Length | 16 frames |
| Input Resolution | 112 × 112 |
| Max Epochs | 12 |
| Early Stopping | Patience 4 (F1-based) |
| Mixed Precision | Enabled (AMP) |
| Class Weights | No_Fall: 0.899, Fall: 3.304 |
- Loads
train.csv/test.csvsplits (or generates them fromvideos_info.csv) - Computes inverse-frequency class weights to handle class imbalance
- Initializes R(2+1)D-18 with Kinetics-400 pretrained weights
- Trains with weighted cross-entropy loss + mixed precision
- Evaluates on test set after each epoch
- Saves best model (by F1) and latest checkpoint
- Generates confusion matrix and training curves
| File | Description |
|---|---|
r2plus1d_fall_v3.pth |
Best model weights |
r2plus1d_fall_checkpoint.pth |
Latest checkpoint (resumable) |
confusion_matrix_v3.png |
Test set confusion matrix |
training_metrics_v3.png |
Loss / Accuracy / F1 curves |
python predict_fall.py "path/to/video.mp4"Loading model...
Processing video: test_fall.mp4
Reading frames: 16 frames extracted
Prediction: Fall (confidence: 98.72%)
import torch
from torchvision.models.video import r2plus1d_18
import cv2
import numpy as np
# Load model
model = r2plus1d_18(pretrained=False)
model.fc = torch.nn.Linear(512, 2)
model.load_state_dict(torch.load("r2plus1d_fall_v3.pth"))
model.eval()
# Process video (16 frames, 112x112, RGB)
# ... frame extraction logic ...
with torch.no_grad():
output = model(video_tensor)
pred = torch.softmax(output, dim=1)
label = "Fall" if pred > pred[^1] else "No Fall"
confidence = pred.max().item() * 100
print(f"{label} ({confidence:.2f}%)")| Property | Value |
|---|---|
| Total clips | ~6,982 |
| Train set | ~5,584 (80%) |
| Test set | 1,398 (20%) |
| Classes | 2 (Fall, No_Fall) |
| Avg duration | 1–8 seconds |
| Frame rates | 15–120 FPS |
| Resolutions | 480p to 4K (normalized) |
- Public Kaggle datasets (Fall Detection Dataset, Fall Video Dataset)
- Original recordings (smartphone, 1080p, 30fps — Sept 2024)
- Research benchmarks (SisFall-derived, multi-camera setups)
Each video is cataloged in videos_info.csv:
filename,path,num_frames,fps,width,height,duration_sec,label
example_fall.mp4,falldataset/Fall/Raw_Video/example_fall.mp4,57,30.0,1920,1080,1.9,0
example_nofall.mp4,falldataset/Video/Raw_Video/example_nofall.mp4,91,30.0,1100,1080,3.0,1Note: Label
0= Fall, Label1= No_Fall
| Method | Type | F1 / Accuracy | Hardware |
|---|---|---|---|
| R(2+1)D-18 (Ours) | Video | 98.71% | RTX 3070 |
| YOLOv8 + Transformer | Video | mAP 99.55% | High-end GPU |
| 4S-3DCNN | Video | 99.03% | Multi-GPU |
| CNN-LSTM | Video + Sensor | 96.4% | GPU |
| DSCS | Sensor only | 99.32% | CPU |
| Random Forest | Sensor only | 97.47% | CPU |
| LSTM | Sensor only | 80.0% | CPU |
- Deep Learning: PyTorch, torchvision
- Video Processing: OpenCV
- Data Management: pandas, NumPy
- Evaluation: scikit-learn
- Visualization: matplotlib
- Training Optimization: CUDA AMP (mixed precision), DataLoader with pin_memory
Click to expand full training history
Epoch 1/12 | Train Loss: 0.3154 | Train Acc: 84.28% | Test Acc: 92.27% | F1: 92.29% ★ New Best
Epoch 2/12 | Train Loss: 0.1993 | Train Acc: 90.69% | Test Acc: 87.84% | F1: 87.83%
Epoch 3/12 | Train Loss: 0.1522 | Train Acc: 93.66% | Test Acc: 93.56% | F1: 93.58% ★ New Best
Epoch 4/12 | Train Loss: 0.1195 | Train Acc: 94.77% | Test Acc: 97.28% | F1: 97.28% ★ New Best
Epoch 5/12 | Train Loss: 0.0848 | Train Acc: 96.26% | Test Acc: 97.21% | F1: 97.21%
Epoch 6/12 | Train Loss: 0.0686 | Train Acc: 97.47% | Test Acc: 97.71% | F1: 97.71% ★ New Best
Epoch 7/12 | Train Loss: 0.0627 | Train Acc: 97.53% | Test Acc: 96.85% | F1: 96.84%
Epoch 8/12 | Train Loss: 0.0660 | Train Acc: 97.71% | Test Acc: 97.50% | F1: 97.50%
Epoch 9/12 | Train Loss: 0.0424 | Train Acc: 98.55% | Test Acc: 98.71% | F1: 98.71% ★ New Best
Epoch 10/12 | Train Loss: 0.0466 | Train Acc: 98.28% | Test Acc: 98.21% | F1: 98.21%
Epoch 11/12 | Train Loss: 0.0370 | Train Acc: 98.39% | Test Acc: 96.35% | F1: 96.34%
Epoch 12/12 | Train Loss: 0.0375 | Train Acc: 98.71% | Test Acc: 97.07% | F1: 97.07%
- Fork the repository
- Create a feature branch (
git checkout -b feature/improvement) - Commit changes (
git commit -am 'Add new feature') - Push to branch (
git push origin feature/improvement) - Open a Pull Request
This project is licensed under the MIT License — see the LICENSE file for details.
-Ali Abroudoust -Morteza Mohasebati
- R(2+1)D paper by Tran et al. (CVPR 2018)
- Kinetics-400 by DeepMind
- PyTorch team for pretrained video models
- Kaggle community for public fall detection datasets
Built with ❤️ and PyTorch