Q-EmotionVA is an advanced deep learning framework for real-time facial emotion recognition with valence-arousal estimation. Built on state-of-the-art computer vision techniques, this project achieves exceptional performance on the AffectNet dataset.
| Feature | Description |
|---|---|
| π Emotion Classification | Recognize 8 basic emotions (Neutral, Happiness, Sadness, Surprise, Fear, Disgust, Anger, Contempt) |
| π Valence-Arousal Regression | Estimate continuous emotional dimensions |
| β‘ Real-time Inference | Process video streams at high frame rates |
| π Multi-backbone Support | Choose from MixedFeatureNet, MobileNetV2, or ShuffleNetV2 |
| π± Edge Deployment | Export models to ONNX format for edge devices |
Q-EmotionVA/
βββ π§ models/ # Neural network architectures
β βββ MixedFeatureNet.py # Custom feature extraction backbone
β βββ DDAM.py # Attention-enhanced emotion model
β βββ DDAM-mbnet.py # MobileNetV2 variant
β βββ DDAM-shufflenet.py # ShuffleNetV2 variant
βββ π οΈ tools/ # Utility scripts
β βββ affectnet_train.py # Training pipeline
β βββ affectnet_test.py # Evaluation with confusion matrix
β βββ video-test-mediapipe.py # Real-time webcam demo
β βββ video-test-onnx.py # ONNX-based real-time demo
β βββ pth2onnx.py # Model conversion tool
β βββ data-handler.py # Dataset preprocessing
βββ πΎ checkpoints/ # Trained model weights
βββ π¦ pretrained/ # Pre-trained backbones
- Python 3.8+
- PyTorch 1.9+
- CUDA 11.0+ (for GPU acceleration)
pip install torch torchvision numpy pandas opencv-python mediapipe onnxruntime-gpu matplotlib scikit-learn tqdm PillowPlace these files in the pretrained/ directory:
- MobileNetV2 π₯
- ShuffleNetV2 π₯
- Download from AffectNet Official Website π₯
- Organize as follows:
AffectNetDataset/
βββ Manually_Annotated/
β βββ Manually_Annotated_Images/ # Raw images
β βββ training.csv # Training annotations
β βββ validation.csv # Validation annotations
python tools/data-handler.pyOutput:
- Cropped faces:
tiny_facedetect_filter_annotated_images/πΌοΈ - Annotation JSON:
tiny_facedetect_train_filter.jsonπ
python tools/affectnet_train.py \
--aff_path /path/to/affectnet \
--batch_size 10 \
--lr 0.0001 \
--epochs 40 \
--num_head 2 \
--num_class 8| Parameter | Description | Default |
|---|---|---|
--aff_path |
Dataset root path | /data/affectnet/ |
--batch_size |
Batch size | 10 |
--lr |
Learning rate | 0.0001 |
--epochs |
Training epochs | 40 |
--num_head |
Attention heads | 2 |
--num_class |
Emotion classes | 8 |
--workers |
Data loading threads | 0 |
Models are saved in checkpoints/ with naming:
affecnet8_epoch{epoch}_acc{accuracy}.pth
python tools/affectnet_test.py \
--aff_path /path/to/affectnet \
--model_path checkpoints/affecnet8_epoch15_acc0.5587.pth \
--num_head 2 \
--num_class 8- β Validation accuracy
- π Confusion matrix visualization (
checkpoints/*.png)
python tools/video-test-mediapipe.pypython tools/video-test-onnx.py| Feature | Description |
|---|---|
| π― Real-time face detection | Powered by MediaPipe |
| π Emotion probability bars | Visualize confidence scores |
| π Valence-Arousal indicators | Real-time emotional state tracking |
| β¨οΈ Exit | Press q to quit |
python tools/pth2onnx.pyOutput: checkpoints/mp_MFN_epochXX.onnx π
Use Case: Edge deployment, TensorRT optimization, cross-platform inference
Input (112x112x3)
β
Backbone (MixedFeatureNet)
β
Feature Maps (7x7x512)
β
Coordinate Attention Heads
β
Feature Fusion
β
Classification Head β Emotion Probabilities (8 classes)
β
Regression Head β Valence, Arousal
The attention module captures spatial information through:
- Horizontal Pooling πΉ - Capture height-wise patterns
- Vertical Pooling πΈ - Capture width-wise patterns
- Channel Interaction π - Fuse spatial information
- Adaptive Weighting βοΈ - Apply learned attention
Combined objective for multi-task learning:
| Loss Component | Purpose | Weight |
|---|---|---|
| Cross-entropy | Emotion classification | 1.0 |
| Attention Diversity | Encourage diverse feature learning | 0.1 |
| CCC Loss | Valence regression | 2.5 |
| CCC Loss | Arousal regression | 2.5 |
| Backbone | Accuracy | Valence CCC | Arousal CCC |
|---|---|---|---|
| MixedFeatureNet | 55.87% | 0.68 | 0.65 |
| MobileNetV2 | 54.23% | 0.66 | 0.63 |
| ShuffleNetV2 | 53.89% | 0.65 | 0.62 |
If you use this work in your research, please cite:
@article{Q-EmotionVA,
title={Q-EmotionVA: Facial Emotion Recognition with Valence-Arousal Estimation},
author={Your Name},
journal={arXiv preprint arXiv:XXXX.XXXXX},
year={2024}
}This project is licensed under the MIT License - see LICENSE for details.
- π AffectNet Dataset
- π MediaPipe
- π PyTorch
- π MobileNetV2
- π ShuffleNetV2
Built with β€οΈ for emotion AI research