A real-time web application that enables seamless communication between hearing people and people who use sign language through AI-powered speech-to-sign and sign-to-voice translation.
- 🌟 Features
- 🏗️ System Architecture
- 💻 Tech Stack
- 🚀 Quick Start
- 📦 Installation
- 🎮 Usage
- 🤖 AI Model Training
- 🐳 Docker Deployment
- 📚 API Documentation
- 🔧 Configuration
- 📖 Project Structure
- 🤝 Contributing
- 📄 License
- Real-time Speech Recognition: Uses Web Speech API for accurate speech-to-text
- Instant Translation: Converts spoken words to sign language animations/videos
- Interactive Display: Shows sign videos with descriptions and pronunciation guides
- Multi-language Support: Supports multiple languages (extensible)
- Hand Tracking: Real-time hand landmark detection using MediaPipe
- Gesture Recognition: AI-powered gesture classification using LSTM neural networks
- Voice Output: Converts recognized gestures to speech using Web Speech API
- Visual Feedback: Shows hand landmarks and confidence scores during recognition
- ✅ Responsive Web Design (works on desktop and tablets)
- ✅ Real-time Camera & Microphone Access
- ✅ GPU-accelerated Hand Tracking
- ✅ Scalable REST API Backend
- ✅ Easy Model Training Pipeline
- ✅ Comprehensive Error Handling
- ✅ Logging and Monitoring
┌─────────────────────────────────────────────────────────────┐
│ User Browser │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ React Frontend │ │
│ │ ├── Speech Recognition Component │ │
│ │ ├── Camera/Hand Tracking Component │ │
│ │ ├── Sign Display Component │ │
│ │ └── Voice Output Component │ │
│ └────────────────────────────────────────────────────────┘ │
│ ↓ │
│ REST API (CORS) │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Python FastAPI Backend │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ API Routes │ │
│ │ ├── /api/voice-to-sign (POST) │ │
│ │ ├── /api/classify-gesture (POST) │ │
│ │ └── /api/signs, /api/gestures (GET) │ │
│ └────────────────────────────────────────────────────────┘ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ AI/ML Pipeline │ │
│ │ ├── Gesture Classifier (LSTM Model) │ │
│ │ ├── Hand Tracker (MediaPipe) │ │
│ │ └── Sign Mapping Database │ │
│ └────────────────────────────────────────────────────────┘ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ Services │ │
│ │ ├── Data Preprocessing │ │
│ │ ├── Model Management │ │
│ │ └── Static File Serving │ │
│ └────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
- React 18.2 - UI Framework
- TailwindCSS 3.3 - Styling
- Axios - HTTP Client
- MediaPipe Hands - Hand Detection (Browser)
- Web Speech API - Speech Recognition & Text-to-Speech
- react-webcam - Camera Access
- FastAPI 0.104 - Web Framework
- Python 3.9+ - Language
- TensorFlow 2.13 - Deep Learning
- MediaPipe 0.10 - Hand Tracking
- OpenCV 4.8 - Image Processing
- NumPy & SciPy - Numerical Computing
- Docker - Containerization
- Docker Compose - Multi-container Orchestration
- Node.js 16+ and npm
- Python 3.9+
- Git
# Clone repository
git clone https://github.com/yourusername/sign-language-translator.git
cd sign-language-translator
# Setup Backend
cd backend
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
# Setup Frontend
cd ../frontend
npm install# Backend
cat > backend/.env << EOF
BACKEND_HOST=0.0.0.0
BACKEND_PORT=8000
BACKEND_RELOAD=true
FRONTEND_URL=http://localhost:3000
EOF
# Frontend
cat > frontend/.env << EOF
REACT_APP_API_URL=http://localhost:8000/api
REACT_APP_ENV=development
EOF# Terminal 1: Start Backend
cd backend
python run.py
# Backend will be available at http://localhost:8000
# Terminal 2: Start Frontend
cd frontend
npm start
# Frontend will be available at http://localhost:3000Open your browser and go to: http://localhost:3000
cd backend
# Create virtual environment
python -m venv venv
# Activate virtual environment
# Linux/Mac:
source venv/bin/activate
# Windows:
venv\Scripts\activate
# Install dependencies
pip install --upgrade pip
pip install -r requirements.txt
# Verify installation
python -c "import tensorflow; print('TensorFlow OK')"
python -c "import mediapipe; print('MediaPipe OK')"cd frontend
# Install dependencies
npm install
# Install additional packages if needed
npm install axios react-webcam
# Verify installation
npm list react react-dom axios- Click "Start Speaking" button
- Speak clearly into your microphone
- See the translation - The spoken text appears
- Watch the sign - Sign video/animation displays
- Use video controls to replay or slow down
- Allow camera access when prompted
- Click "Start Recording" to begin
- Perform sign gestures in front of camera
- Hold gesture for 1-2 seconds
- Click "Stop & Classify" to process
- Listen to the voice output - Recognition speaks the result
cd backend
# Create synthetic training data
python ai_model/train.py --synthetic --epochs 50# Prepare data structure
mkdir -p data/processed/hello
mkdir -p data/processed/goodbye
mkdir -p data/processed/thank_you
# ... add more classes as needed
# Each .npy file should contain shape (30, 126) - 30 frames of hand landmarks
# Train model
python ai_model/train.py --data-dir ./data/processed --epochs 100 --batch-size 32Model saved to: backend/ai_model/gesture_classifier.h5
Training complete!
Final accuracy: 0.95
# Build images
docker-compose build
# Start services
docker-compose up
# Stop services
docker-compose down
# View logs
docker-compose logs -f backend
docker-compose logs -f frontend- Backend: http://localhost:8000
- Frontend: http://localhost:3000
- API Docs: http://localhost:8000/docs
GET /
GET /health
Response:
{
"status": "healthy",
"message": "Sign Language Translator API is running",
"version": "1.0.0"
}POST /api/voice-to-sign
Content-Type: application/json
{
"text": "Hello, how are you?"
}
Response:
{
"sign": "hello",
"media_url": "/static/signs/hello.mp4",
"message": null
}POST /api/classify-gesture
Content-Type: application/json
{
"landmarks": [
[[0.5, 0.5, 0.0], [0.6, 0.4, 0.1], ...], // Frame 1
[[0.5, 0.5, 0.0], [0.6, 0.4, 0.1], ...], // Frame 2
...
]
}
Response:
{
"gesture": "hello",
"confidence": 0.95,
"all_predictions": {
"hello": 0.95,
"goodbye": 0.03,
"thank_you": 0.02
}
}GET /api/signs
Response:
{
"signs": ["hello", "goodbye", "thank_you", ...],
"total": 50,
"data": {...}
}GET /api/gestures
Response:
{
"gestures": ["hello", "goodbye", "thank_you", ...],
"total": 10,
"model_loaded": true
}- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
# Server
BACKEND_HOST = "0.0.0.0"
BACKEND_PORT = 8000
BACKEND_RELOAD = True
# CORS
ALLOWED_ORIGINS = [
"http://localhost:3000",
"http://127.0.0.1:3000"
]
# Model
MODEL_INPUT_SHAPE = (30, 126)
MODEL_CONFIDENCE_THRESHOLD = 0.7
MAX_NUM_HANDS = 2
MIN_DETECTION_CONFIDENCE = 0.5REACT_APP_API_URL=http://localhost:8000/api
REACT_APP_ENV=development
REACT_APP_API_TIMEOUT=30000
REACT_APP_LOG_LEVEL=debugsign-language-translator/
│
├── backend/
│ ├── app/
│ │ ├── __init__.py
│ │ ├── main.py # FastAPI app
│ │ ├── models/
│ │ │ ├── schemas.py # Pydantic models
│ │ │ ├── sign_mapping.py # Sign database
│ │ │ └── __init__.py
│ │ ├── routes/
│ │ │ ├── voice_to_sign.py # Speech→Sign endpoints
│ │ │ ├── gesture_classify.py # Gesture endpoints
│ │ │ └── __init__.py
│ │ ├── services/
│ │ │ ├── preprocessing.py # Data processing
│ │ │ └── __init__.py
│ │ ├── utils/
│ │ │ ├── hand_tracking.py # MediaPipe wrapper
│ │ │ └── __init__.py
│ │ └── static/
│ │ └── signs/ # Sign videos/images
│ │
│ ├── ai_model/
│ │ ├── gesture_model.py # LSTM model
│ │ ├── train.py # Training script
│ │ └── __init__.py
│ │
│ ├── data/
│ │ ├── raw/ # Raw training data
│ │ └── processed/ # Processed data
│ │
│ ├── logs/ # Application logs
│ ├── config.py # Configuration
│ ├── run.py # Entry point
│ ├── requirements.txt # Dependencies
│ ├── .env # Environment variables
│ └── .gitignore
│
├── frontend/
│ ├── public/
│ │ ├── index.html # Entry HTML
│ │ └── favicon.ico
│ │
│ ├── src/
│ │ ├── components/
│ │ │ ├── Camera.jsx # Hand tracking
│ │ │ ├── SpeechInput.jsx # Speech recognition
│ │ │ ├── SignDisplay.jsx # Sign display
│ │ │ └── VoiceOutput.jsx # Text-to-speech
│ │ │
│ │ ├── pages/
│ │ │ └── Home.jsx # Main page
│ │ │
│ │ ├── services/
│ │ │ └── api.js # API client
│ │ │
│ │ ├── utils/
│ │ │ └── handTracking.js # Hand tracking utility
│ │ │
│ │ ├── App.js # Root component
│ │ ├── index.js # React mount
│ │ └── index.css # Global styles
│ │
│ ├── .env # Environment variables
│ ├── .gitignore
│ ├── package.json # Dependencies
│ ├── tailwind.config.js # Tailwind config
│ └── postcss.config.js # PostCSS config
│
├── docker-compose.yml # Docker setup
├── Dockerfile.backend # Backend container
├── Dockerfile.frontend # Frontend container
├── README.md # This file
└── .gitignore
- Check browser permissions
- Ensure HTTPS on production (required by browsers)
- Check device permissions in OS settings
# Train a new model
cd backend
python ai_model/train.py --synthetic
# Or use mock predictions (built-in fallback)Update ALLOWED_ORIGINS in backend/config.py:
ALLOWED_ORIGINS = [
"http://localhost:3000",
"http://yourdomain.com"
]# Kill process on port 8000
lsof -ti:8000 | xargs kill -9
# Or use different port
BACKEND_PORT=8001 python run.py# Install Heroku CLI
heroku login
# Create app
heroku create your-app-name
# Set environment
heroku config:set BACKEND_HOST=0.0.0.0
# Deploy
git push heroku main- Set
BACKEND_RELOAD=false - Use strong CORS origins
- Enable HTTPS
- Set up logging/monitoring
- Configure database (optional)
- Train model on real data
- Test all endpoints
- Set up CI/CD pipeline
Contributions are welcome! Please:
- Fork the repository
- Create feature branch (
git checkout -b feature/amazing-feature) - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open Pull Request
This project is licensed under the MIT License - see LICENSE file for details.
- MediaPipe for hand detection
- FastAPI for the backend framework
- React team for the frontend library
- TensorFlow for deep learning
For issues or questions:
- Open an Issue
- Email: support@example.com
- Documentation: Wiki
Made with ❤️ for accessibility and inclusion