Binary sentiment classifier (Positive/Negative) built by fine-tuning DistilBERT on the Stanford Sentiment Treebank (SST-2) dataset. Served via a FastAPI REST API with single and batch prediction endpoints.
| Metric | Value |
|---|---|
| Model | distilbert-base-uncased |
| Dataset | SST-2 (GLUE benchmark) |
| Validation Accuracy | 90.71% |
| Weighted F1 | 90.70% |
| F1 — Negative class | 90.32% |
| F1 — Positive class | 91.07% |
| Training time | ~15 min (RTX 3050 Laptop 4GB) |
| Inference (single) | <10ms on GPU |
| Published DistilBERT | 91.3% (within 0.6%) |
| Predicted Negative | Predicted Positive | |
|---|---|---|
| Actual Negative | 378 (TN) | 50 (FP) |
| Actual Positive | 31 (FN) | 413 (TP) |
sentiment-classifier/
├── data_exploration.py # dataset analysis and length stats
├── train.py # fine-tuning with HuggingFace Trainer API
├── evaluate_model.py # classification report + confusion matrix
├── app.py # FastAPI REST API
├── sentiment-model/
│ └── best/ # saved model weights + tokenizer
└── assets/
└── confusion_matrix.png
git clone https://github.com/yourusername/sentiment-classifier
cd sentiment-classifier
pip install -r requirements.txtpython train.pyTrains for 3 epochs on SST-2 (67,349 samples). Checkpoints saved after each epoch. Best model selected by validation accuracy.
python evaluate_model.pyGenerates classification report and saves confusion matrix to
assets/confusion_matrix.png.
uvicorn app:app --reload --host 0.0.0.0 --port 8000Interactive docs available at http://127.0.0.1:8000/docs
// Request
{"text": "This movie was absolutely fantastic!"}
// Response
{
"text": "This movie was absolutely fantastic!",
"label": "Positive",
"confidence": 0.9999
}// Request
{"texts": ["Brilliant film.", "Waste of two hours."]}
// Response
[
{"text": "Brilliant film.", "label": "Positive", "confidence": 0.9987},
{"text": "Waste of two hours.", "label": "Negative", "confidence": 0.9971}
]{"status": "ok", "device": "cuda", "model": "distilbert-base-uncased fine-tuned SST-2"}Dynamic padding — DataCollatorWithPadding pads each batch to its
longest sequence rather than padding everything to 512. With SST-2's
average sentence length of 9.4 words (~10 tokens), this reduces memory
usage by over 4× per batch.
max_length=128 — 99th percentile sentence length in SST-2 is 35
words (~40 tokens after WordPiece). Using 128 instead of the default 512
gives comfortable headroom while keeping batches lean.
Warmup ratio — LR linearly ramps from 0 to 2e-5 over the first 10% of training steps. This protects pretrained weights from large gradient updates early in fine-tuning, which would cause catastrophic forgetting.
load_best_model_at_end=True — saves the checkpoint with the best
validation accuracy, not the final epoch. Epoch 3 eval loss is typically
slightly higher than epoch 2 due to minor overfitting — this ensures we
always deploy the best checkpoint.
transformers4.40+ — model, tokenizer, Trainer APIdatasets— SST-2 loading and preprocessingevaluate— accuracy and F1 metricsPyTorch— training backendFastAPI+uvicorn— REST APIscikit-learn— confusion matrix and classification reportmatplotlib— confusion matrix plot
Final project structure is complete:
sentiment-classifier/ ├── data_exploration.py ✅ ├── train.py ✅ ├── evaluate_model.py ✅ ├── app.py ✅ ├── requirements.txt ✅ ├── README.md ✅ ├── sentiment-model/ │ └── best/ ✅ └── assets/ └── confusion_matrix.png ✅
