A character-level Decoder-only Transformer (GPT architecture) from scratch. This project follows the spirit of Andrej Karpathy's nanoGPT tutorial, adapted for Turkish language modeling.
GPT/
├── config/
│ └── config.py # Hyperparameters and configuration
├── data/
│ └── data.txt # Training data
├── notebooks/
│ ├── GPT_DEV.ipynb # Development notebook
│ └── initial.ipynb # Initial exploration
├── outputs/
│ └── test_model.pth # Saved model checkpoints
├── src/
│ ├── __init__.py
│ ├── main.py # Main entry point
│ ├── model.py # GPT transformer architecture
│ ├── data.py # DataProcessor class
│ └── train.py # Trainer class
├── pyproject.toml # Project configuration
└── README.md
Status: Modular architecture implemented! Code is organized into reusable modules with clear separation of concerns.
git clone https://github.com/myz21/GPT.git
cd GPT
# Option 1: Using pip (standard)
pip install -e .
# Option 2: Using uv (faster)
uv venv .venv
source .venv/bin/activate # or .venv\Scripts\activate on Windows
uv syncpython src/main.py| Module | Purpose |
|---|---|
config/config.py |
Hyperparameters (batch_size, block_size, learning_rate, etc.) |
src/data.py |
DataProcessor - loads text, tokenizes, creates batches |
src/model.py |
GPT architecture - Head, MultiHeadAttention, Block, GPTLanguageModel |
src/train.py |
Trainer class - handles training loop and loss estimation |
src/main.py |
Main entry point - orchestrates data loading, model training, generation |
- Upload
nutuk.txtto your Google Drive - Open
notebooks/GPT_DEV.ipynbin Colab - Adjust file paths and run cells
Production Configuration:
- Context Window: 256 characters
- Embedding Dimension: 256
- Attention Heads: 6
- Transformer Blocks: 6
- Batch Size: 64
- Parameters: ~4.8M
- Dropout: 0.2
Test/Development Configuration:
- Context Window: 32 characters
- Embedding Dimension: 64
- Attention Heads: 2
- Transformer Blocks: 2
- Batch Size: 4
- Parameters: ~107K
- Device: CPU (for testing without GPU)
- Basic transformer architecture (self-attention, FFN, residual connections)
- Character-level tokenization for Turkish
- Training loop with AdamW optimizer
- Text generation with temperature sampling
- Modularize codebase ✨
-
src/model.py- Transformer architecture -
src/data.py- Dataset and batching -
src/train.py- Training class -
src/main.py- Main entry point -
config/config.py- Centralized configuration -
pyproject.toml- Project metadata
-
- Hyperparameter tuning for better convergence
- Add validation metrics and monitoring
- Implement learning rate scheduler
- Implement learning rate scheduler (cosine decay with warmup)
- Add gradient clipping and norm monitoring
- Visualize attention patterns
- Experiment with different positional encodings
- Try weight tying (embedding ↔ output projection)
- Compare character-level vs. BPE tokenization
- Test Flash Attention for efficiency
- Implement KV caching for faster inference
- Scale up to larger Turkish corpora
Hardware:
- Works on free Colab GPUs (T4)
- Training time: ~30-60 minutes for decent results
- Can be trained on CPU (much slower)
Hyperparameters to experiment with:
batch_size: 32-64 (depends on GPU memory)learning_rate: 1e-3 to 3e-4block_size: 128-512 (longer = better context, more memory)n_layer: 4-8 (deeper = more capacity, slower training)
This project is heavily inspired by:
MIT License - feel free to use for learning and experimentation!
Note: This is a work-in-progress learning project. Contributions and suggestions are welcome, especially from those also learning transformers! 🚀