GPT-from-Scratch: Learning Transformers with Turkish Text

A character-level Decoder-only Transformer (GPT architecture) from scratch. This project follows the spirit of Andrej Karpathy's nanoGPT tutorial, adapted for Turkish language modeling.

Project Structure

GPT/
├── config/
│   └── config.py          # Hyperparameters and configuration
├── data/
│   └── data.txt           # Training data
├── notebooks/
│   ├── GPT_DEV.ipynb      # Development notebook
│   └── initial.ipynb      # Initial exploration
├── outputs/
│   └── test_model.pth     # Saved model checkpoints
├── src/
│   ├── __init__.py
│   ├── main.py            # Main entry point
│   ├── model.py           # GPT transformer architecture
│   ├── data.py            # DataProcessor class
│   └── train.py           # Trainer class
├── pyproject.toml         # Project configuration
└── README.md

Status: Modular architecture implemented! Code is organized into reusable modules with clear separation of concerns.

🛠️ Setup & Usage

Quick Start

git clone https://github.com/myz21/GPT.git
cd GPT

# Option 1: Using pip (standard)
pip install -e .

# Option 2: Using uv (faster)
uv venv .venv
source .venv/bin/activate  # or .venv\Scripts\activate on Windows
uv sync

Run Training

python src/main.py

Module Overview

Module	Purpose
`config/config.py`	Hyperparameters (batch_size, block_size, learning_rate, etc.)
`src/data.py`	`DataProcessor` - loads text, tokenizes, creates batches
`src/model.py`	GPT architecture - `Head`, `MultiHeadAttention`, `Block`, `GPTLanguageModel`
`src/train.py`	`Trainer` class - handles training loop and loss estimation
`src/main.py`	Main entry point - orchestrates data loading, model training, generation

Google Colab (Original Method)

Upload nutuk.txt to your Google Drive
Open notebooks/GPT_DEV.ipynb in Colab
Adjust file paths and run cells

📊 Model Specifications

Production Configuration:

Context Window: 256 characters
Embedding Dimension: 256
Attention Heads: 6
Transformer Blocks: 6
Batch Size: 64
Parameters: ~4.8M
Dropout: 0.2

Test/Development Configuration:

Context Window: 32 characters
Embedding Dimension: 64
Attention Heads: 2
Transformer Blocks: 2
Batch Size: 4
Parameters: ~107K
Device: CPU (for testing without GPU)

🚀 Roadmap

✅ Completed

🔨 In Progress

Hyperparameter tuning for better convergence
Add validation metrics and monitoring
Implement learning rate scheduler

🎓 Learning Goals

Implement learning rate scheduler (cosine decay with warmup)
Add gradient clipping and norm monitoring
Visualize attention patterns
Experiment with different positional encodings
Try weight tying (embedding ↔ output projection)

🔬 Advanced Experiments (Future)

Compare character-level vs. BPE tokenization
Test Flash Attention for efficiency
Implement KV caching for faster inference
Scale up to larger Turkish corpora

📈 Training Tips

Hardware:

Works on free Colab GPUs (T4)
Training time: ~30-60 minutes for decent results
Can be trained on CPU (much slower)

Hyperparameters to experiment with:

batch_size: 32-64 (depends on GPU memory)
learning_rate: 1e-3 to 3e-4
block_size: 128-512 (longer = better context, more memory)
n_layer: 4-8 (deeper = more capacity, slower training)

🤝 Acknowledgments

This project is heavily inspired by:

📝 License

MIT License - feel free to use for learning and experimentation!

Note: This is a work-in-progress learning project. Contributions and suggestions are welcome, especially from those also learning transformers! 🚀

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GPT-from-Scratch: Learning Transformers with Turkish Text

Project Structure

🛠️ Setup & Usage

Quick Start

Run Training

Module Overview

Google Colab (Original Method)

📊 Model Specifications

🚀 Roadmap

✅ Completed

🔨 In Progress

🎓 Learning Goals

🔬 Advanced Experiments (Future)

📈 Training Tips

🤝 Acknowledgments

📝 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
config		config
data		data
notebooks		notebooks
outputs		outputs
src		src
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

GPT-from-Scratch: Learning Transformers with Turkish Text

Project Structure

🛠️ Setup & Usage

Quick Start

Run Training

Module Overview

Google Colab (Original Method)

📊 Model Specifications

🚀 Roadmap

✅ Completed

🔨 In Progress

🎓 Learning Goals

🔬 Advanced Experiments (Future)

📈 Training Tips

🤝 Acknowledgments

📝 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages