This guide explains how to use the TinyLlama GUI for chatting with local language models.
After installation, simply run:
./tinyllama.shThis will start the interactive GUI.
The GUI provides buttons for all interactions:
╔════════════════════════════════════════════════════════════════╗
║ Model Picker ║
╠════════════════════════════════════════════════════════════════╣
║ Choose an installed model to launch ║
║ Each folder under ./models with a config.json is listed below. ║
║ Use [A] for Auto mode - selects based on your query. ║
║ ║
║ # Installed Model Path ║
║ ───────────────────────────────────────────────────────────── ║
║ A Auto Smart selection based on task ║
║ 1 TinyLlama-1.1B-Chat-v1.0 ./models/TinyLlama-1.1B... ║
║ 2 NVIDIA-Nemotron-3-Nano ./models/NVIDIA-Nemotron... ║
╚════════════════════════════════════════════════════════════════╝
Press A to enable Auto mode. The CLI will automatically select the best model based on your query:
- Factual questions → smaller, factual models
- Code generation → models good at code
- Creative tasks → models with better creativity
# Select a specific model when starting
python ai_cli.py --model tinyllama
python ai_cli.py --model NVIDIA-Nemotron-3-Nano-4B-GGUF
python ai_cli.py --model auto # Smart selectionThe GUI provides buttons for all interactions:
- Mode Button: Click to select AI mode (questions, code, academic, math)
- Settings Button: Opens settings panel for generation parameters
- Like/Dislike Buttons: Appear after AI responses to rate them
- Send Button: Send your message to the AI
- Clear Chat: Reset the conversation
- Save Chat: Export conversation to JSON
╔════════════════════════════════════════════════════════════════╗
║ TinyLlama CLI ║
║ Local Chat ║
╠════════════════════════════════════════════════════════════════╣
║ Try /help /settings /save /exit ║
║ ║
║ Model: TinyLlama-1.1B-Chat-v1.0 (GPU) ║
╚════════════════════════════════════════════════════════════════╝
Simply type your message and press Enter. The model will respond with generated text.
Example:
You: What is Python?
TinyLlama: Python is a high-level, interpreted programming language...
The CLI supports markdown rendering for model responses:
- Bold text renders as bold
- Italic text renders as italic
coderenders in monospace- Lists render properly
- Code blocks are highlighted
The CLI provides several built-in commands:
Shows all available commands and their descriptions.
/help
Display and modify generation parameters:
/settings
Shows a panel with current settings:
| Setting | Value |
|---|---|
| temperature | 0.65 |
| top_p | 0.9 |
| top_k | 40 |
| repetition_penalty | 1.1 |
| max_new_tokens | 256 |
| do_sample | True |
Clears the chat history (starts a fresh conversation).
/clear
Saves the current transcript to transcripts/ and exports training data.
/save
Exits the CLI and saves the transcript automatically.
/exit
The CLI automatically tunes generation settings based on your prompts:
| Prompt Type | Temperature | Top-P | Max Tokens |
|---|---|---|---|
| Factual | 0.45 | 0.82 | 220 |
| Code | 0.40 | 0.85 | 280 |
| Creative | 0.88 | 0.95 | 320 |
| Math | 0.00 | 1.00 | 96 |
| Long Context | 0.55 | 0.86 | 192 |
You can modify settings by editing the code in ai_cli.py:
@dataclass
class GenerationConfig:
temperature: float = 0.65
top_p: float = 0.9
top_k: int = 40
repetition_penalty: float = 1.1
max_new_tokens: int = 256
do_sample: bool = True| Key | Action |
|---|---|
| Enter | Send message |
| Ctrl+C | Interrupt generation |
| Ctrl+L | Clear screen |
| Ctrl+D | Exit (same as /exit) |
Chat transcripts are saved to:
transcripts/
├── 2024-01-15_143022.json
├── 2024-01-16_091545.json
└── ...
Each transcript contains:
- Full chat history
- Model used
- Timestamp
- Generation settings
Every time you use /save or /exit, the CLI exports training data:
training_data/tinyllama_sft.jsonl
Each line is a JSON object with:
id: Unique identifiersource_transcript: Transcript file namecreated_at: ISO timestampmessages: Chat messages array
-
Be specific: "Explain how Python decorators work with examples" works better than "Explain decorators"
-
Use prefixes: The model responds better to clear role definitions:
- "As a Python expert, explain..."
- "Write a haiku about..."
-
Break down complex requests: Split into multiple messages for better results
-
Check settings: Use
/settingsto verify generation parameters
- Model Download - Download more models
- Configuration - Customize your setup
- Advanced Features - Learn about prompt tuning
- Troubleshooting - Common issues and solutions