AI-powered walking tour content generator. Create customized tour guide transcripts and audio for any city's points of interest using multiple AI providers.
- 🤖 Multiple AI Providers: Choose between OpenAI (GPT-4), Anthropic (Claude), or Google (Gemini) for content generation
- 🎙️ Multiple TTS Options: Generate audio using OpenAI TTS, Google Cloud TTS, or free Edge TTS
- 📁 Organized Structure: Automatic directory organization by city and POI
- 🌍 Multilingual Support: Generate content and audio in multiple languages
- 💾 Multiple Formats: Saves plain text, SSML, and MP3 audio files
- 🎨 Beautiful CLI: Rich terminal interface with colors and interactive prompts
- 🔖 Version Tracking: Automatic versioning with complete audit trail for all transcripts
- 🔍 Research Mode: Recursive research agent that discovers dramatic stories and physical details
cd pocket-guide# Create virtual environment (recommended)
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt# Copy example config
cp config.example.yaml config.yaml
# Edit config.yaml and add your API keys
nano config.yaml # or use your preferred editorAPI Keys needed:
- OpenAI: Get from https://platform.openai.com/api-keys
- Anthropic: Get from https://console.anthropic.com/
- Google AI: Get from https://makersuite.google.com/app/apikey
- Google Cloud TTS: (Optional) Set up at https://console.cloud.google.com/
- Edge TTS: No API key needed - it's free!
The typical workflow is:
- Generate content (text) for a POI
- Generate audio (TTS) from the content
python src/cli.py generate --city "Paris" --poi "Eiffel Tower"Interactive mode (will prompt for details):
python src/cli.py generateWith options:
python src/cli.py generate \
--city "Paris" \
--poi "Eiffel Tower" \
--provider openai \
--description "Iconic iron lattice tower built in 1889" \
--interests "history,architecture" \
--language "English"Available providers: openai, anthropic, google
python src/cli.py tts --city "Paris" --poi "Eiffel Tower"With options:
python src/cli.py tts \
--city "Paris" \
--poi "Eiffel Tower" \
--provider edge \
--language "en-US"Available TTS providers: openai, google, edge
python src/cli.py citiespython src/cli.py pois Parispython src/cli.py info --city "Paris" --poi "Eiffel Tower"python src/cli.py voicesAfter generating content, your files will be organized like this:
content/
├── paris/
│ ├── eiffel-tower/
│ │ ├── metadata.json # POI metadata, version history
│ │ ├── transcript.txt # Latest transcript (backward compatible)
│ │ ├── transcript.ssml # Latest SSML (backward compatible)
│ │ ├── transcript_v1_2025-11-25.txt # Version 1 transcript
│ │ ├── transcript_v1_2025-11-25.ssml # Version 1 SSML
│ │ ├── transcript_v2_2025-11-26.txt # Version 2 transcript
│ │ ├── transcript_v2_2025-11-26.ssml # Version 2 SSML
│ │ ├── generation_record_v1_2025-11-25.json # Version 1 audit trail
│ │ ├── generation_record_v2_2025-11-26.json # Version 2 audit trail
│ │ └── audio.mp3 # Generated audio
│ └── louvre/
│ └── ...
└── tokyo/
└── senso-ji/
└── ...
Version Tracking:
- Each generation creates a new version with format
v{N}_{YYYY-MM-DD} - All version files are preserved
transcript.txtalways points to the latest version for backward compatibility- Generation records track all parameters, research sources, and node usage
# Generate content using OpenAI (requires API key)
python src/cli.py generate \
--city "Tokyo" \
--poi "Senso-ji Temple" \
--provider openai \
--description "Ancient Buddhist temple in Asakusa"
# Generate audio using free Edge TTS
python src/cli.py tts \
--city "Tokyo" \
--poi "Senso-ji Temple" \
--provider edge \
--language "en-US"# Generate content with Claude (best for conversational tone)
python src/cli.py generate \
--city "Barcelona" \
--poi "Sagrada Familia" \
--provider anthropic \
--interests "architecture,history,art"
# Generate audio with OpenAI TTS
python src/cli.py tts \
--city "Barcelona" \
--poi "Sagrada Familia" \
--provider openai \
--voice "nova"# Generate Spanish content
python src/cli.py generate \
--city "Madrid" \
--poi "Prado Museum" \
--provider google \
--language "Spanish"
# Generate Spanish audio
python src/cli.py tts \
--city "Madrid" \
--poi "Prado Museum" \
--provider edge \
--language "es-ES"python src/cli.py generate \
--city "New York" \
--poi "Statue of Liberty" \
--custom-prompt "Create a fun, kid-friendly tour guide script about the Statue of Liberty. Include interesting facts that children would enjoy, keep it under 200 words, and use simple language."| Service | Cost | Quality | Notes |
|---|---|---|---|
| Content Generation | |||
| OpenAI GPT-4 | ~$0.03 per POI | Excellent | Best balance |
| Claude Sonnet | ~$0.015 per POI | Excellent | Great for nuanced content |
| Gemini Pro | ~$0.002 per POI | Good | Most cost-effective |
| Text-to-Speech | |||
| Edge TTS | FREE | Good | No API key needed |
| OpenAI TTS | ~$0.0075 per POI | Very Good | Simple to use |
| Google Cloud TTS | ~$0.008 per POI | Excellent | Best multilingual |
Estimates based on ~500 words per POI
- Content: Gemini Pro (cheap)
- TTS: Edge TTS (free)
- Total: ~$0.002 per POI
- Content: Claude Sonnet
- TTS: OpenAI TTS
- Total: ~$0.023 per POI
- Content: GPT-4
- TTS: Google Cloud TTS
- Total: ~$0.038 per POI
Contains POI information and generation settings:
{
"city": "Paris",
"poi": "Eiffel Tower",
"provider": "openai",
"language": "English",
"description": "...",
"interests": ["history", "architecture"]
}Plain text transcript suitable for reading:
Welcome to the Eiffel Tower, one of the most iconic structures in the world...
SSML formatted transcript for advanced TTS control:
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">
<prosody rate="medium" pitch="medium">
Welcome to the Eiffel Tower, one of the most iconic structures...
</prosody>
</speak>Generated audio file ready for playback.
Make sure you've installed dependencies:
pip install -r requirements.txtCheck that your config.yaml has the correct API keys and they're not expired.
Set up Google Cloud credentials:
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account-key.json"Or specify the path in config.yaml:
tts_providers:
google:
credentials_file: "/path/to/service-account-key.json"Edge TTS requires an internet connection. Make sure you're online.
Create a script to generate multiple POIs:
#!/bin/bash
CITY="Paris"
POIS=("Eiffel Tower" "Louvre Museum" "Notre-Dame" "Arc de Triomphe")
for POI in "${POIS[@]}"; do
echo "Processing: $POI"
python src/cli.py generate --city "$CITY" --poi "$POI" --provider openai
python src/cli.py tts --city "$CITY" --poi "$POI" --provider edge
doneList available Edge TTS voices:
python src/cli.py voicesUse a specific voice:
python src/cli.py tts \
--city "Paris" \
--poi "Eiffel Tower" \
--provider edge \
--voice "en-GB-SoniaNeural" # British accentSee PRD.md for the full product roadmap. Phase 1 (this CLI) focuses on:
- ✅ Content generation with multiple AI providers
- ✅ TTS generation with multiple services
- ✅ Organized file structure
- ✅ Interactive CLI interface
Future phases will include:
- Progressive Web App (PWA)
- User preference customization
- Interactive maps
- Real-time tour guidance
- Offline mode
Contributions welcome! This is the foundation for a larger walking tour guide platform.
MIT License - see LICENSE file for details
For issues and questions:
- Open an issue on GitHub
- Check PRD.md for project context