Pocket Guide CLI

AI-powered walking tour content generator. Create customized tour guide transcripts and audio for any city's points of interest using multiple AI providers.

Features

🤖 Multiple AI Providers: Choose between OpenAI (GPT-4), Anthropic (Claude), or Google (Gemini) for content generation
🎙️ Multiple TTS Options: Generate audio using OpenAI TTS, Google Cloud TTS, or free Edge TTS
📁 Organized Structure: Automatic directory organization by city and POI
🌍 Multilingual Support: Generate content and audio in multiple languages
💾 Multiple Formats: Saves plain text, SSML, and MP3 audio files
🎨 Beautiful CLI: Rich terminal interface with colors and interactive prompts
🔖 Version Tracking: Automatic versioning with complete audit trail for all transcripts
🔍 Research Mode: Recursive research agent that discovers dramatic stories and physical details

Installation

1. Clone the repository

cd pocket-guide

2. Set up Python environment

# Create virtual environment (recommended)
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

3. Configure API keys

# Copy example config
cp config.example.yaml config.yaml

# Edit config.yaml and add your API keys
nano config.yaml  # or use your preferred editor

API Keys needed:

OpenAI: Get from https://platform.openai.com/api-keys
Anthropic: Get from https://console.anthropic.com/
Google AI: Get from https://makersuite.google.com/app/apikey
Google Cloud TTS: (Optional) Set up at https://console.cloud.google.com/
Edge TTS: No API key needed - it's free!

Usage

Basic Workflow

The typical workflow is:

Generate content (text) for a POI
Generate audio (TTS) from the content

Commands

Generate Content for a POI

python src/cli.py generate --city "Paris" --poi "Eiffel Tower"

Interactive mode (will prompt for details):

python src/cli.py generate

With options:

python src/cli.py generate \
  --city "Paris" \
  --poi "Eiffel Tower" \
  --provider openai \
  --description "Iconic iron lattice tower built in 1889" \
  --interests "history,architecture" \
  --language "English"

Available providers: openai, anthropic, google

Generate Audio (TTS)

python src/cli.py tts --city "Paris" --poi "Eiffel Tower"

With options:

python src/cli.py tts \
  --city "Paris" \
  --poi "Eiffel Tower" \
  --provider edge \
  --language "en-US"

Available TTS providers: openai, google, edge

List Cities

python src/cli.py cities

List POIs in a City

python src/cli.py pois Paris

Show POI Information

python src/cli.py info --city "Paris" --poi "Eiffel Tower"

List Available Voices (Edge TTS)

python src/cli.py voices

Directory Structure

After generating content, your files will be organized like this:

content/
├── paris/
│   ├── eiffel-tower/
│   │   ├── metadata.json                      # POI metadata, version history
│   │   ├── transcript.txt                     # Latest transcript (backward compatible)
│   │   ├── transcript.ssml                    # Latest SSML (backward compatible)
│   │   ├── transcript_v1_2025-11-25.txt       # Version 1 transcript
│   │   ├── transcript_v1_2025-11-25.ssml      # Version 1 SSML
│   │   ├── transcript_v2_2025-11-26.txt       # Version 2 transcript
│   │   ├── transcript_v2_2025-11-26.ssml      # Version 2 SSML
│   │   ├── generation_record_v1_2025-11-25.json  # Version 1 audit trail
│   │   ├── generation_record_v2_2025-11-26.json  # Version 2 audit trail
│   │   └── audio.mp3                          # Generated audio
│   └── louvre/
│       └── ...
└── tokyo/
    └── senso-ji/
        └── ...

Version Tracking:

Each generation creates a new version with format v{N}_{YYYY-MM-DD}
All version files are preserved
transcript.txt always points to the latest version for backward compatibility
Generation records track all parameters, research sources, and node usage

Examples

Example 1: Quick Start with Free Options

# Generate content using OpenAI (requires API key)
python src/cli.py generate \
  --city "Tokyo" \
  --poi "Senso-ji Temple" \
  --provider openai \
  --description "Ancient Buddhist temple in Asakusa"

# Generate audio using free Edge TTS
python src/cli.py tts \
  --city "Tokyo" \
  --poi "Senso-ji Temple" \
  --provider edge \
  --language "en-US"

Example 2: High Quality with Claude + OpenAI TTS

# Generate content with Claude (best for conversational tone)
python src/cli.py generate \
  --city "Barcelona" \
  --poi "Sagrada Familia" \
  --provider anthropic \
  --interests "architecture,history,art"

# Generate audio with OpenAI TTS
python src/cli.py tts \
  --city "Barcelona" \
  --poi "Sagrada Familia" \
  --provider openai \
  --voice "nova"

Example 3: Multilingual Content

# Generate Spanish content
python src/cli.py generate \
  --city "Madrid" \
  --poi "Prado Museum" \
  --provider google \
  --language "Spanish"

# Generate Spanish audio
python src/cli.py tts \
  --city "Madrid" \
  --poi "Prado Museum" \
  --provider edge \
  --language "es-ES"

Example 4: Custom Prompt

python src/cli.py generate \
  --city "New York" \
  --poi "Statue of Liberty" \
  --custom-prompt "Create a fun, kid-friendly tour guide script about the Statue of Liberty. Include interesting facts that children would enjoy, keep it under 200 words, and use simple language."

Cost Comparison

Service	Cost	Quality	Notes
Content Generation
OpenAI GPT-4	~$0.03 per POI	Excellent	Best balance
Claude Sonnet	~$0.015 per POI	Excellent	Great for nuanced content
Gemini Pro	~$0.002 per POI	Good	Most cost-effective
Text-to-Speech
Edge TTS	FREE	Good	No API key needed
OpenAI TTS	~$0.0075 per POI	Very Good	Simple to use
Google Cloud TTS	~$0.008 per POI	Excellent	Best multilingual

Estimates based on ~500 words per POI

Recommended Combos

For Development/Testing

Content: Gemini Pro (cheap)
TTS: Edge TTS (free)
Total: ~$0.002 per POI

For Production (Budget)

Content: Claude Sonnet
TTS: OpenAI TTS
Total: ~$0.023 per POI

For Production (Premium)

Content: GPT-4
TTS: Google Cloud TTS
Total: ~$0.038 per POI

File Formats

metadata.json

Contains POI information and generation settings:

{
  "city": "Paris",
  "poi": "Eiffel Tower",
  "provider": "openai",
  "language": "English",
  "description": "...",
  "interests": ["history", "architecture"]
}

transcript.txt

Plain text transcript suitable for reading:

Welcome to the Eiffel Tower, one of the most iconic structures in the world...

transcript.ssml

SSML formatted transcript for advanced TTS control:

<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">
    <prosody rate="medium" pitch="medium">
        Welcome to the Eiffel Tower, one of the most iconic structures...
    </prosody>
</speak>

audio.mp3

Generated audio file ready for playback.

Troubleshooting

ImportError: No module named 'openai'

Make sure you've installed dependencies:

pip install -r requirements.txt

API Key Error

Check that your config.yaml has the correct API keys and they're not expired.

Google Cloud TTS: Authentication Error

Set up Google Cloud credentials:

export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account-key.json"

Or specify the path in config.yaml:

tts_providers:
  google:
    credentials_file: "/path/to/service-account-key.json"

Edge TTS: No voices available

Edge TTS requires an internet connection. Make sure you're online.

Advanced Usage

Batch Processing

Create a script to generate multiple POIs:

#!/bin/bash

CITY="Paris"
POIS=("Eiffel Tower" "Louvre Museum" "Notre-Dame" "Arc de Triomphe")

for POI in "${POIS[@]}"; do
  echo "Processing: $POI"
  python src/cli.py generate --city "$CITY" --poi "$POI" --provider openai
  python src/cli.py tts --city "$CITY" --poi "$POI" --provider edge
done

Using Different Voices

List available Edge TTS voices:

python src/cli.py voices

Use a specific voice:

python src/cli.py tts \
  --city "Paris" \
  --poi "Eiffel Tower" \
  --provider edge \
  --voice "en-GB-SoniaNeural"  # British accent

Roadmap

See PRD.md for the full product roadmap. Phase 1 (this CLI) focuses on:

✅ Content generation with multiple AI providers
✅ TTS generation with multiple services
✅ Organized file structure
✅ Interactive CLI interface

Future phases will include:

Progressive Web App (PWA)
User preference customization
Interactive maps
Real-time tour guidance
Offline mode

Contributing

Contributions welcome! This is the foundation for a larger walking tour guide platform.

License

MIT License - see LICENSE file for details

Support

For issues and questions:

Open an issue on GitHub
Check PRD.md for project context

Name		Name	Last commit message	Last commit date
Latest commit History 171 Commits
.github/workflows		.github/workflows
backstage		backstage
docs		docs
examples		examples
schemas		schemas
scripts		scripts
src		src
.gitignore		.gitignore
API_README.md		API_README.md
CLI_CHEATSHEET.md		CLI_CHEATSHEET.md
COHERENCE_COMPLETE_EXPLANATION.md		COHERENCE_COMPLETE_EXPLANATION.md
COHERENCE_RULES_DETAILED_EXPLANATION.md		COHERENCE_RULES_DETAILED_EXPLANATION.md
COHERENCE_SIMPLIFICATION.md		COHERENCE_SIMPLIFICATION.md
CONFIGURATION_CHANGES.md		CONFIGURATION_CHANGES.md
CUSTOMIZE_PROMPTS.md		CUSTOMIZE_PROMPTS.md
DEBUG_INSTRUCTIONS.md		DEBUG_INSTRUCTIONS.md
DIAGNOSIS_REPORT.md		DIAGNOSIS_REPORT.md
EXAMPLE_OUTPUT.md		EXAMPLE_OUTPUT.md
HOW_COHERENCE_SCORES_CALCULATED.md		HOW_COHERENCE_SCORES_CALCULATED.md
ILP_VS_AI_RULES_EXPLANATION.md		ILP_VS_AI_RULES_EXPLANATION.md
IMPLEMENTATION_PLAN.md		IMPLEMENTATION_PLAN.md
IMPLEMENTATION_SUMMARY.md		IMPLEMENTATION_SUMMARY.md
INTERESTS_EXPLAINED.md		INTERESTS_EXPLAINED.md
KNOWLEDGE_GRAPH_ARCHITECTURE.md		KNOWLEDGE_GRAPH_ARCHITECTURE.md
PART1_COMPLETION.md		PART1_COMPLETION.md
PHASE1_MVP_PRD.md		PHASE1_MVP_PRD.md
PRD.md		PRD.md
PRECEDENCE_CONSTRAINTS_EXPLAINED.md		PRECEDENCE_CONSTRAINTS_EXPLAINED.md
PYTHON_39_FIX.md		PYTHON_39_FIX.md
QUICKSTART.md		QUICKSTART.md
README.md		README.md
RESEARCH_DATA_USAGE.md		RESEARCH_DATA_USAGE.md
STORYTELLER_PROMPT_GUIDE.md		STORYTELLER_PROMPT_GUIDE.md
TEST_SCRIPTS_SUMMARY.md		TEST_SCRIPTS_SUMMARY.md
TODO-LIST.md		TODO-LIST.md
TRIP_PLANNER_FLOW.md		TRIP_PLANNER_FLOW.md
TRIP_PLANNER_TRANSPARENCY_PLAN.md		TRIP_PLANNER_TRANSPARENCY_PLAN.md
TRIP_PLANNER_USAGE.md		TRIP_PLANNER_USAGE.md
TROUBLESHOOTING.md		TROUBLESHOOTING.md
TTS_SETUP_GUIDE.md		TTS_SETUP_GUIDE.md
TTS_TAIWAN_GUIDE.md		TTS_TAIWAN_GUIDE.md
USING_STORYTELLER_PROMPT.md		USING_STORYTELLER_PROMPT.md
VERSIONING_PLAN.md		VERSIONING_PLAN.md
analyze_all_precedence.py		analyze_all_precedence.py
antalya_pois.txt		antalya_pois.txt
check_combo_feasibility.py		check_combo_feasibility.py
claude.md		claude.md
config.example.yaml		config.example.yaml
dev-aliases.sh		dev-aliases.sh
diagnose_exact_conflict.py		diagnose_exact_conflict.py
diagnose_position_blocking.py		diagnose_position_blocking.py
diagnose_time_windows.py		diagnose_time_windows.py
explain_high_coherence.py		explain_high_coherence.py
extract_pois.py		extract_pois.py
extract_unique_pois.py		extract_unique_pois.py
fix_tour_backups.py		fix_tour_backups.py
generate_missing_research.py		generate_missing_research.py
pocket-guide		pocket-guide
pocket-guide-nofilter		pocket-guide-nofilter
pocket-guide.py		pocket-guide.py
requirements.txt		requirements.txt
setup.sh		setup.sh
start-dev-tmux.sh		start-dev-tmux.sh
start-dev.sh		start-dev.sh
stop-dev.sh		stop-dev.sh
test-api-keys.py		test-api-keys.py
test-google-direct.py		test-google-direct.py
test-google-progressive.py		test-google-progressive.py
test_comparison.py		test_comparison.py
test_constraint_combinations.py		test_constraint_combinations.py
test_debug_metadata.py		test_debug_metadata.py
test_ilp_combo.py		test_ilp_combo.py
test_ilp_combo_20pois.py		test_ilp_combo_20pois.py
test_ilp_optimizer.py		test_ilp_optimizer.py
test_incremental_pois.py		test_incremental_pois.py
test_itinerary_optimizer.py		test_itinerary_optimizer.py
test_minimal_combo.py		test_minimal_combo.py
test_only_arch_pass.py		test_only_arch_pass.py
test_opening_hours.py		test_opening_hours.py
test_poi_selector.py		test_poi_selector.py
test_precedence_combo.py		test_precedence_combo.py
test_reoptimizer.py		test_reoptimizer.py
test_simple_combinations.py		test_simple_combinations.py
test_tour_no_timewindows.py		test_tour_no_timewindows.py
test_tour_rome_20260227.py		test_tour_rome_20260227.py
test_tour_save.py		test_tour_save.py
test_with_date.py		test_with_date.py
test_without_channeling.py		test_without_channeling.py
test_without_combo_constraints.py		test_without_combo_constraints.py
tts_config.yaml		tts_config.yaml
unique_pois_rome.txt		unique_pois_rome.txt
update_rome_visit_info.py		update_rome_visit_info.py

Folders and files

Latest commit

History

Repository files navigation

Pocket Guide CLI

Features

Installation

1. Clone the repository

2. Set up Python environment

3. Configure API keys

Usage

Basic Workflow

Commands

Generate Content for a POI

Generate Audio (TTS)

List Cities

List POIs in a City

Show POI Information

List Available Voices (Edge TTS)

Directory Structure

Examples

Example 1: Quick Start with Free Options

Example 2: High Quality with Claude + OpenAI TTS

Example 3: Multilingual Content

Example 4: Custom Prompt

Cost Comparison

Recommended Combos

For Development/Testing

For Production (Budget)

For Production (Premium)

File Formats

metadata.json

transcript.txt

transcript.ssml

audio.mp3

Troubleshooting

ImportError: No module named 'openai'

API Key Error

Google Cloud TTS: Authentication Error

Edge TTS: No voices available

Advanced Usage

Batch Processing

Using Different Voices

Roadmap

Contributing

License

Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages