Encode arbitrary files into Commodore Datasette tape audio WAV files and decode Datasette recordings back into the original data. Uses authentic pulse-width modulation (PWM) encoding as implemented in the Commodore 64, VIC-20, and PET computers.
Lite Version (Recommended for most uses):
# Encode
python3 scripts/datasette_encode_lite.py myfile.bin tape.wav
# Decode
python3 scripts/datasette_decode_lite.py tape.wav recovered.bin
# Run tests
python3 scripts/test_roundtrip_lite.pyFull Version (With Commodore tape format structure):
# Encode
python3 scripts/datasette_encode.py myfile.bin tape.wav --machine c64
# Decode
python3 scripts/datasette_decode.py tape.wav recovered.bin
# Run tests
python3 scripts/test_roundtrip.pyThe Commodore Datasette was Commodore's cassette-based storage system, first introduced in 1977 alongside the PET-2001. Unlike audio cassettes that stored analog sound, the Datasette encoded binary data using a clever pulse-width modulation scheme that could be read back by a simple resistor-based demodulator connected to the C64/VIC-20's I/O port.
The format was designed for robustness on low-fidelity consumer cassette decks:
- Uses distinctive pulse widths that are easy to distinguish even with tape speed variation
- Transmits each byte twice (original + repeated copy) for error correction
- Includes even-parity bits on every byte to catch single-bit errors
- Has lead-in pilot tones to help calibrate the tape deck before data loads
Millions of Commodore programs were distributed on cassette before the 5.25" floppy drive became common. This codec lets you digitally preserve and restore those recordings.
Instead of frequency-shift keying (like the Kansas City Standard used by other 1970s computers), Commodore used pulse-width encoding:
Logic 0 (short pulse): ~352 microseconds
Logic 1 (medium pulse): ~512 microseconds
Sync marker (long pulse): ~672 microseconds
Each pulse is a complete sine wave cycle at ~1000 Hz:
0us 352us
| |
▀▄▄▄▄▄▄▄▄▄▄▄ ▀▄▄▄▄▄▄▄
▄▄▄▄▄▄▄▄▄▄▄▀ ▄▄▀
| |
1-bit (short, ~0.35ms)
0us 512us
| |
▀▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄ ▀▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▀ ▄▄▀
| |
1-bit (medium, ~0.51ms)
- Timing tolerance: Pulse widths are more robust to cassette speed variations (±15% tolerance built-in)
- No filtering required: Simple threshold detection on zero crossings works well
- Lower bandwidth: Entire data block can be transmitted in 1-2 minutes on a slow tape deck
- Easy hardware demodulation: The C64's tape input connects directly to a Schmitt trigger; no complex circuitry needed
A Commodore tape contains multiple blocks in this sequence:
- Duration: ~20 seconds (3 seconds for practical use)
- Pattern: Repeated short pulses (~352μs each)
- Purpose: Calibrates the tape deck; gives the user time to start playback; helps the loader synchronize
- Pattern: 9 to 1 repeated sync markers (long-short-medium each)
- Purpose: Distinctive pattern that the loader recognizes; counts down before data arrival
Offset Size Content
------ ---- -------
0 2 Load address (little-endian, typically 0x0001)
2 16 Filename (ASCII, space-padded)
18 2 Program start address (little-endian)
20 2 Program end address (little-endian)
22 10 Reserved (zeros)
- Content: Raw binary program data
- Encoding: Each byte preceded by a sync marker, LSB-first with parity bit
- Purpose: Error correction; loader compares original vs. copy
- Pattern: Identical to Data Block
- Pattern: Silence (signals tape end)
Each data byte is transmitted as 9 pulses (8 data bits + 1 parity bit):
Byte value: 0xA5 (10100101 binary, LSB first = 10100101)
Sync: long short medium (pattern: 1000μs, 352μs, 512μs)
Bit 0 (1): medium (512μs = logic 1)
Bit 1 (0): short (352μs = logic 0)
Bit 2 (1): medium (512μs = logic 1)
Bit 3 (0): short (352μs = logic 0)
Bit 4 (0): short (352μs = logic 0)
Bit 5 (1): medium (512μs = logic 1)
Bit 6 (0): short (352μs = logic 0)
Bit 7 (1): medium (512μs = logic 1)
Parity: short (even parity: 5 ones → parity bit = 1 → medium, but example has 0)
Parity calculation (even parity): If the data byte has an odd number of 1-bits, the parity bit is set to 1 (medium); otherwise 0 (short).
| Parameter | Value | Notes |
|---|---|---|
| Carrier frequency | ~1000 Hz | Sine wave for each pulse |
| Short pulse | 352 μs | TAP format: 0x30 (48) |
| Medium pulse | 512 μs | TAP format: 0x42 (66) |
| Long pulse (sync) | 672 μs | TAP format: 0x56 (86) |
| Sample rate | 44100 Hz | Standard CD-quality audio |
| Tolerance | ±15% | Pulse width classification tolerance |
| Pilot duration | 3 seconds | Short pulses at start |
| Countdown | 9 to 1 | Sync markers before each block |
| Data encoding | LSB-first | Least significant bit transmitted first |
| Error correction | 2x redundancy | Data block repeated for comparison |
| Parity | Even | 1 parity bit per byte |
- Any arbitrary binary file (program code, data, etc.)
- No size limit in theory; practical limit is tape speed (~2-5 KB/minute)
- No format required; raw bytes are encoded as-is
- 16-bit PCM, mono
- 44100 Hz sample rate
- Contains complete Commodore tape structure (pilot, header, data)
- Playable on any standard audio player; compatible with real Commodore datasette decks
- Binary format matching the original Commodore TAP specification
- Header: "C64-TAPE-RAW" (12 bytes) + version + machine type + reserved + data size (4 bytes LE)
- Pulse data: Each byte represents one pulse; value = (duration_us / 8) - 1
- Not used by this codec, but documented for completeness
This codec provides two versions:
Best for: Testing, simple roundtrip encoding/decoding, learning about PWM.
Features:
- Pure pulse-width modulation encoding/decoding
- No Commodore tape format structure (no headers, sync markers, or pilot tones)
- 9x faster encoding than full version
- Perfect roundtrip fidelity on synthetic WAV files
- Simpler, easier to understand and modify
Limitations:
- Not compatible with real Commodore datasettes
- No pilot tone or tape format headers
- Slightly larger WAV files due to inter-pulse gaps
Test Results:
- 9/9 tests passed (100% success rate)
- Handles all byte values (0x00-0xFF) correctly
- Zero parity errors on roundtrip tests
Best for: Compatibility with Commodore hardware, preservation, reverse-engineering.
Features:
- Complete Commodore tape format with pilot tone, headers, sync markers
- Support for multiple machine types (C64, VIC-20, C16)
- Redundant data blocks for error correction
- Authentic tape structure matching original Commodore load sequence
- TAP file format compatibility layer
Status:
- Encoder fully functional
- Decoder partially implemented (pulse detection working, format parsing in progress)
- Not yet suitable for end-to-end roundtrip testing
# Encode a file to tape
python3 scripts/datasette_encode_lite.py mydata.bin output.wav
# Decode tape back to file
python3 scripts/datasette_decode_lite.py output.wav recovered.bin
# Test
python3 scripts/test_roundtrip_lite.py# Encode a program
python3 scripts/datasette_encode.py myprogram.bin tape.wav --machine c64 --filename MYPROG
# Short form (auto-detect filename)
python3 scripts/datasette_encode.py data.bin output.wav
# Decode a tape recording
python3 scripts/datasette_decode.py tape.wav recovered.binEncoding Options:
--machine c64or--machine vic20: Target machine (default: C64)--filename NAME: Tape filename (16 chars max, auto-generated from input filename if omitted)
# Create test data
echo "Hello, Commodore!" > test.txt
# Encode to tape
python3 scripts/datasette_encode_lite.py test.txt test.wav
# Decode back
python3 scripts/datasette_decode_lite.py test.wav test_recovered.txt
# Verify (will match perfectly)
diff test.txt test_recovered.txt && echo "Perfect match!"
# Byte-for-byte check
hexdump -C test.txt > orig.hex
hexdump -C test_recovered.txt > recovered.hex
diff orig.hex recovered.hex# Encode with full format
python3 scripts/datasette_encode.py test.txt test_full.wav --machine c64
# Decode (format structure in progress)
python3 scripts/datasette_decode.py test_full.wav test_decoded.txtShared library providing:
- Pulse classification and generation (short/medium/long)
- Byte encoding/decoding with even parity
- Sine wave pulse generation (1000 Hz carrier)
- Block structure creation (header, sync, countdown)
- TAP file format support
Simple encoder:
- Reads input file as binary
- Encodes each byte as 9 pulses (8 data + 1 parity bit)
- Adds small silence gaps between pulses for decoder separation
- Writes 16-bit mono WAV at 44100 Hz
Simple decoder:
- Loads WAV file
- Detects pulse boundaries using envelope analysis
- Classifies pulse widths as short (352μs) or medium (512μs)
- Extracts bytes and verifies parity
- Handles parity errors gracefully
Advanced encoder:
- Reads input file
- Generates 3-second pilot tone (repeated short pulses)
- Creates header block with filename and addresses
- Encodes data block with sync markers
- Includes repeated copy for error correction
- Writes structured WAV file
Advanced decoder:
- Loads WAV file
- Detects and skips pilot tone
- Measures pulse widths with ±15% tolerance
- Extracts bytes from pulse sequences
- Verifies parity bits
- Compares original vs. repeated copy
- Reports comprehensive statistics
- 9 tests covering single bytes, patterns, and random data
- 100% pass rate on lite codec
- Verifies byte-for-byte fidelity
- 10 comprehensive tests
- Tests full format with headers and sync markers
- Work in progress for complete format parsing
- Python 3.7+: Standard library only (wave, struct, argparse, pathlib)
- numpy: Numerical array operations and audio generation (pip install numpy)
- No external audio libraries required: Uses only numpy and Python's wave module
Minimal install:
pip install numpyThe codec uses three pulse types to encode data:
- Short (352 μs, ~15 samples): Logic 0
- Medium (512 μs, ~23 samples): Logic 1
- Long (672 μs, ~30 samples): Sync/format marker (full version only)
At 44100 Hz, each pulse covers only 15-30 samples, requiring careful signal processing.
Each byte is encoded as 9 pulses:
- 8 data pulses (LSB first): short=0, medium=1
- 1 parity pulse: even parity
Example: Byte 0xA5 (10100101 binary)
LSB first: 1 0 1 0 0 1 0 1
Pulses: M S M S S M S M (+ parity bit)
- Compute signal envelope using absolute value
- Smooth with moving average filter
- Find regions where signal exceeds 15% of peak
- Measure duration of each region
- Classify as short or medium based on 352/512 μs thresholds
- Group 9 pulses into bytes, verify parity
The lite encoder adds ~100 samples of silence between each pulse to ensure decoder can distinguish individual pulses. This increases WAV file size but improves robustness.
Both versions use a 1000 Hz sine wave carrier for each pulse. This frequency is:
- Low enough to be recorded well on consumer tape decks
- High enough to be clearly distinguished from low-frequency noise
- Original Commodore used similar frequencies
- BASIC Anywhere Machine: Web-based C64 emulator with datasette support
- Vice Emulator: Full-featured C64/VIC-20/PET emulator with cassette recording/playback
- Commodore Datasette Wikipedia: Historical information
- KC85 Tape Format: Similar pulse-width approach used by East German KC85 computers
- TurboTape: High-speed cassette protocol for Commodore (faster but incompatible)
MIT License. See LICENSE file for details.
Datasette Codec | Commodore C64/VIC-20/PET Tape Audio Encoder/Decoder