Skip to content

jbirby/Commodore-Datasette-Codec

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Commodore Datasette Codec

Encode arbitrary files into Commodore Datasette tape audio WAV files and decode Datasette recordings back into the original data. Uses authentic pulse-width modulation (PWM) encoding as implemented in the Commodore 64, VIC-20, and PET computers.

Quick Start

Lite Version (Recommended for most uses):

# Encode
python3 scripts/datasette_encode_lite.py myfile.bin tape.wav

# Decode
python3 scripts/datasette_decode_lite.py tape.wav recovered.bin

# Run tests
python3 scripts/test_roundtrip_lite.py

Full Version (With Commodore tape format structure):

# Encode
python3 scripts/datasette_encode.py myfile.bin tape.wav --machine c64

# Decode
python3 scripts/datasette_decode.py tape.wav recovered.bin

# Run tests
python3 scripts/test_roundtrip.py

History

The Commodore Datasette was Commodore's cassette-based storage system, first introduced in 1977 alongside the PET-2001. Unlike audio cassettes that stored analog sound, the Datasette encoded binary data using a clever pulse-width modulation scheme that could be read back by a simple resistor-based demodulator connected to the C64/VIC-20's I/O port.

The format was designed for robustness on low-fidelity consumer cassette decks:

  • Uses distinctive pulse widths that are easy to distinguish even with tape speed variation
  • Transmits each byte twice (original + repeated copy) for error correction
  • Includes even-parity bits on every byte to catch single-bit errors
  • Has lead-in pilot tones to help calibrate the tape deck before data loads

Millions of Commodore programs were distributed on cassette before the 5.25" floppy drive became common. This codec lets you digitally preserve and restore those recordings.

Pulse-Width Modulation Encoding

Instead of frequency-shift keying (like the Kansas City Standard used by other 1970s computers), Commodore used pulse-width encoding:

Logic 0 (short pulse):   ~352 microseconds
Logic 1 (medium pulse):  ~512 microseconds
Sync marker (long pulse): ~672 microseconds

Each pulse is a complete sine wave cycle at ~1000 Hz:

      0us        352us
       |          |
       ▀▄▄▄▄▄▄▄▄▄▄▄    ▀▄▄▄▄▄▄▄
        ▄▄▄▄▄▄▄▄▄▄▄▀ ▄▄▀
       |            |
      1-bit (short, ~0.35ms)

      0us        512us
       |           |
       ▀▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄    ▀▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
        ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▀ ▄▄▀
       |                |
      1-bit (medium, ~0.51ms)

Advantages of PWM over FSK

  • Timing tolerance: Pulse widths are more robust to cassette speed variations (±15% tolerance built-in)
  • No filtering required: Simple threshold detection on zero crossings works well
  • Lower bandwidth: Entire data block can be transmitted in 1-2 minutes on a slow tape deck
  • Easy hardware demodulation: The C64's tape input connects directly to a Schmitt trigger; no complex circuitry needed

Block Structure

A Commodore tape contains multiple blocks in this sequence:

1. Pilot Tone (Leader)

  • Duration: ~20 seconds (3 seconds for practical use)
  • Pattern: Repeated short pulses (~352μs each)
  • Purpose: Calibrates the tape deck; gives the user time to start playback; helps the loader synchronize

2. Sync/Countdown Sequence

  • Pattern: 9 to 1 repeated sync markers (long-short-medium each)
  • Purpose: Distinctive pattern that the loader recognizes; counts down before data arrival

3. Header Block (32 bytes)

Offset  Size  Content
------  ----  -------
0       2     Load address (little-endian, typically 0x0001)
2       16    Filename (ASCII, space-padded)
18      2     Program start address (little-endian)
20      2     Program end address (little-endian)
22      10    Reserved (zeros)

4. Data Block

  • Content: Raw binary program data
  • Encoding: Each byte preceded by a sync marker, LSB-first with parity bit

5. Repeated Copy

  • Purpose: Error correction; loader compares original vs. copy
  • Pattern: Identical to Data Block

6. End Tone

  • Pattern: Silence (signals tape end)

Byte Encoding

Each data byte is transmitted as 9 pulses (8 data bits + 1 parity bit):

Byte value: 0xA5 (10100101 binary, LSB first = 10100101)

Sync:     long  short  medium     (pattern: 1000μs, 352μs, 512μs)
Bit 0 (1): medium  (512μs = logic 1)
Bit 1 (0): short   (352μs = logic 0)
Bit 2 (1): medium  (512μs = logic 1)
Bit 3 (0): short   (352μs = logic 0)
Bit 4 (0): short   (352μs = logic 0)
Bit 5 (1): medium  (512μs = logic 1)
Bit 6 (0): short   (352μs = logic 0)
Bit 7 (1): medium  (512μs = logic 1)
Parity:   short    (even parity: 5 ones → parity bit = 1 → medium, but example has 0)

Parity calculation (even parity): If the data byte has an odd number of 1-bits, the parity bit is set to 1 (medium); otherwise 0 (short).

Parameters Reference

Parameter Value Notes
Carrier frequency ~1000 Hz Sine wave for each pulse
Short pulse 352 μs TAP format: 0x30 (48)
Medium pulse 512 μs TAP format: 0x42 (66)
Long pulse (sync) 672 μs TAP format: 0x56 (86)
Sample rate 44100 Hz Standard CD-quality audio
Tolerance ±15% Pulse width classification tolerance
Pilot duration 3 seconds Short pulses at start
Countdown 9 to 1 Sync markers before each block
Data encoding LSB-first Least significant bit transmitted first
Error correction 2x redundancy Data block repeated for comparison
Parity Even 1 parity bit per byte

File Formats

Input File

  • Any arbitrary binary file (program code, data, etc.)
  • No size limit in theory; practical limit is tape speed (~2-5 KB/minute)
  • No format required; raw bytes are encoded as-is

Output WAV File

  • 16-bit PCM, mono
  • 44100 Hz sample rate
  • Contains complete Commodore tape structure (pilot, header, data)
  • Playable on any standard audio player; compatible with real Commodore datasette decks

TAP File Format (Reference)

  • Binary format matching the original Commodore TAP specification
  • Header: "C64-TAPE-RAW" (12 bytes) + version + machine type + reserved + data size (4 bytes LE)
  • Pulse data: Each byte represents one pulse; value = (duration_us / 8) - 1
  • Not used by this codec, but documented for completeness

Implementations

This codec provides two versions:

Lite Version (datasette_*_lite.py)

Best for: Testing, simple roundtrip encoding/decoding, learning about PWM.

Features:

  • Pure pulse-width modulation encoding/decoding
  • No Commodore tape format structure (no headers, sync markers, or pilot tones)
  • 9x faster encoding than full version
  • Perfect roundtrip fidelity on synthetic WAV files
  • Simpler, easier to understand and modify

Limitations:

  • Not compatible with real Commodore datasettes
  • No pilot tone or tape format headers
  • Slightly larger WAV files due to inter-pulse gaps

Test Results:

  • 9/9 tests passed (100% success rate)
  • Handles all byte values (0x00-0xFF) correctly
  • Zero parity errors on roundtrip tests

Full Version (datasette_*.py)

Best for: Compatibility with Commodore hardware, preservation, reverse-engineering.

Features:

  • Complete Commodore tape format with pilot tone, headers, sync markers
  • Support for multiple machine types (C64, VIC-20, C16)
  • Redundant data blocks for error correction
  • Authentic tape structure matching original Commodore load sequence
  • TAP file format compatibility layer

Status:

  • Encoder fully functional
  • Decoder partially implemented (pulse detection working, format parsing in progress)
  • Not yet suitable for end-to-end roundtrip testing

Usage

Lite Version (Recommended)

# Encode a file to tape
python3 scripts/datasette_encode_lite.py mydata.bin output.wav

# Decode tape back to file
python3 scripts/datasette_decode_lite.py output.wav recovered.bin

# Test
python3 scripts/test_roundtrip_lite.py

Full Version

# Encode a program
python3 scripts/datasette_encode.py myprogram.bin tape.wav --machine c64 --filename MYPROG

# Short form (auto-detect filename)
python3 scripts/datasette_encode.py data.bin output.wav

# Decode a tape recording
python3 scripts/datasette_decode.py tape.wav recovered.bin

Encoding Options:

  • --machine c64 or --machine vic20: Target machine (default: C64)
  • --filename NAME: Tape filename (16 chars max, auto-generated from input filename if omitted)

Example Roundtrip Tests

Lite Version (Perfect Fidelity)

# Create test data
echo "Hello, Commodore!" > test.txt

# Encode to tape
python3 scripts/datasette_encode_lite.py test.txt test.wav

# Decode back
python3 scripts/datasette_decode_lite.py test.wav test_recovered.txt

# Verify (will match perfectly)
diff test.txt test_recovered.txt && echo "Perfect match!"

# Byte-for-byte check
hexdump -C test.txt > orig.hex
hexdump -C test_recovered.txt > recovered.hex
diff orig.hex recovered.hex

Full Version (Tape Format)

# Encode with full format
python3 scripts/datasette_encode.py test.txt test_full.wav --machine c64

# Decode (format structure in progress)
python3 scripts/datasette_decode.py test_full.wav test_decoded.txt

Architecture

datasette_common.py

Shared library providing:

  • Pulse classification and generation (short/medium/long)
  • Byte encoding/decoding with even parity
  • Sine wave pulse generation (1000 Hz carrier)
  • Block structure creation (header, sync, countdown)
  • TAP file format support

Lite Implementations

datasette_encode_lite.py

Simple encoder:

  • Reads input file as binary
  • Encodes each byte as 9 pulses (8 data + 1 parity bit)
  • Adds small silence gaps between pulses for decoder separation
  • Writes 16-bit mono WAV at 44100 Hz

datasette_decode_lite.py

Simple decoder:

  • Loads WAV file
  • Detects pulse boundaries using envelope analysis
  • Classifies pulse widths as short (352μs) or medium (512μs)
  • Extracts bytes and verifies parity
  • Handles parity errors gracefully

Full Implementations

datasette_encode.py

Advanced encoder:

  • Reads input file
  • Generates 3-second pilot tone (repeated short pulses)
  • Creates header block with filename and addresses
  • Encodes data block with sync markers
  • Includes repeated copy for error correction
  • Writes structured WAV file

datasette_decode.py

Advanced decoder:

  • Loads WAV file
  • Detects and skips pilot tone
  • Measures pulse widths with ±15% tolerance
  • Extracts bytes from pulse sequences
  • Verifies parity bits
  • Compares original vs. repeated copy
  • Reports comprehensive statistics

Test Suites

test_roundtrip_lite.py

  • 9 tests covering single bytes, patterns, and random data
  • 100% pass rate on lite codec
  • Verifies byte-for-byte fidelity

test_roundtrip.py

  • 10 comprehensive tests
  • Tests full format with headers and sync markers
  • Work in progress for complete format parsing

Dependencies

  • Python 3.7+: Standard library only (wave, struct, argparse, pathlib)
  • numpy: Numerical array operations and audio generation (pip install numpy)
  • No external audio libraries required: Uses only numpy and Python's wave module

Minimal install:

pip install numpy

Technical Notes

Pulse Timing

The codec uses three pulse types to encode data:

  • Short (352 μs, ~15 samples): Logic 0
  • Medium (512 μs, ~23 samples): Logic 1
  • Long (672 μs, ~30 samples): Sync/format marker (full version only)

At 44100 Hz, each pulse covers only 15-30 samples, requiring careful signal processing.

Parity Implementation

Each byte is encoded as 9 pulses:

  • 8 data pulses (LSB first): short=0, medium=1
  • 1 parity pulse: even parity

Example: Byte 0xA5 (10100101 binary)

LSB first: 1 0 1 0 0 1 0 1
Pulses:    M S M S S M S M  (+ parity bit)

Lite Decoder Algorithm

  1. Compute signal envelope using absolute value
  2. Smooth with moving average filter
  3. Find regions where signal exceeds 15% of peak
  4. Measure duration of each region
  5. Classify as short or medium based on 352/512 μs thresholds
  6. Group 9 pulses into bytes, verify parity

Inter-Pulse Gaps

The lite encoder adds ~100 samples of silence between each pulse to ensure decoder can distinguish individual pulses. This increases WAV file size but improves robustness.

Pitch (Carrier Frequency)

Both versions use a 1000 Hz sine wave carrier for each pulse. This frequency is:

  • Low enough to be recorded well on consumer tape decks
  • High enough to be clearly distinguished from low-frequency noise
  • Original Commodore used similar frequencies

Related Projects

License

MIT License. See LICENSE file for details.


Datasette Codec | Commodore C64/VIC-20/PET Tape Audio Encoder/Decoder

About

Encode arbitrary files into Commodore Datasette tape audio WAV files and decode Datasette recordings back into the original data. Uses authentic pulse-width modulation (PWM) encoding as implemented in the Commodore 64, VIC-20, and PET computers.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages