A SwiftUI waveform visualization framework for audio apps. Handles decoding, caching, FFT analysis, async loading lifecycle, and rendering in one package — with a realtime-safe audio pipeline and zero external dependencies.
Existing options force a choice: a static-image generator with no interaction (DSWaveformImage), a full audio engine dependency (AudioKit), or a UIKit view from 2015 (FDWaveformView). WaveformKit is a single SwiftUI-native package that covers the complete path from audio file to interactive waveform without any of those tradeoffs.
Key design decisions that differentiate it:
- Zero-allocation audio callbacks — FFT processing uses pre-allocated scratch buffers and vectorised vDSP operations. No heap allocations on the audio render thread.
- Async loading lifecycle —
WaveformLoaderdrivesWaveformState(.idle → .loading(progress) → .loaded / .failed) so loading, progress, and error states are first-class, not afterthoughts. - Extensible renderer protocol —
WaveformRendererlets you supply a custom drawing implementation without forking. Built-in styles are backed by the same protocol surface. - Viewport foundation —
WaveformViewportmodels the visible time range for zoom/pan; the coordinate math is complete and tested, ready for gesture wiring in the next release. - Complete, not minimal — decoding, disk caching, FFT spectrum, live mic, seek gestures, markers, accessibility, and snapshot export are all included.
- Six built-in waveform styles: bars, mirrored bars, dancing bars, line, dots, circular
- Four movement modes: progress-fill, reactive (FFT-driven), combined, idle shimmer
WaveformLoaderwithWaveformStateasync lifecycle — progress, error, and retry built inWaveformRendererprotocol for fully custom styles without forkingWaveformViewportdata model for zoom/pan (gesture wiring in Phase 3)- Seek gestures on all styles — linear drag on bar/line/dot styles, angular drag on circular
- Markers and region overlays with tap callbacks, VoiceOver children, and circular-style support
- Live microphone capture (
MicrophoneRecorder) with bounded memory and interruption handling - Three player paths:
AVPlayer(streaming/local),AVAudioPlayer(local),AVAudioEnginePlayer(local + FFT) - Real FFT spectrum via
MTAudioProcessingTapon the audio render thread (Hann-windowed, 1024-point, log-spaced bands) - Resample cache — amplitude arrays are computed once per summary; 30–60 Hz re-renders hit a dictionary lookup
- Disk waveform cache keyed by file identity
- VoiceOver: adjustable element with
X:XX of Y:YYvalue; per-marker accessibility children WaveformView.snapshot(...)→CGImagefor thumbnails and share sheets- Zero external dependencies — AVFoundation, MediaToolbox, Accelerate only
- iOS 17.0+ / macOS 14.0+
- Swift 5.9+
- Xcode 15+
Xcode: File → Add Package Dependencies → enter the repository URL.
Package.swift:
dependencies: [
.package(url: "https://github.com/GRimAce11/WaveformKit.git", from: "0.5.0")
]A runnable showcase app lives in Demo/. Open Demo/Test Waveform.xcodeproj — it references WaveformKit via local path so it builds immediately without an SPM fetch.
| Screen | What it shows |
|---|---|
| Style Gallery | All 6 styles with live movement, colour, and progress controls |
| Playback | WaveformLoader + AVPlayer + markers + seek scrubbing |
| Async Loading | WaveformState lifecycle — progress bar, cancel, retry, error |
| Microphone | Live FFT recording + interruption handling + captured-file playback |
| Custom Renderer | Three WaveformRenderer implementations with annotated source |
| Viewport | Programmatic WaveformViewport zoom and pan |
The demo generates a test tone on-device at first launch — no bundled audio files, no network required.
The recommended entry point is WaveformLoader + WaveformView(loader:). It handles the loading lifecycle, shows a skeleton shimmer while decoding, and transitions cleanly to the real waveform.
import SwiftUI
import WaveformKit
struct PlayerView: View {
let url: URL
@State private var loader = WaveformLoader()
@State private var adapter = AVPlayerAdapter(player: AVPlayer())
@State private var tap: AVPlayerAmplitudeTap?
var body: some View {
WaveformView(
loader: loader,
currentTime: adapter.currentTime,
amplitude: tap?.currentAmplitude ?? 0,
bands: tap?.bands ?? [],
style: .dancingBars(count: 32),
movement: .reactive(boost: 1.4),
colors: WaveformColors(played: .accentColor, unplayed: .secondary.opacity(0.25)),
onSeek: { adapter.seek(to: $0) }
)
.waveformStateOverlay(loader.state)
.frame(height: 80)
.task {
let player = AVPlayer(url: url)
adapter = AVPlayerAdapter(player: player)
tap = AVPlayerAmplitudeTap(player: player, bandCount: 32)
loader.load(url: url)
player.play()
}
}
}WaveformView(loader:) renders an .idle shimmer while the summary decodes, then transitions to the real waveform once loader.state == .loaded. .waveformStateOverlay adds a progress bar during loading and an error view if decoding fails.
WaveformLoader is an @Observable @MainActor class that manages the full decode cycle. It exposes a single state: WaveformState property that drives your UI reactively.
public enum WaveformState {
case idle
case loading(progress: Double) // 0.0 ... 1.0
case loaded(WaveformSummary)
case failed(Error)
}@State private var loader = WaveformLoader()
var body: some View {
switch loader.state {
case .idle:
Text("No file selected")
case .loading(let p):
ProgressView(value: p).padding()
case .loaded(let summary):
WaveformView(summary: summary, currentTime: 0)
case .failed(let error):
Label(error.localizedDescription, systemImage: "xmark.circle")
}
}// Start a decode (cancels any in-progress decode first)
loader.load(url: fileURL, targetBars: 200, useCache: true)
// Cancel a running decode — state → .idle
loader.cancel()
// Retry after a failure
loader.retry()// Useful for AudioSource.precomputed paths or unit tests
loader.set(WaveformSummary.demo(duration: 30))The original one-shot static method is preserved for source compatibility:
let summary = try await WaveformLoader.load(url: url, targetBars: 200)| Style | Appearance | Typical Use |
|---|---|---|
.bars |
Vertical bars rising from the bottom | Podcast seeker, SoundCloud |
.mirroredBars |
Bars centered on the midline | WhatsApp / iMessage voice notes |
.dancingBars |
Bouncing equalizer bars | "Now Playing" widgets, live audio |
.line |
Smooth filled mirrored curve | Minimal / editorial |
.dots |
Capsules along the midline | Voice note minimal |
.circular |
Radial bars around a centre | Album art overlay, AirPods UI |
WaveformView(summary: s, currentTime: t, style: .bars(count: 120))
WaveformView(summary: s, currentTime: t, style: .mirroredBars())
WaveformView(summary: s, currentTime: t, style: .line(thickness: 1.5))
WaveformView(summary: s, currentTime: t, style: .dots(count: 60))
WaveformView(summary: s, currentTime: t, style: .circular(count: 64))
.aspectRatio(1, contentMode: .fit)
// Live spectrum analyzer
WaveformView(summary: s, currentTime: t,
amplitude: tap.currentAmplitude, bands: tap.bands,
style: .dancingBars(count: 32), movement: .reactive())| Mode | Behaviour |
|---|---|
.progress |
Static waveform; played/unplayed colour split |
.reactive(boost:) |
Bar height scales with live amplitude; no progress fill |
.combined(boost:) |
Progress fill AND reactive amplitude on the played portion |
.idle |
Ping-pong shimmer — loading skeleton or paused state |
Implement WaveformRenderer to draw anything without modifying the library.
struct OscilloscopeRenderer: WaveformRenderer {
func draw(
context: inout GraphicsContext,
size: CGSize,
amplitudes: [Float],
progress: Double,
amplitudeScale: CGFloat,
showsProgress: Bool,
colors: WaveformColors
) {
guard amplitudes.count > 1 else { return }
let midY = size.height / 2
var path = Path()
path.move(to: CGPoint(x: 0, y: midY))
for (i, amp) in amplitudes.enumerated() {
let x = size.width * CGFloat(i) / CGFloat(amplitudes.count - 1)
let y = midY - CGFloat(amp) * amplitudeScale * midY
path.addLine(to: CGPoint(x: x, y: y))
}
context.stroke(path, with: .color(colors.played), lineWidth: 1.5)
}
}
WaveformView(
summary: summary,
currentTime: player.currentTime,
style: .custom(renderer: OscilloscopeRenderer(), barCount: 200)
)WaveformRenderer conformances must be Sendable. Stateless value types are the simplest approach; class-based renderers with mutable state need their own synchronisation (@unchecked Sendable + a lock or actor).
WaveformViewport models the currently-visible time window. The data model and coordinate arithmetic are complete; pinch-to-zoom and pan gestures are the Phase 3 deliverable.
You can drive the viewport programmatically today — useful for views that set the visible range externally:
@State private var viewport = WaveformViewport(duration: summary.duration)
WaveformView(
summary: summary,
currentTime: player.currentTime,
viewport: $viewport,
onSeek: { player.seek(to: $0) }
)
// Jump to a specific region
Button("Show bridge") {
viewport.visibleRange = 95...125 // seconds
}
// Zoom 4× centred on the current playhead
Button("Zoom in") {
let anchor = player.currentTime / summary.duration
viewport.zoom(to: 4, anchor: anchor)
}
// Reset
viewport.resetZoom()When viewport is nil (the default) or zoomFactor == 1.0, WaveformView behaves identically to previous versions — no breaking change.
For local files with live spectrum bands, AVAudioEnginePlayer conforms to both WaveformPlayerAdapter and AmplitudeTap:
let player = try AVAudioEnginePlayer(url: url, bandCount: 32)
player.play()
WaveformView(
summary: summary,
currentTime: player.currentTime,
amplitude: player.currentAmplitude,
bands: player.bands,
style: .dancingBars(count: 32),
movement: .reactive(),
onSeek: { player.seek(to: $0) }
)let player = AVPlayer(url: url)
let adapter = AVPlayerAdapter(player: player)
let tap = AVPlayerAmplitudeTap(player: player, bandCount: 32)
WaveformView(
summary: summary,
currentTime: adapter.currentTime,
amplitude: tap.currentAmplitude,
bands: tap.bands,
onSeek: { adapter.seek(to: $0) }
)MTAudioProcessingTap runs on the audio render thread. The main thread polls at 30 Hz with attack/decay envelope smoothing.
let player = try AVAudioPlayer(contentsOf: url)
let adapter = AVAudioPlayerAdapter(player: player)
let tap = AVAudioPlayerAmplitudeTap(player: player)
// tap.bands is always empty — AVAudioPlayer has no PCM accesslet markers: [WaveformMarker] = [
WaveformMarker(time: 12, color: .yellow, label: "Intro"),
WaveformMarker(time: 48, duration: 22, color: .orange, label: "Verse 1"),
WaveformMarker(time: 95, color: .pink, label: "Drop"),
]
WaveformView(
summary: summary,
currentTime: t,
style: .mirroredBars(count: 120),
markers: markers,
onSeek: { player.seek(to: $0) },
onMarkerTap: { marker in player.seek(to: marker.time) }
)- Point markers (
duration: 0): vertical line + dot. Tap →onMarkerTap. - Region markers (
duration > 0): translucent band + edge stripe. Tap inside or near an edge →onMarkerTap. - A drag always fires
onSeek;onMarkerTaponly fires on a tap (no drag). - All six styles support markers.
.circularrenders radial ticks and arc regions with arc-length hit-testing.
@State private var recorder = MicrophoneRecorder(
bandCount: 32,
binsPerSecond: 20,
maximumDuration: 60,
outputURL: FileManager.default.temporaryDirectory
.appendingPathComponent("memo.caf")
)
var body: some View {
WaveformView(
summary: recorder.summary,
currentTime: recorder.currentTime,
amplitude: recorder.currentAmplitude,
bands: recorder.bands,
style: .mirroredBars(count: 80),
movement: .reactive(boost: 1.4)
)
.frame(height: 60)
.task { try? await recorder.start() }
}Requires NSMicrophoneUsageDescription in Info.plist. Interruptions and route changes are handled automatically. Set autoResumeAfterInterruption: false to stay paused after a phone call.
Memory is bounded: when the amplitude array exceeds maxBins (default 4000), adjacent pairs are averaged in-place. A 24-hour recording stays under 16 KB.
AVPlayerAmplitudeTap and AVAudioEnginePlayer run a 1024-point Hann-windowed FFT on the audio render thread. Bands are logarithmically spaced from 40 Hz to 16 kHz.
let tap = AVPlayerAmplitudeTap(player: player, bandCount: 32)
// tap.bands [Float] — bandCount values in [0, 1]
// tap.currentAmplitude Float — smoothed RMS across channelsWhen tap.bands.count >= count, .dancingBars drives each bar from its own frequency range. Otherwise it falls back to amplitude-driven wobble — still visually convincing for voice/podcast content.
// Automatic via WaveformLoader:
loader.load(url: url) // useCache: true by default
// Manual:
let summary = try await WaveformLoader.load(url: url, targetBars: 200)
// Invalidation:
WaveformCache.remove(url: url, targetBars: 200)
WaveformCache.clear()Cache key: filename + file size + mtime + bar count + format version. Stored in ~/Library/Caches/WaveformKit/. No automatic eviction — see Known Limitations.
if let cg = WaveformView.snapshot(
summary: summary,
size: CGSize(width: 300, height: 60),
style: .mirroredBars(count: 80),
colors: WaveformColors(played: .accentColor)
) {
let image = UIImage(cgImage: cg) // iOS
}Use this for List / LazyVStack cells instead of a live Canvas per row.
WaveformView is a single adjustable VoiceOver element. Value: "0:42 of 3:14". Swipe up/down seeks by 5 % of duration and routes through onSeek. Marker count is appended to the label.
Each WaveformMarker is exposed as its own focusable child. Phrasing: "Intro, at 0:12" (point), "Verse, 0:48 to 1:10" (region). Reuse labels in custom wrappers: WaveformView.markerAccessibilityLabel(for:).
┌───────────────────────────────────────────────────────────────────┐
│ Decoding + Caching │
│ │
│ AudioDecoder (AVAssetReader + vDSP_rmsqv per-bar RMS) │
│ └──► WaveformSummary (amplitudes, duration, sampleRate, id) │
│ └──► WaveformCache (disk, file-identity key) │
│ │ │
│ WaveformLoader (@Observable, async, cancellable)│
│ └──► WaveformState │
└─────────────────────────────────┬─────────────────────────────────┘
│
┌─────────────────────────────────▼─────────────────────────────────┐
│ Rendering │
│ │
│ WaveformView │
│ ◄── WaveformSummary │
│ ◄── PlayerAdapter.currentTime (30 Hz, @Observable) │
│ ◄── AmplitudeTap.currentAmplitude + .bands │
│ ◄── WaveformViewport? (visible time range) │
│ │
│ ResampleCache (keyed by summary.id + barCount + slice) │
│ WaveformRenderer protocol → 6 built-in + .custom(any Renderer) │
└─────────────────────────────────┬─────────────────────────────────┘
│
┌─────────────────────────────────▼─────────────────────────────────┐
│ Realtime Audio Pipeline │
│ │
│ MTAudioProcessingTap / AVAudioEngine installTap (render thread) │
│ └──► FFTAnalyzer (1024-pt vDSP, ring buffer, zero allocs) │
│ └──► AmplitudeTapStorage │
│ ├── bandScratch (audio thread only) │
│ └── os_unfair_lock → bands (main thread) │
└───────────────────────────────────────────────────────────────────┘
Decoding — AVAssetReader reads the file once, computing RMS per bar via vDSP_rmsqv. The result is serialised to disk. Future opens skip decoding entirely.
Rendering — WaveformView body runs on the main thread. resampleAmplitudes runs once per unique (summary.id, barCount, visibleSlice) and the result is cached in ResampleCache. Under reactive/dancing-bars movement (30–60 Hz body evaluations), re-renders hit the cache — no [Float] allocation per frame.
Realtime pipeline — All FFT work runs on the audio render thread using pre-allocated buffers. The only synchronisation is a single os_unfair_lock held for O(bandCount) scalar stores (~20 ns for 32 bands). No Objective-C, no Swift runtime overhead, no heap allocations in the process callback.
Measured on Apple Silicon (macOS 14, Release, 10 000 iterations):
| Operation | Measured | Budget |
|---|---|---|
FFTAnalyzer.computeBands (1024-pt, 32 bands) |
6.4 µs/call | < 200 µs |
FFTAnalyzer.push (512 frames) |
0.20 µs/call | < 20 µs |
At 44.1 kHz with 1024-frame buffers the audio callback fires ~43 times per second (23 ms period). The FFT consumes under 0.03 % of the available render-thread budget. On A12 Bionic the same operations take roughly 2–4× longer but remain well within the budget.
These numbers are captured by the test suite and will fail CI if they regress beyond the stated budgets.
- No heap allocations — all buffers allocated once in
AmplitudeTapStorage.init. - No Swift runtime overhead — hot-path copies use
UnsafeMutablePointer.update(from:count:)(compiles tomemmove); windowing usesvDSP_vmul. - Short lock window —
os_unfair_lockheld only for the scalar copy of band values to shared storage. - No Objective-C in the process callback.
WaveformColors(
played: .pink,
unplayed: .gray.opacity(0.25),
playedGradient: Gradient(colors: [.pink, .purple]),
unplayedGradient: nil // optional
)playedGradient overrides played when set. Gradients run horizontally on linear styles, vertically on circular.
#Preview {
WaveformView(
summary: .demo(duration: 30, bars: 120),
currentTime: 12,
style: .bars(count: 120),
colors: WaveformColors(played: .accentColor)
)
.frame(height: 80)
.padding()
}WaveformSummary.demo(duration:bars:seed:) generates a deterministic envelope-shaped waveform without a real audio file.
AVAudioPlayerhas no FFT —AVAudioPlayerAmplitudeTap.bandsis always empty. UseAVAudioEnginePlayerfor the local-file + spectrum combination.- Exotic PCM formats — the audio tap handles
Float32andInt16.Int24,Int32, and big-endian variants are skipped (amplitude and bands read 0). - iOS 17 / macOS 14 floor —
@Observablerequires iOS 17+. An iOS 16 backport is on the roadmap. - Long recordings —
MicrophoneRecorderhalves the amplitude array when it exceedsmaxBins(default 4000). Temporal resolution on the oldest portions degrades after each halving cycle. - No zoom gestures yet —
WaveformViewportis complete butMagnificationGesturewiring ships in Phase 3. - No automatic disk-cache eviction —
WaveformCachegrows until cleared. LRU eviction ships in Phase 3.
Phase 3 — Zoom, Pan, Cache Eviction
MagnificationGesture+DragGesturewired toWaveformViewportScrollView-aware gesture passthroughWaveformSummaryPyramid— multi-resolution amplitude arrays for efficient high-zoom rendering- Disk cache LRU eviction with configurable size budget
Phase 4 — Rendering Evolution
- Optional Metal-backed renderer path for spectrograms and large bar counts
- ProMotion 120 Hz
TimelineViewfor.dancingBars
Phase 5 — Editor-Grade Tooling
- Region selection gesture
- RTL layout support
- Explicit watchOS / tvOS targets
- iOS 16 backport (
ObservableObject)
Phases 3–5 are sequenced by stability: each phase is tested and considered stable before the next begins.