summaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorskal <pascal.massimino@gmail.com>2026-02-17 16:12:21 +0100
committerskal <pascal.massimino@gmail.com>2026-02-17 16:12:21 +0100
commit03579c4a33ab3955ff9924a6dcd882fe91dd9aaa (patch)
treebe458d2ac4bc0d7160be8a18526b4e9157af33a5 /doc
parente3f0b002c0998c8553e782273b254869107ffc0f (diff)
feat(mq_editor): Phase 1 - MQ extraction and visualization (SPECTRAL_BRUSH_2)
Implement McAulay-Quatieri sinusoidal analysis tool for audio compression. New files: - doc/SPECTRAL_BRUSH_2.md: Complete design doc (MQ algorithm, data format, synthesis, roadmap) - tools/mq_editor/index.html: Web UI (file loader, params, canvas) - tools/mq_editor/fft.js: Radix-2 Cooley-Tukey FFT (from spectral_editor) - tools/mq_editor/mq_extract.js: MQ algorithm (peak detection, tracking, bezier fitting) - tools/mq_editor/viewer.js: Visualization (spectrogram, partials, zoom, axes) - tools/mq_editor/README.md: Usage and implementation status Features: - Load WAV → extract sinusoidal partials → fit cubic bezier curves - Time-frequency spectrogram with hot colormap (0-16 kHz) - Horizontal zoom (mousewheel) around mouse position - Axis ticks with labels (time: seconds, freq: Hz/kHz) - Mouse tooltip showing time/frequency coordinates - Real-time adjustable MQ parameters (FFT size, hop, threshold) Algorithm: - STFT with Hann windows (2048 FFT, 512 hop) - Peak detection with parabolic interpolation - Birth/death/continuation tracking (50 Hz tolerance) - Cubic bezier fitting (4 control points per trajectory) Next: Phase 2 (JS synthesizer for audio preview) handoff(Claude): MQ editor Phase 1 complete. Ready for synthesis implementation.
Diffstat (limited to 'doc')
-rw-r--r--doc/SPECTRAL_BRUSH_2.md523
1 files changed, 523 insertions, 0 deletions
diff --git a/doc/SPECTRAL_BRUSH_2.md b/doc/SPECTRAL_BRUSH_2.md
new file mode 100644
index 0000000..76e49db
--- /dev/null
+++ b/doc/SPECTRAL_BRUSH_2.md
@@ -0,0 +1,523 @@
+# Spectral Brush Editor v2: MQ-Based Sinusoidal Synthesis
+
+**Status:** Design Phase
+**Target:** Procedural audio compression for short samples (drums, piano, impacts)
+**Replaces:** Spectrogram-based synthesis (poor audio quality)
+
+---
+
+## Overview
+
+McAulay-Quatieri (MQ) sinusoidal modeling for audio compression. Extract frequency/amplitude trajectories as bezier curves, apply "style" via replicas (harmonics, spread, jitter), synthesize to baked PCM buffers.
+
+**Key Features:**
+- **50-100× compression:** WAV → bezier curves + replica params → C++ structs
+- **Web-based editor:** Real-time MQ extraction, curve editing, synthesis preview
+- **Procedural synthesis:** Bandwidth-enhanced oscillators with phase jitter and frequency spread
+- **Tracker integration:** MQ samples triggered as assets, future pitch/amp modulation
+
+---
+
+## Architecture
+
+### Data Flow
+
+```
+┌─────────────────────────────────────────────────────┐
+│ Web Editor (tools/mq_editor/) │
+├─────────────────────────────────────────────────────┤
+│ Input: WAV or saved .txt params │
+│ ↓ │
+│ MQ Extraction: FFT → Peak Tracking → Bezier Fitting │
+│ ↓ │
+│ Editing: Drag control points, adjust replicas │
+│ ↓ │
+│ JS Synthesizer: Preview original vs. synthesized │
+│ ↓ │
+│ Export: .txt params + generated .cc code │
+└─────────────────────────────────────────────────────┘
+ ↓
+┌─────────────────────────────────────────────────────┐
+│ C++ Demo (src/audio/) │
+├─────────────────────────────────────────────────────┤
+│ Build: .txt → generated .cc (MQSample structs) │
+│ ↓ │
+│ Synthesis: Bake PCM at init (CPU, future GPU) │
+│ ↓ │
+│ AudioEngine: Register as sample asset │
+│ ↓ │
+│ Tracker: Trigger via patterns (future modulation) │
+└─────────────────────────────────────────────────────┘
+```
+
+---
+
+## Data Model
+
+### Per-Partial Representation
+
+Each sinusoidal partial stores:
+
+```
+Partial {
+ freq_curve: CubicBezier // Frequency trajectory (Hz vs. seconds)
+ amp_curve: CubicBezier // Amplitude envelope (0-1 vs. seconds)
+ replicas: ReplicaConfig // Harmonic/inharmonic copies
+}
+
+CubicBezier {
+ (t0, v0), (t1, v1), (t2, v2), (t3, v3) // 4 control points
+}
+
+ReplicaConfig {
+ offsets: [ratio1, ratio2, ...] // Frequency ratios (1.0, 2.01, 0.5, ...)
+ decay_alpha: float // Amplitude decay: exp(-α·|f-f₀|)
+ jitter: float [0-1] // Phase randomization amount
+ spread_above: float [0-1] // Frequency spread +% of f₀
+ spread_below: float [0-1] // Frequency spread -% of f₀
+ bandwidth: float [0-1] // Noise bandwidth ±% of f
+}
+```
+
+### Text Format (.txt)
+
+Stored in `workspaces/main/mq_samples/`:
+
+```
+# MQ Sample: drum_kick.txt
+sample_rate 32000
+duration 1.5
+
+# Global defaults (optional, can override per partial)
+replica_defaults
+ decay_alpha 0.1
+ jitter 0.05
+ spread_above 0.02
+ spread_below 0.02
+ bandwidth 0.01
+end
+
+# Partial 0: fundamental
+partial
+ # Frequency bezier (seconds, Hz): t0 f0 t1 f1 t2 f2 t3 f3
+ freq_curve 0.0 60.0 0.2 58.0 0.8 55.0 1.5 50.0
+
+ # Amplitude bezier (seconds, 0-1): t0 a0 t1 a1 t2 a2 t3 a3
+ amp_curve 0.0 0.0 0.05 1.0 0.5 0.3 1.5 0.0
+
+ # Replica frequency ratios
+ replicas 1.0 2.01 3.03
+
+ # Override defaults (optional)
+ decay_alpha 0.15
+ jitter 0.08
+ spread_above 0.03
+ spread_below 0.01
+ bandwidth 0.02
+end
+
+# Partial 1: overtone
+partial
+ freq_curve 0.0 180.0 0.2 178.0 0.8 175.0 1.5 170.0
+ amp_curve 0.0 0.0 0.05 0.6 0.5 0.2 1.5 0.0
+ replicas 1.0 1.99
+end
+```
+
+### Generated C++ Code
+
+Stored in `src/generated/mq_<name>.cc`:
+
+```cpp
+// Auto-generated from mq_samples/drum_kick.txt
+// DO NOT EDIT
+
+struct MQBezier {
+ float t0, v0, t1, v1, t2, v2, t3, v3;
+};
+
+struct MQPartial {
+ MQBezier freq;
+ MQBezier amp;
+ const float* replicas;
+ int num_replicas;
+ float decay_alpha;
+ float jitter;
+ float spread_above;
+ float spread_below;
+ float bandwidth;
+};
+
+static const float drum_kick_replicas_0[] = {1.0f, 2.01f, 3.03f};
+static const float drum_kick_replicas_1[] = {1.0f, 1.99f};
+
+static const MQPartial drum_kick_partials[] = {
+ {
+ {0.0f, 60.0f, 0.2f, 58.0f, 0.8f, 55.0f, 1.5f, 50.0f},
+ {0.0f, 0.0f, 0.05f, 1.0f, 0.5f, 0.3f, 1.5f, 0.0f},
+ drum_kick_replicas_0, 3,
+ 0.15f, 0.08f, 0.03f, 0.01f, 0.02f
+ },
+ {
+ {0.0f, 180.0f, 0.2f, 178.0f, 0.8f, 175.0f, 1.5f, 170.0f},
+ {0.0f, 0.0f, 0.05f, 0.6f, 0.5f, 0.2f, 1.5f, 0.0f},
+ drum_kick_replicas_1, 2,
+ 0.1f, 0.05f, 0.02f, 0.02f, 0.01f
+ }
+};
+
+struct MQSample {
+ int sample_rate;
+ float duration;
+ const MQPartial* partials;
+ int num_partials;
+};
+
+const MQSample ASSET_MQ_DRUM_KICK = {
+ 32000, 1.5f, drum_kick_partials, 2
+};
+```
+
+---
+
+## McAulay-Quatieri Algorithm
+
+### Phase 1: Peak Detection
+
+STFT with overlapping windows:
+
+```
+For each frame (hop = 512 samples):
+ 1. FFT (size = 2048)
+ 2. Magnitude spectrum |X[k]|
+ 3. Detect peaks: local maxima above threshold
+ 4. Extract (frequency, amplitude, phase) via parabolic interpolation
+```
+
+**Parameters:**
+- `fft_size`: 2048 (adjustable 1024-4096)
+- `hop_size`: 512 (75% overlap)
+- `peak_threshold`: -60 dB (adjustable)
+
+### Phase 2: Trajectory Tracking
+
+Link peaks across frames into continuous partials:
+
+```
+Birth/Death/Continuation model:
+ - Match peak to existing partial if |f_new - f_old| < threshold
+ - Birth new partial if unmatched peak persists 2+ frames
+ - Death partial if no match for 2+ frames
+```
+
+**Tracking threshold:** 50 Hz (adjustable)
+
+### Phase 3: Bezier Curve Fitting
+
+Fit cubic bezier to each partial's trajectory:
+
+```
+Input: [(t1, f1), (t2, f2), ..., (tN, fN)]
+Output: 4 control points minimizing least-squares error
+
+Algorithm:
+ 1. Fix endpoints: (t0, f0) = first, (t3, f3) = last
+ 2. Solve for (t1, f1), (t2, f2) via linear regression
+ 3. Repeat for amplitude trajectory
+```
+
+**Error threshold:** Auto-fit to minimize control points (future: user-adjustable simplification)
+
+---
+
+## Synthesis Model
+
+### Replica Oscillator Bank
+
+For each partial at time `t`:
+
+```python
+# Evaluate bezier curves
+f0 = eval_bezier(partial.freq_curve, t)
+A0 = eval_bezier(partial.amp_curve, t)
+
+# For each replica offset ratio
+for ratio in partial.replicas:
+ # Frequency spread (asymmetric randomization)
+ spread = random.uniform(-partial.spread_below, +partial.spread_above)
+ f = f0 * ratio * (1.0 + spread)
+
+ # Amplitude decay
+ A = A0 * exp(-partial.decay_alpha * abs(f - f0))
+
+ # Phase (non-deterministic, seeded by frame counter)
+ phase = 2*pi*f*t + partial.jitter * random.uniform(0, 2*pi)
+
+ # Base sinusoid
+ sample += A * sin(phase)
+
+ # Bandwidth-enhanced noise (optional)
+ if partial.bandwidth > 0:
+ noise_bw = f * partial.bandwidth
+ sample += A * bandlimited_noise(f - noise_bw, f + noise_bw)
+```
+
+### Bezier Evaluation (Cubic)
+
+De Casteljau's algorithm:
+
+```cpp
+float eval_bezier(const MQBezier& b, float t) {
+ // Normalize t to [0, 1]
+ float u = (t - b.t0) / (b.t3 - b.t0);
+ u = clamp(u, 0.0f, 1.0f);
+
+ // Cubic interpolation
+ float u1 = 1.0f - u;
+ return u1*u1*u1 * b.v0 +
+ 3*u1*u1*u * b.v1 +
+ 3*u1*u*u * b.v2 +
+ u*u*u * b.v3;
+}
+```
+
+### Baking Process (C++)
+
+```cpp
+// At audio_init() time
+void synth_bake_mq(const MQSample& sample, std::vector<float>& pcm_out) {
+ int num_samples = sample.sample_rate * sample.duration;
+ pcm_out.resize(num_samples);
+
+ for (int i = 0; i < num_samples; ++i) {
+ float t = (float)i / sample.sample_rate;
+ float sample_val = 0.0f;
+
+ for (int p = 0; p < sample.num_partials; ++p) {
+ const MQPartial& partial = sample.partials[p];
+ float f0 = eval_bezier(partial.freq, t);
+ float A0 = eval_bezier(partial.amp, t);
+
+ for (int r = 0; r < partial.num_replicas; ++r) {
+ float ratio = partial.replicas[r];
+
+ // Frequency spread
+ uint32_t seed = i * 12345 + p * 67890 + r;
+ float spread = rand_float(seed, -partial.spread_below, partial.spread_above);
+ float f = f0 * ratio * (1.0f + spread);
+
+ // Amplitude decay
+ float A = A0 * expf(-partial.decay_alpha * fabsf(f - f0));
+
+ // Phase jitter
+ float jitter = rand_float(seed + 1, 0.0f, 1.0f) * partial.jitter;
+ float phase = 2.0f * M_PI * f * t + jitter * 2.0f * M_PI;
+
+ sample_val += A * sinf(phase);
+
+ // TODO: bandwidth-enhanced noise
+ }
+ }
+
+ pcm_out[i] = sample_val;
+ }
+}
+```
+
+---
+
+## Web Editor
+
+### UI Layout
+
+```
+┌─────────────────────────────────────────────────────┐
+│ [Load WAV] [Load .txt] [Save .txt] [Export C++] │
+├─────────────────────────────────────────────────────┤
+│ MQ Extraction Params: │
+│ FFT Size: [2048▼] Hop: [512] Threshold: [-60dB]│
+│ [Extract Partials] [Re-extract] │
+├─────────────────────────────────────────────────────┤
+│ ┌─────────────────────────────────────────────────┐ │
+│ │ │ │
+│ │ Time-Frequency Canvas │ │
+│ │ - Spectrogram background │ │
+│ │ - Bezier curves (colored per partial) │ │
+│ │ - Draggable control points (circles) │ │
+│ │ │ │
+│ └─────────────────────────────────────────────────┘ │
+├─────────────────────────────────────────────────────┤
+│ Selected Partial: [0▼] [Add Point] [Remove Point] │
+│ Replicas: [1.0, 2.01, 3.03] [Edit] │
+│ Decay α: [0.15] Jitter: [0.08] │
+│ Spread+: [3%] Spread-: [1%] Bandwidth: [2%] │
+├─────────────────────────────────────────────────────┤
+│ Playback: [▶ Original] [▶ Synthesized] [▶ Both] │
+│ Time: [━━━━━━━━━━━━━━━━━━━━━━━] 0.0s / 1.5s │
+└─────────────────────────────────────────────────────┘
+```
+
+### Features
+
+**Phase 1 (Extraction):**
+- Load WAV, run MQ algorithm, visualize partials
+- Real-time parameter adjustment (FFT size, threshold, tracking)
+
+**Phase 2 (Synthesis Preview):**
+- JS implementation of full synthesis pipeline
+- Playback original vs. synthesized audio (Web Audio API)
+
+**Phase 3 (Editing):**
+- Drag control points to adjust curves
+- Add/remove control points (future: auto-simplification)
+- Per-partial replica configuration
+
+**Phase 4 (Export):**
+- Save `.txt` format (human-readable)
+- Generate C++ code (copy-paste or auto-commit)
+
+---
+
+## C++ Integration
+
+### File Organization
+
+```
+workspaces/main/
+ mq_samples/
+ drum_kick.txt
+ piano_c4.txt
+ synth_pad.txt
+
+src/generated/
+ mq_drum_kick.cc # Auto-generated
+ mq_piano_c4.cc
+ mq_synth_pad.cc
+
+src/audio/
+ mq_synth.h # Bezier eval, baking API
+ mq_synth.cc
+```
+
+### Asset Registration
+
+Add to `workspaces/main/assets.txt`:
+
+```
+MQ_DRUM_KICK, NONE, mq_samples/drum_kick.txt, "MQ kick drum"
+```
+
+Build system:
+1. Detect `.txt` changes → trigger code generator
+2. Compile generated `.cc` → link into demo
+3. `ASSET_MQ_DRUM_KICK` available in code
+
+### Tracker Integration
+
+```cpp
+// Register MQ samples at init
+void audio_init() {
+ synth_register_mq_sample(SAMPLE_ID_KICK, &ASSET_MQ_DRUM_KICK);
+ synth_register_mq_sample(SAMPLE_ID_PIANO, &ASSET_MQ_PIANO_C4);
+}
+
+// Trigger from pattern
+void pattern_callback(int sample_id, float volume) {
+ synth_trigger_mq(sample_id, volume);
+ // Future: pitch modulation, time stretch
+}
+```
+
+---
+
+## Implementation Roadmap
+
+### Phase 1: MQ Extraction (Web)
+**Goal:** Load WAV → Extract partials → Visualize trajectories
+**Deliverables:**
+- `tools/mq_editor/index.html` (basic UI)
+- `tools/mq_editor/mq_extract.js` (FFT + peak tracking + bezier fitting)
+- `tools/mq_editor/render.js` (canvas visualization)
+
+**Timeline:** 1-2 weeks
+
+### Phase 2: JS Synthesizer
+**Goal:** Preview synthesized audio in browser
+**Deliverables:**
+- `tools/mq_editor/mq_synth.js` (replica oscillator bank)
+- Web Audio API integration (playback comparison)
+
+**Timeline:** 1 week
+
+### Phase 3: Web Editor UI
+**Goal:** Full editing workflow
+**Deliverables:**
+- Draggable control points (canvas interaction)
+- Per-partial replica sliders
+- Save/load `.txt` format
+
+**Timeline:** 1-2 weeks
+
+### Phase 4: C++ Code Generator
+**Goal:** `.txt` → generated `.cc` code
+**Deliverables:**
+- `tools/mq_codegen.py` (parser + C++ emitter)
+- Build system integration (CMake hook)
+
+**Timeline:** 3-5 days
+
+### Phase 5: C++ Synthesis
+**Goal:** Bake PCM at demo init
+**Deliverables:**
+- `src/audio/mq_synth.{h,cc}` (bezier eval, oscillator bank)
+- Integration with AudioEngine/tracker
+
+**Timeline:** 1 week
+
+### Phase 6: Optimization
+**Goal:** GPU baking, quantization, size reduction
+**Deliverables:**
+- Compute shader for parallel synthesis
+- Quantized bezier control points (f16 or i16)
+- Curve simplification algorithm
+
+**Timeline:** 2-3 weeks (future work)
+
+---
+
+## Future Enhancements
+
+### Short-Term (Post-MVP)
+- **Pitch modulation:** `synth_trigger_mq(sample_id, volume, pitch_ratio)`
+- **Time stretch:** Adjust bezier time domain dynamically
+- **Amplitude modulation:** LFO/envelope override
+
+### Medium-Term
+- **GPU synthesis:** Compute shader for baked PCM (parallel oscillators)
+- **Curve simplification:** Iterative control point reduction (error tolerance)
+- **Quantization:** f32 → f16/i16 control points (~50% size reduction)
+
+### Long-Term
+- **Hybrid synthesis:** MQ partials + noise residual (stochastic component)
+- **Real-time synthesis:** Per-chunk fillBuffer() instead of baked PCM
+- **Segmented beziers:** Multi-segment curves for complex trajectories
+
+---
+
+## References
+
+- McAulay, R. J., & Quatieri, T. F. (1986). "Speech analysis/synthesis based on a sinusoidal representation." IEEE TASSP.
+- Serra, X., & Smith, J. O. (1990). "Spectral modeling synthesis: A sound analysis/synthesis system based on a deterministic plus stochastic decomposition." Computer Music Journal.
+- De Casteljau's algorithm: https://en.wikipedia.org/wiki/De_Casteljau%27s_algorithm
+
+---
+
+## Status
+
+- [x] Design document
+- [ ] Phase 1: MQ extraction (Web)
+- [ ] Phase 2: JS synthesizer
+- [ ] Phase 3: Web editor UI
+- [ ] Phase 4: C++ code generator
+- [ ] Phase 5: C++ synthesis + integration
+- [ ] Phase 6: GPU optimization