diff options
| author | skal <pascal.massimino@gmail.com> | 2026-02-17 16:12:21 +0100 |
|---|---|---|
| committer | skal <pascal.massimino@gmail.com> | 2026-02-17 16:12:21 +0100 |
| commit | 03579c4a33ab3955ff9924a6dcd882fe91dd9aaa (patch) | |
| tree | be458d2ac4bc0d7160be8a18526b4e9157af33a5 /doc | |
| parent | e3f0b002c0998c8553e782273b254869107ffc0f (diff) | |
feat(mq_editor): Phase 1 - MQ extraction and visualization (SPECTRAL_BRUSH_2)
Implement McAulay-Quatieri sinusoidal analysis tool for audio compression.
New files:
- doc/SPECTRAL_BRUSH_2.md: Complete design doc (MQ algorithm, data format, synthesis, roadmap)
- tools/mq_editor/index.html: Web UI (file loader, params, canvas)
- tools/mq_editor/fft.js: Radix-2 Cooley-Tukey FFT (from spectral_editor)
- tools/mq_editor/mq_extract.js: MQ algorithm (peak detection, tracking, bezier fitting)
- tools/mq_editor/viewer.js: Visualization (spectrogram, partials, zoom, axes)
- tools/mq_editor/README.md: Usage and implementation status
Features:
- Load WAV → extract sinusoidal partials → fit cubic bezier curves
- Time-frequency spectrogram with hot colormap (0-16 kHz)
- Horizontal zoom (mousewheel) around mouse position
- Axis ticks with labels (time: seconds, freq: Hz/kHz)
- Mouse tooltip showing time/frequency coordinates
- Real-time adjustable MQ parameters (FFT size, hop, threshold)
Algorithm:
- STFT with Hann windows (2048 FFT, 512 hop)
- Peak detection with parabolic interpolation
- Birth/death/continuation tracking (50 Hz tolerance)
- Cubic bezier fitting (4 control points per trajectory)
Next: Phase 2 (JS synthesizer for audio preview)
handoff(Claude): MQ editor Phase 1 complete. Ready for synthesis implementation.
Diffstat (limited to 'doc')
| -rw-r--r-- | doc/SPECTRAL_BRUSH_2.md | 523 |
1 files changed, 523 insertions, 0 deletions
diff --git a/doc/SPECTRAL_BRUSH_2.md b/doc/SPECTRAL_BRUSH_2.md new file mode 100644 index 0000000..76e49db --- /dev/null +++ b/doc/SPECTRAL_BRUSH_2.md @@ -0,0 +1,523 @@ +# Spectral Brush Editor v2: MQ-Based Sinusoidal Synthesis + +**Status:** Design Phase +**Target:** Procedural audio compression for short samples (drums, piano, impacts) +**Replaces:** Spectrogram-based synthesis (poor audio quality) + +--- + +## Overview + +McAulay-Quatieri (MQ) sinusoidal modeling for audio compression. Extract frequency/amplitude trajectories as bezier curves, apply "style" via replicas (harmonics, spread, jitter), synthesize to baked PCM buffers. + +**Key Features:** +- **50-100× compression:** WAV → bezier curves + replica params → C++ structs +- **Web-based editor:** Real-time MQ extraction, curve editing, synthesis preview +- **Procedural synthesis:** Bandwidth-enhanced oscillators with phase jitter and frequency spread +- **Tracker integration:** MQ samples triggered as assets, future pitch/amp modulation + +--- + +## Architecture + +### Data Flow + +``` +┌─────────────────────────────────────────────────────┐ +│ Web Editor (tools/mq_editor/) │ +├─────────────────────────────────────────────────────┤ +│ Input: WAV or saved .txt params │ +│ ↓ │ +│ MQ Extraction: FFT → Peak Tracking → Bezier Fitting │ +│ ↓ │ +│ Editing: Drag control points, adjust replicas │ +│ ↓ │ +│ JS Synthesizer: Preview original vs. synthesized │ +│ ↓ │ +│ Export: .txt params + generated .cc code │ +└─────────────────────────────────────────────────────┘ + ↓ +┌─────────────────────────────────────────────────────┐ +│ C++ Demo (src/audio/) │ +├─────────────────────────────────────────────────────┤ +│ Build: .txt → generated .cc (MQSample structs) │ +│ ↓ │ +│ Synthesis: Bake PCM at init (CPU, future GPU) │ +│ ↓ │ +│ AudioEngine: Register as sample asset │ +│ ↓ │ +│ Tracker: Trigger via patterns (future modulation) │ +└─────────────────────────────────────────────────────┘ +``` + +--- + +## Data Model + +### Per-Partial Representation + +Each sinusoidal partial stores: + +``` +Partial { + freq_curve: CubicBezier // Frequency trajectory (Hz vs. seconds) + amp_curve: CubicBezier // Amplitude envelope (0-1 vs. seconds) + replicas: ReplicaConfig // Harmonic/inharmonic copies +} + +CubicBezier { + (t0, v0), (t1, v1), (t2, v2), (t3, v3) // 4 control points +} + +ReplicaConfig { + offsets: [ratio1, ratio2, ...] // Frequency ratios (1.0, 2.01, 0.5, ...) + decay_alpha: float // Amplitude decay: exp(-α·|f-f₀|) + jitter: float [0-1] // Phase randomization amount + spread_above: float [0-1] // Frequency spread +% of f₀ + spread_below: float [0-1] // Frequency spread -% of f₀ + bandwidth: float [0-1] // Noise bandwidth ±% of f +} +``` + +### Text Format (.txt) + +Stored in `workspaces/main/mq_samples/`: + +``` +# MQ Sample: drum_kick.txt +sample_rate 32000 +duration 1.5 + +# Global defaults (optional, can override per partial) +replica_defaults + decay_alpha 0.1 + jitter 0.05 + spread_above 0.02 + spread_below 0.02 + bandwidth 0.01 +end + +# Partial 0: fundamental +partial + # Frequency bezier (seconds, Hz): t0 f0 t1 f1 t2 f2 t3 f3 + freq_curve 0.0 60.0 0.2 58.0 0.8 55.0 1.5 50.0 + + # Amplitude bezier (seconds, 0-1): t0 a0 t1 a1 t2 a2 t3 a3 + amp_curve 0.0 0.0 0.05 1.0 0.5 0.3 1.5 0.0 + + # Replica frequency ratios + replicas 1.0 2.01 3.03 + + # Override defaults (optional) + decay_alpha 0.15 + jitter 0.08 + spread_above 0.03 + spread_below 0.01 + bandwidth 0.02 +end + +# Partial 1: overtone +partial + freq_curve 0.0 180.0 0.2 178.0 0.8 175.0 1.5 170.0 + amp_curve 0.0 0.0 0.05 0.6 0.5 0.2 1.5 0.0 + replicas 1.0 1.99 +end +``` + +### Generated C++ Code + +Stored in `src/generated/mq_<name>.cc`: + +```cpp +// Auto-generated from mq_samples/drum_kick.txt +// DO NOT EDIT + +struct MQBezier { + float t0, v0, t1, v1, t2, v2, t3, v3; +}; + +struct MQPartial { + MQBezier freq; + MQBezier amp; + const float* replicas; + int num_replicas; + float decay_alpha; + float jitter; + float spread_above; + float spread_below; + float bandwidth; +}; + +static const float drum_kick_replicas_0[] = {1.0f, 2.01f, 3.03f}; +static const float drum_kick_replicas_1[] = {1.0f, 1.99f}; + +static const MQPartial drum_kick_partials[] = { + { + {0.0f, 60.0f, 0.2f, 58.0f, 0.8f, 55.0f, 1.5f, 50.0f}, + {0.0f, 0.0f, 0.05f, 1.0f, 0.5f, 0.3f, 1.5f, 0.0f}, + drum_kick_replicas_0, 3, + 0.15f, 0.08f, 0.03f, 0.01f, 0.02f + }, + { + {0.0f, 180.0f, 0.2f, 178.0f, 0.8f, 175.0f, 1.5f, 170.0f}, + {0.0f, 0.0f, 0.05f, 0.6f, 0.5f, 0.2f, 1.5f, 0.0f}, + drum_kick_replicas_1, 2, + 0.1f, 0.05f, 0.02f, 0.02f, 0.01f + } +}; + +struct MQSample { + int sample_rate; + float duration; + const MQPartial* partials; + int num_partials; +}; + +const MQSample ASSET_MQ_DRUM_KICK = { + 32000, 1.5f, drum_kick_partials, 2 +}; +``` + +--- + +## McAulay-Quatieri Algorithm + +### Phase 1: Peak Detection + +STFT with overlapping windows: + +``` +For each frame (hop = 512 samples): + 1. FFT (size = 2048) + 2. Magnitude spectrum |X[k]| + 3. Detect peaks: local maxima above threshold + 4. Extract (frequency, amplitude, phase) via parabolic interpolation +``` + +**Parameters:** +- `fft_size`: 2048 (adjustable 1024-4096) +- `hop_size`: 512 (75% overlap) +- `peak_threshold`: -60 dB (adjustable) + +### Phase 2: Trajectory Tracking + +Link peaks across frames into continuous partials: + +``` +Birth/Death/Continuation model: + - Match peak to existing partial if |f_new - f_old| < threshold + - Birth new partial if unmatched peak persists 2+ frames + - Death partial if no match for 2+ frames +``` + +**Tracking threshold:** 50 Hz (adjustable) + +### Phase 3: Bezier Curve Fitting + +Fit cubic bezier to each partial's trajectory: + +``` +Input: [(t1, f1), (t2, f2), ..., (tN, fN)] +Output: 4 control points minimizing least-squares error + +Algorithm: + 1. Fix endpoints: (t0, f0) = first, (t3, f3) = last + 2. Solve for (t1, f1), (t2, f2) via linear regression + 3. Repeat for amplitude trajectory +``` + +**Error threshold:** Auto-fit to minimize control points (future: user-adjustable simplification) + +--- + +## Synthesis Model + +### Replica Oscillator Bank + +For each partial at time `t`: + +```python +# Evaluate bezier curves +f0 = eval_bezier(partial.freq_curve, t) +A0 = eval_bezier(partial.amp_curve, t) + +# For each replica offset ratio +for ratio in partial.replicas: + # Frequency spread (asymmetric randomization) + spread = random.uniform(-partial.spread_below, +partial.spread_above) + f = f0 * ratio * (1.0 + spread) + + # Amplitude decay + A = A0 * exp(-partial.decay_alpha * abs(f - f0)) + + # Phase (non-deterministic, seeded by frame counter) + phase = 2*pi*f*t + partial.jitter * random.uniform(0, 2*pi) + + # Base sinusoid + sample += A * sin(phase) + + # Bandwidth-enhanced noise (optional) + if partial.bandwidth > 0: + noise_bw = f * partial.bandwidth + sample += A * bandlimited_noise(f - noise_bw, f + noise_bw) +``` + +### Bezier Evaluation (Cubic) + +De Casteljau's algorithm: + +```cpp +float eval_bezier(const MQBezier& b, float t) { + // Normalize t to [0, 1] + float u = (t - b.t0) / (b.t3 - b.t0); + u = clamp(u, 0.0f, 1.0f); + + // Cubic interpolation + float u1 = 1.0f - u; + return u1*u1*u1 * b.v0 + + 3*u1*u1*u * b.v1 + + 3*u1*u*u * b.v2 + + u*u*u * b.v3; +} +``` + +### Baking Process (C++) + +```cpp +// At audio_init() time +void synth_bake_mq(const MQSample& sample, std::vector<float>& pcm_out) { + int num_samples = sample.sample_rate * sample.duration; + pcm_out.resize(num_samples); + + for (int i = 0; i < num_samples; ++i) { + float t = (float)i / sample.sample_rate; + float sample_val = 0.0f; + + for (int p = 0; p < sample.num_partials; ++p) { + const MQPartial& partial = sample.partials[p]; + float f0 = eval_bezier(partial.freq, t); + float A0 = eval_bezier(partial.amp, t); + + for (int r = 0; r < partial.num_replicas; ++r) { + float ratio = partial.replicas[r]; + + // Frequency spread + uint32_t seed = i * 12345 + p * 67890 + r; + float spread = rand_float(seed, -partial.spread_below, partial.spread_above); + float f = f0 * ratio * (1.0f + spread); + + // Amplitude decay + float A = A0 * expf(-partial.decay_alpha * fabsf(f - f0)); + + // Phase jitter + float jitter = rand_float(seed + 1, 0.0f, 1.0f) * partial.jitter; + float phase = 2.0f * M_PI * f * t + jitter * 2.0f * M_PI; + + sample_val += A * sinf(phase); + + // TODO: bandwidth-enhanced noise + } + } + + pcm_out[i] = sample_val; + } +} +``` + +--- + +## Web Editor + +### UI Layout + +``` +┌─────────────────────────────────────────────────────┐ +│ [Load WAV] [Load .txt] [Save .txt] [Export C++] │ +├─────────────────────────────────────────────────────┤ +│ MQ Extraction Params: │ +│ FFT Size: [2048▼] Hop: [512] Threshold: [-60dB]│ +│ [Extract Partials] [Re-extract] │ +├─────────────────────────────────────────────────────┤ +│ ┌─────────────────────────────────────────────────┐ │ +│ │ │ │ +│ │ Time-Frequency Canvas │ │ +│ │ - Spectrogram background │ │ +│ │ - Bezier curves (colored per partial) │ │ +│ │ - Draggable control points (circles) │ │ +│ │ │ │ +│ └─────────────────────────────────────────────────┘ │ +├─────────────────────────────────────────────────────┤ +│ Selected Partial: [0▼] [Add Point] [Remove Point] │ +│ Replicas: [1.0, 2.01, 3.03] [Edit] │ +│ Decay α: [0.15] Jitter: [0.08] │ +│ Spread+: [3%] Spread-: [1%] Bandwidth: [2%] │ +├─────────────────────────────────────────────────────┤ +│ Playback: [▶ Original] [▶ Synthesized] [▶ Both] │ +│ Time: [━━━━━━━━━━━━━━━━━━━━━━━] 0.0s / 1.5s │ +└─────────────────────────────────────────────────────┘ +``` + +### Features + +**Phase 1 (Extraction):** +- Load WAV, run MQ algorithm, visualize partials +- Real-time parameter adjustment (FFT size, threshold, tracking) + +**Phase 2 (Synthesis Preview):** +- JS implementation of full synthesis pipeline +- Playback original vs. synthesized audio (Web Audio API) + +**Phase 3 (Editing):** +- Drag control points to adjust curves +- Add/remove control points (future: auto-simplification) +- Per-partial replica configuration + +**Phase 4 (Export):** +- Save `.txt` format (human-readable) +- Generate C++ code (copy-paste or auto-commit) + +--- + +## C++ Integration + +### File Organization + +``` +workspaces/main/ + mq_samples/ + drum_kick.txt + piano_c4.txt + synth_pad.txt + +src/generated/ + mq_drum_kick.cc # Auto-generated + mq_piano_c4.cc + mq_synth_pad.cc + +src/audio/ + mq_synth.h # Bezier eval, baking API + mq_synth.cc +``` + +### Asset Registration + +Add to `workspaces/main/assets.txt`: + +``` +MQ_DRUM_KICK, NONE, mq_samples/drum_kick.txt, "MQ kick drum" +``` + +Build system: +1. Detect `.txt` changes → trigger code generator +2. Compile generated `.cc` → link into demo +3. `ASSET_MQ_DRUM_KICK` available in code + +### Tracker Integration + +```cpp +// Register MQ samples at init +void audio_init() { + synth_register_mq_sample(SAMPLE_ID_KICK, &ASSET_MQ_DRUM_KICK); + synth_register_mq_sample(SAMPLE_ID_PIANO, &ASSET_MQ_PIANO_C4); +} + +// Trigger from pattern +void pattern_callback(int sample_id, float volume) { + synth_trigger_mq(sample_id, volume); + // Future: pitch modulation, time stretch +} +``` + +--- + +## Implementation Roadmap + +### Phase 1: MQ Extraction (Web) +**Goal:** Load WAV → Extract partials → Visualize trajectories +**Deliverables:** +- `tools/mq_editor/index.html` (basic UI) +- `tools/mq_editor/mq_extract.js` (FFT + peak tracking + bezier fitting) +- `tools/mq_editor/render.js` (canvas visualization) + +**Timeline:** 1-2 weeks + +### Phase 2: JS Synthesizer +**Goal:** Preview synthesized audio in browser +**Deliverables:** +- `tools/mq_editor/mq_synth.js` (replica oscillator bank) +- Web Audio API integration (playback comparison) + +**Timeline:** 1 week + +### Phase 3: Web Editor UI +**Goal:** Full editing workflow +**Deliverables:** +- Draggable control points (canvas interaction) +- Per-partial replica sliders +- Save/load `.txt` format + +**Timeline:** 1-2 weeks + +### Phase 4: C++ Code Generator +**Goal:** `.txt` → generated `.cc` code +**Deliverables:** +- `tools/mq_codegen.py` (parser + C++ emitter) +- Build system integration (CMake hook) + +**Timeline:** 3-5 days + +### Phase 5: C++ Synthesis +**Goal:** Bake PCM at demo init +**Deliverables:** +- `src/audio/mq_synth.{h,cc}` (bezier eval, oscillator bank) +- Integration with AudioEngine/tracker + +**Timeline:** 1 week + +### Phase 6: Optimization +**Goal:** GPU baking, quantization, size reduction +**Deliverables:** +- Compute shader for parallel synthesis +- Quantized bezier control points (f16 or i16) +- Curve simplification algorithm + +**Timeline:** 2-3 weeks (future work) + +--- + +## Future Enhancements + +### Short-Term (Post-MVP) +- **Pitch modulation:** `synth_trigger_mq(sample_id, volume, pitch_ratio)` +- **Time stretch:** Adjust bezier time domain dynamically +- **Amplitude modulation:** LFO/envelope override + +### Medium-Term +- **GPU synthesis:** Compute shader for baked PCM (parallel oscillators) +- **Curve simplification:** Iterative control point reduction (error tolerance) +- **Quantization:** f32 → f16/i16 control points (~50% size reduction) + +### Long-Term +- **Hybrid synthesis:** MQ partials + noise residual (stochastic component) +- **Real-time synthesis:** Per-chunk fillBuffer() instead of baked PCM +- **Segmented beziers:** Multi-segment curves for complex trajectories + +--- + +## References + +- McAulay, R. J., & Quatieri, T. F. (1986). "Speech analysis/synthesis based on a sinusoidal representation." IEEE TASSP. +- Serra, X., & Smith, J. O. (1990). "Spectral modeling synthesis: A sound analysis/synthesis system based on a deterministic plus stochastic decomposition." Computer Music Journal. +- De Casteljau's algorithm: https://en.wikipedia.org/wiki/De_Casteljau%27s_algorithm + +--- + +## Status + +- [x] Design document +- [ ] Phase 1: MQ extraction (Web) +- [ ] Phase 2: JS synthesizer +- [ ] Phase 3: Web editor UI +- [ ] Phase 4: C++ code generator +- [ ] Phase 5: C++ synthesis + integration +- [ ] Phase 6: GPU optimization |
