# Spectral Brush Editor v2: MQ-Based Sinusoidal Synthesis **Status:** Design Phase **Target:** Procedural audio compression for short samples (drums, piano, impacts) **Replaces:** Spectrogram-based synthesis (poor audio quality) --- ## Overview McAulay-Quatieri (MQ) sinusoidal modeling for audio compression. Extract frequency/amplitude trajectories as bezier curves, apply "style" via replicas (harmonics, spread, jitter), synthesize to baked PCM buffers. **Key Features:** - **50-100× compression:** WAV → bezier curves + replica params → C++ structs - **Web-based editor:** Real-time MQ extraction, curve editing, synthesis preview - **Procedural synthesis:** Bandwidth-enhanced oscillators with phase jitter and frequency spread - **Tracker integration:** MQ samples triggered as assets, future pitch/amp modulation --- ## Architecture ### Data Flow ``` ┌─────────────────────────────────────────────────────┐ │ Web Editor (tools/mq_editor/) │ ├─────────────────────────────────────────────────────┤ │ Input: WAV or saved .txt params │ │ ↓ │ │ MQ Extraction: FFT → Peak Tracking → Bezier Fitting │ │ ↓ │ │ Editing: Drag control points, adjust replicas │ │ ↓ │ │ JS Synthesizer: Preview original vs. synthesized │ │ ↓ │ │ Export: .txt params + generated .cc code │ └─────────────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────────────┐ │ C++ Demo (src/audio/) │ ├─────────────────────────────────────────────────────┤ │ Build: .txt → generated .cc (MQSample structs) │ │ ↓ │ │ Synthesis: Bake PCM at init (CPU, future GPU) │ │ ↓ │ │ AudioEngine: Register as sample asset │ │ ↓ │ │ Tracker: Trigger via patterns (future modulation) │ └─────────────────────────────────────────────────────┘ ``` --- ## Data Model ### Per-Partial Representation Each sinusoidal partial stores: ``` Partial { freq_curve: CubicBezier // Frequency trajectory (Hz vs. seconds) amp_curve: CubicBezier // Amplitude envelope (0-1 vs. seconds) replicas: ReplicaConfig // Harmonic/inharmonic copies } CubicBezier { (t0, v0), (t1, v1), (t2, v2), (t3, v3) // 4 control points } ReplicaConfig { offsets: [ratio1, ratio2, ...] // Frequency ratios (1.0, 2.01, 0.5, ...) decay_alpha: float // Amplitude decay: exp(-α·|f-f₀|) jitter: float [0-1] // Phase randomization amount spread_above: float [0-1] // Frequency spread +% of f₀ spread_below: float [0-1] // Frequency spread -% of f₀ bandwidth: float [0-1] // Noise bandwidth ±% of f } ``` ### Text Format (.txt) Stored in `workspaces/main/mq_samples/`: ``` # MQ Sample: drum_kick.txt sample_rate 32000 duration 1.5 # Global defaults (optional, can override per partial) replica_defaults decay_alpha 0.1 jitter 0.05 spread_above 0.02 spread_below 0.02 bandwidth 0.01 end # Partial 0: fundamental partial # Frequency bezier (seconds, Hz): t0 f0 t1 f1 t2 f2 t3 f3 freq_curve 0.0 60.0 0.2 58.0 0.8 55.0 1.5 50.0 # Amplitude bezier (seconds, 0-1): t0 a0 t1 a1 t2 a2 t3 a3 amp_curve 0.0 0.0 0.05 1.0 0.5 0.3 1.5 0.0 # Replica frequency ratios replicas 1.0 2.01 3.03 # Override defaults (optional) decay_alpha 0.15 jitter 0.08 spread_above 0.03 spread_below 0.01 bandwidth 0.02 end # Partial 1: overtone partial freq_curve 0.0 180.0 0.2 178.0 0.8 175.0 1.5 170.0 amp_curve 0.0 0.0 0.05 0.6 0.5 0.2 1.5 0.0 replicas 1.0 1.99 end ``` ### Generated C++ Code Stored in `src/generated/mq_.cc`: ```cpp // Auto-generated from mq_samples/drum_kick.txt // DO NOT EDIT struct MQBezier { float t0, v0, t1, v1, t2, v2, t3, v3; }; struct MQPartial { MQBezier freq; MQBezier amp; const float* replicas; int num_replicas; float decay_alpha; float jitter; float spread_above; float spread_below; float bandwidth; }; static const float drum_kick_replicas_0[] = {1.0f, 2.01f, 3.03f}; static const float drum_kick_replicas_1[] = {1.0f, 1.99f}; static const MQPartial drum_kick_partials[] = { { {0.0f, 60.0f, 0.2f, 58.0f, 0.8f, 55.0f, 1.5f, 50.0f}, {0.0f, 0.0f, 0.05f, 1.0f, 0.5f, 0.3f, 1.5f, 0.0f}, drum_kick_replicas_0, 3, 0.15f, 0.08f, 0.03f, 0.01f, 0.02f }, { {0.0f, 180.0f, 0.2f, 178.0f, 0.8f, 175.0f, 1.5f, 170.0f}, {0.0f, 0.0f, 0.05f, 0.6f, 0.5f, 0.2f, 1.5f, 0.0f}, drum_kick_replicas_1, 2, 0.1f, 0.05f, 0.02f, 0.02f, 0.01f } }; struct MQSample { int sample_rate; float duration; const MQPartial* partials; int num_partials; }; const MQSample ASSET_MQ_DRUM_KICK = { 32000, 1.5f, drum_kick_partials, 2 }; ``` --- ## McAulay-Quatieri Algorithm ### Phase 1: Peak Detection STFT with overlapping windows: ``` For each frame (hop = 512 samples): 1. FFT (size = 2048) 2. Magnitude spectrum |X[k]| 3. Detect peaks: local maxima above threshold 4. Extract (frequency, amplitude, phase) via parabolic interpolation ``` **Parameters:** - `fft_size`: 2048 (adjustable 1024-4096) - `hop_size`: 512 (75% overlap) - `peak_threshold`: -60 dB (adjustable) ### Phase 2: Trajectory Tracking Link peaks across frames into continuous partials: ``` Birth/Death/Continuation model: - Match peak to existing partial if |f_new - f_old| < threshold - Birth new partial if unmatched peak persists 2+ frames - Death partial if no match for 2+ frames ``` **Tracking threshold:** 50 Hz (adjustable) ### Phase 3: Bezier Curve Fitting Fit cubic bezier to each partial's trajectory: ``` Input: [(t1, f1), (t2, f2), ..., (tN, fN)] Output: 4 control points minimizing least-squares error Algorithm: 1. Fix endpoints: (t0, f0) = first, (t3, f3) = last 2. Solve for (t1, f1), (t2, f2) via linear regression 3. Repeat for amplitude trajectory ``` **Error threshold:** Auto-fit to minimize control points (future: user-adjustable simplification) --- ## Synthesis Model ### Replica Oscillator Bank For each partial at time `t`: ```python # Evaluate bezier curves f0 = eval_bezier(partial.freq_curve, t) A0 = eval_bezier(partial.amp_curve, t) # For each replica offset ratio for ratio in partial.replicas: # Frequency spread (asymmetric randomization) spread = random.uniform(-partial.spread_below, +partial.spread_above) f = f0 * ratio * (1.0 + spread) # Amplitude decay A = A0 * exp(-partial.decay_alpha * abs(f - f0)) # Phase (non-deterministic, seeded by frame counter) phase = 2*pi*f*t + partial.jitter * random.uniform(0, 2*pi) # Base sinusoid sample += A * sin(phase) # Bandwidth-enhanced noise (optional) if partial.bandwidth > 0: noise_bw = f * partial.bandwidth sample += A * bandlimited_noise(f - noise_bw, f + noise_bw) ``` ### Bezier Evaluation (Cubic) De Casteljau's algorithm: ```cpp float eval_bezier(const MQBezier& b, float t) { // Normalize t to [0, 1] float u = (t - b.t0) / (b.t3 - b.t0); u = clamp(u, 0.0f, 1.0f); // Cubic interpolation float u1 = 1.0f - u; return u1*u1*u1 * b.v0 + 3*u1*u1*u * b.v1 + 3*u1*u*u * b.v2 + u*u*u * b.v3; } ``` ### Baking Process (C++) ```cpp // At audio_init() time void synth_bake_mq(const MQSample& sample, std::vector& pcm_out) { int num_samples = sample.sample_rate * sample.duration; pcm_out.resize(num_samples); for (int i = 0; i < num_samples; ++i) { float t = (float)i / sample.sample_rate; float sample_val = 0.0f; for (int p = 0; p < sample.num_partials; ++p) { const MQPartial& partial = sample.partials[p]; float f0 = eval_bezier(partial.freq, t); float A0 = eval_bezier(partial.amp, t); for (int r = 0; r < partial.num_replicas; ++r) { float ratio = partial.replicas[r]; // Frequency spread uint32_t seed = i * 12345 + p * 67890 + r; float spread = rand_float(seed, -partial.spread_below, partial.spread_above); float f = f0 * ratio * (1.0f + spread); // Amplitude decay float A = A0 * expf(-partial.decay_alpha * fabsf(f - f0)); // Phase jitter float jitter = rand_float(seed + 1, 0.0f, 1.0f) * partial.jitter; float phase = 2.0f * M_PI * f * t + jitter * 2.0f * M_PI; sample_val += A * sinf(phase); // TODO: bandwidth-enhanced noise } } pcm_out[i] = sample_val; } } ``` --- ## Web Editor ### UI Layout ``` ┌─────────────────────────────────────────────────────┐ │ [Load WAV] [Load .txt] [Save .txt] [Export C++] │ ├─────────────────────────────────────────────────────┤ │ MQ Extraction Params: │ │ FFT Size: [2048▼] Hop: [512] Threshold: [-60dB]│ │ [Extract Partials] [Re-extract] │ ├─────────────────────────────────────────────────────┤ │ ┌─────────────────────────────────────────────────┐ │ │ │ │ │ │ │ Time-Frequency Canvas │ │ │ │ - Spectrogram background │ │ │ │ - Bezier curves (colored per partial) │ │ │ │ - Draggable control points (circles) │ │ │ │ │ │ │ └─────────────────────────────────────────────────┘ │ ├─────────────────────────────────────────────────────┤ │ Selected Partial: [0▼] [Add Point] [Remove Point] │ │ Replicas: [1.0, 2.01, 3.03] [Edit] │ │ Decay α: [0.15] Jitter: [0.08] │ │ Spread+: [3%] Spread-: [1%] Bandwidth: [2%] │ ├─────────────────────────────────────────────────────┤ │ Playback: [▶ Original] [▶ Synthesized] [▶ Both] │ │ Time: [━━━━━━━━━━━━━━━━━━━━━━━] 0.0s / 1.5s │ └─────────────────────────────────────────────────────┘ ``` ### Features **Phase 1 (Extraction):** - Load WAV, run MQ algorithm, visualize partials - Real-time parameter adjustment (FFT size, threshold, tracking) **Phase 2 (Synthesis Preview):** - JS implementation of full synthesis pipeline - Playback original vs. synthesized audio (Web Audio API) **Phase 3 (Editing):** - Drag control points to adjust curves - Add/remove control points (future: auto-simplification) - Per-partial replica configuration **Phase 4 (Export):** - Save `.txt` format (human-readable) - Generate C++ code (copy-paste or auto-commit) --- ## C++ Integration ### File Organization ``` workspaces/main/ mq_samples/ drum_kick.txt piano_c4.txt synth_pad.txt src/generated/ mq_drum_kick.cc # Auto-generated mq_piano_c4.cc mq_synth_pad.cc src/audio/ mq_synth.h # Bezier eval, baking API mq_synth.cc ``` ### Asset Registration Add to `workspaces/main/assets.txt`: ``` MQ_DRUM_KICK, NONE, mq_samples/drum_kick.txt, "MQ kick drum" ``` Build system: 1. Detect `.txt` changes → trigger code generator 2. Compile generated `.cc` → link into demo 3. `ASSET_MQ_DRUM_KICK` available in code ### Tracker Integration ```cpp // Register MQ samples at init void audio_init() { synth_register_mq_sample(SAMPLE_ID_KICK, &ASSET_MQ_DRUM_KICK); synth_register_mq_sample(SAMPLE_ID_PIANO, &ASSET_MQ_PIANO_C4); } // Trigger from pattern void pattern_callback(int sample_id, float volume) { synth_trigger_mq(sample_id, volume); // Future: pitch modulation, time stretch } ``` --- ## Implementation Roadmap ### Phase 1: MQ Extraction (Web) **Goal:** Load WAV → Extract partials → Visualize trajectories **Deliverables:** - `tools/mq_editor/index.html` (basic UI) - `tools/mq_editor/mq_extract.js` (FFT + peak tracking + bezier fitting) - `tools/mq_editor/render.js` (canvas visualization) **Timeline:** 1-2 weeks ### Phase 2: JS Synthesizer **Goal:** Preview synthesized audio in browser **Deliverables:** - `tools/mq_editor/mq_synth.js` (replica oscillator bank) - Web Audio API integration (playback comparison) **Timeline:** 1 week ### Phase 3: Web Editor UI **Goal:** Full editing workflow **Deliverables:** - Draggable control points (canvas interaction) - Per-partial replica sliders - Save/load `.txt` format **Timeline:** 1-2 weeks ### Phase 4: C++ Code Generator **Goal:** `.txt` → generated `.cc` code **Deliverables:** - `tools/mq_codegen.py` (parser + C++ emitter) - Build system integration (CMake hook) **Timeline:** 3-5 days ### Phase 5: C++ Synthesis **Goal:** Bake PCM at demo init **Deliverables:** - `src/audio/mq_synth.{h,cc}` (bezier eval, oscillator bank) - Integration with AudioEngine/tracker **Timeline:** 1 week ### Phase 6: Optimization **Goal:** GPU baking, quantization, size reduction **Deliverables:** - Compute shader for parallel synthesis - Quantized bezier control points (f16 or i16) - Curve simplification algorithm **Timeline:** 2-3 weeks (future work) --- ## Future Enhancements ### Short-Term (Post-MVP) - **Pitch modulation:** `synth_trigger_mq(sample_id, volume, pitch_ratio)` - **Time stretch:** Adjust bezier time domain dynamically - **Amplitude modulation:** LFO/envelope override ### Medium-Term - **GPU synthesis:** Compute shader for baked PCM (parallel oscillators) - **Curve simplification:** Iterative control point reduction (error tolerance) - **Quantization:** f32 → f16/i16 control points (~50% size reduction) ### Long-Term - **Hybrid synthesis:** MQ partials + noise residual (stochastic component) - **Real-time synthesis:** Per-chunk fillBuffer() instead of baked PCM - **Segmented beziers:** Multi-segment curves for complex trajectories --- ## References - McAulay, R. J., & Quatieri, T. F. (1986). "Speech analysis/synthesis based on a sinusoidal representation." IEEE TASSP. - Serra, X., & Smith, J. O. (1990). "Spectral modeling synthesis: A sound analysis/synthesis system based on a deterministic plus stochastic decomposition." Computer Music Journal. - De Casteljau's algorithm: https://en.wikipedia.org/wiki/De_Casteljau%27s_algorithm --- ## Status - [x] Design document - [x] Phase 1: MQ extraction (Web) - [x] FFT-based peak detection with parabolic interpolation - [x] Frequency-dependent trajectory tracking (5% tolerance, candidate system) - [x] Cubic bezier curve fitting for freq/amp trajectories - [x] Spectrogram visualization with zoom/scroll/playhead - [x] Original WAV playback - [ ] Phase 2: JS synthesizer - [ ] Phase 3: Web editor UI - [ ] Phase 4: C++ code generator - [ ] Phase 5: C++ synthesis + integration - [ ] Phase 6: GPU optimization