summaryrefslogtreecommitdiff
path: root/doc/SPECTRAL_BRUSH_2.md
blob: 442db6dd399fcfd6317fb5921d6d7d803ea20e03 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
# Spectral Brush Editor v2: MQ-Based Sinusoidal Synthesis

**Status:** Design Phase
**Target:** Procedural audio compression for short samples (drums, piano, impacts)
**Replaces:** Spectrogram-based synthesis (poor audio quality)

---

## Overview

McAulay-Quatieri (MQ) sinusoidal modeling for audio compression. Extract frequency/amplitude trajectories as bezier curves, apply "style" via replicas (harmonics, spread, jitter), synthesize to baked PCM buffers.

**Key Features:**
- **50-100× compression:** WAV → bezier curves + replica params → C++ structs
- **Web-based editor:** Real-time MQ extraction, curve editing, synthesis preview
- **Procedural synthesis:** Bandwidth-enhanced oscillators with phase jitter and frequency spread
- **Tracker integration:** MQ samples triggered as assets, future pitch/amp modulation

---

## Architecture

### Data Flow

```
┌─────────────────────────────────────────────────────┐
│ Web Editor (tools/mq_editor/)                       │
├─────────────────────────────────────────────────────┤
│ Input: WAV or saved .txt params                     │
│   ↓                                                  │
│ MQ Extraction: FFT → Peak Tracking → Bezier Fitting │
│   ↓                                                  │
│ Editing: Drag control points, adjust replicas       │
│   ↓                                                  │
│ JS Synthesizer: Preview original vs. synthesized    │
│   ↓                                                  │
│ Export: .txt params + generated .cc code            │
└─────────────────────────────────────────────────────┘
                         ↓
┌─────────────────────────────────────────────────────┐
│ C++ Demo (src/audio/)                               │
├─────────────────────────────────────────────────────┤
│ Build: .txt → generated .cc (MQSample structs)      │
│   ↓                                                  │
│ Synthesis: Bake PCM at init (CPU, future GPU)       │
│   ↓                                                  │
│ AudioEngine: Register as sample asset               │
│   ↓                                                  │
│ Tracker: Trigger via patterns (future modulation)   │
└─────────────────────────────────────────────────────┘
```

---

## Data Model

### Per-Partial Representation

Each sinusoidal partial stores:

```
Partial {
  freq_curve: CubicBezier    // Frequency trajectory (Hz vs. seconds)
  amp_curve: CubicBezier     // Amplitude envelope (0-1 vs. seconds)
  replicas: ReplicaConfig    // Harmonic/inharmonic copies
}

CubicBezier {
  (t0, v0), (t1, v1), (t2, v2), (t3, v3)  // 4 control points
}

ReplicaConfig {
  offsets: [ratio1, ratio2, ...]          // Frequency ratios (1.0, 2.01, 0.5, ...)
  decay_alpha: float                      // Amplitude decay: exp(-α·|f-f₀|)
  jitter: float [0-1]                     // Phase randomization amount
  spread_above: float [0-1]               // Frequency spread +% of f₀
  spread_below: float [0-1]               // Frequency spread -% of f₀
  bandwidth: float [0-1]                  // Noise bandwidth ±% of f
}
```

### Text Format (.txt)

Stored in `workspaces/main/mq_samples/`:

```
# MQ Sample: drum_kick.txt
sample_rate 32000
duration 1.5

# Global defaults (optional, can override per partial)
replica_defaults
  decay_alpha 0.1
  jitter 0.05
  spread_above 0.02
  spread_below 0.02
  bandwidth 0.01
end

# Partial 0: fundamental
partial
  # Frequency bezier (seconds, Hz): t0 f0 t1 f1 t2 f2 t3 f3
  freq_curve 0.0 60.0 0.2 58.0 0.8 55.0 1.5 50.0

  # Amplitude bezier (seconds, 0-1): t0 a0 t1 a1 t2 a2 t3 a3
  amp_curve 0.0 0.0 0.05 1.0 0.5 0.3 1.5 0.0

  # Replica frequency ratios
  replicas 1.0 2.01 3.03

  # Override defaults (optional)
  decay_alpha 0.15
  jitter 0.08
  spread_above 0.03
  spread_below 0.01
  bandwidth 0.02
end

# Partial 1: overtone
partial
  freq_curve 0.0 180.0 0.2 178.0 0.8 175.0 1.5 170.0
  amp_curve 0.0 0.0 0.05 0.6 0.5 0.2 1.5 0.0
  replicas 1.0 1.99
end
```

### Generated C++ Code

Stored in `src/generated/mq_<name>.cc`:

```cpp
// Auto-generated from mq_samples/drum_kick.txt
// DO NOT EDIT

struct MQBezier {
  float t0, v0, t1, v1, t2, v2, t3, v3;
};

struct MQPartial {
  MQBezier freq;
  MQBezier amp;
  const float* replicas;
  int num_replicas;
  float decay_alpha;
  float jitter;
  float spread_above;
  float spread_below;
  float bandwidth;
};

static const float drum_kick_replicas_0[] = {1.0f, 2.01f, 3.03f};
static const float drum_kick_replicas_1[] = {1.0f, 1.99f};

static const MQPartial drum_kick_partials[] = {
  {
    {0.0f, 60.0f, 0.2f, 58.0f, 0.8f, 55.0f, 1.5f, 50.0f},
    {0.0f, 0.0f, 0.05f, 1.0f, 0.5f, 0.3f, 1.5f, 0.0f},
    drum_kick_replicas_0, 3,
    0.15f, 0.08f, 0.03f, 0.01f, 0.02f
  },
  {
    {0.0f, 180.0f, 0.2f, 178.0f, 0.8f, 175.0f, 1.5f, 170.0f},
    {0.0f, 0.0f, 0.05f, 0.6f, 0.5f, 0.2f, 1.5f, 0.0f},
    drum_kick_replicas_1, 2,
    0.1f, 0.05f, 0.02f, 0.02f, 0.01f
  }
};

struct MQSample {
  int sample_rate;
  float duration;
  const MQPartial* partials;
  int num_partials;
};

const MQSample ASSET_MQ_DRUM_KICK = {
  32000, 1.5f, drum_kick_partials, 2
};
```

---

## McAulay-Quatieri Algorithm

### Phase 1: Peak Detection

STFT with overlapping windows:

```
For each frame (hop = 512 samples):
  1. FFT (size = 2048)
  2. Magnitude spectrum |X[k]|
  3. Detect peaks: local maxima above threshold
  4. Extract (frequency, amplitude, phase) via parabolic interpolation
```

**Parameters:**
- `fft_size`: 2048 (adjustable 1024-4096)
- `hop_size`: 512 (75% overlap)
- `peak_threshold`: -60 dB (adjustable)

### Phase 2: Trajectory Tracking

Link peaks across frames into continuous partials:

```
Birth/Death/Continuation model:
  - Match peak to existing partial if |f_new - f_old| < threshold
  - Birth new partial if unmatched peak persists 2+ frames
  - Death partial if no match for 2+ frames
```

**Tracking threshold:** 50 Hz (adjustable)

### Phase 3: Bezier Curve Fitting

Fit cubic bezier to each partial's trajectory:

```
Input: [(t1, f1), (t2, f2), ..., (tN, fN)]
Output: 4 control points minimizing least-squares error

Algorithm:
  1. Fix endpoints: (t0, f0) = first, (t3, f3) = last
  2. Solve for (t1, f1), (t2, f2) via linear regression
  3. Repeat for amplitude trajectory
```

**Error threshold:** Auto-fit to minimize control points (future: user-adjustable simplification)

---

## Synthesis Model

### Replica Oscillator Bank

For each partial at time `t`:

```python
# Evaluate bezier curves
f0 = eval_bezier(partial.freq_curve, t)
A0 = eval_bezier(partial.amp_curve, t)

# For each replica offset ratio
for ratio in partial.replicas:
    # Frequency spread (asymmetric randomization)
    spread = random.uniform(-partial.spread_below, +partial.spread_above)
    f = f0 * ratio * (1.0 + spread)

    # Amplitude decay
    A = A0 * exp(-partial.decay_alpha * abs(f - f0))

    # Phase (non-deterministic, seeded by frame counter)
    phase = 2*pi*f*t + partial.jitter * random.uniform(0, 2*pi)

    # Base sinusoid
    sample += A * sin(phase)

    # Bandwidth-enhanced noise (optional)
    if partial.bandwidth > 0:
        noise_bw = f * partial.bandwidth
        sample += A * bandlimited_noise(f - noise_bw, f + noise_bw)
```

### Bezier Evaluation (Cubic)

De Casteljau's algorithm:

```cpp
float eval_bezier(const MQBezier& b, float t) {
  // Normalize t to [0, 1]
  float u = (t - b.t0) / (b.t3 - b.t0);
  u = clamp(u, 0.0f, 1.0f);

  // Cubic interpolation
  float u1 = 1.0f - u;
  return u1*u1*u1 * b.v0 +
         3*u1*u1*u * b.v1 +
         3*u1*u*u * b.v2 +
         u*u*u * b.v3;
}
```

### Baking Process (C++)

```cpp
// At audio_init() time
void synth_bake_mq(const MQSample& sample, std::vector<float>& pcm_out) {
  int num_samples = sample.sample_rate * sample.duration;
  pcm_out.resize(num_samples);

  for (int i = 0; i < num_samples; ++i) {
    float t = (float)i / sample.sample_rate;
    float sample_val = 0.0f;

    for (int p = 0; p < sample.num_partials; ++p) {
      const MQPartial& partial = sample.partials[p];
      float f0 = eval_bezier(partial.freq, t);
      float A0 = eval_bezier(partial.amp, t);

      for (int r = 0; r < partial.num_replicas; ++r) {
        float ratio = partial.replicas[r];

        // Frequency spread
        uint32_t seed = i * 12345 + p * 67890 + r;
        float spread = rand_float(seed, -partial.spread_below, partial.spread_above);
        float f = f0 * ratio * (1.0f + spread);

        // Amplitude decay
        float A = A0 * expf(-partial.decay_alpha * fabsf(f - f0));

        // Phase jitter
        float jitter = rand_float(seed + 1, 0.0f, 1.0f) * partial.jitter;
        float phase = 2.0f * M_PI * f * t + jitter * 2.0f * M_PI;

        sample_val += A * sinf(phase);

        // TODO: bandwidth-enhanced noise
      }
    }

    pcm_out[i] = sample_val;
  }
}
```

---

## Web Editor

### UI Layout

```
┌─────────────────────────────────────────────────────┐
│ [Load WAV] [Load .txt] [Save .txt] [Export C++]    │
├─────────────────────────────────────────────────────┤
│ MQ Extraction Params:                               │
│   FFT Size: [2048▼]  Hop: [512]  Threshold: [-60dB]│
│   [Extract Partials] [Re-extract]                   │
├─────────────────────────────────────────────────────┤
│ ┌─────────────────────────────────────────────────┐ │
│ │                                                 │ │
│ │  Time-Frequency Canvas                          │ │
│ │  - Spectrogram background                       │ │
│ │  - Bezier curves (colored per partial)          │ │
│ │  - Draggable control points (circles)           │ │
│ │                                                 │ │
│ └─────────────────────────────────────────────────┘ │
├─────────────────────────────────────────────────────┤
│ Selected Partial: [0▼]  [Add Point] [Remove Point] │
│   Replicas: [1.0, 2.01, 3.03] [Edit]               │
│   Decay α: [0.15]  Jitter: [0.08]                  │
│   Spread+: [3%]  Spread-: [1%]  Bandwidth: [2%]    │
├─────────────────────────────────────────────────────┤
│ Playback: [▶ Original] [▶ Synthesized] [▶ Both]    │
│ Time: [━━━━━━━━━━━━━━━━━━━━━━━] 0.0s / 1.5s        │
└─────────────────────────────────────────────────────┘
```

### Features

**Phase 1 (Extraction):**
- Load WAV, run MQ algorithm, visualize partials
- Real-time parameter adjustment (FFT size, threshold, tracking)

**Phase 2 (Synthesis Preview):**
- JS implementation of full synthesis pipeline
- Playback original vs. synthesized audio (Web Audio API)

**Phase 3 (Editing):**
- Drag control points to adjust curves
- Add/remove control points (future: auto-simplification)
- Per-partial replica configuration

**Phase 4 (Export):**
- Save `.txt` format (human-readable)
- Generate C++ code (copy-paste or auto-commit)

---

## C++ Integration

### File Organization

```
workspaces/main/
  mq_samples/
    drum_kick.txt
    piano_c4.txt
    synth_pad.txt

src/generated/
  mq_drum_kick.cc    # Auto-generated
  mq_piano_c4.cc
  mq_synth_pad.cc

src/audio/
  mq_synth.h         # Bezier eval, baking API
  mq_synth.cc
```

### Asset Registration

Add to `workspaces/main/assets.txt`:

```
MQ_DRUM_KICK, NONE, mq_samples/drum_kick.txt, "MQ kick drum"
```

Build system:
1. Detect `.txt` changes → trigger code generator
2. Compile generated `.cc` → link into demo
3. `ASSET_MQ_DRUM_KICK` available in code

### Tracker Integration

```cpp
// Register MQ samples at init
void audio_init() {
  synth_register_mq_sample(SAMPLE_ID_KICK, &ASSET_MQ_DRUM_KICK);
  synth_register_mq_sample(SAMPLE_ID_PIANO, &ASSET_MQ_PIANO_C4);
}

// Trigger from pattern
void pattern_callback(int sample_id, float volume) {
  synth_trigger_mq(sample_id, volume);
  // Future: pitch modulation, time stretch
}
```

---

## Implementation Roadmap

### Phase 1: MQ Extraction (Web)
**Goal:** Load WAV → Extract partials → Visualize trajectories
**Deliverables:**
- `tools/mq_editor/index.html` (basic UI)
- `tools/mq_editor/mq_extract.js` (FFT + peak tracking + bezier fitting)
- `tools/mq_editor/render.js` (canvas visualization)

**Timeline:** 1-2 weeks

### Phase 2: JS Synthesizer
**Goal:** Preview synthesized audio in browser
**Deliverables:**
- `tools/mq_editor/mq_synth.js` (replica oscillator bank)
- Web Audio API integration (playback comparison)

**Timeline:** 1 week

### Phase 3: Web Editor UI
**Goal:** Full editing workflow
**Deliverables:**
- Draggable control points (canvas interaction)
- Per-partial replica sliders
- Save/load `.txt` format

**Timeline:** 1-2 weeks

### Phase 4: C++ Code Generator
**Goal:** `.txt` → generated `.cc` code
**Deliverables:**
- `tools/mq_codegen.py` (parser + C++ emitter)
- Build system integration (CMake hook)

**Timeline:** 3-5 days

### Phase 5: C++ Synthesis
**Goal:** Bake PCM at demo init
**Deliverables:**
- `src/audio/mq_synth.{h,cc}` (bezier eval, oscillator bank)
- Integration with AudioEngine/tracker

**Timeline:** 1 week

### Phase 6: Optimization
**Goal:** GPU baking, quantization, size reduction
**Deliverables:**
- Compute shader for parallel synthesis
- Quantized bezier control points (f16 or i16)
- Curve simplification algorithm

**Timeline:** 2-3 weeks (future work)

---

## Future Enhancements

### Short-Term (Post-MVP)
- **Pitch modulation:** `synth_trigger_mq(sample_id, volume, pitch_ratio)`
- **Time stretch:** Adjust bezier time domain dynamically
- **Amplitude modulation:** LFO/envelope override

### Medium-Term
- **GPU synthesis:** Compute shader for baked PCM (parallel oscillators)
- **Curve simplification:** Iterative control point reduction (error tolerance)
- **Quantization:** f32 → f16/i16 control points (~50% size reduction)

### Long-Term
- **Hybrid synthesis:** MQ partials + noise residual (stochastic component)
- **Real-time synthesis:** Per-chunk fillBuffer() instead of baked PCM
- **Segmented beziers:** Multi-segment curves for complex trajectories

---

## References

- McAulay, R. J., & Quatieri, T. F. (1986). "Speech analysis/synthesis based on a sinusoidal representation." IEEE TASSP.
- Serra, X., & Smith, J. O. (1990). "Spectral modeling synthesis: A sound analysis/synthesis system based on a deterministic plus stochastic decomposition." Computer Music Journal.
- De Casteljau's algorithm: https://en.wikipedia.org/wiki/De_Casteljau%27s_algorithm

---

## Status

- [x] Design document
- [x] Phase 1: MQ extraction (Web)
  - [x] FFT-based peak detection with parabolic interpolation
  - [x] Frequency-dependent trajectory tracking (5% tolerance, candidate system)
  - [x] Cubic bezier curve fitting for freq/amp trajectories
  - [x] Spectrogram visualization with zoom/scroll/playhead
  - [x] Original WAV playback
- [ ] Phase 2: JS synthesizer
- [ ] Phase 3: Web editor UI
- [ ] Phase 4: C++ code generator
- [ ] Phase 5: C++ synthesis + integration
- [ ] Phase 6: GPU optimization