TODO.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277

# To-Do List

This file tracks prioritized tasks with detailed attack plans.

**Note:** For a history of recently completed tasks, see `COMPLETED.md`.

## Priority 1: Audio Pipeline Simplification & Jitter Fix (Task #71) [COMPLETED]

**Goal**: Address audio jittering in the miniaudio backend and simplify the entire audio pipeline (Synth, Tracker, AudioEngine, AudioBackend) for better maintainability and performance.

**Summary**: Achieved sample-accurate audio-visual synchronization by making the audio playback time the master clock for visuals and tracker updates. Eliminated jitter by using a stable audio clock for scheduling. See HANDOFF_2026-02-07_Final.md for details.

### Phase 1: Jitter Analysis & Fix
- [x] **Investigate**: Deep dive into `miniaudio_backend.cc` to find the root cause of audio jitter. Analyze buffer sizes, callback timing, and thread synchronization.
- [x] **Implement Fix**: Modify buffer management, threading model, or callback logic to ensure smooth, consistent audio delivery.
- [x] **Verify**: Create a new, specific test case in `src/tests/test_audio_backend.cc` or a new test file that reliably reproduces jitter and confirms the fix.

### Phase 2: Code Simplification & Refactor
- [x] **Review Architecture**: Map out the current interactions between `Synth`, `Tracker`, `AudioEngine`, and `AudioBackend`.
- [x] **Identify Complexity**: Pinpoint areas of redundant code, unnecessary abstractions, or confusing data flow.
- [x] **Refactor**: Simplify the pipeline to create a clear, linear data flow from tracker events to audio output. Reduce dependencies and clarify ownership of resources.
- [x] **Update Documentation**: Modify `doc/HOWTO.md` and `doc/CONTRIBUTING.md` to reflect the new, simpler audio architecture.

---

## Priority 1: Spectral Brush Editor (Task #5) [IN PROGRESS]

**Goal:** Create a web-based tool for procedurally tracing audio spectrograms. Replaces large `.spec` binary assets with tiny procedural C++ code (50-100× compression).

**Design Document:** See `doc/SPECTRAL_BRUSH_EDITOR.md` for complete architecture.

**Core Concept: "Spectral Brush"**
- **Central Curve** (Bezier): Traces time-frequency path through spectrogram
- **Vertical Profile**: Shapes "brush stroke" around curve (Gaussian, Decaying Sinusoid, Noise)

**Workflow:**
```
.wav → Load in editor → Trace with Bezier curves → Export procedural_params.txt + C++ code
```

### Phase 1: C++ Runtime (Foundation)
- [ ] **Files:** `src/audio/spectral_brush.h`, `src/audio/spectral_brush.cc`
- [ ] Define API (`ProfileType`, `draw_bezier_curve()`, `evaluate_profile()`)
- [ ] Implement linear Bezier interpolation
- [ ] Implement Gaussian profile evaluation
- [ ] Implement home-brew deterministic RNG (for future noise support)
- [ ] Add unit tests (`src/tests/test_spectral_brush.cc`)
- [ ] **Deliverable:** Compiles, tests pass

### Phase 2: Editor Core
- [ ] **Files:** `tools/spectral_editor/index.html`, `script.js`, `style.css`, `dct.js` (reuse from old editor)
- [ ] HTML structure (canvas, controls, file input)
- [ ] Canvas rendering (dual-layer: reference + procedural)
- [ ] Bezier curve editor (click to place, drag to adjust, delete control points)
- [ ] Profile controls (Gaussian sigma slider)
- [ ] Real-time spectrogram rendering
- [ ] Audio playback (IDCT → Web Audio API)
- [ ] Undo/Redo system (action history with snapshots)
- [ ] **Keyboard shortcuts:**
  - Key '1': Play procedural sound
  - Key '2': Play original .wav
  - Space: Play/pause
  - Ctrl+Z: Undo
  - Ctrl+Shift+Z: Redo
  - Delete: Remove control point
- [ ] **Deliverable:** Interactive editor, can trace .wav files

### Phase 3: File I/O
- [ ] Load .wav (decode, FFT/STFT → spectrogram)
- [ ] Load .spec (binary format parser)
- [ ] Save procedural_params.txt (human-readable, re-editable)
- [ ] Generate C++ code (ready to compile)
- [ ] Load procedural_params.txt (re-editing workflow)
- [ ] **Deliverable:** Full save/load cycle works

### Phase 4: Future Extensions (Post-MVP)
- [ ] Cubic Bezier interpolation (smoother curves)
- [ ] Decaying sinusoid profile (metallic sounds)
- [ ] Noise profile (textured sounds)
- [ ] Composite profiles (add/subtract/multiply)
- [ ] Multi-dimensional Bezier ({freq, amplitude, decay, ...})
- [ ] Frequency snapping (snap to musical notes)
- [ ] Generic `gen_from_params()` code generation

**Design Decisions:**
- Linear Bezier interpolation (Phase 1), cubic later
- Soft parameter limits in UI (not enforced)
- Home-brew RNG (small, deterministic)
- Single function per sound (generic loader later)
- Start with Bezier + Gaussian only

**Size Impact:** 50-100× compression (5 KB .spec → ~100 bytes C++ code)

---

## Priority 2: Audio Pipeline Streamlining (Task #72) [COMPLETED - February 8, 2026]

**Goal**: Optimize the audio pipeline to reduce memory copies and simplify the data flow by using direct additive mixing and deferred clipping.

- [x] **Phase 1: Direct Additive Mixing**
  - Added `get_write_region()` / `commit_write()` API to ring buffer
  - Refactored `audio_render_ahead()` to write directly to ring buffer
  - Eliminated temporary buffer allocations (zero heap allocations per frame)
  - Removed one memory copy operation (temp → ring buffer)
- [x] **Phase 2: Float32 Internal Pipeline**
  - Verified entire pipeline maintains float32 precision (no changes needed)
- [x] **Phase 3: Final Clipping & Conversion**
  - Implemented in-place clipping in `audio_render_ahead()` (clamps to [-1.0, 1.0])
  - Applied to both primary and wrap-around render paths
- [x] **Phase 4: Verification**
  - All 31 tests pass ✅
  - WAV dump test confirms no clipping detected
  - Binary size: 5.0M stripped (expected -150 to -300 bytes from eliminating new/delete)
  - Zero audio quality regressions

**Files Modified:**
- `src/audio/ring_buffer.h` - Added two-phase write API
- `src/audio/ring_buffer.cc` - Implemented get_write_region() / commit_write()
- `src/audio/audio.cc` - Refactored audio_render_ahead() for direct writes + clipping

**See:** `/Users/skal/.claude/plans/fizzy-strolling-rossum.md` for detailed implementation plan

---

## Priority 2: 3D System Enhancements (Task #18)
**Goal:** Establish a pipeline for importing complex 3D scenes to replace hardcoded geometry.


## Priority 3: WGSL Modularization (Task #50) [RECURRENT]

**Goal**: Refactor `ShaderComposer` and WGSL assets to support granular, reusable snippets and `#include` directives. This is an ongoing task to maintain shader code hygiene as new features are added.


## Phase 2: Size Optimization (Final Goal)

- [ ] **Task #34: Full STL Removal**: Replace all remaining `std::vector`, `std::map`, and `std::string` usage with custom minimal containers or C-style arrays to allow for CRT replacement. (Minimal Priority - deferred to end).

- [ ] **Task #22: Windows Native Platform**: Replace GLFW with direct Win32 API calls for the final 64k push.

- [ ] **Task #28: Spectrogram Quantization**: Research optimal frequency bin distribution and implement quantization.

- [ ] **Task #35: CRT Replacement**: investigation and implementation of CRT-free entry point.

## Future Goals & Ideas (Untriaged)

### Audio Tools
- [ ] **Task #64: specplay Enhancements**: Extend audio analysis tool with new features
  - **Priority 1**: Spectral visualization (ASCII art), waveform display, frequency analysis, dynamic range
  - **Priority 2**: Diff mode (compare .wav vs .spec), batch mode (CSV report, find clipping)
  - **Priority 3**: WAV export (.spec → .wav), normalization
  - **Priority 4**: Spectral envelope, harmonic analysis, onset detection
  - **Priority 5**: Interactive mode (seek, loop, volume control)
  - See `tools/specplay_README.md` for detailed feature list

- [ ] **Task #65: Data-Driven Tempo Control**: Move tempo variation from code to data files
  - **Current**: `g_tempo_scale` is hardcoded in `main.cc` with manual animation curves
  - **Goal**: Define tempo curves in `.seq` or `.track` files for data-driven tempo control
  - **Approach A**: Add TEMPO directive to `.seq` format
    - Example: `TEMPO 0.0 1.0`, `TEMPO 10.0 2.0`, `TEMPO 20.0 1.0` (time, scale pairs)
    - seq_compiler generates tempo curve array in timeline.cc
  - **Approach B**: Add tempo column to music.track
    - Each pattern trigger can specify tempo_scale override
    - tracker_compiler generates tempo events in music_data.cc
  - **Benefits**: Non-programmers can edit tempo, easier iteration, version control friendly
  - **Priority**: Low (current hardcoded approach works, but less flexible)

- [ ] **Task #67: DCT/FFT Performance Benchmarking**: Add timing measurements to audio tests
  - **Goal**: Compare performance of different DCT/IDCT implementations
  - **Location**: Add timing code to `test_dct.cc` or `test_fft.cc`
  - **Measurements**:
    - Reference IDCT/FDCT (naive O(N²) implementation)
    - FFT-based DCT/IDCT (current O(N log N) implementation)
    - Future x86_64 SIMD-optimized versions (when implemented)
  - **Output Format**:
    - Average time per transform (microseconds)
    - Throughput (transforms per second)
    - Speedup factor vs reference implementation
  - **Test Sizes**: DCT_SIZE=512 (production), plus 128, 256, 1024 for scaling analysis
  - **Implementation**:
    - Use `std::chrono::high_resolution_clock` for timing
    - Run each test 1000+ iterations to reduce noise
    - Report min/avg/max times
    - Guard with `#if !defined(STRIP_ALL)` to avoid production overhead
  - **Benefits**: Quantify FFT speedup, validate SIMD optimizations, identify regressions
  - **Priority**: Very Low (nice-to-have for future optimization work)

- [ ] **Task #69: Convert Audio Pipeline to Clipped Int16**: Use clipped int16 for all audio processing
  - **Current**: Audio pipeline uses float32 throughout (generation, mixing, synthesis, output)
  - **Goal**: Convert to clipped int16 for faster/easier processing and reduced memory footprint
  - **Rationale**:
    - Simpler arithmetic (no float operations)
    - Smaller memory footprint (2 bytes vs 4 bytes per sample)
    - Hardware-native format (most audio devices use int16)
    - Eliminates float→int16 conversion at output stage
    - Natural clipping behavior (overflow wraps/clips automatically)
  - **Scope**:
    - Output path: Definitely convert (backends, WAV dump)
    - Synthesis: Consider keeping float32 for quality (IDCT produces float)
    - Mixing: Could use int16 with proper overflow handling
    - Asset storage: Already int16 in .spec files
  - **Implementation Phases**:
    1. **Phase 1: Output Only** (Minimal change, ~50 lines)
       - Convert `synth_render()` output from float to int16
       - Update `MiniaudioBackend` and `WavDumpBackend` to accept int16
       - Keep all internal processing as float
       - **Benefit**: Eliminates final conversion step
    2. **Phase 2: Mixing Stage** (Moderate change, ~200 lines)
       - Convert voice mixing to int16 arithmetic
       - Add saturation/clipping logic
       - Keep IDCT output as float, convert after synthesis
       - **Benefit**: Faster mixing, reduced memory bandwidth
    3. **Phase 3: Full Pipeline** (Large change, ~500+ lines)
       - Convert spectrograms from float to int16 storage
       - Modify IDCT to output int16 directly
       - All synthesis in int16
       - **Benefit**: Maximum size reduction and performance
  - **Trade-offs**:
    - Quality loss: 16-bit resolution vs 32-bit float precision
    - Dynamic range: Limited to [-32768, 32767]
    - Clipping: Must handle overflow carefully in mixing stage
    - Code complexity: Saturation arithmetic more complex than float
  - **Testing Requirements**:
    - Verify no audible quality degradation
    - Ensure clipping behavior matches float version
    - Check mixing overflow doesn't cause artifacts
    - Validate WAV dumps bit-identical to hardware output
  - **Size Impact**:
    - Phase 1: Negligible (~50 bytes)
    - Phase 2: Small reduction (~100-200 bytes, faster code)
    - Phase 3: Large reduction (50% memory, ~1-2KB code savings)
  - **Priority**: Low (final optimization, after size budget is tight)
  - **Notes**:
    - This is a FINAL optimization task, only if 64k budget requires it
    - Quality must be validated - may not be worth the trade-off
    - Consider keeping float for procedural generation quality

### Developer Tools
- [ ] **Task #66: External Asset Loading for Debugging**: mmap() asset files instead of embedded data
  - **Current**: All assets embedded in `assets_data.cc` (regenerate on every asset change)
  - **Goal**: Load assets from external files in debug builds for faster iteration
  - **Scope**: macOS only, non-STRIP_ALL builds only
  - **Implementation**:
    - Add `DEMO_ENABLE_EXTERNAL_ASSETS` CMake option
    - Modify `GetAsset()` to check for external file first (e.g., `assets/final/<name>`)
    - Use `mmap()` to map file into memory (replaces `uint8_t asset[]` array)
    - Fallback to embedded data if file not found
  - **Benefits**: Edit shaders/assets without regenerating assets_data.cc (~10s rebuild)
  - **Trade-offs**: Adds runtime file I/O, only useful during development
  - **Priority**: Low (current workflow acceptable, but nice-to-have for rapid iteration)

### Visual Effects
- [ ] **Task #52: Procedural SDF Font**: Minimal bezier/spline set for [A-Z, 0-9] and SDF rendering.
- [ ] **Task #55: SDF Random Planes Intersection**: Implement `sdPolyhedron` (crystal/gem shapes) via plane intersection.
- [ ] **Task #54: Tracy Integration**: Integrate Tracy debugger for performance profiling.
- [ ] **Task #58: Advanced Shader Factorization**: Further factorize WGSL code into smaller, reusable snippets.
- [ ] **Task #59: Comprehensive RNG Library**: Add WGSL snippets for float/vec2/vec3 noise (Perlin, Gyroid, etc.) and random number generators.
- [ ] **Task #60: OOP Refactoring**: Investigate if more C++ code can be made object-oriented without size penalty (vs functional style).
- [ ] **Task #61: GPU Procedural Generation**: Implement system to generate procedural data (textures, geometry) on GPU and read back to CPU.
- [ ] **Task #62: Physics Engine Enhancements (PBD & Rotation)**:
    - [ ] **Task #62.1: Quaternion Rotation**: Implement quaternion-based rotation for `Object3D` and incorporate angular momentum into physics.
    - [ ] **Task #62.2: Position Based Dynamics (PBD)**: Refactor solver to re-evaluate velocity after resolving all collisions and constraints.
- [ ] **Task #63: Refactor large files**: Split `src/gpu/gpu.cc`, `src/3d/visual_debug.cc` and `src/gpu/effect.cc` into sub-functionalities. (`src/3d/renderer.cc` was also over 500 lines and was taken care of in the past)

### Performance Optimization
- [ ] **Task #70: SIMD x86_64 Implementation**: Implement critical functions using intrinsics for x86_64 platforms.
  - **Goal**: Optimize hot paths for audio and procedural generation.
  - **Scope**:
    - IDCT/FDCT transforms
    - Audio mixing and voice synthesis
    - CPU-side procedural texture/geometry generation
  - **Constraint**: Non-critical; fallback to generic C++ must be maintained.
  - **Priority**: Very Low

---

## Future Goals