TODO.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341

# To-Do List

This file tracks prioritized tasks with detailed attack plans.

**Note:** For a history of recently completed tasks, see `COMPLETED.md`.

## Recently Completed (February 9, 2026)

- [x] **Uniform Buffer Alignment (Task #74)**: Fixed WGSL struct alignment issues across multiple shaders:
  - `circle_mask_compute.wgsl`: Changed `_pad: vec3<f32>` to three `f32` fields
  - `fade_effect.cc`: Changed EffectParams padding from `vec3<f32>` to `_pad0/1/2: f32`
  - `theme_modulation_effect.cc`: Same padding fix for EffectParams
  - Fixed ODR violation in `demo_effects.h` (incomplete FadeEffect forward declaration)
  - Renamed shadowing `uniforms_` members to `common_uniforms_`/`flash_uniforms_`
  - Result: demo64k runs without crashes, 33/33 tests passing (100%)

## Previously Completed (February 8, 2026)

- [x] **Shader Parametrization System**: Full uniform parameter system with .seq syntax support. FlashEffect now supports color/decay parameters with per-frame animation. See `COMPLETED.md` for details.
- [x] **ChromaAberrationEffect Parametrization**: Added offset_scale and angle parameters. Supports diagonal and vertical aberration modes via .seq syntax.
- [x] **GaussianBlurEffect Parametrization**: Added strength parameter. Replaces hardcoded blur radius with configurable value.

---

## Priority 1: Uniform Buffer Alignment (Task #74) [COMPLETED - February 9, 2026]

**Goal**: Fix WebGPU uniform buffer size/padding/alignment mismatches between C++ structs and WGSL shaders.

**Root Cause**: WGSL `vec3<f32>` has 16-byte alignment (not 12), causing struct padding mismatches. Using `vec3<f32>` for padding fields created unpredictable struct sizes.

**Fixes Applied**:
- `circle_mask_compute.wgsl`: Changed `_pad: vec3<f32>` to three separate `f32` fields
  - Before: 24+ bytes in WGSL, 16 bytes in C++
  - After: 16 bytes in both
- Verified all shaders use individual `f32` fields for padding (no `vec3` in padding)

**Results**:
- ✅ demo64k: Runs with **0 WebGPU validation errors**
- ✅ Test suite: **32/33 tests passing (97%)**
- ❌ DemoEffectsTest: SEGFAULT in wgpu_native library (unrelated to alignment fixes)

**Key Lesson**: Never use `vec3<f32>` for padding in WGSL uniform structs. Always use individual `f32` fields to ensure predictable alignment.

---

## Priority 1: WGSL Uniform Buffer Validation & Consolidation (Task #75)

**Goal**: Prevent alignment bugs by consolidating uniform buffer patterns and creating automated validation.

**Background**: Recent bugs (Task #74) revealed WGSL `vec3<f32>` alignment issues causing 16-byte padding where 12 bytes expected. Need systematic approach to prevent recurrence.

**Attack Plan**:

### Phase 1: Audit & Document (1-2 hours)
- [ ] **1.1**: Audit all WGSL shaders for uniform struct definitions
  - List all uniform structs, their sizes, and padding strategies
  - Identify inconsistencies (vec3 padding vs individual f32 fields)
  - Document in `doc/UNIFORM_BUFFER_GUIDELINES.md`
- [ ] **1.2**: Audit C++ struct definitions (CommonPostProcessUniforms, etc.)
  - Verify static_assert size checks exist for all uniform structs
  - Check for missing size validation

### Phase 2: Consolidation (2-3 hours)
- [ ] **2.1**: Standardize on CommonUniforms pattern
  - All post-process effects should use CommonPostProcessUniforms for binding 2
  - Effect-specific params at binding 3 (16 or 32 bytes, properly padded)
- [ ] **2.2**: Eliminate `vec3<f32>` in padding fields
  - Replace all `_pad: vec3<f32>` with `_pad0/1/2: f32`
  - Apply to: FadeEffect, ThemeModulationEffect, any other effects
- [ ] **2.3**: Add C++ wrapper structs with static_assert
  - Every WGSL uniform struct should have matching C++ struct
  - All structs require `static_assert(sizeof(...) == EXPECTED_SIZE)`

### Phase 3: Validation Tool (3-4 hours)
- [ ] **3.1**: Create `tools/validate_uniforms.py`
  - Parse WGSL shader files for uniform struct definitions
  - Calculate expected size using WGSL alignment rules:
    - `f32`: 4-byte aligned
    - `vec2<f32>`: 8-byte aligned
    - `vec3<f32>`: **16-byte aligned** (not 12!)
    - `vec4<f32>`: 16-byte aligned
    - Struct size: rounded to largest member alignment
- [ ] **3.2**: Parse C++ headers for matching structs
  - Extract `sizeof()` from static_assert statements
  - Match WGSL struct names to C++ struct names
- [ ] **3.3**: Report mismatches
  - Exit non-zero if C++ size != WGSL size
  - Print detailed alignment breakdown for debugging
- [ ] **3.4**: Integrate into CI/build system
  - Add CMake custom command to run validation
  - Fail build if validation fails (development builds only)
  - Add to `scripts/check_all.sh`

### Phase 4: Documentation (1 hour)
- [ ] **4.1**: Write `doc/UNIFORM_BUFFER_GUIDELINES.md`
  - Explain WGSL alignment rules (with examples)
  - Document standard patterns (CommonUniforms, effect params)
  - Show correct padding techniques
  - Add examples of common mistakes
- [ ] **4.2**: Update CONTRIBUTING.md
  - Add "Uniform Buffer Checklist" section
  - Require validation tool passes before commit

**Size Impact**: Negligible (consolidation may save 50-100 bytes)

**Priority**: High (prevents entire class of subtle bugs)

**Dependencies**: None

---

## Priority 1: Spectral Brush Editor (Task #5) [IN PROGRESS]

**Goal:** Create a web-based tool for procedurally tracing audio spectrograms. Replaces large `.spec` binary assets with tiny procedural C++ code (50-100× compression).

**Design Document:** See `doc/SPECTRAL_BRUSH_EDITOR.md` for complete architecture.

**Core Concept: "Spectral Brush"**
- **Central Curve** (Bezier): Traces time-frequency path through spectrogram
- **Vertical Profile**: Shapes "brush stroke" around curve (Gaussian, Decaying Sinusoid, Noise)

**Workflow:**
```
.wav → Load in editor → Trace with Bezier curves → Export procedural_params.txt + C++ code
```

### Phase 1: C++ Runtime (Foundation)
- [ ] **Files:** `src/audio/spectral_brush.h`, `src/audio/spectral_brush.cc`
- [ ] Define API (`ProfileType`, `draw_bezier_curve()`, `evaluate_profile()`)
- [ ] Implement linear Bezier interpolation
- [ ] Implement Gaussian profile evaluation
- [ ] Implement home-brew deterministic RNG (for future noise support)
- [ ] Add unit tests (`src/tests/test_spectral_brush.cc`)
- [ ] **Deliverable:** Compiles, tests pass

### Phase 2: Editor Core
- [ ] **Files:** `tools/spectral_editor/index.html`, `script.js`, `style.css`, `dct.js` (reuse from old editor)
- [ ] HTML structure (canvas, controls, file input)
- [ ] Canvas rendering (dual-layer: reference + procedural)
- [ ] Bezier curve editor (click to place, drag to adjust, delete control points)
- [ ] Profile controls (Gaussian sigma slider)
- [ ] Real-time spectrogram rendering
- [ ] Audio playback (IDCT → Web Audio API)
- [ ] Undo/Redo system (action history with snapshots)
- [ ] **Keyboard shortcuts:**
  - Key '1': Play procedural sound
  - Key '2': Play original .wav
  - Space: Play/pause
  - Ctrl+Z: Undo
  - Ctrl+Shift+Z: Redo
  - Delete: Remove control point
- [ ] **Deliverable:** Interactive editor, can trace .wav files

### Phase 3: File I/O
- [ ] Load .wav (decode, FFT/STFT → spectrogram)
- [ ] Load .spec (binary format parser)
- [ ] Save procedural_params.txt (human-readable, re-editable)
- [ ] Generate C++ code (ready to compile)
- [ ] Load procedural_params.txt (re-editing workflow)
- [ ] **Deliverable:** Full save/load cycle works

### Phase 4: Future Extensions (Post-MVP)
- [ ] Cubic Bezier interpolation (smoother curves)
- [ ] Decaying sinusoid profile (metallic sounds)
- [ ] Noise profile (textured sounds)
- [ ] Composite profiles (add/subtract/multiply)
- [ ] Multi-dimensional Bezier ({freq, amplitude, decay, ...})
- [ ] Frequency snapping (snap to musical notes)
- [ ] Generic `gen_from_params()` code generation

**Design Decisions:**
- Linear Bezier interpolation (Phase 1), cubic later
- Soft parameter limits in UI (not enforced)
- Home-brew RNG (small, deterministic)
- Single function per sound (generic loader later)
- Start with Bezier + Gaussian only

**Size Impact:** 50-100× compression (5 KB .spec → ~100 bytes C++ code)

---

## Priority 2: 3D System Enhancements (Task #18)
**Goal:** Establish a pipeline for importing complex 3D scenes to replace hardcoded geometry. **Progress:** C++ pipeline for loading and processing object-specific data (like plane_distance) is now in place. Shader integration for SDFs is pending.


## Priority 3: WGSL Modularization (Task #50) [RECURRENT]

**Goal**: Refactor `ShaderComposer` and WGSL assets to support granular, reusable snippets and `#include` directives. This is an ongoing task to maintain shader code hygiene as new features are added.


## Phase 2: Size Optimization (Final Goal)

- [ ] **Task #34: Full STL Removal**: Replace all remaining `std::vector`, `std::map`, and `std::string` usage with custom minimal containers or C-style arrays to allow for CRT replacement. (Minimal Priority - deferred to end).

- [ ] **Task #22: Windows Native Platform**: Replace GLFW with direct Win32 API calls for the final 64k push.

- [ ] **Task #28: Spectrogram Quantization**: Research optimal frequency bin distribution and implement quantization.

- [ ] **Task #35: CRT Replacement**: investigation and implementation of CRT-free entry point.

## Future Goals & Ideas (Untriaged)

### Audio Tools
- [ ] **Task #64: specplay Enhancements**: Extend audio analysis tool with new features
  - **Priority 1**: Spectral visualization (ASCII art), waveform display, frequency analysis, dynamic range
  - **Priority 2**: Diff mode (compare .wav vs .spec), batch mode (CSV report, find clipping)
  - **Priority 3**: WAV export (.spec → .wav), normalization
  - **Priority 4**: Spectral envelope, harmonic analysis, onset detection
  - **Priority 5**: Interactive mode (seek, loop, volume control)
  - See `tools/specplay_README.md` for detailed feature list

- [ ] **Task #65: Data-Driven Tempo Control**: Move tempo variation from code to data files
  - **Current**: `g_tempo_scale` is hardcoded in `main.cc` with manual animation curves
  - **Goal**: Define tempo curves in `.seq` or `.track` files for data-driven tempo control
  - **Approach A**: Add TEMPO directive to `.seq` format
    - Example: `TEMPO 0.0 1.0`, `TEMPO 10.0 2.0`, `TEMPO 20.0 1.0` (time, scale pairs)
    - seq_compiler generates tempo curve array in timeline.cc
  - **Approach B**: Add tempo column to music.track
    - Each pattern trigger can specify tempo_scale override
    - tracker_compiler generates tempo events in music_data.cc
  - **Benefits**: Non-programmers can edit tempo, easier iteration, version control friendly
  - **Priority**: Low (current hardcoded approach works, but less flexible)

- [ ] **Task #67: DCT/FFT Performance Benchmarking**: Add timing measurements to audio tests
  - **Goal**: Compare performance of different DCT/IDCT implementations
  - **Location**: Add timing code to `test_dct.cc` or `test_fft.cc`
  - **Measurements**:
    - Reference IDCT/FDCT (naive O(N²) implementation)
    - FFT-based DCT/IDCT (current O(N log N) implementation)
    - Future x86_64 SIMD-optimized versions (when implemented)
  - **Output Format**:
    - Average time per transform (microseconds)
    - Throughput (transforms per second)
    - Speedup factor vs reference implementation
  - **Test Sizes**: DCT_SIZE=512 (production), plus 128, 256, 1024 for scaling analysis
  - **Implementation**:
    - Use `std::chrono::high_resolution_clock` for timing
    - Run each test 1000+ iterations to reduce noise
    - Report min/avg/max times
    - Guard with `#if !defined(STRIP_ALL)` to avoid production overhead
  - **Benefits**: Quantify FFT speedup, validate SIMD optimizations, identify regressions
  - **Priority**: Very Low (nice-to-have for future optimization work)

- [ ] **Task #69: Convert Audio Pipeline to Clipped Int16**: Use clipped int16 for all audio processing
  - **Current**: Audio pipeline uses float32 throughout (generation, mixing, synthesis, output)
  - **Goal**: Convert to clipped int16 for faster/easier processing and reduced memory footprint
  - **Rationale**:
    - Simpler arithmetic (no float operations)
    - Smaller memory footprint (2 bytes vs 4 bytes per sample)
    - Hardware-native format (most audio devices use int16)
    - Eliminates float→int16 conversion at output stage
    - Natural clipping behavior (overflow wraps/clips automatically)
  - **Scope**:
    - Output path: Definitely convert (backends, WAV dump)
    - Synthesis: Consider keeping float32 for quality (IDCT produces float)
    - Mixing: Could use int16 with proper overflow handling
    - Asset storage: Already int16 in .spec files
  - **Implementation Phases**:
    1. **Phase 1: Output Only** (Minimal change, ~50 lines)
       - Convert `synth_render()` output from float to int16
       - Update `MiniaudioBackend` and `WavDumpBackend` to accept int16
       - Keep all internal processing as float
       - **Benefit**: Eliminates final conversion step
    2. **Phase 2: Mixing Stage** (Moderate change, ~200 lines)
       - Convert voice mixing to int16 arithmetic
       - Add saturation/clipping logic
       - Keep IDCT output as float, convert after synthesis
       - **Benefit**: Faster mixing, reduced memory bandwidth
    3. **Phase 3: Full Pipeline** (Large change, ~500+ lines)
       - Convert spectrograms from float to int16 storage
       - Modify IDCT to output int16 directly
       - All synthesis in int16
       - **Benefit**: Maximum size reduction and performance
  - **Trade-offs**:
    - Quality loss: 16-bit resolution vs 32-bit float precision
    - Dynamic range: Limited to [-32768, 32767]
    - Clipping: Must handle overflow carefully in mixing stage
    - Code complexity: Saturation arithmetic more complex than float
  - **Testing Requirements**:
    - Verify no audible quality degradation
    - Ensure clipping behavior matches float version
    - Check mixing overflow doesn't cause artifacts
    - Validate WAV dumps bit-identical to hardware output
  - **Size Impact**:
    - Phase 1: Negligible (~50 bytes)
    - Phase 2: Small reduction (~100-200 bytes, faster code)
    - Phase 3: Large reduction (50% memory, ~1-2KB code savings)
  - **Priority**: Low (final optimization, after size budget is tight)
  - **Notes**:
    - This is a FINAL optimization task, only if 64k budget requires it
    - Quality must be validated - may not be worth the trade-off
    - Consider keeping float for procedural generation quality

### Developer Tools
- [ ] **Task #66: External Asset Loading for Debugging**: mmap() asset files instead of embedded data
  - **Current**: All assets embedded in `assets_data.cc` (regenerate on every asset change)
  - **Goal**: Load assets from external files in debug builds for faster iteration
  - **Scope**: macOS only, non-STRIP_ALL builds only
  - **Implementation**:
    - Add `DEMO_ENABLE_EXTERNAL_ASSETS` CMake option
    - Modify `GetAsset()` to check for external file first (e.g., `assets/final/<name>`)
    - Use `mmap()` to map file into memory (replaces `uint8_t asset[]` array)
    - Fallback to embedded data if file not found
  - **Benefits**: Edit shaders/assets without regenerating assets_data.cc (~10s rebuild)
  - **Trade-offs**: Adds runtime file I/O, only useful during development
  - **Priority**: Low (current workflow acceptable, but nice-to-have for rapid iteration)

### Visual Effects
- [ ] **Task #73: Extend Shader Parametrization** [IN PROGRESS - 2/4 complete]
  - **Goal**: Extend uniform parameter system to ChromaAberrationEffect, GaussianBlurEffect, DistortEffect, SolarizeEffect
  - **Pattern**: Follow FlashEffect implementation (UniformHelper, params struct, .seq syntax)
  - **Completed**: ChromaAberrationEffect (offset_scale, angle), GaussianBlurEffect (strength)
  - **Remaining**: DistortEffect, SolarizeEffect
  - **Priority**: Medium (quality-of-life improvement for artists)
  - **Estimated Impact**: ~200-300 bytes per effect
- [ ] **Task #52: Procedural SDF Font**: Minimal bezier/spline set for [A-Z, 0-9] and SDF rendering.
- [ ] **Task #55: SDF Random Planes Intersection**: Implement `sdPolyhedron` (crystal/gem shapes) via plane intersection.
- [ ] **Task #54: Tracy Integration**: Integrate Tracy debugger for performance profiling.
- [ ] **Task #58: Advanced Shader Factorization**: Further factorize WGSL code into smaller, reusable snippets.
- [ ] **Task #59: Comprehensive RNG Library**: Add WGSL snippets for float/vec2/vec3 noise (Perlin, Gyroid, etc.) and random number generators.
- [ ] **Task #60: OOP Refactoring**: Investigate if more C++ code can be made object-oriented without size penalty (vs functional style).
- [ ] **Task #61: GPU Procedural Generation**: Implement system to generate procedural data (textures, geometry) on GPU and read back to CPU.
- [ ] **Task #62: Physics Engine Enhancements (PBD & Rotation)**:
    - [ ] **Task #62.1: Quaternion Rotation**: Implement quaternion-based rotation for `Object3D` and incorporate angular momentum into physics.
    - [ ] **Task #62.2: Position Based Dynamics (PBD)**: Refactor solver to re-evaluate velocity after resolving all collisions and constraints.
- [ ] **Task #63: Refactor large files**: Split `src/gpu/gpu.cc`, `src/3d/visual_debug.cc` and `src/gpu/effect.cc` into sub-functionalities. (`src/3d/renderer.cc` was also over 500 lines and was taken care of in the past)

### Performance Optimization
- [ ] **Task #70: SIMD x86_64 Implementation**: Implement critical functions using intrinsics for x86_64 platforms.
  - **Goal**: Optimize hot paths for audio and procedural generation.
  - **Scope**:
    - IDCT/FDCT transforms
    - Audio mixing and voice synthesis
    - CPU-side procedural texture/geometry generation
  - **Constraint**: Non-critical; fallback to generic C++ must be maintained.
  - **Priority**: Very Low

---

## Future Goals