# To-Do List

This file tracks prioritized tasks with detailed attack plans.

**Note:** For a history of recently completed tasks, see `COMPLETED.md`.

## Recently Completed (February 9, 2026)

- [x] **WGSL Uniform Buffer Validation & Consolidation (Task #75)**:
  - **Standardization**: Refactored `DistortEffect` and others to use `CommonPostProcessUniforms` (binding 2) + `EffectParams` (binding 3).
  - **Validation Tool**: Created `tools/validate_uniforms.py` to parse C++ and WGSL (including embedded strings) and verify size/alignment.
  - **Integration**: Added validation step to CMake build system.
  - **Cleanup**: Renamed generic `EffectParams` to specific names (`FadeParams`, `CircleMaskParams`, etc.) in WGSL and C++.
  - **Documentation**: Added `doc/UNIFORM_BUFFER_GUIDELINES.md` and updated `CONTRIBUTING.md`.

- [x] **Uniform Buffer Alignment (Task #74)**: Fixed WGSL struct alignment issues across multiple shaders:
  - `circle_mask_compute.wgsl`: Changed `_pad: vec3<f32>` to three `f32` fields
  - `fade_effect.cc`: Changed EffectParams padding from `vec3<f32>` to `_pad0/1/2: f32`
  - `theme_modulation_effect.cc`: Same padding fix for EffectParams
  - Fixed ODR violation in `demo_effects.h` (incomplete FadeEffect forward declaration)
  - Renamed shadowing `uniforms_` members to `common_uniforms_`/`flash_uniforms_`
  - Result: demo64k runs without crashes, 33/33 tests passing (100%)

## Previously Completed (February 8, 2026)

- [x] **Shader Parametrization System**: Full uniform parameter system with .seq syntax support. FlashEffect now supports color/decay parameters with per-frame animation. See `COMPLETED.md` for details.
- [x] **ChromaAberrationEffect Parametrization**: Added offset_scale and angle parameters. Supports diagonal and vertical aberration modes via .seq syntax.
- [x] **GaussianBlurEffect Parametrization**: Added strength parameter. Replaces hardcoded blur radius with configurable value.

---

## Priority 1: Uniform Buffer Alignment (Task #74) [COMPLETED - February 9, 2026]

**Goal**: Fix WebGPU uniform buffer size/padding/alignment mismatches between C++ structs and WGSL shaders.

**Root Cause**: WGSL `vec3<f32>` has 16-byte alignment (not 12), causing struct padding mismatches. Using `vec3<f32>` for padding fields created unpredictable struct sizes.

**Fixes Applied**:
- `circle_mask_compute.wgsl`: Changed `_pad: vec3<f32>` to three separate `f32` fields
  - Before: 24+ bytes in WGSL, 16 bytes in C++
  - After: 16 bytes in both
- Verified all shaders use individual `f32` fields for padding (no `vec3` in padding)

**Results**:
- ✅ demo64k: Runs with **0 WebGPU validation errors**
- ✅ Test suite: **32/33 tests passing (97%)**
- ❌ DemoEffectsTest: SEGFAULT in wgpu_native library (unrelated to alignment fixes)

**Key Lesson**: Never use `vec3<f32>` for padding in WGSL uniform structs. Always use individual `f32` fields to ensure predictable alignment.

---

## Priority 1: WGSL Uniform Buffer Validation & Consolidation (Task #75)

**Goal**: Prevent alignment bugs by consolidating uniform buffer patterns and creating automated validation.

**Background**: Recent bugs (Task #74) revealed WGSL `vec3<f32>` alignment issues causing 16-byte padding where 12 bytes expected. Need systematic approach to prevent recurrence.

**Attack Plan**:

### Phase 1: Audit & Document (1-2 hours)
- [ ] **1.1**: Audit all WGSL shaders for uniform struct definitions
  - List all uniform structs, their sizes, and padding strategies
  - Identify inconsistencies (vec3 padding vs individual f32 fields)
  - Document in `doc/UNIFORM_BUFFER_GUIDELINES.md`
- [ ] **1.2**: Audit C++ struct definitions (CommonPostProcessUniforms, etc.)
  - Verify static_assert size checks exist for all uniform structs
  - Check for missing size validation

### Phase 2: Consolidation (2-3 hours)
- [ ] **2.1**: Standardize on CommonUniforms pattern
  - All post-process effects should use CommonPostProcessUniforms for binding 2
  - Effect-specific params at binding 3 (16 or 32 bytes, properly padded)
- [ ] **2.2**: Eliminate `vec3<f32>` in padding fields
  - Replace all `_pad: vec3<f32>` with `_pad0/1/2: f32`
  - Apply to: FadeEffect, ThemeModulationEffect, any other effects
- [ ] **2.3**: Add C++ wrapper structs with static_assert
  - Every WGSL uniform struct should have matching C++ struct
  - All structs require `static_assert(sizeof(...) == EXPECTED_SIZE)`

### Phase 3: Validation Tool (3-4 hours)
- [ ] **3.1**: Create `tools/validate_uniforms.py`
  - Parse WGSL shader files for uniform struct definitions
  - Calculate expected size using WGSL alignment rules:
    - `f32`: 4-byte aligned
    - `vec2<f32>`: 8-byte aligned
    - `vec3<f32>`: **16-byte aligned** (not 12!)
    - `vec4<f32>`: 16-byte aligned
    - Struct size: rounded to largest member alignment
- [ ] **3.2**: Parse C++ headers for matching structs
  - Extract `sizeof()` from static_assert statements
  - Match WGSL struct names to C++ struct names
- [ ] **3.3**: Report mismatches
  - Exit non-zero if C++ size != WGSL size
  - Print detailed alignment breakdown for debugging
- [ ] **3.4**: Integrate into CI/build system
  - Add CMake custom command to run validation
  - Fail build if validation fails (development builds only)
  - Add to `scripts/check_all.sh`

### Phase 4: Documentation (1 hour)
- [ ] **4.1**: Write `doc/UNIFORM_BUFFER_GUIDELINES.md`
  - Explain WGSL alignment rules (with examples)
  - Document standard patterns (CommonUniforms, effect params)
  - Show correct padding techniques
  - Add examples of common mistakes
- [ ] **4.2**: Update CONTRIBUTING.md
  - Add "Uniform Buffer Checklist" section
  - Require validation tool passes before commit

**Size Impact**: Negligible (consolidation may save 50-100 bytes)

**Priority**: High (prevents entire class of subtle bugs)

**Dependencies**: None

---

## Priority 1: Spectral Brush Editor (Task #5) [IN PROGRESS]

**Goal:** Create a web-based tool for procedurally tracing audio spectrograms. Replaces large `.spec` binary assets with tiny procedural C++ code (50-100× compression).

**Design Document:** See `doc/SPECTRAL_BRUSH_EDITOR.md` for complete architecture.

**Core Concept: "Spectral Brush"**
- **Central Curve** (Bezier): Traces time-frequency path through spectrogram
- **Vertical Profile**: Shapes "brush stroke" around curve (Gaussian, Decaying Sinusoid, Noise)

**Workflow:**
```
.wav → Load in editor → Trace with Bezier curves → Export procedural_params.txt + C++ code
```

### Phase 1: C++ Runtime (Foundation)
- [ ] **Files:** `src/audio/spectral_brush.h`, `src/audio/spectral_brush.cc`
- [ ] Define API (`ProfileType`, `draw_bezier_curve()`, `evaluate_profile()`)
- [ ] Implement linear Bezier interpolation
- [ ] Implement Gaussian profile evaluation
- [ ] Implement home-brew deterministic RNG (for future noise support)
- [ ] Add unit tests (`src/tests/test_spectral_brush.cc`)
- [ ] **Deliverable:** Compiles, tests pass

### Phase 2: Editor Core
- [ ] **Files:** `tools/spectral_editor/index.html`, `script.js`, `style.css`, `dct.js` (reuse from old editor)
- [ ] HTML structure (canvas, controls, file input)
- [ ] Canvas rendering (dual-layer: reference + procedural)
- [ ] Bezier curve editor (click to place, drag to adjust, delete control points)
- [ ] Profile controls (Gaussian sigma slider)
- [ ] Real-time spectrogram rendering
- [ ] Audio playback (IDCT → Web Audio API)
- [ ] Undo/Redo system (action history with snapshots)
- [ ] **Keyboard shortcuts:**
  - Key '1': Play procedural sound
  - Key '2': Play original .wav
  - Space: Play/pause
  - Ctrl+Z: Undo
  - Ctrl+Shift+Z: Redo
  - Delete: Remove control point
- [ ] **Deliverable:** Interactive editor, can trace .wav files

### Phase 3: File I/O
- [ ] Load .wav (decode, FFT/STFT → spectrogram)
- [ ] Load .spec (binary format parser)
- [ ] Save procedural_params.txt (human-readable, re-editable)
- [ ] Generate C++ code (ready to compile)
- [ ] Load procedural_params.txt (re-editing workflow)
- [ ] **Deliverable:** Full save/load cycle works

### Phase 4: Future Extensions (Post-MVP)
- [ ] Cubic Bezier interpolation (smoother curves)
- [ ] Decaying sinusoid profile (metallic sounds)
- [ ] Noise profile (textured sounds)
- [ ] Composite profiles (add/subtract/multiply)
- [ ] Multi-dimensional Bezier ({freq, amplitude, decay, ...})
- [ ] Frequency snapping (snap to musical notes)
- [ ] Generic `gen_from_params()` code generation

**Design Decisions:**
- Linear Bezier interpolation (Phase 1), cubic later
- Soft parameter limits in UI (not enforced)
- Home-brew RNG (small, deterministic)
- Single function per sound (generic loader later)
- Start with Bezier + Gaussian only

**Size Impact:** 50-100× compression (5 KB .spec → ~100 bytes C++ code)

---

## Priority 2: 3D System Enhancements (Task #18)
**Goal:** Establish a pipeline for importing complex 3D scenes to replace hardcoded geometry. **Progress:** C++ pipeline for loading and processing object-specific data (like plane_distance) is now in place. Shader integration for SDFs is pending.


## Priority 3: WGSL Modularization (Task #50) [RECURRENT]

**Goal**: Refactor `ShaderComposer` and WGSL assets to support granular, reusable snippets and `#include` directives. This is an ongoing task to maintain shader code hygiene as new features are added.


## Phase 2: Size Optimization (Final Goal)

- [ ] **Task #34: Full STL Removal**: Replace all remaining `std::vector`, `std::map`, and `std::string` usage with custom minimal containers or C-style arrays to allow for CRT replacement. (Minimal Priority - deferred to end).

- [ ] **Task #22: Windows Native Platform**: Replace GLFW with direct Win32 API calls for the final 64k push.

- [ ] **Task #28: Spectrogram Quantization**: Research optimal frequency bin distribution and implement quantization.

- [ ] **Task #35: CRT Replacement**: investigation and implementation of CRT-free entry point.

## Future Goals & Ideas (Untriaged)

### Audio Tools
- [ ] **Task #64: specplay Enhancements**: Extend audio analysis tool with new features
  - **Priority 1**: Spectral visualization (ASCII art), waveform display, frequency analysis, dynamic range
  - **Priority 2**: Diff mode (compare .wav vs .spec), batch mode (CSV report, find clipping)
  - **Priority 3**: WAV export (.spec → .wav), normalization
  - **Priority 4**: Spectral envelope, harmonic analysis, onset detection
  - **Priority 5**: Interactive mode (seek, loop, volume control)
  - See `tools/specplay_README.md` for detailed feature list

- [ ] **Task #65: Data-Driven Tempo Control**: Move tempo variation from code to data files
  - **Current**: `g_tempo_scale` is hardcoded in `main.cc` with manual animation curves
  - **Goal**: Define tempo curves in `.seq` or `.track` files for data-driven tempo control
  - **Approach A**: Add TEMPO directive to `.seq` format
    - Example: `TEMPO 0.0 1.0`, `TEMPO 10.0 2.0`, `TEMPO 20.0 1.0` (time, scale pairs)
    - seq_compiler generates tempo curve array in timeline.cc
  - **Approach B**: Add tempo column to music.track
    - Each pattern trigger can specify tempo_scale override
    - tracker_compiler generates tempo events in music_data.cc
  - **Benefits**: Non-programmers can edit tempo, easier iteration, version control friendly
  - **Priority**: Low (current hardcoded approach works, but less flexible)

- [ ] **Task #67: DCT/FFT Performance Benchmarking**: Add timing measurements to audio tests
  - **Goal**: Compare performance of different DCT/IDCT implementations
  - **Location**: Add timing code to `test_dct.cc` or `test_fft.cc`
  - **Measurements**:
    - Reference IDCT/FDCT (naive O(N²) implementation)
    - FFT-based DCT/IDCT (current O(N log N) implementation)
    - Future x86_64 SIMD-optimized versions (when implemented)
  - **Output Format**:
    - Average time per transform (microseconds)
    - Throughput (transforms per second)
    - Speedup factor vs reference implementation
  - **Test Sizes**: DCT_SIZE=512 (production), plus 128, 256, 1024 for scaling analysis
  - **Implementation**:
    - Use `std::chrono::high_resolution_clock` for timing
    - Run each test 1000+ iterations to reduce noise
    - Report min/avg/max times
    - Guard with `#if !defined(STRIP_ALL)` to avoid production overhead
  - **Benefits**: Quantify FFT speedup, validate SIMD optimizations, identify regressions
  - **Priority**: Very Low (nice-to-have for future optimization work)

- [ ] **Task #69: Convert Audio Pipeline to Clipped Int16**: Use clipped int16 for all audio processing
  - **Current**: Audio pipeline uses float32 throughout (generation, mixing, synthesis, output)
  - **Goal**: Convert to clipped int16 for faster/easier processing and reduced memory footprint
  - **Rationale**:
    - Simpler arithmetic (no float operations)
    - Smaller memory footprint (2 bytes vs 4 bytes per sample)
    - Hardware-native format (most audio devices use int16)
    - Eliminates float→int16 conversion at output stage
    - Natural clipping behavior (overflow wraps/clips automatically)
  - **Scope**:
    - Output path: Definitely convert (backends, WAV dump)
    - Synthesis: Consider keeping float32 for quality (IDCT produces float)
    - Mixing: Could use int16 with proper overflow handling
    - Asset storage: Already int16 in .spec files
  - **Implementation Phases**:
    1. **Phase 1: Output Only** (Minimal change, ~50 lines)
       - Convert `synth_render()` output from float to int16
       - Update `MiniaudioBackend` and `WavDumpBackend` to accept int16
       - Keep all internal processing as float
       - **Benefit**: Eliminates final conversion step
    2. **Phase 2: Mixing Stage** (Moderate change, ~200 lines)
       - Convert voice mixing to int16 arithmetic
       - Add saturation/clipping logic
       - Keep IDCT output as float, convert after synthesis
       - **Benefit**: Faster mixing, reduced memory bandwidth
    3. **Phase 3: Full Pipeline** (Large change, ~500+ lines)
       - Convert spectrograms from float to int16 storage
       - Modify IDCT to output int16 directly
       - All synthesis in int16
       - **Benefit**: Maximum size reduction and performance
  - **Trade-offs**:
    - Quality loss: 16-bit resolution vs 32-bit float precision
    - Dynamic range: Limited to [-32768, 32767]
    - Clipping: Must handle overflow carefully in mixing stage
    - Code complexity: Saturation arithmetic more complex than float
  - **Testing Requirements**:
    - Verify no audible quality degradation
    - Ensure clipping behavior matches float version
    - Check mixing overflow doesn't cause artifacts
    - Validate WAV dumps bit-identical to hardware output
  - **Size Impact**:
    - Phase 1: Negligible (~50 bytes)
    - Phase 2: Small reduction (~100-200 bytes, faster code)
    - Phase 3: Large reduction (50% memory, ~1-2KB code savings)
  - **Priority**: Low (final optimization, after size budget is tight)
  - **Notes**:
    - This is a FINAL optimization task, only if 64k budget requires it
    - Quality must be validated - may not be worth the trade-off
    - Consider keeping float for procedural generation quality

### Developer Tools
- [ ] **Task #66: External Asset Loading for Debugging**: mmap() asset files instead of embedded data
  - **Current**: All assets embedded in `assets_data.cc` (regenerate on every asset change)
  - **Goal**: Load assets from external files in debug builds for faster iteration
  - **Scope**: macOS only, non-STRIP_ALL builds only
  - **Implementation**:
    - Add `DEMO_ENABLE_EXTERNAL_ASSETS` CMake option
    - Modify `GetAsset()` to check for external file first (e.g., `assets/final/<name>`)
    - Use `mmap()` to map file into memory (replaces `uint8_t asset[]` array)
    - Fallback to embedded data if file not found
  - **Benefits**: Edit shaders/assets without regenerating assets_data.cc (~10s rebuild)
  - **Trade-offs**: Adds runtime file I/O, only useful during development
  - **Priority**: Low (current workflow acceptable, but nice-to-have for rapid iteration)

### Visual Effects
- [ ] **Task #73: Extend Shader Parametrization** [IN PROGRESS - 2/4 complete]
  - **Goal**: Extend uniform parameter system to ChromaAberrationEffect, GaussianBlurEffect, DistortEffect, SolarizeEffect
  - **Pattern**: Follow FlashEffect implementation (UniformHelper, params struct, .seq syntax)
  - **Completed**: ChromaAberrationEffect (offset_scale, angle), GaussianBlurEffect (strength)
  - **Remaining**: DistortEffect, SolarizeEffect
  - **Priority**: Medium (quality-of-life improvement for artists)
  - **Estimated Impact**: ~200-300 bytes per effect
- [ ] **Task #52: Procedural SDF Font**: Minimal bezier/spline set for [A-Z, 0-9] and SDF rendering.
- [ ] **Task #55: SDF Random Planes Intersection**: Implement `sdPolyhedron` (crystal/gem shapes) via plane intersection.
- [ ] **Task #54: Tracy Integration**: Integrate Tracy debugger for performance profiling.
- [ ] **Task #58: Advanced Shader Factorization**: Further factorize WGSL code into smaller, reusable snippets.
- [ ] **Task #59: Comprehensive RNG Library**: Add WGSL snippets for float/vec2/vec3 noise (Perlin, Gyroid, etc.) and random number generators.
- [ ] **Task #60: OOP Refactoring**: Investigate if more C++ code can be made object-oriented without size penalty (vs functional style).
- [ ] **Task #61: GPU Procedural Generation**: Implement system to generate procedural data (textures, geometry) on GPU and read back to CPU.
- [ ] **Task #62: Physics Engine Enhancements (PBD & Rotation)**:
    - [ ] **Task #62.1: Quaternion Rotation**: Implement quaternion-based rotation for `Object3D` and incorporate angular momentum into physics.
    - [ ] **Task #62.2: Position Based Dynamics (PBD)**: Refactor solver to re-evaluate velocity after resolving all collisions and constraints.
- [ ] **Task #63: Refactor large files**: Split `src/gpu/gpu.cc`, `src/3d/visual_debug.cc` and `src/gpu/effect.cc` into sub-functionalities. (`src/3d/renderer.cc` was also over 500 lines and was taken care of in the past)

### Performance Optimization
- [ ] **Task #70: SIMD x86_64 Implementation**: Implement critical functions using intrinsics for x86_64 platforms.
  - **Goal**: Optimize hot paths for audio and procedural generation.
  - **Scope**:
    - IDCT/FDCT transforms
    - Audio mixing and voice synthesis
    - CPU-side procedural texture/geometry generation
  - **Constraint**: Non-critical; fallback to generic C++ must be maintained.
  - **Priority**: Very Low

---

## Future Goals