8 files changed, 63 insertions, 89 deletions
diff --git a/PROJECT_CONTEXT.md b/PROJECT_CONTEXT.md
index bffbeb8..d7fc771 100644
--- a/PROJECT_CONTEXT.md
+++ b/PROJECT_CONTEXT.md
@@ -34,21 +34,21 @@
 - **Timing System:** **Beat-based timelines** for musical synchronization. Sequences defined in beats, converted to seconds at runtime. Effects receive both physical time (constant) and beat time (musical). Variable tempo affects audio only. See `doc/BEAT_TIMING.md`.
 - **Workspace system:** Multi-workspace support. Easy switching with `-DDEMO_WORKSPACE=<name>`. Organized structure: `music/`, `weights/`, `obj/`, `shaders/`. Shared common shaders in `src/shaders/`. See `doc/WORKSPACE_SYSTEM.md`.
 - **Audio:** Sample-accurate sync. Zero heap allocations per frame. Variable tempo. OLA-IDCT synthesis (v2 .spec): Hann analysis window, rectangular synthesis, 50% overlap, click-free. V1 (raw DCT-512) preserved for generated notes. .spec files regenerated as v2.
-- **Shaders:** Parameterized effects (UniformHelper, .seq syntax). Beat-synchronized animation support (`beat_time`, `beat_phase`). Modular WGSL composition with ShaderComposer. 27 shared common shaders (math, render, compute). Reusable snippets: `render/scratch_lines`, `render/ntsc_common` (NTSC signal processing, RGB and YIQ input variants via `sample_ntsc_signal` hook), `math/color` (YIQ/NTSC), `math/color_c64` (C64 palette, Bayer dither, border animation).
+- **Shaders:** Parameterized effects (UniformHelper, .seq syntax). Beat-synchronized animation support (`beat_time`, `beat_phase`). Modular WGSL composition with ShaderComposer. 37 shared common shaders (math/, render/, compute/, debug/). Reusable snippets: `render/scratch_lines`, `render/ntsc_common` (NTSC signal processing, RGB and YIQ input variants via `sample_ntsc_signal` hook), `math/color` (YIQ/NTSC), `math/color_c64` (C64 palette, Bayer dither, border animation).
 - **3D:** Hybrid SDF/rasterization with BVH. Binary scene loader. Blender pipeline.
-- **Effects:** CNN post-processing: CNNEffect (v1) and CNNv2Effect operational. CNN v2: sigmoid activation, storage buffer weights (~3.2 KB), 7D static features, dynamic layers. Training stable, convergence validated. **CNN v3 Phases 1–9 complete** + runtime pipeline operational: `GBufferEffect` (MRT raster + sphere impostors + SDF shadow pass) → `GBufDeferredEffect` (albedo×diffuse debug view) wired in `cnn_v3_test` sequence. Two training bugs fixed: dec0 ReLU removed (full [0,1] output range), FiLM MLP loaded from `.bin` at runtime. Parity validated: max_err=4.88e-4. See `cnn_v3/docs/HOWTO.md`.
+- **Effects:** CNN post-processing: CNNEffect (v1) and CNNv2Effect operational. CNN v2: sigmoid activation, storage buffer weights (~3.2 KB), 7D static features, dynamic layers. Training stable, convergence validated. **CNN v3 Phases 1–9 complete** + runtime pipeline operational: `GBufferEffect` (MRT raster + sphere impostors + SDF shadow pass) → `GBufDeferredEffect` (albedo×diffuse) in `cnn_v3_test`; debug sequence adds `CNNv3Effect` → `GBufViewEffect`. Training bugs fixed: dec0 ReLU removed, FiLM MLP loaded from `.bin`. Parity validated: max_err=4.88e-4. See `cnn_v3/docs/HOWTO.md`.
 - **Tools:** CNN test tool operational. Texture readback utility functional. Timeline editor (web-based, beat-aligned, audio playback).
 - **Build:** Asset dependency tracking. Size measurement. Hot-reload (debug-only). WSL (Windows 10) supported: native Linux build and cross-compile to `.exe` via `mingw-w64`.
-- **Sequence:** DAG-based effect routing with explicit node system. Python compiler with topological sort and ping-pong optimization. 12 effects operational (Passthrough, Placeholder, GaussianBlur, Heptagon, Particles, RotatingCube, Hybrid3D, Flash, PeakMeter, Scene1, Scene2, Scratch). Effect times are absolute (seq_compiler adds sequence start offset). See `doc/SEQUENCE.md`.
-- **Testing:** **36/36 passing**.
+- **Sequence:** DAG-based effect routing with explicit node system. Python compiler with topological sort and ping-pong optimization. 18 effects operational (Passthrough, Placeholder, GaussianBlur, Heptagon, Particles, RotatingCube, Hybrid3D, Flash, PeakMeter, Scene1, Scene2, Scratch, Ntsc, NtscYiq, GBufferEffect, CNNv3Effect, GBufDeferredEffect, GBufViewEffect). Effect times are absolute (seq_compiler adds sequence start offset). See `doc/SEQUENCE.md`.
+- **Testing:** **38/38 passing**.
 
 ---
 
 ## Next Up
 
-**Active:** CNN v3 training bugs fixed ✅ — retrain from scratch with more data (≥50 samples). Spectral Brush Editor.
-**Ongoing:** Test infrastructure maintenance (38/38 passing)
-**Future:** CNN v3 training pass, size optimization (64k target)
+**Active:** Spectral Brush Editor (Task #5). CNN v3 data collection + retrain (≥50 samples needed, 11 collected).
+**Ongoing:** Test infrastructure (38/38 passing).
+**Future:** Size optimization (64k target), CNN v3 2D mode, CNN v2 8-bit quantization.
 
 See `TODO.md` for details.
 
diff --git a/TODO.md b/TODO.md
index ea48fc2..132be5d 100644
--- a/TODO.md
+++ b/TODO.md
@@ -12,32 +12,41 @@ Procedural spectrogram tool: 50-100× compression (5 KB .spec → ~100 bytes C++
 
 ---
 
-## Priority 2: Test Infrastructure Maintenance [ONGOING]
+## Priority 2: CNN v3 Training [IN PROGRESS]
 
-**Status:** 38/38 tests passing
+**Design:** `cnn_v3/docs/CNN_V3.md` | Phases 1–9 complete. Runtime pipeline operational.
+
+**Pipelines:**
+- `cnn_v3_test`: `GBufferEffect` → `GBufDeferredEffect`
+- `cnn_v3_debug`: `GBufferEffect` → `CNNv3Effect` → `GBufViewEffect`
+
+**Active:**
+- [ ] Restore full scene in `GBufferEffect::set_scene()` (20 cubes + 4 spheres, 2 lights)
+- [ ] Collect ≥50 training samples (currently 11) — see `cnn_v3/docs/HOWTO.md` §2
+- [ ] Retrain from scratch — see `cnn_v3/docs/HOWTO.md` §3
 
-### ✅ Fix FFT twiddle factor accumulation bug (`src/audio/fft.cc`) — DONE
+**Pending (lower priority):**
+- [ ] GBufferEffect: Pass 3 transparency (transp=0 placeholder)
+- [ ] GBufferEffect: `resize()` support
+- [ ] Web tool (`cnn_v3/tools/shaders.js`): `prev_cnn` hardcoded to 0 in both JS pack shaders
+  (lines ~313 / ~39). Fix: add `prev` texture binding and wire in `tester.js`.
 
-`fft_radix2` now computes `wr = cosf(angle*k); wi = sinf(angle*k);` directly per k.
-Tests A–E added to `test_fft.cc`. `arrays_match` default tolerance reverted to 5e-3.
+---
 
-## ✅ Audio Timing Drift — DONE
+## Priority 3: Test Infrastructure [ONGOING]
 
-Events triggered ~180ms early over 63 beats @ BPM=90. Root causes fixed:
-1. `chunk_frames` truncation accumulation replaced by accurate double-precision integration.
-2. `tracker` updated to double-precision time representations for exact sample-accurate scheduling.
+**Status:** 38/38 tests passing
 
-## ✅ Audio System Enhancements — DONE
+---
 
-1. **`synth.cc`: use `ola_decode()` from `src/audio/ola.h`** — `ola_decode_frame` extracted and used for per-frame OLA-IDCT synthesis, deduplicating the IDCT + overlap handling logic.
+## Priority 4: GPU-Accelerated PCM Synthesis
 
-2. **GPU-Accelerated PCM Synthesis:**
-   - Compute shader for direct PCM generation (bypass spectrogram)
-   - Write to compute buffer, readback to synth
+Compute shader for direct PCM generation (bypasses spectrogram decode).
+Write to compute buffer, readback to synth. No design doc yet.
 
 ---
 
-## Priority 4: 3D System Enhancements (Task #18)
+## Priority 5: 3D System Enhancements (Task #18)
 
 Pipeline for importing complex 3D scenes to replace hardcoded geometry.
 
@@ -45,76 +54,34 @@ Pipeline for importing complex 3D scenes to replace hardcoded geometry.
 
 ---
 
-## Priority 4: WGSL Modularization (Task #50) [RECURRENT]
+## Priority 5: WGSL Modularization (Task #50) [RECURRENT]
 
 Ongoing shader code hygiene for granular, reusable snippets.
 
 ---
 
-## Priority 4: Wine/Windows Black Screen
+## Priority 5: Wine/Windows Black Screen
 
-`demo64k.exe` runs under Wine (wgpu-native v27, Vulkan/MoltenVK) but shows a black window — no visuals rendered. Audio and timeline progress correctly. GPU device/adapter init succeeds.
+`demo64k.exe` opens under Wine but shows a black window. Audio runs correctly.
 
-**Likely causes to investigate:**
+**Likely causes:**
 - Swapchain format mismatch (Wine Vulkan may prefer BGRA8 over RGBA8)
-- Surface present failing silently (check `WGPUSurfaceGetCurrentTexture` status)
-- Render pass output not reaching the surface (missing present call or wrong texture view)
+- Surface present failing silently (`WGPUSurfaceGetCurrentTexture` status)
+- Render pass output not reaching the surface
 
-**To reproduce:** `./scripts/run_win.sh` — window opens, stays black.
+**To reproduce:** `./scripts/run_win.sh`
 
 ---
 
-## CNN v3 — U-Net + FiLM [IN PROGRESS]
+## Future
 
-**Design:** `cnn_v3/docs/CNN_V3.md` | All phases 1–9 complete. Runtime pipeline operational.
-
-**Current pipeline:** `GBufferEffect` → `GBufDeferredEffect` → `GBufViewEffect` → sink
-
-**Training bugs fixed (2026-03-27):**
-- ✅ dec0 ReLU removed: output now spans full [0,1] range (was stuck ≥0.5)
-- ✅ FiLM MLP loaded from `cnn_v3_film_mlp.bin` at runtime (was hardcoded heuristics)
-
-**Active work:**
-- [ ] Restore full scene in `GBufferEffect::set_scene()` (20 cubes + 4 spheres, 2 lights)
-- [ ] Collect ≥50 training samples (currently 11) — see `cnn_v3/docs/HOWTO.md` §2
-- [ ] Retrain from scratch — see `cnn_v3/docs/HOWTO.md` §3
-
-**Pending (lower priority):**
-- [ ] GBufferEffect: Pass 3 transparency (transp=0 placeholder)
-- [ ] GBufferEffect: `resize()` support
-- [ ] Web tool (`cnn_v3/tools/shaders.js`): `prev_cnn` always zero in both pack shaders
-  (`FULL_PACK_SHADER` line ~313 and simple pack line ~39 hardcode `prev=0`).
-  C++ `gbuf_pack.wgsl` reads a real `prev_cnn` texture (binding 6).
-  Fix: add a `prev` texture binding to both JS pack shaders and wire it up in `tester.js`.
-
-## Future: CNN v3 "2D Mode" (G-buffer-free)
-
-Allow `CNNv3Effect` to run on a plain screen buffer / photo without a real G-buffer.
-Fake the missing feature vectors (normals, depth, material IDs, shadow, transp) from
-the RGB input alone:
-- normals: approximate from local luminance gradient (Sobel)
-- depth: constant (e.g. 0.5) or estimated from a simple heuristic
-- material IDs / shadow / transp: neutral defaults (e.g. 0)
-
-This would let the effect be applied to any rendered frame (post-NTSC, post-Scratch, etc.)
-without requiring a 3D G-buffer pass upstream, and enable training/inference on photos.
-
-Implementation sketch:
-- New `CNNv3Effect2D` subclass (or a mode flag) that synthesizes `feat_tex0`/`feat_tex1`
-  internally from a single `rgba8unorm` input, then runs the same 5-pass U-Net.
-- Separate `gbuf_pack_2d.wgsl` compute shader that fills feat0/feat1 from a photo buffer.
-
-## Future: CNN v2 8-bit Quantization
-
-Reduce weights from f16 (~3.2 KB) to i8 (~1.6 KB).
-
-**Requirements:** Quantization-aware training (QAT)
-**Design:** `cnn_v2/docs/CNN_V2.md`
-
----
+### CNN v3 "2D Mode" (G-buffer-free)
+Run `CNNv3Effect` on a plain screen buffer / photo — fake normals via Sobel, constant depth, neutral material defaults. New `gbuf_pack_2d.wgsl` + `CNNv3Effect2D` subclass or mode flag.
 
-## Future: Size Optimization (64k Target)
+### CNN v2 8-bit Quantization
+Reduce weights f16 (~3.2 KB) → i8 (~1.6 KB). Requires QAT. See `cnn_v2/docs/CNN_V2.md`.
 
+### Size Optimization (64k Target)
 - Task #22: Windows Native Platform (Win32)
 - Task #28: Spectrogram Quantization
 - Task #34: Full STL Removal
diff --git a/cnn_v3/README.md b/cnn_v3/README.md
index a844b1b..bd54e50 100644
--- a/cnn_v3/README.md
+++ b/cnn_v3/README.md
@@ -31,7 +31,7 @@ Add images directly to these directories and commit them.
 
 ## Status
 
-**Phases 1–7 complete.** 36/36 tests pass.
+**Phases 1–9 complete.** 38/38 tests pass. Training bugs fixed (2026-03-27).
 
 | Phase | Status |
 |-------|--------|
@@ -42,6 +42,8 @@ Add images directly to these directories and commit them.
 | 5 — Parity validation | ✅ max_err=4.88e-4 |
 | 6 — Training script | ✅ train_cnn_v3.py |
 | 7 — Validation tools | ✅ GBufViewEffect + web sample loader |
+| 8 — Architecture upgrade [8,16] | ✅ enc_channels=[8,16], 16ch split into lo/hi pairs |
+| 9 — Training bug fixes | ✅ dec0 ReLU removed, FiLM MLP loaded from .bin |
 
 See `cnn_v3/docs/HOWTO.md` for the practical playbook (§9 covers validation tools).
 See `cnn_v3/docs/CNN_V3.md` for full design.
diff --git a/cnn_v3/docs/HOWTO.md b/cnn_v3/docs/HOWTO.md
index 67f7931..e8fd0a5 100644
--- a/cnn_v3/docs/HOWTO.md
+++ b/cnn_v3/docs/HOWTO.md
@@ -235,7 +235,7 @@ channel-dropout training.
 python3 cnn_v3/training/pack_photo_sample.py \
     --photo  input/photo1.jpg \
     --target target/photo1_styled.png \
-    --output dataset/photos/sample_001/
+    --output dataset/simple/sample_001/
 ```
 
 `--target` is required and must be a stylized ground-truth image at the same
@@ -245,9 +245,9 @@ resolution as the photo. The script writes it as `target.png` in the sample dir.
 
 ```
 dataset/
-  blender/
+  full/       # Blender G-buffer samples (--input-mode full)
     sample_0001/  sample_0002/  ...
-  photos/
+  simple/     # Photo/stylized pairs (--input-mode simple)
     sample_001/   sample_002/   ...
 ```
 
@@ -399,14 +399,14 @@ Test vectors generated by `cnn_v3/training/gen_test_vectors.py` (PyTorch referen
 
 | Phase | Status | Notes |
 |-------|--------|-------|
-| 1 — G-buffer (raster + pack) | ✅ Done | Integrated, 36/36 tests pass |
+| 1 — G-buffer (raster + pack) | ✅ Done | Integrated, 38/38 tests pass |
 | 1 — G-buffer (SDF shadow pass) | ✅ Done | `gbuf_shadow.wgsl`, proxy-box SDF |
 | 2 — Training infrastructure | ✅ Done | blender_export.py, pack_*_sample.py |
 | 3 — WGSL U-Net shaders | ✅ Done | 5 compute shaders + cnn_v3/common snippet |
-| 4 — C++ CNNv3Effect | ✅ Done | FiLM uniform upload, 36/36 tests pass |
+| 4 — C++ CNNv3Effect | ✅ Done | FiLM uniform upload, 38/38 tests pass |
 | 5 — Parity validation | ✅ Done | test_cnn_v3_parity.cc, max_err=4.88e-4 |
 | 6 — FiLM MLP training | ✅ Done | train_cnn_v3.py + cnn_v3_utils.py written |
-| 7 — G-buffer visualizer (C++) | ✅ Done | GBufViewEffect, 36/36 tests pass |
+| 7 — G-buffer visualizer (C++) | ✅ Done | GBufViewEffect, 38/38 tests pass |
 | 8 — Architecture upgrade [8,16] | ✅ Done | enc_channels=[8,16], multi-scale loss, 16ch textures split into lo/hi pairs |
 | 7 — Sample loader (web tool) | ✅ Done | "Load sample directory" in cnn_v3/tools/ |
 | 9 — Training bug fixes | ✅ Done | dec0 ReLU removed (output unblocked); FiLM MLP loaded at runtime |
diff --git a/doc/COMPLETED.md b/doc/COMPLETED.md
index 203c27a..233373e 100644
--- a/doc/COMPLETED.md
+++ b/doc/COMPLETED.md
@@ -34,6 +34,12 @@ Completed task archive. See `doc/archive/` for detailed historical documents.
 
 ---
 
+## March 2026 (continued)
+
+- [x] **FFT twiddle factor fix** — `fft_radix2` computes `wr/wi` directly per k via `cosf/sinf(angle*k)`. Tests A–E added to `test_fft.cc`. Tolerance reverted to 5e-3.
+- [x] **Audio timing drift fix** — Events were triggered ~180ms early over 63 beats. Fixed: `chunk_frames` truncation replaced by double-precision integration; tracker updated to double-precision time.
+- [x] **OLA decode refactor** — `ola_decode_frame` extracted into `src/audio/ola.h` and used in `synth.cc`, deduplicating IDCT + overlap handling logic.
+
 ## March 2026
 
 - [x] **CNN v3 training bug fixes (2026-03-27)** — Two bugs blocking convergence:
diff --git a/doc/SEQUENCE.md b/doc/SEQUENCE.md
index 3d7a6ce..bb1e8e8 100644
--- a/doc/SEQUENCE.md
+++ b/doc/SEQUENCE.md
@@ -307,7 +307,7 @@ params.aspect_ratio;    // width/height
 - DAG validation, topological sort, ping-pong optimization
 - Multi-input/multi-output effects
 - Node aliasing (compile-time optimization)
-- 12 effects: Passthrough, Placeholder, GaussianBlur, Heptagon, Particles, RotatingCube, Hybrid3D, Flash, PeakMeter, Scene1, Scene2, Scratch
+- 18 effects: Passthrough, Placeholder, GaussianBlur, Heptagon, Particles, RotatingCube, Hybrid3D, Flash, PeakMeter, Scene1, Scene2, Scratch, Ntsc, NtscYiq, GBufferEffect, CNNv3Effect, GBufDeferredEffect, GBufViewEffect
 
 **Missing/Future:**
 - Flatten mode (`--flatten` generates same code as dev mode)
diff --git a/src/audio/audio_engine.cc b/src/audio/audio_engine.cc
index b4c4863..c184324 100644
--- a/src/audio/audio_engine.cc
+++ b/src/audio/audio_engine.cc
@@ -184,7 +184,7 @@ void AudioEngine::seek(float target_time) {
     tracker_update(t, 0.0f);
   }
 
-  // 6. Final update at exact target time
+  // 5. Final update at exact target time
   tracker_update(target_time, 0.0f);
   current_time_ = target_time;
 
diff --git a/src/audio/synth.cc b/src/audio/synth.cc
index 9b56069..0161385 100644
--- a/src/audio/synth.cc
+++ b/src/audio/synth.cc
@@ -28,7 +28,7 @@ struct Voice {
   float overlap_buf[OLA_OVERLAP]; // OLA tail from previous frame (v2 only)
   bool ola_mode;                  // True for SPEC_VERSION_V2_OLA
   int buffer_pos;
-  float fractional_pos; // Fractional sample position for tempo scaling
+  float fractional_pos; // Reserved
 
   int start_sample_offset; // Samples to wait before producing audio output
 
@@ -212,8 +212,7 @@ void synth_trigger_voice(int spectrogram_id, float volume, float pan,
           v.ola_mode ? OLA_HOP_SIZE : DCT_SIZE; // Force reload on first render
       if (v.ola_mode)
         memset(v.overlap_buf, 0, sizeof(v.overlap_buf));
-      v.fractional_pos =
-          0.0f; // Initialize fractional position for tempo scaling
+      v.fractional_pos = 0.0f;
       v.start_sample_offset = start_offset_samples;
       v.active_spectral_data =
           g_synth_data.active_spectrogram_data[spectrogram_id];