4 files changed, 31 insertions, 36 deletions
diff --git a/PROJECT_CONTEXT.md b/PROJECT_CONTEXT.md
index 3ed265a..d211cea 100644
--- a/PROJECT_CONTEXT.md
+++ b/PROJECT_CONTEXT.md
@@ -36,7 +36,7 @@
 - **Audio:** Sample-accurate sync. Zero heap allocations per frame. Variable tempo. OLA-IDCT synthesis (v2 .spec): Hann analysis window, rectangular synthesis, 50% overlap, click-free. V1 (raw DCT-512) preserved for generated notes. .spec files regenerated as v2.
 - **Shaders:** Parameterized effects (UniformHelper, .seq syntax). Beat-synchronized animation support (`beat_time`, `beat_phase`). Modular WGSL composition with ShaderComposer. 27 shared common shaders (math, render, compute). Reusable snippets: `render/scratch_lines`, `render/ntsc_common` (NTSC signal processing, RGB and YIQ input variants via `sample_ntsc_signal` hook), `math/color` (YIQ/NTSC), `math/color_c64` (C64 palette, Bayer dither, border animation).
 - **3D:** Hybrid SDF/rasterization with BVH. Binary scene loader. Blender pipeline.
-- **Effects:** CNN post-processing: CNNEffect (v1) and CNNv2Effect operational. CNN v2: sigmoid activation, storage buffer weights (~3.2 KB), 7D static features, dynamic layers. Training stable, convergence validated. **CNN v3 Phases 1–7 complete:** `CNNv3Effect` C++ class (5 compute passes, FiLM uniform upload, identity γ/β defaults). Parity validated: max_err=4.88e-4 (≤1/255). Validation tools: `GBufViewEffect` (C++ 4×5 channel grid) + web "Load sample directory" (G-buffer pack → CNN inference → PSNR vs target.png). See `cnn_v3/docs/HOWTO.md` §9.
+- **Effects:** CNN post-processing: CNNEffect (v1) and CNNv2Effect operational. CNN v2: sigmoid activation, storage buffer weights (~3.2 KB), 7D static features, dynamic layers. Training stable, convergence validated. **CNN v3 Phases 1–7 complete** + runtime pipeline operational: `GBufferEffect` (MRT raster + sphere impostors + SDF shadow pass) → `GBufDeferredEffect` (albedo×diffuse debug view) wired in `cnn_v3_test` sequence. Shared snippets: `math/normal` (oct encode/decode), `ray_sphere`. Parity validated: max_err=4.88e-4. See `cnn_v3/docs/HOWTO.md`.
 - **Tools:** CNN test tool operational. Texture readback utility functional. Timeline editor (web-based, beat-aligned, audio playback).
 - **Build:** Asset dependency tracking. Size measurement. Hot-reload (debug-only). WSL (Windows 10) supported: native Linux build and cross-compile to `.exe` via `mingw-w64`.
 - **Sequence:** DAG-based effect routing with explicit node system. Python compiler with topological sort and ping-pong optimization. 12 effects operational (Passthrough, Placeholder, GaussianBlur, Heptagon, Particles, RotatingCube, Hybrid3D, Flash, PeakMeter, Scene1, Scene2, Scratch). Effect times are absolute (seq_compiler adds sequence start offset). See `doc/SEQUENCE.md`.
@@ -46,9 +46,9 @@
 
 ## Next Up
 
-**Active:** CNN v3 training (`train_cnn_v3.py`), Spectral Brush Editor
-**Ongoing:** Test infrastructure maintenance (35/35 passing)
-**Future:** Size optimization (64k target), 3D enhancements
+**Active:** CNN v3 shadow pass debugging (`GBufDeferredEffect`), Spectral Brush Editor
+**Ongoing:** Test infrastructure maintenance (38/38 passing)
+**Future:** CNN v3 training pass, size optimization (64k target)
 
 See `TODO.md` for details.
 
diff --git a/TODO.md b/TODO.md
index 66cbe76..e855384 100644
--- a/TODO.md
+++ b/TODO.md
@@ -14,7 +14,7 @@ Procedural spectrogram tool: 50-100× compression (5 KB .spec → ~100 bytes C++
 
 ## Priority 2: Test Infrastructure Maintenance [ONGOING]
 
-**Status:** 35/35 tests passing
+**Status:** 38/38 tests passing
 
 **Outstanding TODOs:**
 
@@ -62,32 +62,18 @@ Ongoing shader code hygiene for granular, reusable snippets.
 
 ## CNN v3 — U-Net + FiLM [IN PROGRESS]
 
-U-Net architecture with FiLM conditioning. Runtime style control via beat/audio.
-Richer G-buffer input (normals, depth, material IDs). Per-pixel testability across
-PyTorch / HTML WebGPU / C++ WebGPU.
+**Design:** `cnn_v3/docs/CNN_V3.md` | All phases 1–7 complete. Runtime pipeline operational.
 
-**Design:** `cnn_v3/docs/CNN_V3.md`
+**Current pipeline:** `GBufferEffect` → `GBufDeferredEffect` → sink (debug view: albedo×diffuse)
 
-**Phases:**
-1. ✅ G-buffer: `GBufferEffect` integrated. SDF/shadow placeholder (shadow=1, transp=0).
-2. ✅ Training infrastructure: `blender_export.py`, `pack_blender_sample.py`, `pack_photo_sample.py`
-3. ✅ WGSL shaders: cnn_v3_common (snippet), enc0, enc1, bottleneck, dec1, dec0
-4. ✅ C++ `CNNv3Effect`: 5 compute passes, FiLM uniform upload, `set_film_params()` API
-   - Params alignment fix: WGSL `vec3u` align=16 → C++ structs 64/96 bytes
-   - Weight offsets as explicit formulas (e.g. `20*4*9+4`)
-   - FiLM γ/β: identity defaults; real values require trained MLP (see below)
-5. ✅ Parity validation: test vectors + `test_cnn_v3_parity.cc`. max_err=4.88e-4 (≤1/255).
-   - Key fix: intermediate nodes at fractional resolutions (W/2, W/4) via `NodeRegistry::default_width()/default_height()`
+**Active work:**
+- [ ] Fix/validate shadow pass (`gbuf_shadow.wgsl`) — currently disabled in deferred
+- [ ] Re-enable shadow in `GBufDeferredEffect` once validated
+- [ ] Run first real training pass — see `cnn_v3/docs/HOWTO.md` §3
 
-6. ✅ Training script: `train_cnn_v3.py` + `cnn_v3_utils.py` written
-   - ✅ `export_cnn_v3_weights.py` — convert trained `.pth` → `.bin` (f16)
-7. ✅ Validation tools:
-   - `GBufViewEffect` — C++ 4×5 channel grid (all 20 G-buffer channels)
-   - Web tool "Load sample directory" — G-buffer pack → CNN inference → PSNR
-   - See `cnn_v3/docs/HOWTO.md` §9
-
-**Next: run a real training pass**
-- See `cnn_v3/docs/HOWTO.md` §3 for training commands
+**Pending (lower priority):**
+- [ ] GBufferEffect: Pass 3 transparency (transp=0 placeholder)
+- [ ] GBufferEffect: `resize()` support
 
 ## Future: CNN v3 "2D Mode" (G-buffer-free)
 
diff --git a/cnn_v3/shaders/gbuf_deferred.wgsl b/cnn_v3/shaders/gbuf_deferred.wgsl
index dda4b27..2ed4ce3 100644
--- a/cnn_v3/shaders/gbuf_deferred.wgsl
+++ b/cnn_v3/shaders/gbuf_deferred.wgsl
@@ -5,6 +5,7 @@
 #include "math/normal"
 
 @group(0) @binding(0) var feat_tex0: texture_2d<u32>;
+@group(0) @binding(1) var feat_tex1: texture_2d<u32>;
 @group(0) @binding(2) var<uniform> uniforms: GBufDeferredUniforms;
 
 struct GBufDeferredUniforms {
@@ -39,5 +40,9 @@ fn fs_main(@builtin(position) pos: vec4f) -> @location(0) vec4f {
     let normal  = oct_decode(vec2f(bx.y, ny_d.x));
     let diffuse = max(0.0, dot(normal, KEY_LIGHT));
 
-    return vec4f(albedo * (AMBIENT + diffuse), 1.0);
+    // feat_tex1[2] = pack4x8unorm(mip2.g, mip2.b, shadow, transp)
+    let t1     = textureLoad(feat_tex1, coord, 0);
+    let shadow = unpack4x8unorm(t1.z).z;
+
+    return vec4f(albedo * (AMBIENT + diffuse * shadow), 1.0);
 }
diff --git a/cnn_v3/src/gbuf_deferred_effect.cc b/cnn_v3/src/gbuf_deferred_effect.cc
index 1adae5e..de6bd29 100644
--- a/cnn_v3/src/gbuf_deferred_effect.cc
+++ b/cnn_v3/src/gbuf_deferred_effect.cc
@@ -37,12 +37,13 @@ GBufDeferredEffect::GBufDeferredEffect(const GpuContext& ctx,
     : Effect(ctx, inputs, outputs, start_time, end_time) {
   HEADLESS_RETURN_IF_NULL(ctx_.device);
 
-  WGPUBindGroupLayoutEntry entries[2] = {
+  WGPUBindGroupLayoutEntry entries[3] = {
       bgl_uint_tex(0),
+      bgl_uint_tex(1),
       bgl_uniform(2, sizeof(GBufDeferredUniforms)),
   };
   WGPUBindGroupLayoutDescriptor bgl_desc = {};
-  bgl_desc.entryCount = 2;
+  bgl_desc.entryCount = 3;
   bgl_desc.entries    = entries;
   WGPUBindGroupLayout bgl = wgpuDeviceCreateBindGroupLayout(ctx_.device, &bgl_desc);
 
@@ -89,6 +90,7 @@ void GBufDeferredEffect::render(WGPUCommandEncoder encoder,
                                 const UniformsSequenceParams& params,
                                 NodeRegistry& nodes) {
   WGPUTextureView feat0_view  = nodes.get_view(input_nodes_[0]);
+  WGPUTextureView feat1_view  = nodes.get_view(input_nodes_[1]);
   WGPUTextureView output_view = nodes.get_view(output_nodes_[0]);
 
   // Upload resolution uniform into the base class uniforms buffer (first 8 bytes).
@@ -101,16 +103,18 @@ void GBufDeferredEffect::render(WGPUCommandEncoder encoder,
   WGPUBindGroupLayout bgl =
       wgpuRenderPipelineGetBindGroupLayout(pipeline_.get(), 0);
 
-  WGPUBindGroupEntry bg_entries[2] = {};
+  WGPUBindGroupEntry bg_entries[3] = {};
   bg_entries[0].binding     = 0;
   bg_entries[0].textureView = feat0_view;
-  bg_entries[1].binding     = 2;
-  bg_entries[1].buffer      = uniforms_buffer_.get().buffer;
-  bg_entries[1].size        = sizeof(GBufDeferredUniforms);
+  bg_entries[1].binding     = 1;
+  bg_entries[1].textureView = feat1_view;
+  bg_entries[2].binding     = 2;
+  bg_entries[2].buffer      = uniforms_buffer_.get().buffer;
+  bg_entries[2].size        = sizeof(GBufDeferredUniforms);
 
   WGPUBindGroupDescriptor bg_desc = {};
   bg_desc.layout     = bgl;
-  bg_desc.entryCount = 2;
+  bg_desc.entryCount = 3;
   bg_desc.entries    = bg_entries;
   bind_group_.replace(wgpuDeviceCreateBindGroup(ctx_.device, &bg_desc));
   wgpuBindGroupLayoutRelease(bgl);