2 files changed, 258 insertions, 41 deletions
diff --git a/cnn_v3/docs/HOWTO.md b/cnn_v3/docs/HOWTO.md
index 983e8b7..5c5cc2a 100644
--- a/cnn_v3/docs/HOWTO.md
+++ b/cnn_v3/docs/HOWTO.md
@@ -22,57 +22,141 @@ It rasterizes proxy geometry to MRT G-buffer textures and packs them into two
 
 ### Adding to a Sequence
 
-`GBufferEffect` does not exist in `seq_compiler.py` as a named effect yet
-(no `.seq` syntax integration for Phase 1). Wire it directly in C++ alongside
-your scene code, or add it to the timeline when the full CNNv3Effect is ready.
+Both `GBufferEffect` and `GBufViewEffect` are registered in `seq_compiler.py`
+(`CLASS_TO_HEADER`) and can be wired directly in `timeline.seq`.
 
-**C++ wiring example** (e.g. inside a Sequence or main.cc):
+**Debug view (G-buffer → sink)**:
+```seq
+SEQUENCE 12.00 0 "cnn_v3_test"
+  NODE gbuf_feat0 gbuf_rgba32uint
+  NODE gbuf_feat1 gbuf_rgba32uint
+  EFFECT + GBufferEffect source -> gbuf_feat0 gbuf_feat1 0.00 8.00
+  EFFECT + GBufViewEffect gbuf_feat0 gbuf_feat1 -> sink 0.00 8.00
+```
 
-```cpp
-#include "../../cnn_v3/src/gbuffer_effect.h"
+**Full CNN pipeline**:
+```seq
+SEQUENCE 12.00 0 "cnn_v3_test"
+  NODE gbuf_feat0 gbuf_rgba32uint
+  NODE gbuf_feat1 gbuf_rgba32uint
+  NODE cnn_v3_out gbuf_albedo
+  EFFECT + GBufferEffect source -> gbuf_feat0 gbuf_feat1 0.00 8.00
+  EFFECT + CNNv3Effect gbuf_feat0 gbuf_feat1 -> cnn_v3_out 0.00 8.00
+  EFFECT + Passthrough cnn_v3_out -> sink 0.00 8.00
+```
 
-// Allocate once alongside your scene
-auto gbuf = std::make_shared<GBufferEffect>(
-    ctx, /*inputs=*/{"prev_cnn"},  // or any dummy node
-    /*outputs=*/{"gbuf_feat0", "gbuf_feat1"},
-    /*start=*/0.0f, /*end=*/60.0f);
+### Internal scene
 
-gbuf->set_scene(&my_scene, &my_camera);
+Call `set_scene()` once before the first render to populate the built-in demo
+scene. No external `Scene` or `Camera` pointer is required — the effect owns
+them.
 
-// In render loop, call before CNN pass:
-gbuf->render(encoder, params, nodes);
-```
+**What `set_scene()` creates:**
+- **20 small cubes** — random positions in [-2,2]×[-1.5,1.5]³, scale 0.1–0.25,
+  random colors. Each has a random rotation axis and speed; animated each frame
+  via `quat::from_axis(axis, time * speed)`.
+- **4 pumping spheres** — at fixed world positions, base radii 0.25–0.35.
+  Scale driven by `audio_intensity`: `r = base_r * (1 + audio_intensity * 0.8)`.
+- **Camera** — position (0, 2.5, 6), target (0, 0, 0), 45° FOV.
+  Aspect ratio updated each frame from `params.aspect_ratio`.
+- **Two directional lights** (uploaded to `lights_uniform_`, ready for shadow pass):
+  - Key: warm white (1.0, 0.92, 0.78), direction `normalize(1, 2, 1)` (upper-right-front)
+  - Fill: cool blue (0.4, 0.45, 0.8 × 0.4), direction `normalize(-1, 1, -1)` (upper-left-back)
 
 ### Internal passes
 
 Each frame, `GBufferEffect::render()` executes:
 
-1. **Pass 1 — MRT rasterization** (`gbuf_raster.wgsl`)
+1. **Pass 1 — MRT rasterization** (`gbuf_raster.wgsl`) ✅
    - Proxy box (36 verts) × N objects, instanced
    - MRT outputs: `gbuf_albedo` (rgba16float), `gbuf_normal_mat` (rgba16float)
    - Depth test + write into `gbuf_depth` (depth32float)
+   - `obj.type` written to `ObjectData.params.x` for future SDF branching
 
-2. **Pass 2/3 — SDF + Lighting** — TODO (placeholder: shadow=1, transp=0)
+2. **Pass 2 — SDF shadow raymarching** (`gbuf_shadow.wgsl`) ✅
+   - See implementation plan below.
 
-3. **Pass 4 — Pack compute** (`gbuf_pack.wgsl`)
+3. **Pass 3 — Transparency** — TODO (deferred; transp=0 for opaque scenes)
+
+4. **Pass 4 — Pack compute** (`gbuf_pack.wgsl`) ✅
    - Reads all G-buffer textures + `prev_cnn` input
    - Writes `feat_tex0` + `feat_tex1` (rgba32uint, 20 channels, 32 bytes/pixel)
+   - Shadow / transp nodes cleared to 1.0 / 0.0 via zero-draw render passes
+     until Pass 2/3 are implemented.
 
 ### Output node names
 
-By default the outputs are named from the `outputs` vector passed to the
-constructor. Use these names when binding the CNN effect input:
+Outputs are named from the `outputs` vector passed to the constructor:
 
 ```
 outputs[0]  → feat_tex0   (rgba32uint: albedo.rgb, normal.xy, depth, depth_grad.xy)
 outputs[1]  → feat_tex1   (rgba32uint: mat_id, prev.rgb, mip1.rgb, mip2.rgb, shadow, transp)
 ```
 
-### Scene data
+---
+
+## 1b. GBufferEffect — Implementation Plan (Pass 2: SDF Shadow)
+
+### What remains
+
+| Item | Status | Notes |
+|------|--------|-------|
+| Pass 1: MRT raster | ✅ Done | proxy box, all object types |
+| Pass 4: Pack compute | ✅ Done | 20 channels packed |
+| Internal scene + animation | ✅ Done | cubes + spheres + 2 lights |
+| Pass 2: SDF shadow | ✅ Done | `gbuf_shadow.wgsl`, proxy-box SDF per object |
+| Pass 3: Transparency | ❌ TODO | low priority, opaque scenes only |
+| Phase 4: type-aware SDF | ✅ Done | switch on `obj.params.x` in `dfWithID` |
+
+### Pass 2: SDF shadow raymarching
 
-Call `set_scene(scene, camera)` before the first render. The effect uploads
-`GlobalUniforms` (view-proj, camera pos, resolution) and `ObjectData` (model
-matrix, color) to GPU storage buffers each frame.
+**New file: `cnn_v3/shaders/gbuf_shadow.wgsl`** — fullscreen render pass.
+
+Bind layout:
+
+| Binding | Type | Content |
+|---------|------|---------|
+| 0 | `uniform` | `GlobalUniforms` (`#include "common_uniforms"`) |
+| 1 | `storage read` | `ObjectsBuffer` |
+| 2 | `texture_depth_2d` | depth from Pass 1 |
+| 3 | `sampler` (non-filtering) | depth load |
+| 4 | `uniform` | `GBufLightsUniforms` (2 lights) |
+
+Algorithm per fragment:
+1. Reconstruct world position from NDC depth + `globals.inv_view_proj`
+2. For each object: `sdBox((inv_model * world_pos).xyz, vec3(1.0))` — proxy box in local space
+3. For each light: offset ray origin by `0.02 * surface_normal`; march shadow ray toward `light.direction`
+4. Soft shadow via `shadowWithStoredDistance()` from `render/raymarching_id`
+5. Combine lights: `shadow = min(shadow_light0, shadow_light1)`
+6. Discard fragments where depth == 1.0 (sky/background → shadow = 1.0)
+7. Output shadow factor to RGBA8Unorm render target (`.r` = shadow)
+
+**C++ additions (`gbuffer_effect.h/.cc`):**
+```cpp
+RenderPipeline shadow_pipeline_;
+void create_shadow_pipeline();
+```
+In `render()` between Pass 1 and the shadow/transp node clears:
+- Build bind group (global_uniforms_buf_, objects_buf_, depth_view, sampler_, lights_uniform_)
+- Run fullscreen triangle → `node_shadow_` color attachment
+- Remove the `clear_node(node_shadow_, 1.0f)` placeholder once the pass is live
+
+**Register:**
+- `cnn_v3/shaders/gbuf_shadow.wgsl` → `SHADER_GBUF_SHADOW` in `assets.txt`
+- `extern const char* gbuf_shadow_wgsl;` in `gbuffer_effect.cc`
+
+### Phase 4: Object-type-aware SDF (optional)
+
+Branch on `obj.params.x` (populated since this commit) using `math/sdf_shapes`:
+
+| Type value | ObjectType | SDF |
+|------------|-----------|-----|
+| 0 | CUBE | `sdBox(local_p, vec3(1))` |
+| 1 | SPHERE | `sdSphere(local_p, 1.0)` |
+| 2 | PLANE | `sdPlane(local_p, vec3(0,1,0), obj.params.y)` |
+| 3 | TORUS | `sdTorus(local_p, vec2(0.8, 0.2))` |
+
+Only worth adding after Pass 2 is validated visually.
 
 ---
 
@@ -253,12 +337,14 @@ Test vectors generated by `cnn_v3/training/gen_test_vectors.py` (PyTorch referen
 | Phase | Status | Notes |
 |-------|--------|-------|
 | 1 — G-buffer (raster + pack) | ✅ Done | Integrated, 36/36 tests pass |
-| 1 — G-buffer (SDF + shadow passes) | TODO | Placeholder: shadow=1, transp=0 |
+| 1 — G-buffer (SDF shadow pass) | ✅ Done | `gbuf_shadow.wgsl`, proxy-box SDF |
 | 2 — Training infrastructure | ✅ Done | blender_export.py, pack_*_sample.py |
 | 3 — WGSL U-Net shaders | ✅ Done | 5 compute shaders + cnn_v3/common snippet |
 | 4 — C++ CNNv3Effect | ✅ Done | FiLM uniform upload, 36/36 tests pass |
 | 5 — Parity validation | ✅ Done | test_cnn_v3_parity.cc, max_err=4.88e-4 |
 | 6 — FiLM MLP training | ✅ Done | train_cnn_v3.py + cnn_v3_utils.py written |
+| 7 — G-buffer visualizer (C++) | ✅ Done | GBufViewEffect, 36/36 tests pass |
+| 7 — Sample loader (web tool) | ✅ Done | "Load sample directory" in cnn_v3/tools/ |
 
 ---
 
@@ -337,9 +423,142 @@ auto src = ShaderComposer::Get().Compose({"cnn_v3/common"}, raw_wgsl);
 
 ---
 
-## 9. See Also
+## 9. Validation Workflow
+
+Two complementary tools let you verify each stage of the pipeline before training
+or integrating into the demo.
+
+### 9a. C++ — GBufViewEffect (G-buffer channel grid)
+
+`GBufViewEffect` renders all 20 feature channels from `feat_tex0` / `feat_tex1`
+in a **4×5 tiled grid** so you can see the G-buffer at a glance.
+
+**Registration (already done)**
+
+| File | What changed |
+|------|-------------|
+| `cnn_v3/shaders/gbuf_view.wgsl` | New fragment shader |
+| `cnn_v3/src/gbuf_view_effect.h` | Effect class declaration |
+| `cnn_v3/src/gbuf_view_effect.cc` | Effect class implementation |
+| `workspaces/main/assets.txt` | `SHADER_GBUF_VIEW` asset |
+| `cmake/DemoSourceLists.cmake` | `gbuf_view_effect.cc` in COMMON_GPU_EFFECTS |
+| `src/gpu/demo_effects.h` | `#include "../../cnn_v3/src/gbuf_view_effect.h"` |
+| `src/effects/shaders.h/.cc` | `gbuf_view_wgsl` extern declaration + definition |
+| `src/tests/gpu/test_demo_effects.cc` | GBufViewEffect test |
+
+**Constructor signature**
+
+```cpp
+GBufViewEffect(const GpuContext& ctx,
+               const std::vector<std::string>& inputs,   // {feat_tex0, feat_tex1}
+               const std::vector<std::string>& outputs,  // {gbuf_view_out}
+               float start_time, float end_time)
+```
+
+**Wiring example** (alongside GBufferEffect):
+
+```cpp
+auto gbuf  = std::make_shared<GBufferEffect>(ctx,
+    std::vector<std::string>{"prev_cnn"},
+    std::vector<std::string>{"gbuf_feat0", "gbuf_feat1"}, 0.0f, 60.0f);
+auto gview = std::make_shared<GBufViewEffect>(ctx,
+    std::vector<std::string>{"gbuf_feat0", "gbuf_feat1"},
+    std::vector<std::string>{"gbuf_view_out"}, 0.0f, 60.0f);
+```
+
+**Grid layout** (output resolution = input resolution, channel cells each 1/4 W × 1/5 H):
+
+| Row | Col 0 | Col 1 | Col 2 | Col 3 |
+|-----|-------|-------|-------|-------|
+| 0 | `alb.r` | `alb.g` | `alb.b` | `nrm.x` remap→[0,1] |
+| 1 | `nrm.y` remap→[0,1] | `depth` (inverted) | `dzdx` ×20+0.5 | `dzdy` ×20+0.5 |
+| 2 | `mat_id` | `prev.r` | `prev.g` | `prev.b` |
+| 3 | `mip1.r` | `mip1.g` | `mip1.b` | `mip2.r` |
+| 4 | `mip2.g` | `mip2.b` | `shadow` | `transp` |
+
+All channels displayed as grayscale. 1-pixel gray grid lines separate cells. Dark background for out-of-range cells.
+
+**Shader binding layout** (no sampler needed — integer texture):
+
+| Binding | Type | Content |
+|---------|------|---------|
+| 0 | `texture_2d<u32>` | `feat_tex0` (8 f16 channels via `pack2x16float`) |
+| 1 | `texture_2d<u32>` | `feat_tex1` (12 u8 channels via `pack4x8unorm`) |
+| 2 | `uniform` (8 B) | `GBufViewUniforms { resolution: vec2f }` |
+
+The BGL is built manually in the constructor (no sampler) — this is an exception to the
+standard post-process pattern because `rgba32uint` textures use `WGPUTextureSampleType_Uint`
+and cannot be sampled, only loaded via `textureLoad()`.
+
+**Implementation note — bind group recreation**
+
+`render()` calls `wgpuRenderPipelineGetBindGroupLayout(pipeline_, 0)` each frame to
+extract the BGL, creates a new `BindGroup`, then immediately releases the BGL handle.
+This avoids storing a raw BGL as a member (no RAII wrapper exists for it) while
+remaining correct across ping-pong buffer swaps.
+
+---
+
+### 9b. Web tool — "Load sample directory"
+
+`cnn_v3/tools/index.html` has a **"Load sample directory"** button that:
+1. Opens a `webkitdirectory` picker to select a sample folder
+2. Loads all G-buffer component PNGs as `rgba8unorm` GPU textures
+3. Runs the `FULL_PACK_SHADER` compute shader to assemble `feat_tex0` / `feat_tex1`
+4. Runs full CNN inference (enc0 → enc1 → bottleneck → dec1 → dec0)
+5. Displays the CNN output on the main canvas
+6. If `target.png` is present, shows it side-by-side and prints PSNR
+
+**File name matching** (case-insensitive, substring):
+
+| Channel | Matched patterns | Fallback |
+|---------|-----------------|---------|
+| Albedo (required) | `albedo`, `color` | — (error if missing) |
+| Normal | `normal`, `nrm` | `rgb(128,128,0,255)` — flat (0,0) oct-encoded |
+| Depth | `depth` | `0` — zero depth |
+| Mat ID | `matid`, `index`, `mat_id` | `0` — no material |
+| Shadow | `shadow` | `255` — fully lit |
+| Transparency | `transp`, `alpha` | `0` — fully opaque |
+| Target | `target`, `output`, `ground_truth` | not shown |
+
+**`FULL_PACK_SHADER`** (defined in `cnn_v3/tools/shaders.js`)
+
+WebGPU compute shader (`@workgroup_size(8,8)`) with 9 bindings:
+
+| Binding | Resource | Format |
+|---------|----------|--------|
+| 0–5 | albedo, normal, depth, matid, shadow, transp | `texture_2d<f32>` (rgba8unorm, R channel for single-channel maps) |
+| 6 | feat_tex0 output | `texture_storage_2d<rgba32uint,write>` |
+| 7 | feat_tex1 output | `texture_storage_2d<rgba32uint,write>` |
+
+No sampler — all reads use `textureLoad()` (integer texel coordinates).
+
+Packs channels identically to `gbuf_pack.wgsl`:
+- `feat_tex0`: `pack2x16float(alb.rg)`, `pack2x16float(alb.b, nrm.x)`, `pack2x16float(nrm.y, depth)`, `pack2x16float(dzdx, dzdy)`
+- `feat_tex1`: `pack4x8unorm(matid,0,0,0)`, `pack4x8unorm(mip1.rgb, mip2.r)`, `pack4x8unorm(mip2.gb, shadow, transp)`
+- Depth gradients: central differences on depth R channel
+- Mip1 / Mip2: box2 (2×2) / box4 (4×4) average filter on albedo
+
+**PSNR computation** (`computePSNR`)
+
+- CNN output (`rgba16float`) copied to CPU staging buffer via `copyTextureToBuffer`
+- f16→float32 decoded in JavaScript
+- Target drawn to offscreen `<canvas>` via `drawImage`, pixels read with `getImageData`
+- MSE and PSNR computed over all RGB pixels (alpha ignored)
+- Result displayed below target canvas as `MSE=X.XXXXX  PSNR=XX.XXdB`
+
+**`runFromFeat(f0, f1, w, h)`**
+
+Called by `loadSampleDir()` after packing, or can be called directly if feat textures
+are already available. Skips the photo-pack step, runs all 5 CNN passes, and displays
+the result. Intermediate textures are stored in `this.layerTextures` so the Layer
+Visualization panel still works.
+
+---
+
+## 10. See Also
 
 - `cnn_v3/docs/CNN_V3.md` — Full architecture design (U-Net, FiLM, feature layout)
 - `doc/EFFECT_WORKFLOW.md` — General effect integration guide
 - `cnn_v2/docs/CNN_V2.md` — Reference implementation (simpler, operational)
-- `src/tests/gpu/test_demo_effects.cc` — GBufferEffect construction test
+- `src/tests/gpu/test_demo_effects.cc` — GBufferEffect + GBufViewEffect tests
diff --git a/cnn_v3/docs/HOW_TO_CNN.md b/cnn_v3/docs/HOW_TO_CNN.md
index 020f79c..458b68f 100644
--- a/cnn_v3/docs/HOW_TO_CNN.md
+++ b/cnn_v3/docs/HOW_TO_CNN.md
@@ -458,11 +458,14 @@ Converts a trained `.pth` checkpoint to two raw binary files for the C++ runtime
 
 ```bash
 cd cnn_v3/training
-python3 export_cnn_v3_weights.py checkpoints/checkpoint_epoch_200.pth
-# writes to export/ by default
-
 python3 export_cnn_v3_weights.py checkpoints/checkpoint_epoch_200.pth \
-    --output /path/to/assets/
+    --output ../../workspaces/main/weights/
+```
+
+Output files are registered in `workspaces/main/assets.txt` as:
+```
+WEIGHTS_CNN_V3, BINARY, weights/cnn_v3_weights.bin, "CNN v3 conv weights (f16, 3928 bytes)"
+WEIGHTS_CNN_V3_FILM_MLP, BINARY, weights/cnn_v3_film_mlp.bin, "CNN v3 FiLM MLP weights (f32, 3104 bytes)"
 ```
 
 ### Output files
@@ -557,20 +560,15 @@ auto cnn = std::make_shared<CNNv3Effect>(
 
 ### Uploading weights
 
-Load `cnn_v3_weights.bin` once at startup, before the first `render()`:
+Load `cnn_v3_weights.bin` once at startup via the asset system, before the first `render()`:
 
 ```cpp
-// Read binary file
-std::vector<uint8_t> data;
-{
-    std::ifstream f("cnn_v3_weights.bin", std::ios::binary | std::ios::ate);
-    data.resize(f.tellg());
-    f.seekg(0);
-    f.read(reinterpret_cast<char*>(data.data()), data.size());
-}
+// Load via asset system
+const char* data = SafeGetAsset(AssetId::ASSET_WEIGHTS_CNN_V3);
+uint32_t size = GetAssetSize(AssetId::ASSET_WEIGHTS_CNN_V3);
 
 // Upload to GPU
-cnn->upload_weights(ctx.queue, data.data(), (uint32_t)data.size());
+cnn->upload_weights(ctx.queue, reinterpret_cast<const uint8_t*>(data), size);
 ```
 
 Before `upload_weights()`: all conv weights are zero, so output is `sigmoid(0) = 0.5` gray.