17 files changed, 782 insertions, 11 deletions
diff --git a/PROJECT_CONTEXT.md b/PROJECT_CONTEXT.md
index 6219275..3ed265a 100644
--- a/PROJECT_CONTEXT.md
+++ b/PROJECT_CONTEXT.md
@@ -36,7 +36,7 @@
 - **Audio:** Sample-accurate sync. Zero heap allocations per frame. Variable tempo. OLA-IDCT synthesis (v2 .spec): Hann analysis window, rectangular synthesis, 50% overlap, click-free. V1 (raw DCT-512) preserved for generated notes. .spec files regenerated as v2.
 - **Shaders:** Parameterized effects (UniformHelper, .seq syntax). Beat-synchronized animation support (`beat_time`, `beat_phase`). Modular WGSL composition with ShaderComposer. 27 shared common shaders (math, render, compute). Reusable snippets: `render/scratch_lines`, `render/ntsc_common` (NTSC signal processing, RGB and YIQ input variants via `sample_ntsc_signal` hook), `math/color` (YIQ/NTSC), `math/color_c64` (C64 palette, Bayer dither, border animation).
 - **3D:** Hybrid SDF/rasterization with BVH. Binary scene loader. Blender pipeline.
-- **Effects:** CNN post-processing: CNNEffect (v1) and CNNv2Effect operational. CNN v2: sigmoid activation, storage buffer weights (~3.2 KB), 7D static features, dynamic layers. Training stable, convergence validated. **CNN v3 Phases 1–5 complete:** `CNNv3Effect` C++ class (5 compute passes, FiLM uniform upload, identity γ/β defaults). Parity validated: enc0 max_err=1.95e-3, dec1 max_err=1.95e-3, final max_err=4.88e-4 (all ≤1/255). Key fix: intermediate nodes declared at fractional resolutions (W/2, W/4) via `NodeRegistry::default_width()/default_height()`. See `cnn_v3/docs/HOWTO.md`.
+- **Effects:** CNN post-processing: CNNEffect (v1) and CNNv2Effect operational. CNN v2: sigmoid activation, storage buffer weights (~3.2 KB), 7D static features, dynamic layers. Training stable, convergence validated. **CNN v3 Phases 1–7 complete:** `CNNv3Effect` C++ class (5 compute passes, FiLM uniform upload, identity γ/β defaults). Parity validated: max_err=4.88e-4 (≤1/255). Validation tools: `GBufViewEffect` (C++ 4×5 channel grid) + web "Load sample directory" (G-buffer pack → CNN inference → PSNR vs target.png). See `cnn_v3/docs/HOWTO.md` §9.
 - **Tools:** CNN test tool operational. Texture readback utility functional. Timeline editor (web-based, beat-aligned, audio playback).
 - **Build:** Asset dependency tracking. Size measurement. Hot-reload (debug-only). WSL (Windows 10) supported: native Linux build and cross-compile to `.exe` via `mingw-w64`.
 - **Sequence:** DAG-based effect routing with explicit node system. Python compiler with topological sort and ping-pong optimization. 12 effects operational (Passthrough, Placeholder, GaussianBlur, Heptagon, Particles, RotatingCube, Hybrid3D, Flash, PeakMeter, Scene1, Scene2, Scratch). Effect times are absolute (seq_compiler adds sequence start offset). See `doc/SEQUENCE.md`.
diff --git a/TODO.md b/TODO.md
index 4f38740..66cbe76 100644
--- a/TODO.md
+++ b/TODO.md
@@ -79,9 +79,14 @@ PyTorch / HTML WebGPU / C++ WebGPU.
 5. ✅ Parity validation: test vectors + `test_cnn_v3_parity.cc`. max_err=4.88e-4 (≤1/255).
    - Key fix: intermediate nodes at fractional resolutions (W/2, W/4) via `NodeRegistry::default_width()/default_height()`
 
-**Next: export + real training run**
-- `train_cnn_v3.py` + `cnn_v3_utils.py` written (Phase 6 training script done)
-- ✅ `export_cnn_v3_weights.py` — convert trained `.pth` → `.bin` (f16)
+6. ✅ Training script: `train_cnn_v3.py` + `cnn_v3_utils.py` written
+   - ✅ `export_cnn_v3_weights.py` — convert trained `.pth` → `.bin` (f16)
+7. ✅ Validation tools:
+   - `GBufViewEffect` — C++ 4×5 channel grid (all 20 G-buffer channels)
+   - Web tool "Load sample directory" — G-buffer pack → CNN inference → PSNR
+   - See `cnn_v3/docs/HOWTO.md` §9
+
+**Next: run a real training pass**
 - See `cnn_v3/docs/HOWTO.md` §3 for training commands
 
 ## Future: CNN v3 "2D Mode" (G-buffer-free)
diff --git a/cmake/DemoSourceLists.cmake b/cmake/DemoSourceLists.cmake
index 742057a..e5c7339 100644
--- a/cmake/DemoSourceLists.cmake
+++ b/cmake/DemoSourceLists.cmake
@@ -42,6 +42,7 @@ set(COMMON_GPU_EFFECTS
     src/effects/scene2_effect.cc
     cnn_v3/src/gbuffer_effect.cc
     cnn_v3/src/cnn_v3_effect.cc
+    cnn_v3/src/gbuf_view_effect.cc
     # TODO: Port CNN effects to v2 (complex v1 dependencies)
     # cnn_v1/src/cnn_v1_effect.cc
     # cnn_v2/src/cnn_v2_effect.cc
diff --git a/cnn_v3/README.md b/cnn_v3/README.md
index f161bf4..a844b1b 100644
--- a/cnn_v3/README.md
+++ b/cnn_v3/README.md
@@ -31,9 +31,18 @@ Add images directly to these directories and commit them.
 
 ## Status
 
-**Phase 1 complete.** G-buffer integrated (raster + pack), 35/35 tests pass.
-Training infrastructure ready. U-Net WGSL shaders are next.
+**Phases 1–7 complete.** 36/36 tests pass.
 
-See `cnn_v3/docs/HOWTO.md` for the practical playbook.
+| Phase | Status |
+|-------|--------|
+| 1 — G-buffer (raster + pack) | ✅ |
+| 2 — Training infrastructure | ✅ |
+| 3 — WGSL U-Net shaders | ✅ |
+| 4 — C++ CNNv3Effect + FiLM | ✅ |
+| 5 — Parity validation | ✅ max_err=4.88e-4 |
+| 6 — Training script | ✅ train_cnn_v3.py |
+| 7 — Validation tools | ✅ GBufViewEffect + web sample loader |
+
+See `cnn_v3/docs/HOWTO.md` for the practical playbook (§9 covers validation tools).
 See `cnn_v3/docs/CNN_V3.md` for full design.
 See `cnn_v2/` for reference implementation.
diff --git a/cnn_v3/docs/HOWTO.md b/cnn_v3/docs/HOWTO.md
index 983e8b7..c6f306b 100644
--- a/cnn_v3/docs/HOWTO.md
+++ b/cnn_v3/docs/HOWTO.md
@@ -259,6 +259,8 @@ Test vectors generated by `cnn_v3/training/gen_test_vectors.py` (PyTorch referen
 | 4 — C++ CNNv3Effect | ✅ Done | FiLM uniform upload, 36/36 tests pass |
 | 5 — Parity validation | ✅ Done | test_cnn_v3_parity.cc, max_err=4.88e-4 |
 | 6 — FiLM MLP training | ✅ Done | train_cnn_v3.py + cnn_v3_utils.py written |
+| 7 — G-buffer visualizer (C++) | ✅ Done | GBufViewEffect, 36/36 tests pass |
+| 7 — Sample loader (web tool) | ✅ Done | "Load sample directory" in cnn_v3/tools/ |
 
 ---
 
@@ -337,9 +339,141 @@ auto src = ShaderComposer::Get().Compose({"cnn_v3/common"}, raw_wgsl);
 
 ---
 
-## 9. See Also
+## 9. Validation Workflow
+
+Two complementary tools let you verify each stage of the pipeline before training
+or integrating into the demo.
+
+### 9a. C++ — GBufViewEffect (G-buffer channel grid)
+
+`GBufViewEffect` renders all 20 feature channels from `feat_tex0` / `feat_tex1`
+in a **4×5 tiled grid** so you can see the G-buffer at a glance.
+
+**Registration (already done)**
+
+| File | What changed |
+|------|-------------|
+| `cnn_v3/shaders/gbuf_view.wgsl` | New fragment shader |
+| `cnn_v3/src/gbuf_view_effect.h` | Effect class declaration |
+| `cnn_v3/src/gbuf_view_effect.cc` | Effect class implementation |
+| `workspaces/main/assets.txt` | `SHADER_GBUF_VIEW` asset |
+| `cmake/DemoSourceLists.cmake` | `gbuf_view_effect.cc` in COMMON_GPU_EFFECTS |
+| `src/gpu/demo_effects.h` | `#include "../../cnn_v3/src/gbuf_view_effect.h"` |
+| `src/effects/shaders.h/.cc` | `gbuf_view_wgsl` extern declaration + definition |
+| `src/tests/gpu/test_demo_effects.cc` | GBufViewEffect test |
+
+**Constructor signature**
+
+```cpp
+GBufViewEffect(const GpuContext& ctx,
+               const std::vector<std::string>& inputs,   // {feat_tex0, feat_tex1}
+               const std::vector<std::string>& outputs,  // {gbuf_view_out}
+               float start_time, float end_time)
+```
+
+**Wiring example** (alongside GBufferEffect):
+
+```cpp
+auto gbuf  = std::make_shared<GBufferEffect>(ctx,
+    std::vector<std::string>{"prev_cnn"},
+    std::vector<std::string>{"gbuf_feat0", "gbuf_feat1"}, 0.0f, 60.0f);
+auto gview = std::make_shared<GBufViewEffect>(ctx,
+    std::vector<std::string>{"gbuf_feat0", "gbuf_feat1"},
+    std::vector<std::string>{"gbuf_view_out"}, 0.0f, 60.0f);
+```
+
+**Grid layout** (output resolution = input resolution, channel cells each 1/4 W × 1/5 H):
+
+| Row | Col 0 | Col 1 | Col 2 | Col 3 |
+|-----|-------|-------|-------|-------|
+| 0 | `alb.r` (red tint) | `alb.g` (green tint) | `alb.b` (blue tint) | `nrm.x` remap→[0,1] |
+| 1 | `nrm.y` remap→[0,1] | `depth` (inverted) | `dzdx` ×20+0.5 | `dzdy` ×20+0.5 |
+| 2 | `mat_id` | `prev.r` | `prev.g` | `prev.b` |
+| 3 | `mip1.r` | `mip1.g` | `mip1.b` | `mip2.r` |
+| 4 | `mip2.g` | `mip2.b` | `shadow` | `transp` |
+
+1-pixel gray grid lines separate cells. Dark background for out-of-range cells.
+
+**Shader binding layout** (no sampler needed — integer texture):
+
+| Binding | Type | Content |
+|---------|------|---------|
+| 0 | `texture_2d<u32>` | `feat_tex0` (8 f16 channels via `pack2x16float`) |
+| 1 | `texture_2d<u32>` | `feat_tex1` (12 u8 channels via `pack4x8unorm`) |
+| 2 | `uniform` (8 B) | `GBufViewUniforms { resolution: vec2f }` |
+
+The BGL is built manually in the constructor (no sampler) — this is an exception to the
+standard post-process pattern because `rgba32uint` textures use `WGPUTextureSampleType_Uint`
+and cannot be sampled, only loaded via `textureLoad()`.
+
+**Implementation note — bind group recreation**
+
+`render()` calls `wgpuRenderPipelineGetBindGroupLayout(pipeline_, 0)` each frame to
+extract the BGL, creates a new `BindGroup`, then immediately releases the BGL handle.
+This avoids storing a raw BGL as a member (no RAII wrapper exists for it) while
+remaining correct across ping-pong buffer swaps.
+
+---
+
+### 9b. Web tool — "Load sample directory"
+
+`cnn_v3/tools/index.html` has a **"Load sample directory"** button that:
+1. Opens a `webkitdirectory` picker to select a sample folder
+2. Loads all G-buffer component PNGs as `rgba8unorm` GPU textures
+3. Runs the `FULL_PACK_SHADER` compute shader to assemble `feat_tex0` / `feat_tex1`
+4. Runs full CNN inference (enc0 → enc1 → bottleneck → dec1 → dec0)
+5. Displays the CNN output on the main canvas
+6. If `target.png` is present, shows it side-by-side and prints PSNR
+
+**File name matching** (case-insensitive, substring):
+
+| Channel | Matched patterns | Fallback |
+|---------|-----------------|---------|
+| Albedo (required) | `albedo`, `color` | — (error if missing) |
+| Normal | `normal`, `nrm` | `rgb(128,128,0,255)` — flat (0,0) oct-encoded |
+| Depth | `depth` | `0` — zero depth |
+| Mat ID | `matid`, `index`, `mat_id` | `0` — no material |
+| Shadow | `shadow` | `255` — fully lit |
+| Transparency | `transp`, `alpha` | `0` — fully opaque |
+| Target | `target`, `output`, `ground_truth` | not shown |
+
+**`FULL_PACK_SHADER`** (defined in `cnn_v3/tools/shaders.js`)
+
+WebGPU compute shader (`@workgroup_size(8,8)`) with 9 bindings:
+
+| Binding | Resource | Format |
+|---------|----------|--------|
+| 0–5 | albedo, normal, depth, matid, shadow, transp | `texture_2d<f32>` (rgba8unorm, R channel for single-channel maps) |
+| 6 | linear sampler | `sampler` |
+| 7 | feat_tex0 output | `texture_storage_2d<rgba32uint,write>` |
+| 8 | feat_tex1 output | `texture_storage_2d<rgba32uint,write>` |
+
+Packs channels identically to `gbuf_pack.wgsl`:
+- `feat_tex0`: `pack2x16float(alb.rg)`, `pack2x16float(alb.b, nrm.x)`, `pack2x16float(nrm.y, depth)`, `pack2x16float(dzdx, dzdy)`
+- `feat_tex1`: `pack4x8unorm(matid,0,0,0)`, `pack4x8unorm(mip1.rgb, mip2.r)`, `pack4x8unorm(mip2.gb, shadow, transp)`
+- Depth gradients: central differences on depth R channel
+- Mip1 / Mip2: box2 (2×2) / box4 (4×4) average filter on albedo
+
+**PSNR computation** (`computePSNR`)
+
+- CNN output (`rgba16float`) copied to CPU staging buffer via `copyTextureToBuffer`
+- f16→float32 decoded in JavaScript
+- Target drawn to offscreen `<canvas>` via `drawImage`, pixels read with `getImageData`
+- MSE and PSNR computed over all RGB pixels (alpha ignored)
+- Result displayed below target canvas as `MSE=X.XXXXX  PSNR=XX.XXdB`
+
+**`runFromFeat(f0, f1, w, h)`**
+
+Called by `loadSampleDir()` after packing, or can be called directly if feat textures
+are already available. Skips the photo-pack step, runs all 5 CNN passes, and displays
+the result. Intermediate textures are stored in `this.layerTextures` so the Layer
+Visualization panel still works.
+
+---
+
+## 10. See Also
 
 - `cnn_v3/docs/CNN_V3.md` — Full architecture design (U-Net, FiLM, feature layout)
 - `doc/EFFECT_WORKFLOW.md` — General effect integration guide
 - `cnn_v2/docs/CNN_V2.md` — Reference implementation (simpler, operational)
-- `src/tests/gpu/test_demo_effects.cc` — GBufferEffect construction test
+- `src/tests/gpu/test_demo_effects.cc` — GBufferEffect + GBufViewEffect tests
diff --git a/cnn_v3/shaders/gbuf_view.wgsl b/cnn_v3/shaders/gbuf_view.wgsl
new file mode 100644
index 0000000..f2ae085
--- /dev/null
+++ b/cnn_v3/shaders/gbuf_view.wgsl
@@ -0,0 +1,102 @@
+// G-buffer channel visualization — 4×5 grid of 20 feature channels.
+// Takes feat_tex0 (rgba32uint, ch 0-7 f16) and feat_tex1 (rgba32uint, ch 8-19 unorm8).
+// Outputs tiled channel view to a standard rgba8unorm render target.
+//
+// Channel layout (row×col):
+//   Row 0: ch0(alb.r)  ch1(alb.g)  ch2(alb.b)  ch3(nrm.x)
+//   Row 1: ch4(nrm.y)  ch5(depth)  ch6(dzdx)   ch7(dzdy)
+//   Row 2: ch8(matid)  ch9(prv.r)  ch10(prv.g) ch11(prv.b)
+//   Row 3: ch12(m1.r)  ch13(m1.g)  ch14(m1.b)  ch15(m2.r)
+//   Row 4: ch16(m2.g)  ch17(m2.b)  ch18(shdw)  ch19(trns)
+
+struct GBufViewUniforms { resolution: vec2f }
+
+@group(0) @binding(0) var feat0:   texture_2d<u32>;
+@group(0) @binding(1) var feat1:   texture_2d<u32>;
+@group(0) @binding(2) var<uniform> u: GBufViewUniforms;
+
+@vertex
+fn vs_main(@builtin(vertex_index) vid: u32) -> @builtin(position) vec4f {
+    var corners = array<vec2f, 3>(
+        vec2f(-1.0, -1.0), vec2f(3.0, -1.0), vec2f(-1.0, 3.0));
+    return vec4f(corners[vid], 0.0, 1.0);
+}
+
+@fragment
+fn fs_main(@builtin(position) pos: vec4f) -> @location(0) vec4f {
+    let uv  = pos.xy / u.resolution;
+
+    let COLS = 4.0;
+    let ROWS = 5.0;
+    let col  = u32(uv.x * COLS);
+    let row  = u32(uv.y * ROWS);
+    let ch   = row * 4u + col;
+
+    if (col >= 4u || ch >= 20u) {
+        return vec4f(0.05, 0.05, 0.05, 1.0);
+    }
+
+    // 1-pixel grid lines (thin border per cell)
+    let lx = fract(uv.x * COLS);
+    let ly = fract(uv.y * ROWS);
+    if (lx < 0.005 || lx > 0.995 || ly < 0.005 || ly > 0.995) {
+        return vec4f(0.25, 0.25, 0.25, 1.0);
+    }
+
+    // Map local UV to texel coordinate
+    let dim = vec2i(textureDimensions(feat0));
+    let tc  = clamp(vec2i(vec2f(lx, ly) * vec2f(dim)), vec2i(0), dim - vec2i(1));
+
+    var v: f32 = 0.0;
+
+    if (ch < 8u) {
+        // feat0: 4 × pack2x16float — each u32 component holds two f16 values
+        let t       = textureLoad(feat0, tc, 0);
+        let pair_idx = ch >> 1u;
+        let sub      = ch & 1u;
+        var p: vec2f;
+        if      (pair_idx == 0u) { p = unpack2x16float(t.x); }
+        else if (pair_idx == 1u) { p = unpack2x16float(t.y); }
+        else if (pair_idx == 2u) { p = unpack2x16float(t.z); }
+        else                     { p = unpack2x16float(t.w); }
+        v = select(p.y, p.x, sub == 0u);
+    } else {
+        // feat1: 3 × pack4x8unorm — components .x/.y/.z hold 4 u8 values each
+        let t        = textureLoad(feat1, tc, 0);
+        let ch1      = ch - 8u;
+        let comp_idx = ch1 / 4u;
+        let sub      = ch1 % 4u;
+        var bytes: vec4f;
+        if      (comp_idx == 0u) { bytes = unpack4x8unorm(t.x); }
+        else if (comp_idx == 1u) { bytes = unpack4x8unorm(t.y); }
+        else                     { bytes = unpack4x8unorm(t.z); }
+        var ba = array<f32, 4>(bytes.x, bytes.y, bytes.z, bytes.w);
+        v = ba[sub];
+    }
+
+    // Channel-specific normalization for display clarity
+    var disp: f32;
+    if (ch <= 2u) {
+        // Albedo: already [0,1]
+        disp = clamp(v, 0.0, 1.0);
+    } else if (ch == 3u || ch == 4u) {
+        // Normals oct-encoded in [-1,1] → remap to [0,1]
+        disp = clamp(v * 0.5 + 0.5, 0.0, 1.0);
+    } else if (ch == 5u) {
+        // Depth [0,1]: invert so near=white, far=dark
+        disp = clamp(1.0 - v, 0.0, 1.0);
+    } else if (ch == 6u || ch == 7u) {
+        // Depth gradients (signed, small values): amplify × 20 + 0.5 for visibility
+        disp = clamp(v * 20.0 + 0.5, 0.0, 1.0);
+    } else {
+        // Everything else: clamp to [0,1]
+        disp = clamp(v, 0.0, 1.0);
+    }
+
+    // Albedo channels: tint for identification (ch0=red, ch1=green, ch2=blue)
+    if      (ch == 0u) { return vec4f(disp, 0.0,  0.0,  1.0); }
+    else if (ch == 1u) { return vec4f(0.0,  disp, 0.0,  1.0); }
+    else if (ch == 2u) { return vec4f(0.0,  0.0,  disp, 1.0); }
+
+    return vec4f(disp, disp, disp, 1.0);
+}
diff --git a/cnn_v3/src/gbuf_view_effect.cc b/cnn_v3/src/gbuf_view_effect.cc
new file mode 100644
index 0000000..180919d
--- /dev/null
+++ b/cnn_v3/src/gbuf_view_effect.cc
@@ -0,0 +1,144 @@
+// GBufViewEffect — G-buffer channel grid visualization
+// Renders 20 feature channels from feat_tex0/feat_tex1 in a 4×5 tiled layout.
+
+#include "gbuf_view_effect.h"
+
+#if defined(USE_TEST_ASSETS)
+#include "test_assets.h"
+#else
+#include "generated/assets.h"
+#endif
+
+#include "gpu/gpu.h"
+#include "util/asset_manager.h"
+#include "util/fatal_error.h"
+
+extern const char* gbuf_view_wgsl;
+
+// BGL entry: texture_2d<u32> read binding (fragment stage)
+static WGPUBindGroupLayoutEntry bgl_uint_tex_frag(uint32_t binding) {
+  WGPUBindGroupLayoutEntry e = {};
+  e.binding               = binding;
+  e.visibility            = WGPUShaderStage_Fragment;
+  e.texture.sampleType    = WGPUTextureSampleType_Uint;
+  e.texture.viewDimension = WGPUTextureViewDimension_2D;
+  return e;
+}
+
+// BGL entry: uniform buffer (fragment stage)
+static WGPUBindGroupLayoutEntry bgl_uniform_frag(uint32_t binding,
+                                                  uint64_t min_size) {
+  WGPUBindGroupLayoutEntry e = {};
+  e.binding               = binding;
+  e.visibility            = WGPUShaderStage_Fragment;
+  e.buffer.type           = WGPUBufferBindingType_Uniform;
+  e.buffer.minBindingSize = min_size;
+  return e;
+}
+
+GBufViewEffect::GBufViewEffect(const GpuContext& ctx,
+                                const std::vector<std::string>& inputs,
+                                const std::vector<std::string>& outputs,
+                                float start_time, float end_time)
+    : Effect(ctx, inputs, outputs, start_time, end_time) {
+  HEADLESS_RETURN_IF_NULL(ctx_.device);
+
+  // Build BGL: binding 0 = feat0 (u32 tex), 1 = feat1 (u32 tex), 2 = uniforms
+  WGPUBindGroupLayoutEntry entries[3] = {
+      bgl_uint_tex_frag(0),
+      bgl_uint_tex_frag(1),
+      bgl_uniform_frag(2, 8),  // only resolution (vec2f = 8 bytes) is read
+  };
+  WGPUBindGroupLayoutDescriptor bgl_desc = {};
+  bgl_desc.entryCount = 3;
+  bgl_desc.entries    = entries;
+  WGPUBindGroupLayout bgl = wgpuDeviceCreateBindGroupLayout(ctx_.device, &bgl_desc);
+
+  // Pipeline layout
+  WGPUPipelineLayoutDescriptor pl_desc = {};
+  pl_desc.bindGroupLayoutCount = 1;
+  pl_desc.bindGroupLayouts     = &bgl;
+  WGPUPipelineLayout pl = wgpuDeviceCreatePipelineLayout(ctx_.device, &pl_desc);
+
+  // Shader module
+  WGPUShaderSourceWGSL wgsl_src = {};
+  wgsl_src.chain.sType = WGPUSType_ShaderSourceWGSL;
+  wgsl_src.code        = str_view(gbuf_view_wgsl);
+  WGPUShaderModuleDescriptor shader_desc = {};
+  shader_desc.nextInChain = &wgsl_src.chain;
+  WGPUShaderModule shader =
+      wgpuDeviceCreateShaderModule(ctx_.device, &shader_desc);
+
+  // Render pipeline
+  WGPUColorTargetState target = {};
+  target.format    = WGPUTextureFormat_RGBA8Unorm;
+  target.writeMask = WGPUColorWriteMask_All;
+
+  WGPUFragmentState frag = {};
+  frag.module      = shader;
+  frag.entryPoint  = str_view("fs_main");
+  frag.targetCount = 1;
+  frag.targets     = &target;
+
+  WGPURenderPipelineDescriptor pipe_desc = {};
+  pipe_desc.layout               = pl;
+  pipe_desc.vertex.module        = shader;
+  pipe_desc.vertex.entryPoint    = str_view("vs_main");
+  pipe_desc.fragment             = &frag;
+  pipe_desc.primitive.topology   = WGPUPrimitiveTopology_TriangleList;
+  pipe_desc.multisample.count    = 1;
+  pipe_desc.multisample.mask     = UINT32_MAX;
+
+  pipeline_.set(wgpuDeviceCreateRenderPipeline(ctx_.device, &pipe_desc));
+
+  wgpuShaderModuleRelease(shader);
+  wgpuPipelineLayoutRelease(pl);
+  wgpuBindGroupLayoutRelease(bgl);
+}
+
+void GBufViewEffect::render(WGPUCommandEncoder encoder,
+                             const UniformsSequenceParams& params,
+                             NodeRegistry& nodes) {
+  WGPUTextureView feat0_view  = nodes.get_view(input_nodes_[0]);
+  WGPUTextureView feat1_view  = nodes.get_view(input_nodes_[1]);
+  WGPUTextureView output_view = nodes.get_view(output_nodes_[0]);
+
+  // Rebuild bind group (views may change with ping-pong or resize)
+  WGPUBindGroupLayout bgl =
+      wgpuRenderPipelineGetBindGroupLayout(pipeline_.get(), 0);
+
+  WGPUBindGroupEntry bg_entries[3] = {};
+  bg_entries[0].binding     = 0;
+  bg_entries[0].textureView = feat0_view;
+  bg_entries[1].binding     = 1;
+  bg_entries[1].textureView = feat1_view;
+  bg_entries[2].binding     = 2;
+  bg_entries[2].buffer      = uniforms_buffer_.get().buffer;
+  bg_entries[2].size        = sizeof(UniformsSequenceParams);
+
+  WGPUBindGroupDescriptor bg_desc = {};
+  bg_desc.layout     = bgl;
+  bg_desc.entryCount = 3;
+  bg_desc.entries    = bg_entries;
+  bind_group_.replace(wgpuDeviceCreateBindGroup(ctx_.device, &bg_desc));
+  wgpuBindGroupLayoutRelease(bgl);
+
+  WGPURenderPassColorAttachment color_att = {};
+  color_att.view         = output_view;
+  color_att.loadOp       = WGPULoadOp_Clear;
+  color_att.storeOp      = WGPUStoreOp_Store;
+  color_att.clearValue   = {0.0f, 0.0f, 0.0f, 1.0f};
+  color_att.depthSlice   = WGPU_DEPTH_SLICE_UNDEFINED;
+
+  WGPURenderPassDescriptor pass_desc = {};
+  pass_desc.colorAttachmentCount = 1;
+  pass_desc.colorAttachments     = &color_att;
+
+  WGPURenderPassEncoder pass =
+      wgpuCommandEncoderBeginRenderPass(encoder, &pass_desc);
+  wgpuRenderPassEncoderSetPipeline(pass, pipeline_.get());
+  wgpuRenderPassEncoderSetBindGroup(pass, 0, bind_group_.get(), 0, nullptr);
+  wgpuRenderPassEncoderDraw(pass, 3, 1, 0, 0);
+  wgpuRenderPassEncoderEnd(pass);
+  wgpuRenderPassEncoderRelease(pass);
+}
diff --git a/cnn_v3/src/gbuf_view_effect.h b/cnn_v3/src/gbuf_view_effect.h
new file mode 100644
index 0000000..d4d8139
--- /dev/null
+++ b/cnn_v3/src/gbuf_view_effect.h
@@ -0,0 +1,25 @@
+// GBufViewEffect: Visualizes G-buffer feature textures as a 4×5 channel grid.
+// Inputs: feat_tex0 (rgba32uint, ch 0-7 f16), feat_tex1 (rgba32uint, ch 8-19 unorm8)
+// Output: rgba8unorm tiled channel visualization (downscaled 4× per channel)
+
+#pragma once
+
+#include "gpu/effect.h"
+#include "gpu/sequence.h"
+#include "gpu/wgpu_resource.h"
+
+class GBufViewEffect : public Effect {
+ public:
+  GBufViewEffect(const GpuContext& ctx,
+                 const std::vector<std::string>& inputs,
+                 const std::vector<std::string>& outputs,
+                 float start_time, float end_time);
+
+  void render(WGPUCommandEncoder encoder,
+              const UniformsSequenceParams& params,
+              NodeRegistry& nodes) override;
+
+ private:
+  RenderPipeline pipeline_;
+  BindGroup       bind_group_;
+};
diff --git a/cnn_v3/tools/index.html b/cnn_v3/tools/index.html
index 8494fef..1398ca5 100644
--- a/cnn_v3/tools/index.html
+++ b/cnn_v3/tools/index.html
@@ -64,6 +64,7 @@ video{display:none}
   <div class="left">
     <input type="file" id="wFile"  accept=".bin" style="display:none">
     <input type="file" id="fFile"  accept=".bin" style="display:none">
+    <input type="file" id="sFile"  webkitdirectory style="display:none" onchange="tester.loadSampleDir(this.files)">
 
     <div class="dz" id="wDrop"  onclick="document.getElementById('wFile').click()">Drop cnn_v3_weights.bin</div>
     <div class="dz" id="fDrop"  onclick="document.getElementById('fFile').click()">Drop cnn_v3_film_mlp.bin (optional)</div>
@@ -79,6 +80,10 @@ video{display:none}
         <div id="fullHelp" style="display:none;margin-top:6px;font-size:9px;color:#555;line-height:1.6">
           Drop PNGs: *albedo*/color · *normal* · *depth* · *matid*/index · *shadow* · *transp*/alpha
         </div>
+        <div style="margin-top:8px;border-top:1px solid #333;padding-top:8px">
+          <button onclick="document.getElementById('sFile').click()" style="width:100%">↑ Load sample directory</button>
+          <div id="sampleSt" style="font-size:9px;color:#555;margin-top:3px"></div>
+        </div>
       </div>
     </div>
 
@@ -121,7 +126,17 @@ video{display:none}
       <div class="sep"></div>
       <button onclick="tester.savePNG()">Save PNG</button>
     </div>
-    <canvas id="canvas"></canvas>
+    <div style="display:flex;gap:12px;align-items:flex-start">
+      <div style="display:flex;flex-direction:column;align-items:center;gap:3px">
+        <canvas id="canvas"></canvas>
+        <span id="cnnLabel" style="font-size:9px;color:#555"></span>
+      </div>
+      <div id="targetPane" style="display:none;flex-direction:column;align-items:center;gap:3px">
+        <canvas id="targetCanvas" style="max-width:100%;max-height:100%;image-rendering:pixelated;box-shadow:0 4px 12px rgba(0,0,0,.5)"></canvas>
+        <span style="font-size:9px;color:#555">target.png</span>
+        <span id="psnrSt" style="font-size:9px;color:#4a9eff"></span>
+      </div>
+    </div>
   </div>
 
   <div class="right">
diff --git a/cnn_v3/tools/shaders.js b/cnn_v3/tools/shaders.js
index c3e994d..d5b1fb4 100644
--- a/cnn_v3/tools/shaders.js
+++ b/cnn_v3/tools/shaders.js
@@ -250,3 +250,51 @@ const VIZ_U32=`
   var v=array<f32,8>(a.x,a.y,b.x,b.y,c.x,c.y,d.x,d.y);
   let x=clamp(v[min(ch,7u)],0.,1.); return vec4f(x,x,x,1.);
 }`;
+
+// Full G-buffer pack: assembles feat_tex0/feat_tex1 from individual G-buffer images.
+// Bindings: albedo(0) normal(1) depth(2) matid(3) shadow(4) transp(5) sampler(6) f0(7) f1(8)
+// All source textures are rgba8unorm (browser-loaded images, R channel for depth/matid/shadow/transp).
+// Matches gbuf_pack.wgsl packing exactly so the CNN sees the same layout.
+const FULL_PACK_SHADER=`
+@group(0) @binding(0) var albedo: texture_2d<f32>;
+@group(0) @binding(1) var normal: texture_2d<f32>;
+@group(0) @binding(2) var depth:  texture_2d<f32>;
+@group(0) @binding(3) var matid:  texture_2d<f32>;
+@group(0) @binding(4) var shadow: texture_2d<f32>;
+@group(0) @binding(5) var transp: texture_2d<f32>;
+@group(0) @binding(6) var smp:    sampler;
+@group(0) @binding(7) var f0:     texture_storage_2d<rgba32uint,write>;
+@group(0) @binding(8) var f1:     texture_storage_2d<rgba32uint,write>;
+fn ld(c:vec2i,d:vec2i)->f32{return textureLoad(depth,clamp(c,vec2i(0),d-vec2i(1)),0).r;}
+fn b2(tl:vec2i,d:vec2i)->vec3f{
+  var s=vec3f(0.);
+  for(var y:i32=0;y<2;y++){for(var x:i32=0;x<2;x++){s+=textureLoad(albedo,clamp(tl+vec2i(x,y),vec2i(0),d-vec2i(1)),0).rgb;}}
+  return s*.25;}
+fn b4(tl:vec2i,d:vec2i)->vec3f{
+  var s=vec3f(0.);
+  for(var y:i32=0;y<4;y++){for(var x:i32=0;x<4;x++){s+=textureLoad(albedo,clamp(tl+vec2i(x,y),vec2i(0),d-vec2i(1)),0).rgb;}}
+  return s*(1./16.);}
+@compute @workgroup_size(8,8)
+fn main(@builtin(global_invocation_id) id:vec3u){
+  let c=vec2i(id.xy); let d=vec2i(textureDimensions(albedo));
+  if(c.x>=d.x||c.y>=d.y){return;}
+  let alb=textureLoad(albedo,c,0).rgb;
+  let nrm=textureLoad(normal,c,0).rg;
+  let oct=nrm*2.-vec2f(1.);  // [0,1] -> [-1,1]
+  let dv=ld(c,d);
+  let dzdx=(ld(c+vec2i(1,0),d)-ld(c-vec2i(1,0),d))*.5;
+  let dzdy=(ld(c+vec2i(0,1),d)-ld(c-vec2i(0,1),d))*.5;
+  textureStore(f0,c,vec4u(
+    pack2x16float(alb.rg),
+    pack2x16float(vec2f(alb.b,oct.x)),
+    pack2x16float(vec2f(oct.y,dv)),
+    pack2x16float(vec2f(dzdx,dzdy))));
+  let mid=textureLoad(matid,c,0).r;
+  let shd=textureLoad(shadow,c,0).r;
+  let trp=textureLoad(transp,c,0).r;
+  let m1=b2(c-vec2i(0),d); let m2=b4(c-vec2i(1),d);
+  textureStore(f1,c,vec4u(
+    pack4x8unorm(vec4f(mid,0.,0.,0.)),
+    pack4x8unorm(vec4f(m1.r,m1.g,m1.b,m2.r)),
+    pack4x8unorm(vec4f(m2.g,m2.b,shd,trp)),
+    0u));}`;
diff --git a/cnn_v3/tools/tester.js b/cnn_v3/tools/tester.js
index f056444..c1faec9 100644
--- a/cnn_v3/tools/tester.js
+++ b/cnn_v3/tools/tester.js
@@ -13,6 +13,7 @@ class CNNv3Tester {
     this.image   = null;
     this.isVideo = false;
     this.viewMode= 0;   // 0=cnn 1=orig 2=diff
+    this.targetBitmap = null;  // set when a sample dir with target.png is loaded
     this.blend   = 1.0;
     this.layerTextures = {};
     this.lastResult = null;
@@ -525,6 +526,282 @@ class CNNv3Tester {
       return(s?-1:1)*Math.pow(2,e-15)*(1+m/1024);};
     return [f(lo),f(hi)];
   }
+
+  // ── Full G-buffer pack pipeline ───────────────────────────────────────────
+
+  getFullPack() {
+    return this.pl('fullpack', () => this.computePL(FULL_PACK_SHADER, 'main'));
+  }
+
+  // Create a 1×1 rgba8unorm fallback texture with given RGBA bytes [0-255].
+  makeFallbackTex(r, g, b, a) {
+    const tex = this.device.createTexture({size:[1,1], format:'rgba8unorm',
+      usage: GPUTextureUsage.TEXTURE_BINDING|GPUTextureUsage.COPY_DST});
+    this.device.queue.writeTexture({texture:tex}, new Uint8Array([r,g,b,a]),
+      {bytesPerRow:4,rowsPerImage:1}, [1,1]);
+    return tex;
+  }
+
+  // Load an image File as a GPU rgba8unorm texture. Returns {tex, w, h}.
+  async loadGpuTex(file) {
+    const bmp = await createImageBitmap(file);
+    const w = bmp.width, h = bmp.height;
+    const tex = this.device.createTexture({size:[w,h], format:'rgba8unorm',
+      usage: GPUTextureUsage.TEXTURE_BINDING|GPUTextureUsage.COPY_DST|GPUTextureUsage.RENDER_ATTACHMENT});
+    this.device.queue.copyExternalImageToTexture({source:bmp}, {texture:tex}, [w,h]);
+    bmp.close();
+    return {tex, w, h};
+  }
+
+  // ── Load sample directory ─────────────────────────────────────────────────
+
+  async loadSampleDir(files) {
+    if (!files || files.length === 0) return;
+    if (!this.weightsU32) { this.setStatus('Load weights first', true); return; }
+
+    this.setMode('full');
+    const st = document.getElementById('sampleSt');
+    st.textContent = 'Loading…';
+
+    // Match files by name pattern
+    const match = (pat) => {
+      for (const f of files) {
+        const n = f.name.toLowerCase();
+        if (pat.some(p => n.includes(p))) return f;
+      }
+      return null;
+    };
+
+    const fAlbedo = match(['albedo', 'color']);
+    const fNormal = match(['normal', 'nrm']);
+    const fDepth  = match(['depth']);
+    const fMatid  = match(['matid', 'index', 'mat_id']);
+    const fShadow = match(['shadow']);
+    const fTransp = match(['transp', 'alpha']);
+    const fTarget = match(['target', 'output', 'ground_truth']);
+
+    if (!fAlbedo) {
+      st.textContent = '✗ No albedo.png found';
+      this.setStatus('No albedo.png in sample dir', true);
+      return;
+    }
+
+    try {
+      const t0 = performance.now();
+
+      // Load primary albedo to get dimensions
+      const {tex: albTex, w, h} = await this.loadGpuTex(fAlbedo);
+      this.canvas.width = w; this.canvas.height = h;
+      this.context.configure({device:this.device, format:this.format});
+
+      // Load optional channels — fall back to neutral 1×1 textures
+      const nrmTex = fNormal ? (await this.loadGpuTex(fNormal)).tex
+                              : this.makeFallbackTex(128, 128, 0, 255); // oct-encoded (0,0) normal
+      const dptTex = fDepth  ? (await this.loadGpuTex(fDepth)).tex
+                              : this.makeFallbackTex(0, 0, 0, 255);
+      const midTex = fMatid  ? (await this.loadGpuTex(fMatid)).tex
+                              : this.makeFallbackTex(0, 0, 0, 255);
+      const shdTex = fShadow ? (await this.loadGpuTex(fShadow)).tex
+                              : this.makeFallbackTex(255, 255, 255, 255); // fully lit
+      const trpTex = fTransp ? (await this.loadGpuTex(fTransp)).tex
+                              : this.makeFallbackTex(0, 0, 0, 255);      // fully opaque
+
+      // Load target if present
+      if (this.targetBitmap) { this.targetBitmap.close(); this.targetBitmap = null; }
+      if (fTarget) {
+        this.targetBitmap = await createImageBitmap(fTarget);
+        this.showTarget();
+      } else {
+        document.getElementById('targetPane').style.display = 'none';
+      }
+
+      // Pack G-buffer into feat0/feat1
+      const mk = (fmt, tw, th) => this.device.createTexture({size:[tw,th], format:fmt,
+        usage:GPUTextureUsage.STORAGE_BINDING|GPUTextureUsage.TEXTURE_BINDING|GPUTextureUsage.COPY_SRC});
+      const f0 = mk('rgba32uint', w, h);
+      const f1 = mk('rgba32uint', w, h);
+
+      const ceil8 = (n) => Math.ceil(n/8);
+      const pl = this.getFullPack();
+      const bg = this.device.createBindGroup({layout: pl.getBindGroupLayout(0),
+        entries: [
+          {binding:0, resource: albTex.createView()},
+          {binding:1, resource: nrmTex.createView()},
+          {binding:2, resource: dptTex.createView()},
+          {binding:3, resource: midTex.createView()},
+          {binding:4, resource: shdTex.createView()},
+          {binding:5, resource: trpTex.createView()},
+          {binding:6, resource: this.linearSampler},
+          {binding:7, resource: f0.createView()},
+          {binding:8, resource: f1.createView()},
+        ]});
+
+      const enc = this.device.createCommandEncoder();
+      const cp = enc.beginComputePass();
+      cp.setPipeline(pl); cp.setBindGroup(0, bg);
+      cp.dispatchWorkgroups(ceil8(w), ceil8(h));
+      cp.end();
+      this.device.queue.submit([enc.finish()]);
+      await this.device.queue.onSubmittedWorkDone();
+
+      // Cleanup source textures
+      [albTex, nrmTex, dptTex, midTex, shdTex, trpTex].forEach(t => t.destroy());
+
+      const found = [fAlbedo, fNormal, fDepth, fMatid, fShadow, fTransp]
+        .filter(Boolean).map(f => f.name).join(', ');
+      st.textContent = `✓ ${found}`;
+      this.log(`Sample packed: ${w}×${h}, ${((performance.now()-t0)).toFixed(0)}ms`);
+
+      // Run inference from packed feat textures
+      await this.runFromFeat(f0, f1, w, h);
+      f0.destroy(); f1.destroy();
+
+    } catch(e) {
+      st.textContent = `✗ ${e.message}`;
+      this.setStatus(`Sample error: ${e.message}`, true);
+      this.log(`Sample error: ${e.message}`, 'err');
+    }
+  }
+
+  // Show target.png in the #targetPane alongside main canvas.
+  showTarget() {
+    if (!this.targetBitmap) return;
+    const tc = document.getElementById('targetCanvas');
+    tc.width  = this.targetBitmap.width;
+    tc.height = this.targetBitmap.height;
+    const ctx2d = tc.getContext('2d');
+    ctx2d.drawImage(this.targetBitmap, 0, 0);
+    document.getElementById('targetPane').style.display = 'flex';
+  }
+
+  // Run CNN inference starting from pre-packed feat_tex0 / feat_tex1.
+  // Used by loadSampleDir() to skip the photo-pack step.
+  async runFromFeat(f0, f1, w, h) {
+    if (!this.weightsU32 || !this.device) return;
+    const t0 = performance.now();
+    const W2=w>>1, H2=h>>1, W4=W2>>1, H4=H2>>1;
+
+    this.context.configure({device:this.device, format:this.format});
+
+    // Create a neutral "original" texture so the display shader can still
+    // render Orig/Diff modes (just black for sample mode).
+    if (this.inputTex) this.inputTex.destroy();
+    this.inputTex = this.device.createTexture({size:[w,h], format:'rgba8unorm',
+      usage:GPUTextureUsage.TEXTURE_BINDING|GPUTextureUsage.COPY_DST|GPUTextureUsage.RENDER_ATTACHMENT});
+    // Leave it cleared to black — Diff mode against target would need more work
+
+    const mk = (fmt, tw, th) => this.device.createTexture({size:[tw,th], format:fmt,
+      usage:GPUTextureUsage.STORAGE_BINDING|GPUTextureUsage.TEXTURE_BINDING|GPUTextureUsage.COPY_SRC});
+    const e0=mk('rgba16float',w,h), e1=mk('rgba32uint',W2,H2);
+    const bn=mk('rgba32uint',W4,H4), d1=mk('rgba16float',W2,H2), ot=mk('rgba16float',w,h);
+
+    if (!this.weightsGPU) {
+      this.weightsGPU = this.device.createBuffer({size:this.weightsBuffer.byteLength,
+        usage:GPUBufferUsage.STORAGE|GPUBufferUsage.COPY_DST});
+      this.device.queue.writeBuffer(this.weightsGPU, 0, this.weightsBuffer);
+    }
+    const wg = this.weightsGPU;
+    const fp = this.filmParams();
+    const wu = (data) => {
+      const b = this.device.createBuffer({size:data.byteLength, usage:GPUBufferUsage.UNIFORM|GPUBufferUsage.COPY_DST});
+      this.device.queue.writeBuffer(b, 0, data); return b;
+    };
+    const uE0=wu(this.u4(ENC0_OFF,fp.ge0,fp.be0));
+    const uE1=wu(this.u8(ENC1_OFF,fp.ge1,fp.be1));
+    const uBN=wu(this.ubn(BN_OFF));
+    const uD1=wu(this.u4(DEC1_OFF,fp.gd1,fp.bd1));
+    const uD0=wu(this.u4(DEC0_OFF,fp.gd0,fp.bd0));
+    const dispData=new ArrayBuffer(16);
+    new DataView(dispData).setFloat32(4, this.blend, true);
+    const uDp=wu(dispData);
+
+    const enc = this.device.createCommandEncoder();
+    const bg  = (pl,...entries) => this.device.createBindGroup({layout:pl.getBindGroupLayout(0),
+      entries:entries.map((r,i)=>({binding:i,resource:r}))});
+    const rv  = (t) => t.createView();
+    const cp  = (pl,bgr,wx,wy) => {const p=enc.beginComputePass();p.setPipeline(pl);p.setBindGroup(0,bgr);p.dispatchWorkgroups(wx,wy);p.end();};
+    const ceil8 = (n) => Math.ceil(n/8);
+
+    cp(this.getEnc0(), bg(this.getEnc0(), rv(f0),rv(f1),{buffer:wg},{buffer:uE0},rv(e0)), ceil8(w), ceil8(h));
+    cp(this.getEnc1(), bg(this.getEnc1(), rv(e0),{buffer:wg},{buffer:uE1},rv(e1)), ceil8(W2), ceil8(H2));
+    cp(this.getBN(),   bg(this.getBN(),   rv(e1),{buffer:wg},{buffer:uBN},rv(bn)), ceil8(W4), ceil8(H4));
+    cp(this.getDec1(), bg(this.getDec1(), rv(bn),rv(e1),{buffer:wg},{buffer:uD1},rv(d1)), ceil8(W2), ceil8(H2));
+    cp(this.getDec0(), bg(this.getDec0(), rv(d1),rv(e0),{buffer:wg},{buffer:uD0},rv(ot)), ceil8(w), ceil8(h));
+
+    const dbg = bg(this.getDisp(), rv(ot), rv(this.inputTex), {buffer:uDp});
+    const rp = enc.beginRenderPass({colorAttachments:[{
+      view:this.context.getCurrentTexture().createView(), loadOp:'clear', storeOp:'store'}]});
+    rp.setPipeline(this.getDisp()); rp.setBindGroup(0, dbg); rp.draw(6); rp.end();
+
+    this.device.queue.submit([enc.finish()]);
+    await this.device.queue.onSubmittedWorkDone();
+
+    [uE0,uE1,uBN,uD1,uD0].forEach(b => b.destroy());
+
+    // Compute PSNR against target if available
+    let psnrStr = '';
+    if (this.targetBitmap) {
+      this.showTarget();
+      try { psnrStr = await this.computePSNR(ot, w, h); } catch(_) {}
+    }
+
+    this.destroyLayerTex();
+    this.layerTextures = {feat0:f0, feat1:f1, enc0:e0, enc1:e1, bn, dec1:d1, output:ot};
+    this.lastResult = {ot, itex:this.inputTex, uDp, dispPL:this.getDisp(), w, h};
+    this.updateVizPanel();
+
+    const ms = (performance.now()-t0).toFixed(1);
+    document.getElementById('cnnLabel').textContent = `CNN output  (${ms}ms)`;
+    if (psnrStr) document.getElementById('psnrSt').textContent = psnrStr;
+    this.setStatus(`Sample: ${ms}ms · ${w}×${h}`);
+    this.log(`runFromFeat: ${ms}ms`);
+  }
+
+  // Compute PSNR between CNN rgba16float output texture and target.png bitmap.
+  async computePSNR(outTex, w, h) {
+    const bpr = Math.ceil(w * 8 / 256) * 256;
+    const stg = this.device.createBuffer({size:bpr*h,
+      usage:GPUBufferUsage.COPY_DST|GPUBufferUsage.MAP_READ});
+    const enc = this.device.createCommandEncoder();
+    enc.copyTextureToBuffer({texture:outTex}, {buffer:stg, bytesPerRow:bpr, rowsPerImage:h}, [w,h]);
+    this.device.queue.submit([enc.finish()]);
+    await stg.mapAsync(GPUMapMode.READ);
+    const raw = new DataView(stg.getMappedRange());
+
+    // Decode output pixels from f16
+    const f16 = (bits) => {
+      const s=(bits>>15)&1, e=(bits>>10)&0x1F, m=bits&0x3FF;
+      if(e===0) return 0; if(e===31) return s?0:1;
+      return Math.max(0,Math.min(1,(s?-1:1)*Math.pow(2,e-15)*(1+m/1024)));
+    };
+    const cnnPx = new Float32Array(w*h*3);
+    for (let y=0;y<h;y++) for (let x=0;x<w;x++) {
+      const src=y*bpr+x*8, pi=(y*w+x)*3;
+      cnnPx[pi]  = f16(raw.getUint16(src,   true));
+      cnnPx[pi+1]= f16(raw.getUint16(src+2, true));
+      cnnPx[pi+2]= f16(raw.getUint16(src+4, true));
+    }
+    stg.unmap(); stg.destroy();
+
+    // Read target pixels via offscreen canvas
+    const oc = document.createElement('canvas');
+    oc.width = w; oc.height = h;
+    const ctx2d = oc.getContext('2d');
+    ctx2d.drawImage(this.targetBitmap, 0, 0, w, h);
+    const tgtData = ctx2d.getImageData(0, 0, w, h).data;
+
+    let mse = 0;
+    const n = w * h * 3;
+    for (let i=0; i<w*h; i++) {
+      const dr = cnnPx[i*3]   - tgtData[i*4]  /255;
+      const dg = cnnPx[i*3+1] - tgtData[i*4+1]/255;
+      const db = cnnPx[i*3+2] - tgtData[i*4+2]/255;
+      mse += dr*dr + dg*dg + db*db;
+    }
+    mse /= n;
+    const psnr = mse > 0 ? (10 * Math.log10(1 / mse)).toFixed(2) : '∞';
+    return `MSE=${mse.toFixed(5)}  PSNR=${psnr}dB`;
+  }
 }
 
 // ── UI helpers ───────────────────────────────────────────────────────────────
diff --git a/doc/COMPLETED.md b/doc/COMPLETED.md
index 7b925f0..072c92f 100644
--- a/doc/COMPLETED.md
+++ b/doc/COMPLETED.md
@@ -36,6 +36,8 @@ Completed task archive. See `doc/archive/` for detailed historical documents.
 
 ## March 2026
 
+- [x] **CNN v3 Phase 7: Validation tools** — `GBufViewEffect` (C++ 4×5 channel grid, `cnn_v3/shaders/gbuf_view.wgsl`, `cnn_v3/src/gbuf_view_effect.{h,cc}`): renders all 20 G-buffer feature channels tiled on screen; custom BGL with `WGPUTextureSampleType_Uint`, bind group rebuilt per frame via `wgpuRenderPipelineGetBindGroupLayout`. Web tool "Load sample directory" (`cnn_v3/tools/tester.js` + `shaders.js`): `webkitdirectory` picker, `FULL_PACK_SHADER` compute (matches `gbuf_pack.wgsl`), `runFromFeat()` inference, PSNR vs `target.png`. 36/36 tests.
+
 - [x] **CNN v3 Phase 5: Parity validation** — `test_cnn_v3_parity.cc` (2 tests: zero_weights, random_weights). Root cause: intermediate nodes declared at full res instead of W/2, W/4. Fix: `NodeRegistry::default_width()/default_height()` getters + fractional resolution in `declare_nodes()`. Final max_err=4.88e-4 ✓. 36/36 tests.
 
 - [x] **CNN v3 Phase 4: C++ CNNv3Effect + FiLM uniform upload** — `cnn_v3/src/cnn_v3_effect.{h,cc}`. 5 compute passes (enc0→enc1→bottleneck→dec1→dec0), shared f16 weights buffer, per-pass uniform buffers, `set_film_params()` API. Key fix: WGSL `vec3u` has align=16, so `CnnV3Params4ch`=64B and `CnnV3ParamsEnc1`=96B (not 48/80). Weight offsets as explicit formulas. FiLM γ/β identity defaults; real values await `train_cnn_v3.py`. 35/35 tests.
diff --git a/src/effects/shaders.cc b/src/effects/shaders.cc
index f64e135..37b7d09 100644
--- a/src/effects/shaders.cc
+++ b/src/effects/shaders.cc
@@ -117,6 +117,7 @@ const char* ntsc_rgb_shader_wgsl = SafeGetAsset(AssetId::ASSET_SHADER_NTSC_RGB);
 const char* ntsc_yiq_shader_wgsl = SafeGetAsset(AssetId::ASSET_SHADER_NTSC_YIQ);
 const char* gbuf_raster_wgsl     = SafeGetAsset(AssetId::ASSET_SHADER_GBUF_RASTER);
 const char* gbuf_pack_wgsl       = SafeGetAsset(AssetId::ASSET_SHADER_GBUF_PACK);
+const char* gbuf_view_wgsl       = SafeGetAsset(AssetId::ASSET_SHADER_GBUF_VIEW);
 const char* cnn_v3_enc0_wgsl        = SafeGetAsset(AssetId::ASSET_SHADER_CNN_V3_ENC0);
 const char* cnn_v3_enc1_wgsl        = SafeGetAsset(AssetId::ASSET_SHADER_CNN_V3_ENC1);
 const char* cnn_v3_bottleneck_wgsl  = SafeGetAsset(AssetId::ASSET_SHADER_CNN_V3_BOTTLENECK);
diff --git a/src/effects/shaders.h b/src/effects/shaders.h
index 4a77597..de5dda4 100644
--- a/src/effects/shaders.h
+++ b/src/effects/shaders.h
@@ -23,6 +23,7 @@ extern const char* ntsc_yiq_shader_wgsl;
 // CNN v3 G-buffer shaders
 extern const char* gbuf_raster_wgsl;
 extern const char* gbuf_pack_wgsl;
+extern const char* gbuf_view_wgsl;
 
 // CNN v3 inference shaders
 extern const char* cnn_v3_enc0_wgsl;
diff --git a/src/gpu/demo_effects.h b/src/gpu/demo_effects.h
index 66b920c..edb4a23 100644
--- a/src/gpu/demo_effects.h
+++ b/src/gpu/demo_effects.h
@@ -32,9 +32,10 @@
 #include "effects/scratch_effect.h"
 #include "effects/ntsc_effect.h"
 
-// CNN v3 G-buffer + inference
+// CNN v3 G-buffer + inference + debug view
 #include "../../cnn_v3/src/gbuffer_effect.h"
 #include "../../cnn_v3/src/cnn_v3_effect.h"
+#include "../../cnn_v3/src/gbuf_view_effect.h"
 
 // TODO: Port CNN effects
 // #include "../../cnn_v1/src/cnn_v1_effect.h"
diff --git a/src/tests/gpu/test_demo_effects.cc b/src/tests/gpu/test_demo_effects.cc
index f5af5a9..d6e9c8a 100644
--- a/src/tests/gpu/test_demo_effects.cc
+++ b/src/tests/gpu/test_demo_effects.cc
@@ -89,6 +89,11 @@ static void test_effects() {
            fixture.ctx(),
            std::vector<std::string>{"gbuf_feat0", "gbuf_feat1"},
            std::vector<std::string>{"cnn_v3_output"}, 0.0f, 1000.0f)},
+      {"GBufViewEffect",
+       std::make_shared<GBufViewEffect>(
+           fixture.ctx(),
+           std::vector<std::string>{"gbuf_feat0", "gbuf_feat1"},
+           std::vector<std::string>{"gbuf_view_out"}, 0.0f, 1000.0f)},
   };
 
   int passed = 0;
diff --git a/workspaces/main/assets.txt b/workspaces/main/assets.txt
index 0af8b7b..f22552b 100644
--- a/workspaces/main/assets.txt
+++ b/workspaces/main/assets.txt
@@ -101,6 +101,7 @@ SHADER_RENDER_NTSC_COMMON, WGSL, ../../src/shaders/render/ntsc_common.wgsl, "NTS
 # --- CNN v3 G-Buffer ---
 SHADER_GBUF_RASTER, WGSL, ../../cnn_v3/shaders/gbuf_raster.wgsl, "CNN v3 G-buffer MRT rasterization shader"
 SHADER_GBUF_PACK, WGSL, ../../cnn_v3/shaders/gbuf_pack.wgsl, "CNN v3 G-buffer feature pack compute shader"
+SHADER_GBUF_VIEW, WGSL, ../../cnn_v3/shaders/gbuf_view.wgsl, "CNN v3 G-buffer channel visualization (4x5 grid)"
 
 # --- CNN v3 Inference ---
 SHADER_CNN_V3_COMMON, WGSL, ../../cnn_v3/shaders/cnn_v3_common.wgsl, "CNN v3 shared helpers snippet (get_w, unpack_8ch)"