# CNN v3 How-To

Practical playbook for the CNN v3 pipeline: G-buffer effect, training data,
training the U-Net+FiLM network, and wiring everything into the demo.

See `CNN_V3.md` for the full architecture design.

---

## 1. Using GBufferEffect in the Demo

`GBufferEffect` is a full-class effect (Path B in `doc/EFFECT_WORKFLOW.md`).
It rasterizes proxy geometry to MRT G-buffer textures and packs them into two
`rgba32uint` feature textures (`feat_tex0`, `feat_tex1`) consumed by the CNN.

### Registration (already done)

- Shaders in `assets.txt`: `SHADER_GBUF_RASTER`, `SHADER_GBUF_PACK`
- Source in `cmake/DemoSourceLists.cmake`: `cnn_v3/src/gbuffer_effect.cc`
- Header included in `src/gpu/demo_effects.h`
- Test in `src/tests/gpu/test_demo_effects.cc`

### Adding to a Sequence

Both `GBufferEffect` and `GBufViewEffect` are registered in `seq_compiler.py`
(`CLASS_TO_HEADER`) and can be wired directly in `timeline.seq`.

**Debug view (G-buffer → sink)**:
```seq
SEQUENCE 12.00 0 "cnn_v3_test"
  NODE gbuf_feat0 gbuf_rgba32uint
  NODE gbuf_feat1 gbuf_rgba32uint
  EFFECT + GBufferEffect source -> gbuf_feat0 gbuf_feat1 0.00 8.00
  EFFECT + GBufViewEffect gbuf_feat0 gbuf_feat1 -> sink 0.00 8.00
```

**Full CNN pipeline**:
```seq
SEQUENCE 12.00 0 "cnn_v3_test"
  NODE gbuf_feat0 gbuf_rgba32uint
  NODE gbuf_feat1 gbuf_rgba32uint
  NODE cnn_v3_out gbuf_albedo
  EFFECT + GBufferEffect source -> gbuf_feat0 gbuf_feat1 0.00 8.00
  EFFECT + CNNv3Effect gbuf_feat0 gbuf_feat1 -> cnn_v3_out 0.00 8.00
  EFFECT + Passthrough cnn_v3_out -> sink 0.00 8.00
```

### Internal scene

Call `set_scene()` once before the first render to populate the built-in demo
scene. No external `Scene` or `Camera` pointer is required — the effect owns
them.

**What `set_scene()` creates:**
- **20 small cubes** — random positions in [-2,2]×[-1.5,1.5]³, scale 0.1–0.25,
  random colors. Each has a random rotation axis and speed; animated each frame
  via `quat::from_axis(axis, time * speed)`.
- **4 pumping spheres** — at fixed world positions, base radii 0.25–0.35.
  Scale driven by `audio_intensity`: `r = base_r * (1 + audio_intensity * 0.8)`.
- **Camera** — position (0, 2.5, 6), target (0, 0, 0), 45° FOV.
  Aspect ratio updated each frame from `params.aspect_ratio`.
- **Two directional lights** (uploaded to `lights_uniform_`, ready for shadow pass):
  - Key: warm white (1.0, 0.92, 0.78), direction `normalize(1, 2, 1)` (upper-right-front)
  - Fill: cool blue (0.4, 0.45, 0.8 × 0.4), direction `normalize(-1, 1, -1)` (upper-left-back)

### Internal passes

Each frame, `GBufferEffect::render()` executes:

1. **Pass 1 — MRT rasterization** (`gbuf_raster.wgsl`) ✅
   - Proxy box (36 verts) × N objects, instanced
   - MRT outputs: `gbuf_albedo` (rgba16float), `gbuf_normal_mat` (rgba16float)
   - Depth test + write into `gbuf_depth` (depth32float)
   - `obj.type` written to `ObjectData.params.x` for future SDF branching

2. **Pass 2 — SDF shadow raymarching** (`gbuf_shadow.wgsl`) ✅
   - See implementation plan below.

3. **Pass 3 — Transparency** — TODO (deferred; transp=0 for opaque scenes)

4. **Pass 4 — Pack compute** (`gbuf_pack.wgsl`) ✅
   - Reads all G-buffer textures + `prev_cnn` input
   - Writes `feat_tex0` + `feat_tex1` (rgba32uint, 20 channels, 32 bytes/pixel)
   - Shadow / transp nodes cleared to 1.0 / 0.0 via zero-draw render passes
     until Pass 2/3 are implemented.

### Output node names

Outputs are named from the `outputs` vector passed to the constructor:

```
outputs[0]  → feat_tex0   (rgba32uint: albedo.rgb, normal.xy, depth, depth_grad.xy)
outputs[1]  → feat_tex1   (rgba32uint: mat_id, prev.rgb, mip1.rgb, mip2.rgb, shadow, transp)
```

---

## 1b. GBufferEffect — Implementation Plan (Pass 2: SDF Shadow)

### What remains

| Item | Status | Notes |
|------|--------|-------|
| Pass 1: MRT raster | ✅ Done | proxy box, all object types |
| Pass 4: Pack compute | ✅ Done | 20 channels packed |
| Internal scene + animation | ✅ Done | cubes + spheres + 2 lights |
| Pass 2: SDF shadow | ✅ Done | `gbuf_shadow.wgsl`, proxy-box SDF per object |
| Pass 3: Transparency | ❌ TODO | low priority, opaque scenes only |
| Phase 4: type-aware SDF | ✅ Done | switch on `obj.params.x` in `dfWithID` |

### Pass 2: SDF shadow raymarching

**New file: `cnn_v3/shaders/gbuf_shadow.wgsl`** — fullscreen render pass.

Bind layout:

| Binding | Type | Content |
|---------|------|---------|
| 0 | `uniform` | `GlobalUniforms` (`#include "common_uniforms"`) |
| 1 | `storage read` | `ObjectsBuffer` |
| 2 | `texture_depth_2d` | depth from Pass 1 |
| 3 | `sampler` (non-filtering) | depth load |
| 4 | `uniform` | `GBufLightsUniforms` (2 lights) |

Algorithm per fragment:
1. Reconstruct world position from NDC depth + `globals.inv_view_proj`
2. For each object: `sdBox((inv_model * world_pos).xyz, vec3(1.0))` — proxy box in local space
3. For each light: offset ray origin by `0.02 * surface_normal`; march shadow ray toward `light.direction`
4. Soft shadow via `shadowWithStoredDistance()` from `render/raymarching_id`
5. Combine lights: `shadow = min(shadow_light0, shadow_light1)`
6. Discard fragments where depth == 1.0 (sky/background → shadow = 1.0)
7. Output shadow factor to RGBA8Unorm render target (`.r` = shadow)

**C++ additions (`gbuffer_effect.h/.cc`):**
```cpp
RenderPipeline shadow_pipeline_;
void create_shadow_pipeline();
```
In `render()` between Pass 1 and the shadow/transp node clears:
- Build bind group (global_uniforms_buf_, objects_buf_, depth_view, sampler_, lights_uniform_)
- Run fullscreen triangle → `node_shadow_` color attachment
- Remove the `clear_node(node_shadow_, 1.0f)` placeholder once the pass is live

**Register:**
- `cnn_v3/shaders/gbuf_shadow.wgsl` → `SHADER_GBUF_SHADOW` in `assets.txt`
- `extern const char* gbuf_shadow_wgsl;` in `gbuffer_effect.cc`

### Phase 4: Object-type-aware SDF (optional)

Branch on `obj.params.x` (populated since this commit) using `math/sdf_shapes`:

| Type value | ObjectType | SDF |
|------------|-----------|-----|
| 0 | CUBE | `sdBox(local_p, vec3(1))` |
| 1 | SPHERE | `sdSphere(local_p, 1.0)` |
| 2 | PLANE | `sdPlane(local_p, vec3(0,1,0), obj.params.y)` |
| 3 | TORUS | `sdTorus(local_p, vec2(0.8, 0.2))` |

Only worth adding after Pass 2 is validated visually.

---

## 2. Preparing Training Data

CNN v3 supports two data sources: Blender renders and real photos.

### 2a. From Blender Renders

Requires **Blender 4.5 LTS** — Blender 5.x compositor does not route per-pass
render data yet (only Combined is exported).

```bash
# macOS: add to ~/.zshrc or run once per shell session
alias blender4="/Applications/Blender_4.5.8_LTS.app/Contents/MacOS/Blender"

# 1. Export G-buffer passes to multilayer EXR
blender4 -b scene.blend -P cnn_v3/training/blender_export.py \
    -- --output /tmp/renders/

# 2. Pack each EXR into a sample directory
for exr in /tmp/renders/*.exr; do
    name=$(basename "${exr%.exr}")
    python3 cnn_v3/training/pack_blender_sample.py \
        --exr "$exr" --output /tmp/renders/$name/
done
```

Each sample directory contains:
```
sample_XXXX/
  albedo.png    — RGB uint8 (material color, pre-lighting)
  normal.png    — RG uint8 (oct-encoded XY, remap [0,1])
  depth.png     — R uint16 (1/z normalized, 16-bit)
  matid.png     — R uint8 (object index / 255)
  shadow.png    — R uint8 (0=dark, 255=lit)
  transp.png    — R uint8 (0=opaque, 255=transparent)
  target.png    — RGB/RGBA (stylized ground truth)
```

### 2b. From Real Photos

Geometric channels are zeroed; the network degrades gracefully due to
channel-dropout training.

```bash
python3 cnn_v3/training/pack_photo_sample.py \
    --photo cnn_v3/training/input/photo1.jpg \
    --output dataset/photos/sample_001/
```

The output `target.png` defaults to the input photo (no style). Copy in
your stylized version as `target.png` before training.

### Dataset layout

```
dataset/
  blender/
    sample_0001/  sample_0002/  ...
  photos/
    sample_001/   sample_002/   ...
```

Mix freely; the dataloader treats all sample directories uniformly.

---

## 3. Training

Two source files:
- **`cnn_v3_utils.py`** — image I/O, feature assembly, channel dropout, salient-point
  detection, `CNNv3Dataset`
- **`train_cnn_v3.py`** — `CNNv3` model, training loop, CLI

### Quick start

```bash
cd cnn_v3/training

# Patch-based (default) — 64×64 patches around Harris corners
python3 train_cnn_v3.py \
    --input dataset/ \
    --input-mode simple \
    --epochs 200

# Full-image mode (resizes to 256×256)
python3 train_cnn_v3.py \
    --input dataset/ \
    --input-mode full \
    --full-image --image-size 256 \
    --epochs 500

# Quick smoke test: 1 epoch, small patches, random detector
python3 train_cnn_v3.py \
    --input dataset/ --epochs 1 \
    --patch-size 32 --detector random
```

### Key flags

| Flag | Default | Notes |
|------|---------|-------|
| `--input DIR` | `training/dataset` | Root with `full/` or `simple/` subdirs |
| `--input-mode` | `simple` | `simple`=photos, `full`=Blender G-buffer |
| `--patch-size N` | `64` | Patch crop size |
| `--patches-per-image N` | `256` | Patches extracted per image per epoch |
| `--detector` | `harris` | `harris` \| `shi-tomasi` \| `fast` \| `gradient` \| `random` |
| `--channel-dropout-p F` | `0.3` | Dropout prob for geometric channels |
| `--full-image` | off | Resize full image instead of cropping patches |
| `--enc-channels C` | `4,8` | Encoder channel counts, comma-separated |
| `--film-cond-dim N` | `5` | FiLM conditioning input size |
| `--epochs N` | `200` | Training epochs |
| `--batch-size N` | `16` | Batch size |
| `--lr F` | `1e-3` | Adam learning rate |
| `--checkpoint-dir DIR` | `checkpoints/` | Where to save `.pth` files |
| `--checkpoint-every N` | `50` | Epoch interval for checkpoints (0=disable) |

### FiLM conditioning during training

- Conditioning vector `[beat_phase, beat_time/8, audio_intensity, style_p0, style_p1]`
  is **randomised per sample** (uniform [0,1]) so the MLP trains jointly with the U-Net.
- At inference, real beat/audio values are fed from `CNNv3Effect::set_film_params()`.

### Channel dropout

Applied per-sample in `cnn_v3_utils.apply_channel_dropout()`:
- Geometric channels (normal, depth, depth_grad) zeroed with `p=channel_dropout_p`
- Context channels (mat_id, shadow, transp) with `p≈0.2`
- Temporal channels (prev.rgb) with `p=0.5`

This ensures the network works for both full G-buffer and photo-only inputs.

---

## 4. Running the CNN v3 Effect

`CNNv3Effect` is implemented. Wire into a sequence:

```seq
# BPM 120
SEQUENCE 0 0 "Scene with CNN v3"
  EFFECT + GBufferEffect prev_cnn -> gbuf_feat0 gbuf_feat1  0 60
  EFFECT + CNNv3Effect   gbuf_feat0 gbuf_feat1 -> sink       0 60
```

FiLM parameters uploaded each frame:
```cpp
cnn_v3_effect->set_film_params(
    params.beat_phase, params.beat_time / 8.0f, params.audio_intensity,
    style_p0, style_p1);
```

FiLM γ/β default to identity (γ=1, β=0) until `train_cnn_v3.py` produces a trained MLP.

---

## 5. Per-Pixel Validation

C++ parity test passes: `src/tests/gpu/test_cnn_v3_parity.cc` (2 tests).

```bash
cmake -B build -DDEMO_BUILD_TESTS=ON && cmake --build build -j4
cd build && ./test_cnn_v3_parity
```

Results (8×8 test tensors, random weights):
- enc0 max_err = 1.95e-3 ✓
- dec1 max_err = 1.95e-3 ✓
- final max_err = 4.88e-4 ✓  (all ≤ 1/255 = 3.92e-3)

Test vectors generated by `cnn_v3/training/gen_test_vectors.py` (PyTorch reference).

---

## 6. Phase Status

| Phase | Status | Notes |
|-------|--------|-------|
| 1 — G-buffer (raster + pack) | ✅ Done | Integrated, 36/36 tests pass |
| 1 — G-buffer (SDF shadow pass) | ✅ Done | `gbuf_shadow.wgsl`, proxy-box SDF |
| 2 — Training infrastructure | ✅ Done | blender_export.py, pack_*_sample.py |
| 3 — WGSL U-Net shaders | ✅ Done | 5 compute shaders + cnn_v3/common snippet |
| 4 — C++ CNNv3Effect | ✅ Done | FiLM uniform upload, 36/36 tests pass |
| 5 — Parity validation | ✅ Done | test_cnn_v3_parity.cc, max_err=4.88e-4 |
| 6 — FiLM MLP training | ✅ Done | train_cnn_v3.py + cnn_v3_utils.py written |
| 7 — G-buffer visualizer (C++) | ✅ Done | GBufViewEffect, 36/36 tests pass |
| 7 — Sample loader (web tool) | ✅ Done | "Load sample directory" in cnn_v3/tools/ |

---

## 7. CNN v3 Inference Shaders (Phase 3)

Five compute passes, each a standalone WGSL shader using `#include "cnn_v3/common"`.
The common snippet provides `get_w()` and `unpack_8ch()`.

| Pass | Shader | Input(s) | Output | Dims |
|------|--------|----------|--------|------|
| enc0 | `cnn_v3_enc0.wgsl` | feat_tex0+feat_tex1 (20ch) | enc0_tex rgba16float (4ch) | full |
| enc1 | `cnn_v3_enc1.wgsl` | enc0_tex (AvgPool2×2 inline) | enc1_tex rgba32uint (8ch) | ½ |
| bottleneck | `cnn_v3_bottleneck.wgsl` | enc1_tex (AvgPool2×2 inline) | bottleneck_tex rgba32uint (8ch) | ¼ |
| dec1 | `cnn_v3_dec1.wgsl` | bottleneck_tex + enc1_tex (skip) | dec1_tex rgba16float (4ch) | ½ |
| dec0 | `cnn_v3_dec0.wgsl` | dec1_tex + enc0_tex (skip) | output_tex rgba16float (4ch) | full |

**Parity rules baked into the shaders:**
- Zero-padding (not clamp) at conv borders
- AvgPool 2×2 for downsampling (exact, deterministic)
- Nearest-neighbor for upsampling (integer `coord / 2`)
- Skip connections: channel concatenation (not add)
- FiLM applied after conv+bias, before ReLU: `max(0, γ·x + β)`
- No batch norm at inference
- Weight layout: OIHW (out × in × kH × kW), biases after conv weights

**Params uniform per shader** (`group 0, binding 3`):
```
struct Params {
    weight_offset: u32,  // f16 index into shared weights buffer
    _pad: vec3u,
    gamma: vec4f,        // FiLM γ  (enc1: gamma_lo+gamma_hi for 8ch)
    beta:  vec4f,        // FiLM β  (enc1: beta_lo+beta_hi for 8ch)
}
```
FiLM γ/β are computed CPU-side by the FiLM MLP (Phase 4) and uploaded each frame.

**Weight offsets** (f16 units, including bias):
| Layer | Weights | Bias | Total f16 |
|-------|---------|------|-----------|
| enc0  | 20×4×9=720 | +4 | 724 |
| enc1  | 4×8×9=288  | +8 | 296 |
| bottleneck | 8×8×1=64 | +8 | 72 |
| dec1  | 16×4×9=576 | +4 | 580 |
| dec0  | 8×4×9=288  | +4 | 292 |
| **Total** | | | **2064 f16 = ~4 KB** |

**Asset IDs** (registered in `workspaces/main/assets.txt` + `src/effects/shaders.cc`):
`SHADER_CNN_V3_COMMON`, `SHADER_CNN_V3_ENC0`, `SHADER_CNN_V3_ENC1`,
`SHADER_CNN_V3_BOTTLENECK`, `SHADER_CNN_V3_DEC1`, `SHADER_CNN_V3_DEC0`

**C++ usage (Phase 4):**
```cpp
auto src = ShaderComposer::Get().Compose({"cnn_v3/common"}, raw_wgsl);
```

---

## 8. Quick Troubleshooting

**GBufferEffect renders nothing / albedo is black**
- Check `set_scene()` was called before `render()`
- Verify scene has at least one object
- Check camera matrix is not degenerate (near/far, aspect)

**Pack shader fails to compile**
- `gbuf_pack.wgsl` uses no `#include`s; ShaderComposer compose is a no-op
- Check `ASSET_SHADER_GBUF_PACK` resolves in assets.txt

**Raster shader fails with `#include "common_uniforms"` error**
- `ShaderComposer::Get().Compose({"common_uniforms"}, src)` must be called
  before passing to `wgpuDeviceCreateShaderModule` — already done in effect.cc

**G-buffer outputs wrong resolution**
- `resize()` is not yet implemented in GBufferEffect; textures are fixed
  at construction size. Will be added when resize support is needed.

---

## 9. Validation Workflow

Two complementary tools let you verify each stage of the pipeline before training
or integrating into the demo.

### 9a. C++ — GBufViewEffect (G-buffer channel grid)

`GBufViewEffect` renders all 20 feature channels from `feat_tex0` / `feat_tex1`
in a **4×5 tiled grid** so you can see the G-buffer at a glance.

**Registration (already done)**

| File | What changed |
|------|-------------|
| `cnn_v3/shaders/gbuf_view.wgsl` | New fragment shader |
| `cnn_v3/src/gbuf_view_effect.h` | Effect class declaration |
| `cnn_v3/src/gbuf_view_effect.cc` | Effect class implementation |
| `workspaces/main/assets.txt` | `SHADER_GBUF_VIEW` asset |
| `cmake/DemoSourceLists.cmake` | `gbuf_view_effect.cc` in COMMON_GPU_EFFECTS |
| `src/gpu/demo_effects.h` | `#include "../../cnn_v3/src/gbuf_view_effect.h"` |
| `src/effects/shaders.h/.cc` | `gbuf_view_wgsl` extern declaration + definition |
| `src/tests/gpu/test_demo_effects.cc` | GBufViewEffect test |

**Constructor signature**

```cpp
GBufViewEffect(const GpuContext& ctx,
               const std::vector<std::string>& inputs,   // {feat_tex0, feat_tex1}
               const std::vector<std::string>& outputs,  // {gbuf_view_out}
               float start_time, float end_time)
```

**Wiring example** (alongside GBufferEffect):

```cpp
auto gbuf  = std::make_shared<GBufferEffect>(ctx,
    std::vector<std::string>{"prev_cnn"},
    std::vector<std::string>{"gbuf_feat0", "gbuf_feat1"}, 0.0f, 60.0f);
auto gview = std::make_shared<GBufViewEffect>(ctx,
    std::vector<std::string>{"gbuf_feat0", "gbuf_feat1"},
    std::vector<std::string>{"gbuf_view_out"}, 0.0f, 60.0f);
```

**Grid layout** (output resolution = input resolution, channel cells each 1/4 W × 1/5 H):

| Row | Col 0 | Col 1 | Col 2 | Col 3 |
|-----|-------|-------|-------|-------|
| 0 | `alb.r` | `alb.g` | `alb.b` | `nrm.x` remap→[0,1] |
| 1 | `nrm.y` remap→[0,1] | `depth` (inverted) | `dzdx` ×20+0.5 | `dzdy` ×20+0.5 |
| 2 | `mat_id` | `prev.r` | `prev.g` | `prev.b` |
| 3 | `mip1.r` | `mip1.g` | `mip1.b` | `mip2.r` |
| 4 | `mip2.g` | `mip2.b` | `shadow` | `transp` |

All channels displayed as grayscale. 1-pixel gray grid lines separate cells. Dark background for out-of-range cells.

**Shader binding layout** (no sampler needed — integer texture):

| Binding | Type | Content |
|---------|------|---------|
| 0 | `texture_2d<u32>` | `feat_tex0` (8 f16 channels via `pack2x16float`) |
| 1 | `texture_2d<u32>` | `feat_tex1` (12 u8 channels via `pack4x8unorm`) |
| 2 | `uniform` (8 B) | `GBufViewUniforms { resolution: vec2f }` |

The BGL is built manually in the constructor (no sampler) — this is an exception to the
standard post-process pattern because `rgba32uint` textures use `WGPUTextureSampleType_Uint`
and cannot be sampled, only loaded via `textureLoad()`.

**Implementation note — bind group recreation**

`render()` calls `wgpuRenderPipelineGetBindGroupLayout(pipeline_, 0)` each frame to
extract the BGL, creates a new `BindGroup`, then immediately releases the BGL handle.
This avoids storing a raw BGL as a member (no RAII wrapper exists for it) while
remaining correct across ping-pong buffer swaps.

---

### 9b. Web tool — "Load sample directory"

`cnn_v3/tools/index.html` has a **"Load sample directory"** button that:
1. Opens a `webkitdirectory` picker to select a sample folder
2. Loads all G-buffer component PNGs as `rgba8unorm` GPU textures
3. Runs the `FULL_PACK_SHADER` compute shader to assemble `feat_tex0` / `feat_tex1`
4. Runs full CNN inference (enc0 → enc1 → bottleneck → dec1 → dec0)
5. Displays the CNN output on the main canvas
6. If `target.png` is present, shows it side-by-side and prints PSNR

**File name matching** (case-insensitive, substring):

| Channel | Matched patterns | Fallback |
|---------|-----------------|---------|
| Albedo (required) | `albedo`, `color` | — (error if missing) |
| Normal | `normal`, `nrm` | `rgb(128,128,0,255)` — flat (0,0) oct-encoded |
| Depth | `depth` | `0` — zero depth |
| Mat ID | `matid`, `index`, `mat_id` | `0` — no material |
| Shadow | `shadow` | `255` — fully lit |
| Transparency | `transp`, `alpha` | `0` — fully opaque |
| Target | `target`, `output`, `ground_truth` | not shown |

**`FULL_PACK_SHADER`** (defined in `cnn_v3/tools/shaders.js`)

WebGPU compute shader (`@workgroup_size(8,8)`) with 9 bindings:

| Binding | Resource | Format |
|---------|----------|--------|
| 0–5 | albedo, normal, depth, matid, shadow, transp | `texture_2d<f32>` (rgba8unorm, R channel for single-channel maps) |
| 6 | feat_tex0 output | `texture_storage_2d<rgba32uint,write>` |
| 7 | feat_tex1 output | `texture_storage_2d<rgba32uint,write>` |

No sampler — all reads use `textureLoad()` (integer texel coordinates).

Packs channels identically to `gbuf_pack.wgsl`:
- `feat_tex0`: `pack2x16float(alb.rg)`, `pack2x16float(alb.b, nrm.x)`, `pack2x16float(nrm.y, depth)`, `pack2x16float(dzdx, dzdy)`
- `feat_tex1`: `pack4x8unorm(matid,0,0,0)`, `pack4x8unorm(mip1.rgb, mip2.r)`, `pack4x8unorm(mip2.gb, shadow, transp)`
- Depth gradients: central differences on depth R channel
- Mip1 / Mip2: box2 (2×2) / box4 (4×4) average filter on albedo

**PSNR computation** (`computePSNR`)

- CNN output (`rgba16float`) copied to CPU staging buffer via `copyTextureToBuffer`
- f16→float32 decoded in JavaScript
- Target drawn to offscreen `<canvas>` via `drawImage`, pixels read with `getImageData`
- MSE and PSNR computed over all RGB pixels (alpha ignored)
- Result displayed below target canvas as `MSE=X.XXXXX  PSNR=XX.XXdB`

**`runFromFeat(f0, f1, w, h)`**

Called by `loadSampleDir()` after packing, or can be called directly if feat textures
are already available. Skips the photo-pack step, runs all 5 CNN passes, and displays
the result. Intermediate textures are stored in `this.layerTextures` so the Layer
Visualization panel still works.

---

## 10. See Also

- `cnn_v3/docs/CNN_V3.md` — Full architecture design (U-Net, FiLM, feature layout)
- `doc/EFFECT_WORKFLOW.md` — General effect integration guide
- `cnn_v2/docs/CNN_V2.md` — Reference implementation (simpler, operational)
- `src/tests/gpu/test_demo_effects.cc` — GBufferEffect + GBufViewEffect tests