// cnn_v3/docs/GBUF_DIF_MIGRATION.md
// Plan: replace G-buffer shadow channel with dif (diffuse × shadow)
// Status: IN PROGRESS — Step 1 (WGSL) complete; Steps 2–5 pending

# G-Buffer `shadow` → `dif` Migration Plan

## Motivation

The raw `shadow` channel (ch18) is less informative than `dif = max(0, dot(normal, light_dir)) * shadow`
because `shadow` alone ignores the diffuse Lambert term. The CNN learns better when it receives
the pre-multiplied occluded diffuse signal directly. `albedo` is already in ch0–2, so the CNN
can reconstruct the full shaded color as `albedo * (ambient + dif)`.

## Design Decision

**Replace ch18 (`shadow`) with ch18 (`dif`) in-place. Channel count stays 20.**

- `dif` is a scalar: `max(0, dot(normal, KEY_LIGHT)) * shadow`
- KEY_LIGHT = normalize(1, 2, 1) = (0.408, 0.816, 0.408) — matches `gbuf_deferred.wgsl`
- Stored at the same position (t1.z byte 2) → no weight shape change
- `transp` stays at ch19 (t1.z byte 3)
- t1.w reverts to 0 (spare)

### Feature layout (20 channels, unchanged count)

| ch | name     | type   | range    | source         |
|----|----------|--------|----------|----------------|
| 0  | alb.r    | f16    | [0,1]    | feat_tex0.x lo |
| 1  | alb.g    | f16    | [0,1]    | feat_tex0.x hi |
| 2  | alb.b    | f16    | [0,1]    | feat_tex0.y lo |
| 3  | nrm.x    | f16    | [-1,1]   | feat_tex0.y hi |
| 4  | nrm.y    | f16    | [-1,1]   | feat_tex0.z lo |
| 5  | depth    | f16    | [0,1]    | feat_tex0.z hi |
| 6  | dzdx     | f16    | (signed) | feat_tex0.w lo |
| 7  | dzdy     | f16    | (signed) | feat_tex0.w hi |
| 8  | mat_id   | u8     | [0,1]    | feat_tex1.x[0] |
| 9  | prev.r   | u8     | [0,1]    | feat_tex1.x[1] |
| 10 | prev.g   | u8     | [0,1]    | feat_tex1.x[2] |
| 11 | prev.b   | u8     | [0,1]    | feat_tex1.x[3] |
| 12 | mip1.r   | u8     | [0,1]    | feat_tex1.y[0] |
| 13 | mip1.g   | u8     | [0,1]    | feat_tex1.y[1] |
| 14 | mip1.b   | u8     | [0,1]    | feat_tex1.y[2] |
| 15 | mip2.r   | u8     | [0,1]    | feat_tex1.y[3] |
| 16 | mip2.g   | u8     | [0,1]    | feat_tex1.z[0] |
| 17 | mip2.b   | u8     | [0,1]    | feat_tex1.z[1] |
| 18 | **dif**  | u8     | [0,1]    | feat_tex1.z[2] ← was shadow |
| 19 | transp   | u8     | [0,1]    | feat_tex1.z[3] |

---

## Current State (intermediate — needs fixing)

The commit tagged `wip(cnn_v3): shadow→dif intermediate` contains partial work.
The WGSL changes are **incorrect** — `dif` is redundantly stored in t1.w (3×) and
`shadow` was dropped from t1.z without putting `dif` in its place.

### What is wrong

| File | Problem |
|---|---|
| `gbuf_pack.wgsl` | t1.z = `mip2.g\|mip2.b\|transp\|spare` (shadow removed, dif not put there); t1.w = `dif\|dif\|dif\|spare` (redundant) |
| `gbuf_deferred.wgsl` | reads `dif` from `t1.w.x` — should be `t1.z.z` |
| `gbuf_view.wgsl` | expanded to 4×6 grid with ch20–22 as dif.rgb — should stay 4×5, ch18=dif |

---

## Implementation Checklist

### Step 1 — Fix WGSL (correct the in-place swap) ✅

- [x] `cnn_v3/shaders/gbuf_pack.wgsl`
  - t1.z: `pack4x8unorm(vec4f(mip2.g, mip2.b, dif, transp))` ← dif at byte 2
  - t1.w: `0u` ← revert to spare
  - Remove comment line about t1.w dif

- [x] `cnn_v3/shaders/gbuf_deferred.wgsl`
  - Read: `let dif = unpack4x8unorm(t1.z).z;` ← from t1.z byte 2

- [x] `cnn_v3/shaders/gbuf_view.wgsl`
  - Revert to 4×5 grid (ROWS = 5.0)
  - Guard: `ch >= 20u`
  - ch18 label: `dif` (4 chars: 0x64696600)
  - ch19 label: `trns` (unchanged)
  - Remove row-5 cases (20u, 21u, default→dif.b)
  - Revert `else if (comp_idx == 2u)` → `else` (drop t1.w branch)
  - Update header comment

- [x] `cnn_v3/shaders/cnn_v3_enc0.wgsl`
  - Verify `load_feat()`: g = unpack4x8unorm(t1.z) → g.z = ch18 = dif ✓ (no change needed)

### Step 2 — Python training ✅

- [x] `cnn_v3/training/cnn_v3_utils.py`
  - Added `oct_decode()` helper and `_KEY_LIGHT` constant
  - `assemble_features()`: ch18 = `dif` computed on-the-fly
  - Replace `shadow[..., None]` with `dif[..., None]` at index 18
  - `CONTEXT_CHANNELS = [8, 18, 19]` — same indices, updated comment

- [ ] `cnn_v3/training/pack_blender_sample.py`
  - Optional: save `dif.png` (precomputed) alongside existing passes
  - Not strictly required if utils.py computes on-the-fly

### Step 3 — Web tool ✅

- [x] `cnn_v3/tools/shaders.js` (FULL_PACK_SHADER)
  - Add `oct_decode` inline (or inline the math)
  - Compute `let dif = max(0., dot(oct_decode(nrm), vec3f(0.408, 0.816, 0.408))) * shd`
  - Pack: t1.z = `pack4x8unorm(vec4f(m2.g, m2.b, dif, trp))`
  - t1.w = `0u`

### Step 4 — Test vectors

- [ ] Re-run `cnn_v3/training/gen_test_vectors.py` to regenerate `test_vectors.h`
  - ch18 value changes (dif ≠ shadow in general); old vectors are invalid
  - Parity threshold (4.88e-4) should be unchanged

### Step 5 — Docs ✅

- [x] `cnn_v3/docs/CNN_V3.md` — feature table, pack pseudo-code, simple-mode defaults, CONTEXT_CHANNELS comment
- [x] `cnn_v3/docs/HOWTO.md` — outputs description, channel table, dropout comment, FULL_PACK_SHADER description
- [x] This file: all steps marked complete

---

## Architecture Impact

| Dimension | Before | After |
|---|---|---|
| Channel count | 20 | 20 ✅ |
| Weight shapes | Conv(20→4, ...) | Conv(20→4, ...) ✅ |
| Total f16 weights | 1964 | 1964 ✅ |
| Training data regen | — | Not required ✅ |
| Parity test vectors | Valid | Must regenerate ❌ |
| Existing trained weights | Valid | Invalidated (ch18 distribution changes) ❌ |

No real training pass has occurred yet, so weight invalidation is not a concern.