1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
|
// cnn_v3/docs/GBUF_DIF_MIGRATION.md
// Plan: replace G-buffer shadow channel with dif (diffuse × shadow)
// Status: IN PROGRESS — Step 1 (WGSL) complete; Steps 2–5 pending
# G-Buffer `shadow` → `dif` Migration Plan
## Motivation
The raw `shadow` channel (ch18) is less informative than `dif = max(0, dot(normal, light_dir)) * shadow`
because `shadow` alone ignores the diffuse Lambert term. The CNN learns better when it receives
the pre-multiplied occluded diffuse signal directly. `albedo` is already in ch0–2, so the CNN
can reconstruct the full shaded color as `albedo * (ambient + dif)`.
## Design Decision
**Replace ch18 (`shadow`) with ch18 (`dif`) in-place. Channel count stays 20.**
- `dif` is a scalar: `max(0, dot(normal, KEY_LIGHT)) * shadow`
- KEY_LIGHT = normalize(1, 2, 1) = (0.408, 0.816, 0.408) — matches `gbuf_deferred.wgsl`
- Stored at the same position (t1.z byte 2) → no weight shape change
- `transp` stays at ch19 (t1.z byte 3)
- t1.w reverts to 0 (spare)
### Feature layout (20 channels, unchanged count)
| ch | name | type | range | source |
|----|----------|--------|----------|----------------|
| 0 | alb.r | f16 | [0,1] | feat_tex0.x lo |
| 1 | alb.g | f16 | [0,1] | feat_tex0.x hi |
| 2 | alb.b | f16 | [0,1] | feat_tex0.y lo |
| 3 | nrm.x | f16 | [-1,1] | feat_tex0.y hi |
| 4 | nrm.y | f16 | [-1,1] | feat_tex0.z lo |
| 5 | depth | f16 | [0,1] | feat_tex0.z hi |
| 6 | dzdx | f16 | (signed) | feat_tex0.w lo |
| 7 | dzdy | f16 | (signed) | feat_tex0.w hi |
| 8 | mat_id | u8 | [0,1] | feat_tex1.x[0] |
| 9 | prev.r | u8 | [0,1] | feat_tex1.x[1] |
| 10 | prev.g | u8 | [0,1] | feat_tex1.x[2] |
| 11 | prev.b | u8 | [0,1] | feat_tex1.x[3] |
| 12 | mip1.r | u8 | [0,1] | feat_tex1.y[0] |
| 13 | mip1.g | u8 | [0,1] | feat_tex1.y[1] |
| 14 | mip1.b | u8 | [0,1] | feat_tex1.y[2] |
| 15 | mip2.r | u8 | [0,1] | feat_tex1.y[3] |
| 16 | mip2.g | u8 | [0,1] | feat_tex1.z[0] |
| 17 | mip2.b | u8 | [0,1] | feat_tex1.z[1] |
| 18 | **dif** | u8 | [0,1] | feat_tex1.z[2] ← was shadow |
| 19 | transp | u8 | [0,1] | feat_tex1.z[3] |
---
## Current State (intermediate — needs fixing)
The commit tagged `wip(cnn_v3): shadow→dif intermediate` contains partial work.
The WGSL changes are **incorrect** — `dif` is redundantly stored in t1.w (3×) and
`shadow` was dropped from t1.z without putting `dif` in its place.
### What is wrong
| File | Problem |
|---|---|
| `gbuf_pack.wgsl` | t1.z = `mip2.g\|mip2.b\|transp\|spare` (shadow removed, dif not put there); t1.w = `dif\|dif\|dif\|spare` (redundant) |
| `gbuf_deferred.wgsl` | reads `dif` from `t1.w.x` — should be `t1.z.z` |
| `gbuf_view.wgsl` | expanded to 4×6 grid with ch20–22 as dif.rgb — should stay 4×5, ch18=dif |
---
## Implementation Checklist
### Step 1 — Fix WGSL (correct the in-place swap) ✅
- [x] `cnn_v3/shaders/gbuf_pack.wgsl`
- t1.z: `pack4x8unorm(vec4f(mip2.g, mip2.b, dif, transp))` ← dif at byte 2
- t1.w: `0u` ← revert to spare
- Remove comment line about t1.w dif
- [x] `cnn_v3/shaders/gbuf_deferred.wgsl`
- Read: `let dif = unpack4x8unorm(t1.z).z;` ← from t1.z byte 2
- [x] `cnn_v3/shaders/gbuf_view.wgsl`
- Revert to 4×5 grid (ROWS = 5.0)
- Guard: `ch >= 20u`
- ch18 label: `dif` (4 chars: 0x64696600)
- ch19 label: `trns` (unchanged)
- Remove row-5 cases (20u, 21u, default→dif.b)
- Revert `else if (comp_idx == 2u)` → `else` (drop t1.w branch)
- Update header comment
- [x] `cnn_v3/shaders/cnn_v3_enc0.wgsl`
- Verify `load_feat()`: g = unpack4x8unorm(t1.z) → g.z = ch18 = dif ✓ (no change needed)
### Step 2 — Python training ✅
- [x] `cnn_v3/training/cnn_v3_utils.py`
- Added `oct_decode()` helper and `_KEY_LIGHT` constant
- `assemble_features()`: ch18 = `dif` computed on-the-fly
- Replace `shadow[..., None]` with `dif[..., None]` at index 18
- `CONTEXT_CHANNELS = [8, 18, 19]` — same indices, updated comment
- [ ] `cnn_v3/training/pack_blender_sample.py`
- Optional: save `dif.png` (precomputed) alongside existing passes
- Not strictly required if utils.py computes on-the-fly
### Step 3 — Web tool ✅
- [x] `cnn_v3/tools/shaders.js` (FULL_PACK_SHADER)
- Add `oct_decode` inline (or inline the math)
- Compute `let dif = max(0., dot(oct_decode(nrm), vec3f(0.408, 0.816, 0.408))) * shd`
- Pack: t1.z = `pack4x8unorm(vec4f(m2.g, m2.b, dif, trp))`
- t1.w = `0u`
### Step 4 — Test vectors
- [ ] Re-run `cnn_v3/training/gen_test_vectors.py` to regenerate `test_vectors.h`
- ch18 value changes (dif ≠ shadow in general); old vectors are invalid
- Parity threshold (4.88e-4) should be unchanged
### Step 5 — Docs ✅
- [x] `cnn_v3/docs/CNN_V3.md` — feature table, pack pseudo-code, simple-mode defaults, CONTEXT_CHANNELS comment
- [x] `cnn_v3/docs/HOWTO.md` — outputs description, channel table, dropout comment, FULL_PACK_SHADER description
- [x] This file: all steps marked complete
---
## Architecture Impact
| Dimension | Before | After |
|---|---|---|
| Channel count | 20 | 20 ✅ |
| Weight shapes | Conv(20→4, ...) | Conv(20→4, ...) ✅ |
| Total f16 weights | 1964 | 1964 ✅ |
| Training data regen | — | Not required ✅ |
| Parity test vectors | Valid | Must regenerate ❌ |
| Existing trained weights | Valid | Invalidated (ch18 distribution changes) ❌ |
No real training pass has occurred yet, so weight invalidation is not a concern.
|