1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
|
// cnn_v3/docs/GBUF_DIF_MIGRATION.md
// Plan: replace G-buffer shadow channel with dif (diffuse × shadow)
// Status: IN PROGRESS — current commit is intermediate state, see §Current State
# G-Buffer `shadow` → `dif` Migration Plan
## Motivation
The raw `shadow` channel (ch18) is less informative than `dif = max(0, dot(normal, light_dir)) * shadow`
because `shadow` alone ignores the diffuse Lambert term. The CNN learns better when it receives
the pre-multiplied occluded diffuse signal directly. `albedo` is already in ch0–2, so the CNN
can reconstruct the full shaded color as `albedo * (ambient + dif)`.
## Design Decision
**Replace ch18 (`shadow`) with ch18 (`dif`) in-place. Channel count stays 20.**
- `dif` is a scalar: `max(0, dot(normal, KEY_LIGHT)) * shadow`
- KEY_LIGHT = normalize(1, 2, 1) = (0.408, 0.816, 0.408) — matches `gbuf_deferred.wgsl`
- Stored at the same position (t1.z byte 2) → no weight shape change
- `transp` stays at ch19 (t1.z byte 3)
- t1.w reverts to 0 (spare)
### Feature layout (20 channels, unchanged count)
| ch | name | type | range | source |
|----|----------|--------|----------|----------------|
| 0 | alb.r | f16 | [0,1] | feat_tex0.x lo |
| 1 | alb.g | f16 | [0,1] | feat_tex0.x hi |
| 2 | alb.b | f16 | [0,1] | feat_tex0.y lo |
| 3 | nrm.x | f16 | [-1,1] | feat_tex0.y hi |
| 4 | nrm.y | f16 | [-1,1] | feat_tex0.z lo |
| 5 | depth | f16 | [0,1] | feat_tex0.z hi |
| 6 | dzdx | f16 | (signed) | feat_tex0.w lo |
| 7 | dzdy | f16 | (signed) | feat_tex0.w hi |
| 8 | mat_id | u8 | [0,1] | feat_tex1.x[0] |
| 9 | prev.r | u8 | [0,1] | feat_tex1.x[1] |
| 10 | prev.g | u8 | [0,1] | feat_tex1.x[2] |
| 11 | prev.b | u8 | [0,1] | feat_tex1.x[3] |
| 12 | mip1.r | u8 | [0,1] | feat_tex1.y[0] |
| 13 | mip1.g | u8 | [0,1] | feat_tex1.y[1] |
| 14 | mip1.b | u8 | [0,1] | feat_tex1.y[2] |
| 15 | mip2.r | u8 | [0,1] | feat_tex1.y[3] |
| 16 | mip2.g | u8 | [0,1] | feat_tex1.z[0] |
| 17 | mip2.b | u8 | [0,1] | feat_tex1.z[1] |
| 18 | **dif** | u8 | [0,1] | feat_tex1.z[2] ← was shadow |
| 19 | transp | u8 | [0,1] | feat_tex1.z[3] |
---
## Current State (intermediate — needs fixing)
The commit tagged `wip(cnn_v3): shadow→dif intermediate` contains partial work.
The WGSL changes are **incorrect** — `dif` is redundantly stored in t1.w (3×) and
`shadow` was dropped from t1.z without putting `dif` in its place.
### What is wrong
| File | Problem |
|---|---|
| `gbuf_pack.wgsl` | t1.z = `mip2.g\|mip2.b\|transp\|spare` (shadow removed, dif not put there); t1.w = `dif\|dif\|dif\|spare` (redundant) |
| `gbuf_deferred.wgsl` | reads `dif` from `t1.w.x` — should be `t1.z.z` |
| `gbuf_view.wgsl` | expanded to 4×6 grid with ch20–22 as dif.rgb — should stay 4×5, ch18=dif |
---
## Implementation Checklist
### Step 1 — Fix WGSL (correct the in-place swap)
- [ ] `cnn_v3/shaders/gbuf_pack.wgsl`
- t1.z: `pack4x8unorm(vec4f(mip2.g, mip2.b, dif, transp))` ← dif at byte 2
- t1.w: `0u` ← revert to spare
- Remove comment line about t1.w dif
- [ ] `cnn_v3/shaders/gbuf_deferred.wgsl`
- Read: `let dif = unpack4x8unorm(t1.z).z;` ← from t1.z byte 2
- [ ] `cnn_v3/shaders/gbuf_view.wgsl`
- Revert to 4×5 grid (ROWS = 5.0)
- Guard: `ch >= 20u`
- ch18 label: `dif` (4 chars: 0x64696600)
- ch19 label: `trns` (unchanged)
- Remove row-5 cases (20u, 21u, default→dif.b)
- Revert `else if (comp_idx == 2u)` → `else` (drop t1.w branch)
- Update header comment
- [ ] `cnn_v3/shaders/cnn_v3_enc0.wgsl`
- Verify `load_feat()`: g = unpack4x8unorm(t1.z) → g.z = ch18 = dif ✓ (no change needed)
### Step 2 — Python training
- [ ] `cnn_v3/training/cnn_v3_utils.py`
- `assemble_features()`: ch18 = `dif` computed on-the-fly:
```python
KEY_LIGHT = np.array([0.408, 0.816, 0.408])
nor3 = oct_decode(normal) # (H,W,2) → (H,W,3)
diffuse = np.maximum(0, (nor3 * KEY_LIGHT).sum(-1))
dif = diffuse * shadow # (H,W)
```
- Replace `shadow[..., None]` with `dif[..., None]` at index 18
- `CONTEXT_CHANNELS = [8, 18, 19]` — same indices, update comment
- [ ] `cnn_v3/training/pack_blender_sample.py`
- Optional: save `dif.png` (precomputed) alongside existing passes
- Not strictly required if utils.py computes on-the-fly
### Step 3 — Web tool
- [ ] `cnn_v3/tools/shaders.js` (FULL_PACK_SHADER)
- Add `oct_decode` inline (or inline the math)
- Compute `let dif = max(0., dot(oct_decode(nrm), vec3f(0.408, 0.816, 0.408))) * shd`
- Pack: t1.z = `pack4x8unorm(vec4f(m2.g, m2.b, dif, trp))`
- t1.w = `0u`
### Step 4 — Test vectors
- [ ] Re-run `cnn_v3/training/gen_test_vectors.py` to regenerate `test_vectors.h`
- ch18 value changes (dif ≠ shadow in general); old vectors are invalid
- Parity threshold (4.88e-4) should be unchanged
### Step 5 — Docs
- [ ] `cnn_v3/docs/CNN_V3.md` — update feature table (ch18 shadow → dif)
- [ ] `cnn_v3/docs/HOWTO.md` — §7 channel table, §3 pass-2 note
- [ ] This file: mark steps complete as they land
---
## Architecture Impact
| Dimension | Before | After |
|---|---|---|
| Channel count | 20 | 20 ✅ |
| Weight shapes | Conv(20→4, ...) | Conv(20→4, ...) ✅ |
| Total f16 weights | 1964 | 1964 ✅ |
| Training data regen | — | Not required ✅ |
| Parity test vectors | Valid | Must regenerate ❌ |
| Existing trained weights | Valid | Invalidated (ch18 distribution changes) ❌ |
No real training pass has occurred yet, so weight invalidation is not a concern.
|