diff options
| author | skal <pascal.massimino@gmail.com> | 2026-03-25 10:05:42 +0100 |
|---|---|---|
| committer | skal <pascal.massimino@gmail.com> | 2026-03-25 10:05:42 +0100 |
| commit | ce6e5b99f26e4e7c69a3cacf360bd0d492de928c (patch) | |
| tree | a8d64b33a7ea1109b6b7e1043ced946cac416756 /cnn_v3/docs/HOW_TO_CNN.md | |
| parent | 8b4d7a49f038d7e849e6764dcc3abd1e1be01061 (diff) | |
feat(cnn_v3): 3×3 dilated bottleneck + Sobel loss + FiLM warmup + architecture PNG
- Replace 1×1 pointwise bottleneck with Conv(8→8, 3×3, dilation=2):
effective RF grows from ~13px to ~29px at ¼res (~+1 KB weights)
- Add Sobel edge loss in training (--edge-loss-weight, default 0.1)
- Add FiLM 2-phase training: freeze MLP for warmup epochs then
unfreeze at lr×0.1 (--film-warmup-epochs, default 50)
- Update weight layout: BN 72→584 f16, total 1964→2476 f16 (4952 B)
- Cascade offsets in C++ effect, JS tool, export/gen_test_vectors scripts
- Regenerate test_vectors.h (1238 u32); parity max_err=9.77e-04
- Generate dark-theme U-Net+FiLM architecture PNG (gen_architecture_png.py)
- Replace ASCII art in CNN_V3.md and HOW_TO_CNN.md with PNG embed
handoff(Gemini): bottleneck dilation + Sobel loss + FiLM warmup landed.
Next: run first real training pass (see cnn_v3/docs/HOWTO.md §3).
Diffstat (limited to 'cnn_v3/docs/HOW_TO_CNN.md')
| -rw-r--r-- | cnn_v3/docs/HOW_TO_CNN.md | 41 |
1 files changed, 17 insertions, 24 deletions
diff --git a/cnn_v3/docs/HOW_TO_CNN.md b/cnn_v3/docs/HOW_TO_CNN.md index f5f1b1a..09db97c 100644 --- a/cnn_v3/docs/HOW_TO_CNN.md +++ b/cnn_v3/docs/HOW_TO_CNN.md @@ -28,26 +28,13 @@ CNN v3 is a 2-level U-Net with FiLM conditioning, designed to run in real-time a **Architecture:** -``` -Input: 20-channel G-buffer feature textures (rgba32uint) - │ - enc0 ──── Conv(20→4, 3×3) + FiLM + ReLU ┐ full res - │ ↘ skip │ - enc1 ──── AvgPool2×2 + Conv(4→8, 3×3) + FiLM ┐ ½ res - │ ↘ skip │ - bottleneck AvgPool2×2 + Conv(8→8, 1×1) + ReLU ¼ res (no FiLM) - │ │ - dec1 ←── upsample×2 + cat(enc1 skip) + Conv(16→4, 3×3) + FiLM - │ │ ½ res - dec0 ←── upsample×2 + cat(enc0 skip) + Conv(8→4, 3×3) + FiLM + sigmoid - full res → RGBA output -``` + **FiLM MLP:** `Linear(5→16) → ReLU → Linear(16→40)` trained jointly with U-Net. - Input: `[beat_phase, beat_norm, audio_intensity, style_p0, style_p1]` - Output: 40 γ/β values controlling style across all 4 FiLM layers -**Weight budget:** ~3.9 KB f16 (fits ≤6 KB target) +**Weight budget:** ~4.84 KB f16 conv (fits ≤6 KB target) **Two data paths:** - **Simple mode** — real photos with zeroed geometric channels (normal, depth, matid) @@ -307,7 +294,9 @@ uv run train_cnn_v3.py --input dataset/ --epochs 1 \ uv run train_cnn_v3.py \ --input dataset/ \ --input-mode simple \ - --epochs 200 + --epochs 200 \ + --edge-loss-weight 0.1 \ + --film-warmup-epochs 50 ``` **Blender G-buffer training:** @@ -315,7 +304,9 @@ uv run train_cnn_v3.py \ uv run train_cnn_v3.py \ --input dataset/ \ --input-mode full \ - --epochs 200 + --epochs 200 \ + --edge-loss-weight 0.1 \ + --film-warmup-epochs 50 ``` **Full-image mode (better global coherence, slower):** @@ -360,12 +351,14 @@ uv run train_cnn_v3.py \ | `--checkpoint-dir DIR` | `checkpoints/` | Set per-experiment | | `--checkpoint-every N` | 50 | 0 to disable intermediate checkpoints | | `--resume [CKPT]` | — | Resume from checkpoint path; if path missing, uses latest in `--checkpoint-dir` | +| `--edge-loss-weight F` | 0.1 | Sobel gradient loss weight alongside MSE; improves style/edge capture; 0=MSE only | +| `--film-warmup-epochs N` | 50 | Freeze FiLM MLP for first N epochs (phase-1), then unfreeze at lr×0.1; 0=joint training | ### Architecture at startup The model prints its parameter count: ``` -Model: enc=[4, 8] film_cond_dim=5 params=2740 (~5.4 KB f16) +Model: enc=[4, 8] film_cond_dim=5 params=3252 (~6.4 KB f16) ``` If `params` is much higher, `--enc-channels` was changed; update C++ constants accordingly. @@ -489,7 +482,7 @@ Use `--html-output PATH` to write to a different `weights.js` location. Output files are registered in `workspaces/main/assets.txt` as: ``` -WEIGHTS_CNN_V3, BINARY, weights/cnn_v3_weights.bin, "CNN v3 conv weights (f16, 3928 bytes)" +WEIGHTS_CNN_V3, BINARY, weights/cnn_v3_weights.bin, "CNN v3 conv weights (f16, 4952 bytes)" WEIGHTS_CNN_V3_FILM_MLP, BINARY, weights/cnn_v3_film_mlp.bin, "CNN v3 FiLM MLP weights (f32, 3104 bytes)" ``` @@ -501,10 +494,10 @@ WEIGHTS_CNN_V3_FILM_MLP, BINARY, weights/cnn_v3_film_mlp.bin, "CNN v3 FiLM MLP w |-------|-----------|-------| | enc0 Conv(20→4,3×3)+bias | 724 | — | | enc1 Conv(4→8,3×3)+bias | 296 | — | -| bottleneck Conv(8→8,1×1)+bias | 72 | — | +| bottleneck Conv(8→8,3×3,dil=2)+bias | 584 | — | | dec1 Conv(16→4,3×3)+bias | 580 | — | | dec0 Conv(8→4,3×3)+bias | 292 | — | -| **Total** | **1964 f16** | **3928 bytes** | +| **Total** | **2476 f16** | **4952 bytes** | **`cnn_v3_film_mlp.bin`** — FiLM MLP weights as raw f32, row-major: @@ -534,8 +527,8 @@ Checkpoint: epoch=200 loss=0.012345 enc_channels=[4, 8] film_cond_dim=5 cnn_v3_weights.bin - 1964 f16 values → 982 u32 → 3928 bytes - Upload via CNNv3Effect::upload_weights(queue, data, 3928) + 2476 f16 values → 1238 u32 → 4952 bytes + Upload via CNNv3Effect::upload_weights(queue, data, 4952) cnn_v3_film_mlp.bin L0: weight (16, 5) + bias (16,) @@ -824,7 +817,7 @@ all geometric channels (normal, depth, depth_grad, mat_id, prev) = 0. ### Pitfalls - `rgba32uint` and `rgba16float` textures both need `STORAGE_BINDING | TEXTURE_BINDING` usage. -- Weight offsets are **f16 indices** (enc0=0, enc1=724, bn=1020, dec1=1092, dec0=1672). +- Weight offsets are **f16 indices** (enc0=0, enc1=724, bn=1020, dec1=1604, dec0=2184). - Uniform buffer layouts must match WGSL `Params` structs exactly (padding included). --- |
