fix(cnn_v3): remove dec0 ReLU, load FiLM MLP at runtime

Two bugs blocking training convergence: 1. dec0 ReLU before sigmoid constrained output to [0.5,1.0] — network could never produce dark pixels. Removed F.relu in train_cnn_v3.py and max(0,…) in cnn_v3_dec0.wgsl. Test vectors regenerated. 2. set_film_params() used hardcoded heuristics instead of the trained MLP. Added CNNv3FilmMlp struct + load_film_mlp() to cnn_v3_effect.h/.cc. MLP auto-loaded from ASSET_WEIGHTS_CNN_V3_FILM_MLP at construction; Linear(5→16)→ReLU→Linear(16→72) runs CPU-side each frame. 36/36 tests pass. Parity max_err=4.88e-4 unchanged. handoff(Gemini): retrain from scratch — needs ≥50 samples (currently 11). See cnn_v3/docs/HOWTO.md §2-3.
author: skal <pascal.massimino@gmail.com> 2026-03-27 07:59:00 +0100
committer: skal <pascal.massimino@gmail.com> 2026-03-27 07:59:00 +0100
commit: fb13e67acbc7d7dd2974a456fcb134966c47cee0 (patch)
tree: 8dd1c6df371b0ee046792680a14c8bcb3c36510b /cnn_v3/docs/CNN_V3.md
parent: 8c5e41724fdfc3be24e95f48ae4b2be616404074 (diff)
1 files changed, 6 insertions, 2 deletions
diff --git a/cnn_v3/docs/CNN_V3.md b/cnn_v3/docs/CNN_V3.md
index a197a1d..081adf8 100644
--- a/cnn_v3/docs/CNN_V3.md
+++ b/cnn_v3/docs/CNN_V3.md
@@ -19,7 +19,7 @@ CNN v3 is a next-generation post-processing effect using:
 - Training from both Blender renders and real photos
 - Strict test framework: per-pixel bit-exact validation across all implementations
 
-**Status:** Phases 1–7 complete. Architecture upgraded to enc_channels=[8,16] for improved capacity. Parity test and runtime updated. Next: training pass.
+**Status:** Phases 1–9 complete. Architecture upgraded to enc_channels=[8,16]. Two training bugs fixed (dec0 ReLU removed; FiLM MLP loaded at runtime). Parity validated. Next: retrain from scratch with more data.
 
 ---
 
@@ -34,9 +34,13 @@ FiLM is applied **inside each encoder/decoder block**, after each convolution.
 ### U-Net Block (per level)
 
 ```
-input → Conv 3×3 → BN (or none) → FiLM(γ,β) → ReLU → output
+enc0/enc1/dec1:  input → Conv 3×3 → FiLM(γ,β) → ReLU → output
+dec0 (final):    input → Conv 3×3 → FiLM(γ,β) → Sigmoid → output
 ```
 
+The final decoder layer uses sigmoid directly — **no ReLU** — so the network
+can output the full [0,1] range. ReLU before sigmoid would clamp to [0.5,1.0].
+
 FiLM at level `l`:
 ```
 FiLM(x, γ_l, β_l) = γ_l ⊙ x + β_l      (per-channel affine)
author	skal <pascal.massimino@gmail.com>	2026-03-27 07:59:00 +0100
committer	skal <pascal.massimino@gmail.com>	2026-03-27 07:59:00 +0100
commit	fb13e67acbc7d7dd2974a456fcb134966c47cee0 (patch)
tree	8dd1c6df371b0ee046792680a14c8bcb3c36510b /cnn_v3/docs/CNN_V3.md
parent	8c5e41724fdfc3be24e95f48ae4b2be616404074 (diff)