From fb13e67acbc7d7dd2974a456fcb134966c47cee0 Mon Sep 17 00:00:00 2001
From: skal <pascal.massimino@gmail.com>
Date: Fri, 27 Mar 2026 07:59:00 +0100
Subject: fix(cnn_v3): remove dec0 ReLU, load FiLM MLP at runtime
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Two bugs blocking training convergence:

1. dec0 ReLU before sigmoid constrained output to [0.5,1.0] — network
   could never produce dark pixels. Removed F.relu in train_cnn_v3.py
   and max(0,…) in cnn_v3_dec0.wgsl. Test vectors regenerated.

2. set_film_params() used hardcoded heuristics instead of the trained MLP.
   Added CNNv3FilmMlp struct + load_film_mlp() to cnn_v3_effect.h/.cc.
   MLP auto-loaded from ASSET_WEIGHTS_CNN_V3_FILM_MLP at construction;
   Linear(5→16)→ReLU→Linear(16→72) runs CPU-side each frame.

36/36 tests pass. Parity max_err=4.88e-4 unchanged.

handoff(Gemini): retrain from scratch — needs ≥50 samples (currently 11).
See cnn_v3/docs/HOWTO.md §2-3.
---
 cnn_v3/docs/HOWTO.md | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

(limited to 'cnn_v3/docs/HOWTO.md')

diff --git a/cnn_v3/docs/HOWTO.md b/cnn_v3/docs/HOWTO.md
index ff8793f..67f7931 100644
--- a/cnn_v3/docs/HOWTO.md
+++ b/cnn_v3/docs/HOWTO.md
@@ -371,7 +371,9 @@ cnn_v3_effect->set_film_params(
     style_p0, style_p1);
 ```
 
-FiLM γ/β default to identity (γ=1, β=0) until `train_cnn_v3.py` produces a trained MLP.
+FiLM MLP weights are auto-loaded from `ASSET_WEIGHTS_CNN_V3_FILM_MLP` at construction.
+The MLP forward pass (`Linear(5→16)→ReLU→Linear(16→72)`) runs CPU-side in `set_film_params()`.
+Falls back to identity (γ=1, β=0) if no `.bin` is present.
 
 ---
 
@@ -407,6 +409,7 @@ Test vectors generated by `cnn_v3/training/gen_test_vectors.py` (PyTorch referen
 | 7 — G-buffer visualizer (C++) | ✅ Done | GBufViewEffect, 36/36 tests pass |
 | 8 — Architecture upgrade [8,16] | ✅ Done | enc_channels=[8,16], multi-scale loss, 16ch textures split into lo/hi pairs |
 | 7 — Sample loader (web tool) | ✅ Done | "Load sample directory" in cnn_v3/tools/ |
+| 9 — Training bug fixes | ✅ Done | dec0 ReLU removed (output unblocked); FiLM MLP loaded at runtime |
 
 ---
 
@@ -428,7 +431,8 @@ The common snippet provides `get_w()` and `unpack_8ch()`.
 - AvgPool 2×2 for downsampling (exact, deterministic)
 - Nearest-neighbor for upsampling (integer `coord / 2`)
 - Skip connections: channel concatenation (not add)
-- FiLM applied after conv+bias, before ReLU: `max(0, γ·x + β)`
+- FiLM applied after conv+bias, before ReLU: `max(0, γ·x + β)` (enc0/enc1/dec1)
+- dec0 final layer: FiLM then sigmoid directly — **no ReLU** (`sigmoid(γ·x + β)`)
 - No batch norm at inference
 - Weight layout: OIHW (out × in × kH × kW), biases after conv weights
 
-- 
cgit v1.2.3