summaryrefslogtreecommitdiff
path: root/cnn_v3/docs
diff options
context:
space:
mode:
Diffstat (limited to 'cnn_v3/docs')
-rw-r--r--cnn_v3/docs/CNN_V3.md8
-rw-r--r--cnn_v3/docs/HOWTO.md8
2 files changed, 12 insertions, 4 deletions
diff --git a/cnn_v3/docs/CNN_V3.md b/cnn_v3/docs/CNN_V3.md
index a197a1d..081adf8 100644
--- a/cnn_v3/docs/CNN_V3.md
+++ b/cnn_v3/docs/CNN_V3.md
@@ -19,7 +19,7 @@ CNN v3 is a next-generation post-processing effect using:
- Training from both Blender renders and real photos
- Strict test framework: per-pixel bit-exact validation across all implementations
-**Status:** Phases 1–7 complete. Architecture upgraded to enc_channels=[8,16] for improved capacity. Parity test and runtime updated. Next: training pass.
+**Status:** Phases 1–9 complete. Architecture upgraded to enc_channels=[8,16]. Two training bugs fixed (dec0 ReLU removed; FiLM MLP loaded at runtime). Parity validated. Next: retrain from scratch with more data.
---
@@ -34,9 +34,13 @@ FiLM is applied **inside each encoder/decoder block**, after each convolution.
### U-Net Block (per level)
```
-input → Conv 3×3 → BN (or none) → FiLM(γ,β) → ReLU → output
+enc0/enc1/dec1: input → Conv 3×3 → FiLM(γ,β) → ReLU → output
+dec0 (final): input → Conv 3×3 → FiLM(γ,β) → Sigmoid → output
```
+The final decoder layer uses sigmoid directly — **no ReLU** — so the network
+can output the full [0,1] range. ReLU before sigmoid would clamp to [0.5,1.0].
+
FiLM at level `l`:
```
FiLM(x, γ_l, β_l) = γ_l ⊙ x + β_l (per-channel affine)
diff --git a/cnn_v3/docs/HOWTO.md b/cnn_v3/docs/HOWTO.md
index ff8793f..67f7931 100644
--- a/cnn_v3/docs/HOWTO.md
+++ b/cnn_v3/docs/HOWTO.md
@@ -371,7 +371,9 @@ cnn_v3_effect->set_film_params(
style_p0, style_p1);
```
-FiLM γ/β default to identity (γ=1, β=0) until `train_cnn_v3.py` produces a trained MLP.
+FiLM MLP weights are auto-loaded from `ASSET_WEIGHTS_CNN_V3_FILM_MLP` at construction.
+The MLP forward pass (`Linear(5→16)→ReLU→Linear(16→72)`) runs CPU-side in `set_film_params()`.
+Falls back to identity (γ=1, β=0) if no `.bin` is present.
---
@@ -407,6 +409,7 @@ Test vectors generated by `cnn_v3/training/gen_test_vectors.py` (PyTorch referen
| 7 — G-buffer visualizer (C++) | ✅ Done | GBufViewEffect, 36/36 tests pass |
| 8 — Architecture upgrade [8,16] | ✅ Done | enc_channels=[8,16], multi-scale loss, 16ch textures split into lo/hi pairs |
| 7 — Sample loader (web tool) | ✅ Done | "Load sample directory" in cnn_v3/tools/ |
+| 9 — Training bug fixes | ✅ Done | dec0 ReLU removed (output unblocked); FiLM MLP loaded at runtime |
---
@@ -428,7 +431,8 @@ The common snippet provides `get_w()` and `unpack_8ch()`.
- AvgPool 2×2 for downsampling (exact, deterministic)
- Nearest-neighbor for upsampling (integer `coord / 2`)
- Skip connections: channel concatenation (not add)
-- FiLM applied after conv+bias, before ReLU: `max(0, γ·x + β)`
+- FiLM applied after conv+bias, before ReLU: `max(0, γ·x + β)` (enc0/enc1/dec1)
+- dec0 final layer: FiLM then sigmoid directly — **no ReLU** (`sigmoid(γ·x + β)`)
- No batch norm at inference
- Weight layout: OIHW (out × in × kH × kW), biases after conv weights