feat(cnn_v3): 3×3 dilated bottleneck + Sobel loss + FiLM warmup + architecture PNG

- Replace 1×1 pointwise bottleneck with Conv(8→8, 3×3, dilation=2): effective RF grows from ~13px to ~29px at ¼res (~+1 KB weights) - Add Sobel edge loss in training (--edge-loss-weight, default 0.1) - Add FiLM 2-phase training: freeze MLP for warmup epochs then unfreeze at lr×0.1 (--film-warmup-epochs, default 50) - Update weight layout: BN 72→584 f16, total 1964→2476 f16 (4952 B) - Cascade offsets in C++ effect, JS tool, export/gen_test_vectors scripts - Regenerate test_vectors.h (1238 u32); parity max_err=9.77e-04 - Generate dark-theme U-Net+FiLM architecture PNG (gen_architecture_png.py) - Replace ASCII art in CNN_V3.md and HOW_TO_CNN.md with PNG embed handoff(Gemini): bottleneck dilation + Sobel loss + FiLM warmup landed. Next: run first real training pass (see cnn_v3/docs/HOWTO.md §3).
author: skal <pascal.massimino@gmail.com> 2026-03-25 10:05:42 +0100
committer: skal <pascal.massimino@gmail.com> 2026-03-25 10:05:42 +0100
commit: ce6e5b99f26e4e7c69a3cacf360bd0d492de928c (patch)
tree: a8d64b33a7ea1109b6b7e1043ced946cac416756 /cnn_v3/docs/CNN_V3.md
parent: 8b4d7a49f038d7e849e6764dcc3abd1e1be01061 (diff)
1 files changed, 6 insertions, 32 deletions
diff --git a/cnn_v3/docs/CNN_V3.md b/cnn_v3/docs/CNN_V3.md
index 4d58811..d775e2b 100644
--- a/cnn_v3/docs/CNN_V3.md
+++ b/cnn_v3/docs/CNN_V3.md
@@ -27,33 +27,7 @@ CNN v3 is a next-generation post-processing effect using:
 
 ### Pipeline Overview
 
-```
-G-Buffer (albedo, normal, depth, matID, UV)
-        │
-        ▼
-  FiLM Conditioning
-  (beat_time, audio_intensity, style_params)
-        │ → γ[], β[] per channel
-        ▼
-  U-Net
-  ┌─────────────────────────────────────────┐
-  │  Encoder                                │
-  │  enc0 (H×W, 4ch) ────────────skip──────┤
-  │  ↓ down (avg pool 2×2)                  │
-  │  enc1 (H/2×W/2, 8ch) ────────skip──────┤
-  │  ↓ down                                 │
-  │  bottleneck (H/4×W/4, 8ch)             │
-  │                                         │
-  │  Decoder                                │
-  │  ↑ up (nearest ×2) + skip enc1         │
-  │  dec1 (H/2×W/2, 4ch)                   │
-  │  ↑ up + skip enc0                       │
-  │  dec0 (H×W, 4ch)                        │
-  └─────────────────────────────────────────┘
-        │
-        ▼
-  output RGBA (H×W)
-```
+![CNN v3 U-Net + FiLM Architecture](cnn_v3_architecture.png)
 
 FiLM is applied **inside each encoder/decoder block**, after each convolution.
 
@@ -352,11 +326,11 @@ All f16, little-endian, same packing as v2 (`pack2x16float`).
 |-----------|---------|------|-----------|
 | enc0: Conv(20→4, 3×3) | 20×4×9=720 | +4 | 724 |
 | enc1: Conv(4→8, 3×3) | 4×8×9=288 | +8 | 296 |
-| bottleneck: Conv(8→8, 1×1) | 8×8×1=64 | +8 | 72 |
+| bottleneck: Conv(8→8, 3×3, dil=2) | 8×8×9=576 | +8 | 584 |
 | dec1: Conv(16→4, 3×3) | 16×4×9=576 | +4 | 580 |
 | dec0: Conv(8→4, 3×3) | 8×4×9=288 | +4 | 292 |
 | FiLM MLP (5→16→40) | 5×16+16×40=720 | +16+40 | 776 |
-| **Total** | | | **~3.9 KB f16** |
+| **Total conv** | | | **~4.84 KB f16** |
 
 Skip connections: dec1 input = 8ch (bottleneck) + 8ch (enc1 skip) = 16ch.
 dec0 input = 4ch (dec1) + 4ch (enc0 skip) = 8ch.
@@ -541,7 +515,7 @@ class CNNv3(nn.Module):
             nn.Conv2d(enc_channels[0], enc_channels[1], 3, padding=1),
         ])
         # Bottleneck
-        self.bottleneck = nn.Conv2d(enc_channels[1], enc_channels[1], 1)
+        self.bottleneck = nn.Conv2d(enc_channels[1], enc_channels[1], 3, padding=2, dilation=2)
         # Decoder (skip connections: concat → double channels)
         self.dec = nn.ModuleList([
             nn.Conv2d(enc_channels[1]*2, enc_channels[0], 3, padding=1),
@@ -709,7 +683,7 @@ Parity results:
 Pass 0: pack_gbuffer.wgsl         — assemble G-buffer channels into storage texture
 Pass 1: cnn_v3_enc0.wgsl          — encoder level 0 (20→4ch, 3×3)
 Pass 2: cnn_v3_enc1.wgsl          — encoder level 1 (4→8ch, 3×3) + downsample
-Pass 3: cnn_v3_bottleneck.wgsl    — bottleneck (8→8, 1×1)
+Pass 3: cnn_v3_bottleneck.wgsl    — bottleneck (8→8, 3×3, dilation=2)
 Pass 4: cnn_v3_dec1.wgsl          — decoder level 1: upsample + skip + (16→4, 3×3)
 Pass 5: cnn_v3_dec0.wgsl          — decoder level 0: upsample + skip + (8→4, 3×3)
 Pass 6: cnn_v3_output.wgsl        — sigmoid + composite to framebuffer
@@ -816,7 +790,7 @@ Status bar shows which channels are loaded.
 | `PACK_SHADER` | `STATIC_SHADER` | 20ch into feat_tex0 + feat_tex1 (rgba32uint each) |
 | `ENC0_SHADER` | part of `CNN_SHADER` | Conv(20→4, 3×3) + FiLM + ReLU; writes enc0_tex |
 | `ENC1_SHADER` | | Conv(4→8, 3×3) + FiLM + ReLU + avg_pool2×2; writes enc1_tex (half-res) |
-| `BOTTLENECK_SHADER` | | Conv(8→8, 1×1) + FiLM + ReLU; writes bn_tex |
+| `BOTTLENECK_SHADER` | | Conv(8→8, 3×3, dilation=2) + ReLU; writes bn_tex |
 | `DEC1_SHADER` | | nearest upsample×2 + concat(bn, enc1_skip) + Conv(16→4, 3×3) + FiLM + ReLU |
 | `DEC0_SHADER` | | nearest upsample×2 + concat(dec1, enc0_skip) + Conv(8→4, 3×3) + FiLM + ReLU |
 | `OUTPUT_SHADER` | | Conv(4→4, 1×1) + sigmoid → composites to canvas |
author	skal <pascal.massimino@gmail.com>	2026-03-25 10:05:42 +0100
committer	skal <pascal.massimino@gmail.com>	2026-03-25 10:05:42 +0100
commit	ce6e5b99f26e4e7c69a3cacf360bd0d492de928c (patch)
tree	a8d64b33a7ea1109b6b7e1043ced946cac416756 /cnn_v3/docs/CNN_V3.md
parent	8b4d7a49f038d7e849e6764dcc3abd1e1be01061 (diff)