summaryrefslogtreecommitdiff
path: root/cnn_v3/docs/CNN_V3.md
diff options
context:
space:
mode:
authorskal <pascal.massimino@gmail.com>2026-03-25 10:05:42 +0100
committerskal <pascal.massimino@gmail.com>2026-03-25 10:05:42 +0100
commitce6e5b99f26e4e7c69a3cacf360bd0d492de928c (patch)
treea8d64b33a7ea1109b6b7e1043ced946cac416756 /cnn_v3/docs/CNN_V3.md
parent8b4d7a49f038d7e849e6764dcc3abd1e1be01061 (diff)
feat(cnn_v3): 3×3 dilated bottleneck + Sobel loss + FiLM warmup + architecture PNG
- Replace 1×1 pointwise bottleneck with Conv(8→8, 3×3, dilation=2): effective RF grows from ~13px to ~29px at ¼res (~+1 KB weights) - Add Sobel edge loss in training (--edge-loss-weight, default 0.1) - Add FiLM 2-phase training: freeze MLP for warmup epochs then unfreeze at lr×0.1 (--film-warmup-epochs, default 50) - Update weight layout: BN 72→584 f16, total 1964→2476 f16 (4952 B) - Cascade offsets in C++ effect, JS tool, export/gen_test_vectors scripts - Regenerate test_vectors.h (1238 u32); parity max_err=9.77e-04 - Generate dark-theme U-Net+FiLM architecture PNG (gen_architecture_png.py) - Replace ASCII art in CNN_V3.md and HOW_TO_CNN.md with PNG embed handoff(Gemini): bottleneck dilation + Sobel loss + FiLM warmup landed. Next: run first real training pass (see cnn_v3/docs/HOWTO.md §3).
Diffstat (limited to 'cnn_v3/docs/CNN_V3.md')
-rw-r--r--cnn_v3/docs/CNN_V3.md38
1 files changed, 6 insertions, 32 deletions
diff --git a/cnn_v3/docs/CNN_V3.md b/cnn_v3/docs/CNN_V3.md
index 4d58811..d775e2b 100644
--- a/cnn_v3/docs/CNN_V3.md
+++ b/cnn_v3/docs/CNN_V3.md
@@ -27,33 +27,7 @@ CNN v3 is a next-generation post-processing effect using:
### Pipeline Overview
-```
-G-Buffer (albedo, normal, depth, matID, UV)
- │
- ▼
- FiLM Conditioning
- (beat_time, audio_intensity, style_params)
- │ → γ[], β[] per channel
- ▼
- U-Net
- ┌─────────────────────────────────────────┐
- │ Encoder │
- │ enc0 (H×W, 4ch) ────────────skip──────┤
- │ ↓ down (avg pool 2×2) │
- │ enc1 (H/2×W/2, 8ch) ────────skip──────┤
- │ ↓ down │
- │ bottleneck (H/4×W/4, 8ch) │
- │ │
- │ Decoder │
- │ ↑ up (nearest ×2) + skip enc1 │
- │ dec1 (H/2×W/2, 4ch) │
- │ ↑ up + skip enc0 │
- │ dec0 (H×W, 4ch) │
- └─────────────────────────────────────────┘
- │
- ▼
- output RGBA (H×W)
-```
+![CNN v3 U-Net + FiLM Architecture](cnn_v3_architecture.png)
FiLM is applied **inside each encoder/decoder block**, after each convolution.
@@ -352,11 +326,11 @@ All f16, little-endian, same packing as v2 (`pack2x16float`).
|-----------|---------|------|-----------|
| enc0: Conv(20→4, 3×3) | 20×4×9=720 | +4 | 724 |
| enc1: Conv(4→8, 3×3) | 4×8×9=288 | +8 | 296 |
-| bottleneck: Conv(8→8, 1×1) | 8×8×1=64 | +8 | 72 |
+| bottleneck: Conv(8→8, 3×3, dil=2) | 8×8×9=576 | +8 | 584 |
| dec1: Conv(16→4, 3×3) | 16×4×9=576 | +4 | 580 |
| dec0: Conv(8→4, 3×3) | 8×4×9=288 | +4 | 292 |
| FiLM MLP (5→16→40) | 5×16+16×40=720 | +16+40 | 776 |
-| **Total** | | | **~3.9 KB f16** |
+| **Total conv** | | | **~4.84 KB f16** |
Skip connections: dec1 input = 8ch (bottleneck) + 8ch (enc1 skip) = 16ch.
dec0 input = 4ch (dec1) + 4ch (enc0 skip) = 8ch.
@@ -541,7 +515,7 @@ class CNNv3(nn.Module):
nn.Conv2d(enc_channels[0], enc_channels[1], 3, padding=1),
])
# Bottleneck
- self.bottleneck = nn.Conv2d(enc_channels[1], enc_channels[1], 1)
+ self.bottleneck = nn.Conv2d(enc_channels[1], enc_channels[1], 3, padding=2, dilation=2)
# Decoder (skip connections: concat → double channels)
self.dec = nn.ModuleList([
nn.Conv2d(enc_channels[1]*2, enc_channels[0], 3, padding=1),
@@ -709,7 +683,7 @@ Parity results:
Pass 0: pack_gbuffer.wgsl — assemble G-buffer channels into storage texture
Pass 1: cnn_v3_enc0.wgsl — encoder level 0 (20→4ch, 3×3)
Pass 2: cnn_v3_enc1.wgsl — encoder level 1 (4→8ch, 3×3) + downsample
-Pass 3: cnn_v3_bottleneck.wgsl — bottleneck (8→8, 1×1)
+Pass 3: cnn_v3_bottleneck.wgsl — bottleneck (8→8, 3×3, dilation=2)
Pass 4: cnn_v3_dec1.wgsl — decoder level 1: upsample + skip + (16→4, 3×3)
Pass 5: cnn_v3_dec0.wgsl — decoder level 0: upsample + skip + (8→4, 3×3)
Pass 6: cnn_v3_output.wgsl — sigmoid + composite to framebuffer
@@ -816,7 +790,7 @@ Status bar shows which channels are loaded.
| `PACK_SHADER` | `STATIC_SHADER` | 20ch into feat_tex0 + feat_tex1 (rgba32uint each) |
| `ENC0_SHADER` | part of `CNN_SHADER` | Conv(20→4, 3×3) + FiLM + ReLU; writes enc0_tex |
| `ENC1_SHADER` | | Conv(4→8, 3×3) + FiLM + ReLU + avg_pool2×2; writes enc1_tex (half-res) |
-| `BOTTLENECK_SHADER` | | Conv(8→8, 1×1) + FiLM + ReLU; writes bn_tex |
+| `BOTTLENECK_SHADER` | | Conv(8→8, 3×3, dilation=2) + ReLU; writes bn_tex |
| `DEC1_SHADER` | | nearest upsample×2 + concat(bn, enc1_skip) + Conv(16→4, 3×3) + FiLM + ReLU |
| `DEC0_SHADER` | | nearest upsample×2 + concat(dec1, enc0_skip) + Conv(8→4, 3×3) + FiLM + ReLU |
| `OUTPUT_SHADER` | | Conv(4→4, 1×1) + sigmoid → composites to canvas |