1 files changed, 6 insertions, 32 deletions
diff --git a/cnn_v3/docs/CNN_V3.md b/cnn_v3/docs/CNN_V3.md
index 4d58811..d775e2b 100644
--- a/cnn_v3/docs/CNN_V3.md
+++ b/cnn_v3/docs/CNN_V3.md
@@ -27,33 +27,7 @@ CNN v3 is a next-generation post-processing effect using:
 
 ### Pipeline Overview
 
-```
-G-Buffer (albedo, normal, depth, matID, UV)
-        │
-        ▼
-  FiLM Conditioning
-  (beat_time, audio_intensity, style_params)
-        │ → γ[], β[] per channel
-        ▼
-  U-Net
-  ┌─────────────────────────────────────────┐
-  │  Encoder                                │
-  │  enc0 (H×W, 4ch) ────────────skip──────┤
-  │  ↓ down (avg pool 2×2)                  │
-  │  enc1 (H/2×W/2, 8ch) ────────skip──────┤
-  │  ↓ down                                 │
-  │  bottleneck (H/4×W/4, 8ch)             │
-  │                                         │
-  │  Decoder                                │
-  │  ↑ up (nearest ×2) + skip enc1         │
-  │  dec1 (H/2×W/2, 4ch)                   │
-  │  ↑ up + skip enc0                       │
-  │  dec0 (H×W, 4ch)                        │
-  └─────────────────────────────────────────┘
-        │
-        ▼
-  output RGBA (H×W)
-```
+![CNN v3 U-Net + FiLM Architecture](cnn_v3_architecture.png)
 
 FiLM is applied **inside each encoder/decoder block**, after each convolution.
 
@@ -352,11 +326,11 @@ All f16, little-endian, same packing as v2 (`pack2x16float`).
 |-----------|---------|------|-----------|
 | enc0: Conv(20→4, 3×3) | 20×4×9=720 | +4 | 724 |
 | enc1: Conv(4→8, 3×3) | 4×8×9=288 | +8 | 296 |
-| bottleneck: Conv(8→8, 1×1) | 8×8×1=64 | +8 | 72 |
+| bottleneck: Conv(8→8, 3×3, dil=2) | 8×8×9=576 | +8 | 584 |
 | dec1: Conv(16→4, 3×3) | 16×4×9=576 | +4 | 580 |
 | dec0: Conv(8→4, 3×3) | 8×4×9=288 | +4 | 292 |
 | FiLM MLP (5→16→40) | 5×16+16×40=720 | +16+40 | 776 |
-| **Total** | | | **~3.9 KB f16** |
+| **Total conv** | | | **~4.84 KB f16** |
 
 Skip connections: dec1 input = 8ch (bottleneck) + 8ch (enc1 skip) = 16ch.
 dec0 input = 4ch (dec1) + 4ch (enc0 skip) = 8ch.
@@ -541,7 +515,7 @@ class CNNv3(nn.Module):
             nn.Conv2d(enc_channels[0], enc_channels[1], 3, padding=1),
         ])
         # Bottleneck
-        self.bottleneck = nn.Conv2d(enc_channels[1], enc_channels[1], 1)
+        self.bottleneck = nn.Conv2d(enc_channels[1], enc_channels[1], 3, padding=2, dilation=2)
         # Decoder (skip connections: concat → double channels)
         self.dec = nn.ModuleList([
             nn.Conv2d(enc_channels[1]*2, enc_channels[0], 3, padding=1),
@@ -709,7 +683,7 @@ Parity results:
 Pass 0: pack_gbuffer.wgsl         — assemble G-buffer channels into storage texture
 Pass 1: cnn_v3_enc0.wgsl          — encoder level 0 (20→4ch, 3×3)
 Pass 2: cnn_v3_enc1.wgsl          — encoder level 1 (4→8ch, 3×3) + downsample
-Pass 3: cnn_v3_bottleneck.wgsl    — bottleneck (8→8, 1×1)
+Pass 3: cnn_v3_bottleneck.wgsl    — bottleneck (8→8, 3×3, dilation=2)
 Pass 4: cnn_v3_dec1.wgsl          — decoder level 1: upsample + skip + (16→4, 3×3)
 Pass 5: cnn_v3_dec0.wgsl          — decoder level 0: upsample + skip + (8→4, 3×3)
 Pass 6: cnn_v3_output.wgsl        — sigmoid + composite to framebuffer
@@ -816,7 +790,7 @@ Status bar shows which channels are loaded.
 | `PACK_SHADER` | `STATIC_SHADER` | 20ch into feat_tex0 + feat_tex1 (rgba32uint each) |
 | `ENC0_SHADER` | part of `CNN_SHADER` | Conv(20→4, 3×3) + FiLM + ReLU; writes enc0_tex |
 | `ENC1_SHADER` | | Conv(4→8, 3×3) + FiLM + ReLU + avg_pool2×2; writes enc1_tex (half-res) |
-| `BOTTLENECK_SHADER` | | Conv(8→8, 1×1) + FiLM + ReLU; writes bn_tex |
+| `BOTTLENECK_SHADER` | | Conv(8→8, 3×3, dilation=2) + ReLU; writes bn_tex |
 | `DEC1_SHADER` | | nearest upsample×2 + concat(bn, enc1_skip) + Conv(16→4, 3×3) + FiLM + ReLU |
 | `DEC0_SHADER` | | nearest upsample×2 + concat(dec1, enc0_skip) + Conv(8→4, 3×3) + FiLM + ReLU |
 | `OUTPUT_SHADER` | | Conv(4→4, 1×1) + sigmoid → composites to canvas |