diff options
Diffstat (limited to 'cnn_v3/docs/CNN_V3.md')
| -rw-r--r-- | cnn_v3/docs/CNN_V3.md | 38 |
1 files changed, 6 insertions, 32 deletions
diff --git a/cnn_v3/docs/CNN_V3.md b/cnn_v3/docs/CNN_V3.md index 4d58811..d775e2b 100644 --- a/cnn_v3/docs/CNN_V3.md +++ b/cnn_v3/docs/CNN_V3.md @@ -27,33 +27,7 @@ CNN v3 is a next-generation post-processing effect using: ### Pipeline Overview -``` -G-Buffer (albedo, normal, depth, matID, UV) - │ - ▼ - FiLM Conditioning - (beat_time, audio_intensity, style_params) - │ → γ[], β[] per channel - ▼ - U-Net - ┌─────────────────────────────────────────┐ - │ Encoder │ - │ enc0 (H×W, 4ch) ────────────skip──────┤ - │ ↓ down (avg pool 2×2) │ - │ enc1 (H/2×W/2, 8ch) ────────skip──────┤ - │ ↓ down │ - │ bottleneck (H/4×W/4, 8ch) │ - │ │ - │ Decoder │ - │ ↑ up (nearest ×2) + skip enc1 │ - │ dec1 (H/2×W/2, 4ch) │ - │ ↑ up + skip enc0 │ - │ dec0 (H×W, 4ch) │ - └─────────────────────────────────────────┘ - │ - ▼ - output RGBA (H×W) -``` + FiLM is applied **inside each encoder/decoder block**, after each convolution. @@ -352,11 +326,11 @@ All f16, little-endian, same packing as v2 (`pack2x16float`). |-----------|---------|------|-----------| | enc0: Conv(20→4, 3×3) | 20×4×9=720 | +4 | 724 | | enc1: Conv(4→8, 3×3) | 4×8×9=288 | +8 | 296 | -| bottleneck: Conv(8→8, 1×1) | 8×8×1=64 | +8 | 72 | +| bottleneck: Conv(8→8, 3×3, dil=2) | 8×8×9=576 | +8 | 584 | | dec1: Conv(16→4, 3×3) | 16×4×9=576 | +4 | 580 | | dec0: Conv(8→4, 3×3) | 8×4×9=288 | +4 | 292 | | FiLM MLP (5→16→40) | 5×16+16×40=720 | +16+40 | 776 | -| **Total** | | | **~3.9 KB f16** | +| **Total conv** | | | **~4.84 KB f16** | Skip connections: dec1 input = 8ch (bottleneck) + 8ch (enc1 skip) = 16ch. dec0 input = 4ch (dec1) + 4ch (enc0 skip) = 8ch. @@ -541,7 +515,7 @@ class CNNv3(nn.Module): nn.Conv2d(enc_channels[0], enc_channels[1], 3, padding=1), ]) # Bottleneck - self.bottleneck = nn.Conv2d(enc_channels[1], enc_channels[1], 1) + self.bottleneck = nn.Conv2d(enc_channels[1], enc_channels[1], 3, padding=2, dilation=2) # Decoder (skip connections: concat → double channels) self.dec = nn.ModuleList([ nn.Conv2d(enc_channels[1]*2, enc_channels[0], 3, padding=1), @@ -709,7 +683,7 @@ Parity results: Pass 0: pack_gbuffer.wgsl — assemble G-buffer channels into storage texture Pass 1: cnn_v3_enc0.wgsl — encoder level 0 (20→4ch, 3×3) Pass 2: cnn_v3_enc1.wgsl — encoder level 1 (4→8ch, 3×3) + downsample -Pass 3: cnn_v3_bottleneck.wgsl — bottleneck (8→8, 1×1) +Pass 3: cnn_v3_bottleneck.wgsl — bottleneck (8→8, 3×3, dilation=2) Pass 4: cnn_v3_dec1.wgsl — decoder level 1: upsample + skip + (16→4, 3×3) Pass 5: cnn_v3_dec0.wgsl — decoder level 0: upsample + skip + (8→4, 3×3) Pass 6: cnn_v3_output.wgsl — sigmoid + composite to framebuffer @@ -816,7 +790,7 @@ Status bar shows which channels are loaded. | `PACK_SHADER` | `STATIC_SHADER` | 20ch into feat_tex0 + feat_tex1 (rgba32uint each) | | `ENC0_SHADER` | part of `CNN_SHADER` | Conv(20→4, 3×3) + FiLM + ReLU; writes enc0_tex | | `ENC1_SHADER` | | Conv(4→8, 3×3) + FiLM + ReLU + avg_pool2×2; writes enc1_tex (half-res) | -| `BOTTLENECK_SHADER` | | Conv(8→8, 1×1) + FiLM + ReLU; writes bn_tex | +| `BOTTLENECK_SHADER` | | Conv(8→8, 3×3, dilation=2) + ReLU; writes bn_tex | | `DEC1_SHADER` | | nearest upsample×2 + concat(bn, enc1_skip) + Conv(16→4, 3×3) + FiLM + ReLU | | `DEC0_SHADER` | | nearest upsample×2 + concat(dec1, enc0_skip) + Conv(8→4, 3×3) + FiLM + ReLU | | `OUTPUT_SHADER` | | Conv(4→4, 1×1) + sigmoid → composites to canvas | |
