diff options
Diffstat (limited to 'cnn_v3/docs')
| -rw-r--r-- | cnn_v3/docs/CNN_V3.md | 87 | ||||
| -rw-r--r-- | cnn_v3/docs/HOWTO.md | 47 | ||||
| -rw-r--r-- | cnn_v3/docs/HOW_TO_CNN.md | 32 | ||||
| -rw-r--r-- | cnn_v3/docs/cnn_v3_architecture.png | bin | 254783 -> 256685 bytes | |||
| -rw-r--r-- | cnn_v3/docs/gen_architecture_png.py | 18 |
5 files changed, 98 insertions, 86 deletions
diff --git a/cnn_v3/docs/CNN_V3.md b/cnn_v3/docs/CNN_V3.md index d775e2b..a197a1d 100644 --- a/cnn_v3/docs/CNN_V3.md +++ b/cnn_v3/docs/CNN_V3.md @@ -19,7 +19,7 @@ CNN v3 is a next-generation post-processing effect using: - Training from both Blender renders and real photos - Strict test framework: per-pixel bit-exact validation across all implementations -**Status:** Phases 1–5 complete. Parity validated (max_err=4.88e-4 ≤ 1/255). Next: `train_cnn_v3.py` for FiLM MLP training. +**Status:** Phases 1–7 complete. Architecture upgraded to enc_channels=[8,16] for improved capacity. Parity test and runtime updated. Next: training pass. --- @@ -52,14 +52,14 @@ A small MLP takes a conditioning vector `c` and outputs all γ/β: c = [beat_phase, beat_time/8, audio_intensity, style_p0, style_p1] (5D) ↓ Linear(5 → 16) → ReLU ↓ Linear(16 → N_film_params) - → [γ_enc0(4ch), β_enc0(4ch), γ_enc1(8ch), β_enc1(8ch), - γ_dec1(4ch), β_dec1(4ch), γ_dec0(4ch), β_dec0(4ch)] - = 2 × (4+8+4+4) = 40 parameters output + → [γ_enc0(8ch), β_enc0(8ch), γ_enc1(16ch), β_enc1(16ch), + γ_dec1(8ch), β_dec1(8ch), γ_dec0(4ch), β_dec0(4ch)] + = 2 × (8+16+8+4) = 72 parameters output ``` **Runtime cost:** trivial (one MLP forward pass per frame, CPU-side). **Training:** jointly trained with U-Net — backprop through FiLM to MLP. -**Size:** MLP weights ~(5×16 + 16×40) × 2 bytes f16 ≈ 1.4 KB. +**Size:** MLP weights ~(5×16 + 16×72) × 2 bytes f16 ≈ 2.5 KB. **Why FiLM instead of just uniform parameters?** - γ/β are per-channel, enabling fine-grained style control @@ -318,22 +318,25 @@ All f16, little-endian, same packing as v2 (`pack2x16float`). ## Size Budget -**CNN v3 target: ≤ 6 KB weights** +**CNN v3 target: ≤ 6 KB weights (conv only); current arch prioritises quality** -**Implemented architecture (fits ≤ 4 KB):** +**Implemented architecture (enc_channels=[8,16] — ~15.3 KB conv f16):** | Component | Weights | Bias | Total f16 | |-----------|---------|------|-----------| -| enc0: Conv(20→4, 3×3) | 20×4×9=720 | +4 | 724 | -| enc1: Conv(4→8, 3×3) | 4×8×9=288 | +8 | 296 | -| bottleneck: Conv(8→8, 3×3, dil=2) | 8×8×9=576 | +8 | 584 | -| dec1: Conv(16→4, 3×3) | 16×4×9=576 | +4 | 580 | -| dec0: Conv(8→4, 3×3) | 8×4×9=288 | +4 | 292 | -| FiLM MLP (5→16→40) | 5×16+16×40=720 | +16+40 | 776 | -| **Total conv** | | | **~4.84 KB f16** | +| enc0: Conv(20→8, 3×3) | 20×8×9=1440 | +8 | 1448 | +| enc1: Conv(8→16, 3×3) | 8×16×9=1152 | +16 | 1168 | +| bottleneck: Conv(16→16, 3×3, dil=2) | 16×16×9=2304 | +16 | 2320 | +| dec1: Conv(32→8, 3×3) | 32×8×9=2304 | +8 | 2312 | +| dec0: Conv(16→4, 3×3) | 16×4×9=576 | +4 | 580 | +| **Total conv** | | | **7828 f16 = ~15.3 KB** | +| FiLM MLP (5→16→72) | 5×16+16×72=1232 | +16+72 | 1320 | +| **Total incl. MLP** | | | **9148 f16 = ~17.9 KB** | -Skip connections: dec1 input = 8ch (bottleneck) + 8ch (enc1 skip) = 16ch. -dec0 input = 4ch (dec1) + 4ch (enc0 skip) = 8ch. +Skip connections: dec1 input = 16ch (bottleneck up) + 16ch (enc1 skip) = 32ch. +dec0 input = 8ch (dec1 up) + 8ch (enc0 skip) = 16ch. + +**Smaller variant (enc_channels=[4,8] — ~4.84 KB conv f16):** fits 6 KB target but has lower representational capacity. Train with `--enc-channels 4,8` if size-critical. --- @@ -507,7 +510,7 @@ All tests: max per-pixel per-channel absolute error ≤ 1/255 (PyTorch f32 vs We ```python class CNNv3(nn.Module): - def __init__(self, enc_channels=[4,8], film_cond_dim=5): + def __init__(self, enc_channels=[8,16], film_cond_dim=5): super().__init__() # Encoder self.enc = nn.ModuleList([ @@ -681,11 +684,11 @@ Parity results: ``` Pass 0: pack_gbuffer.wgsl — assemble G-buffer channels into storage texture -Pass 1: cnn_v3_enc0.wgsl — encoder level 0 (20→4ch, 3×3) -Pass 2: cnn_v3_enc1.wgsl — encoder level 1 (4→8ch, 3×3) + downsample -Pass 3: cnn_v3_bottleneck.wgsl — bottleneck (8→8, 3×3, dilation=2) -Pass 4: cnn_v3_dec1.wgsl — decoder level 1: upsample + skip + (16→4, 3×3) -Pass 5: cnn_v3_dec0.wgsl — decoder level 0: upsample + skip + (8→4, 3×3) +Pass 1: cnn_v3_enc0.wgsl — encoder level 0 (20→8ch, 3×3) +Pass 2: cnn_v3_enc1.wgsl — encoder level 1 (8→16ch, 3×3) + downsample +Pass 3: cnn_v3_bottleneck.wgsl — bottleneck (16→16, 3×3, dilation=2) +Pass 4: cnn_v3_dec1.wgsl — decoder level 1: upsample + skip + (32→8, 3×3) +Pass 5: cnn_v3_dec0.wgsl — decoder level 0: upsample + skip + (16→4, 3×3) Pass 6: cnn_v3_output.wgsl — sigmoid + composite to framebuffer ``` @@ -788,11 +791,11 @@ Status bar shows which channels are loaded. | Shader | Replaces | Notes | |--------|----------|-------| | `PACK_SHADER` | `STATIC_SHADER` | 20ch into feat_tex0 + feat_tex1 (rgba32uint each) | -| `ENC0_SHADER` | part of `CNN_SHADER` | Conv(20→4, 3×3) + FiLM + ReLU; writes enc0_tex | -| `ENC1_SHADER` | | Conv(4→8, 3×3) + FiLM + ReLU + avg_pool2×2; writes enc1_tex (half-res) | -| `BOTTLENECK_SHADER` | | Conv(8→8, 3×3, dilation=2) + ReLU; writes bn_tex | -| `DEC1_SHADER` | | nearest upsample×2 + concat(bn, enc1_skip) + Conv(16→4, 3×3) + FiLM + ReLU | -| `DEC0_SHADER` | | nearest upsample×2 + concat(dec1, enc0_skip) + Conv(8→4, 3×3) + FiLM + ReLU | +| `ENC0_SHADER` | part of `CNN_SHADER` | Conv(20→8, 3×3) + FiLM + ReLU; writes enc0_tex (rgba32uint, 8ch) | +| `ENC1_SHADER` | | Conv(8→16, 3×3) + FiLM + ReLU + avg_pool2×2; writes enc1_lo+enc1_hi (2× rgba32uint, 16ch split) | +| `BOTTLENECK_SHADER` | | Conv(16→16, 3×3, dilation=2) + ReLU; writes bn_lo+bn_hi (2× rgba32uint, 16ch split) | +| `DEC1_SHADER` | | nearest upsample×2 + concat(bn, enc1_skip) + Conv(32→8, 3×3) + FiLM + ReLU; writes dec1_tex (rgba32uint, 8ch) | +| `DEC0_SHADER` | | nearest upsample×2 + concat(dec1, enc0_skip) + Conv(16→4, 3×3) + FiLM + ReLU; writes rgba16float | | `OUTPUT_SHADER` | | Conv(4→4, 1×1) + sigmoid → composites to canvas | FiLM γ/β computed JS-side from sliders (tiny MLP forward pass in JS), uploaded as uniform. @@ -805,15 +808,15 @@ FiLM γ/β computed JS-side from sliders (tiny MLP forward pass in JS), uploaded |------|------|--------|----------| | `feat_tex0` | W×H | rgba32uint | feature buffer slots 0–7 (f16) | | `feat_tex1` | W×H | rgba32uint | feature buffer slots 8–19 (u8+spare) | -| `enc0_tex` | W×H | rgba32uint | 4 channels f16 (enc0 output, skip) | -| `enc1_tex` | W/2×H/2 | rgba32uint | 8 channels f16 (enc1 out, skip) — 2 texels per pixel | -| `bn_tex` | W/2×H/2 | rgba32uint | 8 channels f16 (bottleneck output) | -| `dec1_tex` | W×H | rgba32uint | 4 channels f16 (dec1 output) | -| `dec0_tex` | W×H | rgba32uint | 4 channels f16 (dec0 output) | +| `enc0_tex` | W×H | rgba32uint | 8 channels f16 (enc0 output, skip) | +| `enc1_lo` + `enc1_hi` | W/2×H/2 each | rgba32uint | 16 channels f16 split (enc1 out, skip) | +| `bn_lo` + `bn_hi` | W/4×H/4 each | rgba32uint | 16 channels f16 split (bottleneck output) | +| `dec1_tex` | W/2×H/2 | rgba32uint | 8 channels f16 (dec1 output) | +| `dec0_tex` | W×H | rgba16float | 4 channels f16 (final RGBA output) | | `prev_tex` | W×H | rgba16float | previous CNN output (temporal, `F16X8`) | -Skip connections: enc0_tex and enc1_tex are **kept alive** across the full forward pass -(not ping-ponged away). DEC1 and DEC0 read them directly. +Skip connections: enc0_tex (8ch) and enc1_lo/enc1_hi (16ch split) are **kept alive** across the +full forward pass (not ping-ponged away). DEC1 and DEC0 read them directly. --- @@ -856,7 +859,7 @@ python3 -m http.server 8000 Ordered for parallel execution where possible. Phases 1 and 2 are independent. -**Architecture locked:** enc_channels = [4, 8]. See Size Budget for weight counts. +**Architecture:** enc_channels = [8, 16]. See Size Budget for weight counts. --- @@ -881,7 +884,7 @@ before the real G-buffer exists. Wire real G-buffer in Phase 5. **1a. PyTorch model** - [ ] `cnn_v3/training/train_cnn_v3.py` - - [ ] `CNNv3` class: U-Net [4,8], FiLM MLP (5→16→48), channel dropout + - [ ] `CNNv3` class: U-Net [8,16], FiLM MLP (5→16→72), channel dropout - [ ] `GBufferDataset`: loads 20-channel feature tensors from packed PNGs - [ ] Training loop, checkpointing, grayscale/RGBA loss option @@ -919,11 +922,11 @@ no batch norm at inference, `#include` existing snippets where possible. - writes feat_tex0 (f16×8) + feat_tex1 (u8×12, spare) **2b. U-Net compute shaders** -- [ ] `src/effects/cnn_v3_enc0.wgsl` — Conv(20→4, 3×3) + FiLM + ReLU -- [ ] `src/effects/cnn_v3_enc1.wgsl` — Conv(4→8, 3×3) + FiLM + ReLU + avg_pool 2×2 -- [ ] `src/effects/cnn_v3_bottleneck.wgsl` — Conv(8→8, 1×1) + FiLM + ReLU -- [ ] `src/effects/cnn_v3_dec1.wgsl` — nearest upsample×2 + concat enc1_skip + Conv(16→4, 3×3) + FiLM + ReLU -- [ ] `src/effects/cnn_v3_dec0.wgsl` — nearest upsample×2 + concat enc0_skip + Conv(8→4, 3×3) + FiLM + ReLU +- [ ] `src/effects/cnn_v3_enc0.wgsl` — Conv(20→8, 3×3) + FiLM + ReLU +- [ ] `src/effects/cnn_v3_enc1.wgsl` — Conv(8→16, 3×3) + FiLM + ReLU + avg_pool 2×2 +- [ ] `src/effects/cnn_v3_bottleneck.wgsl` — Conv(16→16, 3×3, dilation=2) + ReLU +- [ ] `src/effects/cnn_v3_dec1.wgsl` — nearest upsample×2 + concat enc1_skip + Conv(32→8, 3×3) + FiLM + ReLU +- [ ] `src/effects/cnn_v3_dec0.wgsl` — nearest upsample×2 + concat enc0_skip + Conv(16→4, 3×3) + FiLM + ReLU - [ ] `src/effects/cnn_v3_output.wgsl` — Conv(4→4, 1×1) + sigmoid → composite to framebuffer Reuse from existing shaders: @@ -941,7 +944,7 @@ Reuse from existing shaders: - [ ] `src/effects/cnn_v3_effect.h` — class declaration - textures: feat_tex0, feat_tex1, enc0_tex, enc1_tex (half-res), bn_tex (half-res), dec1_tex, dec0_tex - **`WGPUTexture prev_cnn_tex_`** — persistent RGBA8, owned by effect, initialized black - - `FilmParams` uniform buffer (γ/β for 4 levels = 48 floats = 192 bytes) + - `FilmParams` uniform buffer (γ/β for 4 levels = 72 floats = 288 bytes) - FiLM MLP weights (loaded from .bin, run CPU-side per frame) - [ ] `src/effects/cnn_v3_effect.cc` — implementation diff --git a/cnn_v3/docs/HOWTO.md b/cnn_v3/docs/HOWTO.md index 9a3efdf..ff8793f 100644 --- a/cnn_v3/docs/HOWTO.md +++ b/cnn_v3/docs/HOWTO.md @@ -267,22 +267,30 @@ Two source files: ```bash cd cnn_v3/training -# Patch-based (default) — 64×64 patches around Harris corners -python3 train_cnn_v3.py \ +# Recommended: [8,16] channels + multi-scale loss (matches runtime) +uv run python3 train_cnn_v3.py \ --input dataset/ \ - --input-mode simple \ - --epochs 200 + --enc-channels 8,16 \ + --epochs 5000 \ + --checkpoint-dir checkpoints_8_16 # Full-image mode (resizes to 256×256) -python3 train_cnn_v3.py \ +uv run python3 train_cnn_v3.py \ --input dataset/ \ - --input-mode full \ + --enc-channels 8,16 \ --full-image --image-size 256 \ - --epochs 500 + --epochs 5000 + +# Size-budget variant [4,8] (fits 6 KB) +uv run python3 train_cnn_v3.py \ + --input dataset/ \ + --enc-channels 4,8 \ + --epochs 5000 # Quick smoke test: 1 epoch, small patches, random detector -python3 train_cnn_v3.py \ +uv run python3 train_cnn_v3.py \ --input dataset/ --epochs 1 \ + --enc-channels 8,16 \ --patch-size 32 --detector random ``` @@ -318,7 +326,7 @@ All other flags (`--epochs`, `--lr`, `--checkpoint-dir`, `--enc-channels`, etc.) | `--detector` | `harris` | `harris` \| `shi-tomasi` \| `fast` \| `gradient` \| `random` | | `--channel-dropout-p F` | `0.3` | Dropout prob for geometric channels | | `--full-image` | off | Resize full image instead of cropping patches | -| `--enc-channels C` | `4,8` | Encoder channel counts, comma-separated | +| `--enc-channels C` | `4,8` | Encoder channel counts: `8,16` (current default runtime), `4,8` (size budget) | | `--film-cond-dim N` | `5` | FiLM conditioning input size | | `--epochs N` | `200` | Training epochs | | `--batch-size N` | `16` | Batch size | @@ -397,6 +405,7 @@ Test vectors generated by `cnn_v3/training/gen_test_vectors.py` (PyTorch referen | 5 — Parity validation | ✅ Done | test_cnn_v3_parity.cc, max_err=4.88e-4 | | 6 — FiLM MLP training | ✅ Done | train_cnn_v3.py + cnn_v3_utils.py written | | 7 — G-buffer visualizer (C++) | ✅ Done | GBufViewEffect, 36/36 tests pass | +| 8 — Architecture upgrade [8,16] | ✅ Done | enc_channels=[8,16], multi-scale loss, 16ch textures split into lo/hi pairs | | 7 — Sample loader (web tool) | ✅ Done | "Load sample directory" in cnn_v3/tools/ | --- @@ -408,10 +417,10 @@ The common snippet provides `get_w()` and `unpack_8ch()`. | Pass | Shader | Input(s) | Output | Dims | |------|--------|----------|--------|------| -| enc0 | `cnn_v3_enc0.wgsl` | feat_tex0+feat_tex1 (20ch) | enc0_tex rgba16float (4ch) | full | -| enc1 | `cnn_v3_enc1.wgsl` | enc0_tex (AvgPool2×2 inline) | enc1_tex rgba32uint (8ch) | ½ | -| bottleneck | `cnn_v3_bottleneck.wgsl` | enc1_tex (AvgPool2×2 inline) | bottleneck_tex rgba32uint (8ch) | ¼ | -| dec1 | `cnn_v3_dec1.wgsl` | bottleneck_tex + enc1_tex (skip) | dec1_tex rgba16float (4ch) | ½ | +| enc0 | `cnn_v3_enc0.wgsl` | feat_tex0+feat_tex1 (20ch) | enc0_tex rgba32uint (8ch) | full | +| enc1 | `cnn_v3_enc1.wgsl` | enc0_tex (AvgPool2×2 inline) | enc1_lo+enc1_hi rgba32uint (16ch split) | ½ | +| bottleneck | `cnn_v3_bottleneck.wgsl` | enc1_lo+enc1_hi (AvgPool2×2 inline) | bn_lo+bn_hi rgba32uint (16ch split) | ¼ | +| dec1 | `cnn_v3_dec1.wgsl` | bn_lo+bn_hi + enc1_lo+enc1_hi (skip) | dec1_tex rgba32uint (8ch) | ½ | | dec0 | `cnn_v3_dec0.wgsl` | dec1_tex + enc0_tex (skip) | output_tex rgba16float (4ch) | full | **Parity rules baked into the shaders:** @@ -437,12 +446,12 @@ FiLM γ/β are computed CPU-side by the FiLM MLP (Phase 4) and uploaded each fra **Weight offsets** (f16 units, including bias): | Layer | Weights | Bias | Total f16 | |-------|---------|------|-----------| -| enc0 | 20×4×9=720 | +4 | 724 | -| enc1 | 4×8×9=288 | +8 | 296 | -| bottleneck | 8×8×9=576 | +8 | 584 | -| dec1 | 16×4×9=576 | +4 | 580 | -| dec0 | 8×4×9=288 | +4 | 292 | -| **Total** | | | **2476 f16 = ~4.84 KB** | +| enc0 | 20×8×9=1440 | +8 | 1448 | +| enc1 | 8×16×9=1152 | +16 | 1168 | +| bottleneck | 16×16×9=2304 | +16 | 2320 | +| dec1 | 32×8×9=2304 | +8 | 2312 | +| dec0 | 16×4×9=576 | +4 | 580 | +| **Total** | | | **7828 f16 = ~15.3 KB** | **Asset IDs** (registered in `workspaces/main/assets.txt` + `src/effects/shaders.cc`): `SHADER_CNN_V3_COMMON`, `SHADER_CNN_V3_ENC0`, `SHADER_CNN_V3_ENC1`, diff --git a/cnn_v3/docs/HOW_TO_CNN.md b/cnn_v3/docs/HOW_TO_CNN.md index 09db97c..11ed260 100644 --- a/cnn_v3/docs/HOW_TO_CNN.md +++ b/cnn_v3/docs/HOW_TO_CNN.md @@ -358,7 +358,7 @@ uv run train_cnn_v3.py \ The model prints its parameter count: ``` -Model: enc=[4, 8] film_cond_dim=5 params=3252 (~6.4 KB f16) +Model: enc=[8, 16] film_cond_dim=5 params=9148 (~17.9 KB f16) ``` If `params` is much higher, `--enc-channels` was changed; update C++ constants accordingly. @@ -492,12 +492,12 @@ WEIGHTS_CNN_V3_FILM_MLP, BINARY, weights/cnn_v3_film_mlp.bin, "CNN v3 FiLM MLP w | Layer | f16 count | Bytes | |-------|-----------|-------| -| enc0 Conv(20→4,3×3)+bias | 724 | — | -| enc1 Conv(4→8,3×3)+bias | 296 | — | -| bottleneck Conv(8→8,3×3,dil=2)+bias | 584 | — | -| dec1 Conv(16→4,3×3)+bias | 580 | — | -| dec0 Conv(8→4,3×3)+bias | 292 | — | -| **Total** | **2476 f16** | **4952 bytes** | +| enc0 Conv(20→8,3×3)+bias | 1448 | — | +| enc1 Conv(8→16,3×3)+bias | 1168 | — | +| bottleneck Conv(16→16,3×3,dil=2)+bias | 2320 | — | +| dec1 Conv(32→8,3×3)+bias | 2312 | — | +| dec0 Conv(16→4,3×3)+bias | 580 | — | +| **Total** | **7828 f16** | **15656 bytes** | **`cnn_v3_film_mlp.bin`** — FiLM MLP weights as raw f32, row-major: @@ -505,9 +505,9 @@ WEIGHTS_CNN_V3_FILM_MLP, BINARY, weights/cnn_v3_film_mlp.bin, "CNN v3 FiLM MLP w |-------|-------|-----------| | L0 weight | (16, 5) | 80 | | L0 bias | (16,) | 16 | -| L1 weight | (40, 16) | 640 | -| L1 bias | (40,) | 40 | -| **Total** | | **776 f32 = 3104 bytes** | +| L1 weight | (72, 16) | 1152 | +| L1 bias | (72,) | 72 | +| **Total** | | **1320 f32 = 5280 bytes** | The FiLM MLP is for CPU-side inference (future — see §4d). The U-Net weights in `cnn_v3_weights.bin` are what you need immediately. @@ -524,16 +524,16 @@ The export script produces this layout: `u32 = u16[0::2] | (u16[1::2] << 16)`. ``` Checkpoint: epoch=200 loss=0.012345 - enc_channels=[4, 8] film_cond_dim=5 + enc_channels=[8, 16] film_cond_dim=5 cnn_v3_weights.bin - 2476 f16 values → 1238 u32 → 4952 bytes - Upload via CNNv3Effect::upload_weights(queue, data, 4952) + 7828 f16 values → 3914 u32 → 15656 bytes + Upload via CNNv3Effect::upload_weights(queue, data, 15656) cnn_v3_film_mlp.bin L0: weight (16, 5) + bias (16,) - L1: weight (40, 16) + bias (40,) - 776 f32 values → 3104 bytes + L1: weight (72, 16) + bias (72,) + 1320 f32 values → 5280 bytes ``` ### Pitfalls @@ -542,7 +542,7 @@ cnn_v3_film_mlp.bin assertion in the export script fires. The C++ weight-offset constants (`kEnc0Weights` etc.) in `cnn_v3_effect.cc` must also be updated to match. - **Old checkpoint missing `config`:** if `config` key is absent (checkpoint from a very early - version), the script defaults to `enc_channels=[4,8], film_cond_dim=5`. + version), the script defaults to `enc_channels=[8,16], film_cond_dim=5`. - **`weights_only=True`:** requires PyTorch ≥ 2.0. If you get a warning, upgrade torch. --- diff --git a/cnn_v3/docs/cnn_v3_architecture.png b/cnn_v3/docs/cnn_v3_architecture.png Binary files differindex 2116c2b..474f488 100644 --- a/cnn_v3/docs/cnn_v3_architecture.png +++ b/cnn_v3/docs/cnn_v3_architecture.png diff --git a/cnn_v3/docs/gen_architecture_png.py b/cnn_v3/docs/gen_architecture_png.py index bd60a97..1c2ff65 100644 --- a/cnn_v3/docs/gen_architecture_png.py +++ b/cnn_v3/docs/gen_architecture_png.py @@ -108,20 +108,20 @@ def dim_label(x, y, txt): box(EX, Y_IN, BW, BH_IO, C_IO, 'G-Buffer Features', '20 channels · full res') -box(EX, Y_E0, BW, BH, C_ENC, 'enc0 Conv(20→4, 3×3) + FiLM + ReLU', - 'full res · 4 ch') +box(EX, Y_E0, BW, BH, C_ENC, 'enc0 Conv(20→8, 3×3) + FiLM + ReLU', + 'full res · 8 ch') -box(EX, Y_E1, BW, BH, C_ENC, 'enc1 Conv(4→8, 3×3) + FiLM + ReLU', - '½ res · 8 ch · (AvgPool↓ on input)') +box(EX, Y_E1, BW, BH, C_ENC, 'enc1 Conv(8→16, 3×3) + FiLM + ReLU', + '½ res · 16 ch · (AvgPool↓ on input)') box(BX, Y_BN, BW_BN, BH_BN, C_BN, - 'bottleneck Conv(8→8, 3×3, dilation=2) + ReLU', - '¼ res · 8 ch · no FiLM · effective RF ≈ 10 px @ ½res') + 'bottleneck Conv(16→16, 3×3, dilation=2) + ReLU', + '¼ res · 16 ch · no FiLM · effective RF ≈ 10 px @ ½res') -box(DX, Y_D1, BW, BH, C_DEC, 'dec1 Conv(16→4, 3×3) + FiLM + ReLU', - '½ res · 4 ch · (upsample↑ + cat enc1 skip)') +box(DX, Y_D1, BW, BH, C_DEC, 'dec1 Conv(32→8, 3×3) + FiLM + ReLU', + '½ res · 8 ch · (upsample↑ + cat enc1 skip)') -box(DX, Y_D0, BW, BH, C_DEC, 'dec0 Conv(8→4, 3×3) + FiLM + sigmoid', +box(DX, Y_D0, BW, BH, C_DEC, 'dec0 Conv(16→4, 3×3) + FiLM + sigmoid', 'full res · 4 ch · (upsample↑ + cat enc0 skip)') box(DX, Y_OUT, BW, BH_IO, C_IO, 'RGBA Output', |
