# CNN v3 — Complete Pipeline Playbook U-Net + FiLM style-transfer pipeline: data collection → training → export → C++ integration → demo → parity test → HTML tool. --- ## Table of Contents 1. [Overview](#0-overview) 2. [Collecting Training Samples](#1-collecting-training-samples) - [1a. From Real Photos](#1a-from-real-photos) - [1b. From Blender (Full G-Buffer)](#1b-from-blender-full-g-buffer) - [1c. Dataset Layout](#1c-dataset-layout) 3. [Training the U-Net + FiLM](#2-training-the-u-net--film) 4. [Exporting Weights](#3-exporting-weights) 5. [Wiring into CNNv3Effect (C++)](#4-wiring-into-cnnv3effect-c) 6. [Running a Demo](#5-running-a-demo) 7. [Parity Testing](#6-parity-testing) 8. [HTML WebGPU Tool](#7-html-webgpu-tool) 9. [Appendix A — File Reference](#appendix-a--file-reference) 10. [Appendix B — 20-Channel Feature Layout](#appendix-b--20-channel-feature-layout) --- ## 0. Overview CNN v3 is a 2-level U-Net with FiLM conditioning, designed to run in real-time as a WebGPU compute effect inside the demo. **Architecture:** ``` Input: 20-channel G-buffer feature textures (rgba32uint) │ enc0 ──── Conv(20→4, 3×3) + FiLM + ReLU ┐ full res │ ↘ skip │ enc1 ──── AvgPool2×2 + Conv(4→8, 3×3) + FiLM ┐ ½ res │ ↘ skip │ bottleneck AvgPool2×2 + Conv(8→8, 1×1) + ReLU ¼ res (no FiLM) │ │ dec1 ←── upsample×2 + cat(enc1 skip) + Conv(16→4, 3×3) + FiLM │ │ ½ res dec0 ←── upsample×2 + cat(enc0 skip) + Conv(8→4, 3×3) + FiLM + sigmoid full res → RGBA output ``` **FiLM MLP:** `Linear(5→16) → ReLU → Linear(16→40)` trained jointly with U-Net. - Input: `[beat_phase, beat_norm, audio_intensity, style_p0, style_p1]` - Output: 40 γ/β values controlling style across all 4 FiLM layers **Weight budget:** ~3.9 KB f16 (fits ≤6 KB target) **Two data paths:** - **Simple mode** — real photos with zeroed geometric channels (normal, depth, matid) - **Full mode** — Blender G-buffer renders with all 20 channels populated **Pipeline summary:** ``` photos/Blender → pack → dataset/ → train_cnn_v3.py → checkpoint.pth │ export_cnn_v3_weights.py ┌─────────┴──────────┐ cnn_v3_weights.bin cnn_v3_film_mlp.bin │ CNNv3Effect::upload_weights() │ demo / HTML tool ``` --- ## 1. Collecting Training Samples Each sample is a directory containing 7 PNG files. The dataloader discovers samples by scanning for directories containing `albedo.png`. ### 1a. From Real Photos **What it does:** Converts one photo into a sample with zeroed geometric channels. The network handles this correctly because channel-dropout training (§2e) teaches it to work with or without geometry data. **Step 1 — Pack an input/target pair with `gen_sample`:** ```bash cd cnn_v3/training ./gen_sample.sh /path/to/photo.png /path/to/stylized.png dataset/simple/sample_001/ ``` `gen_sample.sh ` is the recommended one-shot wrapper. It calls `pack_photo_sample.py` with both `--photo` and `--target` in a single step. **What gets written:** | File | Content | Notes | |------|---------|-------| | `albedo.png` | Photo RGB uint8 | Source image | | `normal.png` | (128, 128, 0) uint8 | Neutral "no normal" → reconstructed (0,0,1) | | `depth.png` | All zeros uint16 | No depth data | | `matid.png` | All zeros uint8 | No material IDs | | `shadow.png` | 255 everywhere uint8 | Assume fully lit | | `transp.png` | 1 − alpha uint8 | 0 = opaque | | `target.png` | Stylized target RGBA | Ground truth for training | **Step 2 — Verify the target:** The network learns the mapping `albedo → target`. If you pass the same image as both input and target, the network learns identity (useful as sanity check, not for real training). Confirm `target.png` looks correct before running training. **Alternative — pack without a target yet:** ```bash python3 pack_photo_sample.py \ --photo /path/to/photo.png \ --output dataset/simple/sample_001/ # target.png defaults to a copy of the input; replace it before training: cp my_stylized_version.png dataset/simple/sample_001/target.png ``` **Batch packing:** ```bash for f in photos/*.png; do name=$(basename "${f%.png}") ./gen_sample.sh "$f" "targets/${name}_styled.png" \ dataset/simple/sample_${name}/ done ``` **Pitfalls:** - Input must be RGB or RGBA; grayscale photos need `.convert('RGB')` first - `normal.png` B channel is always 0 (unused); only R and G channels carry oct-encoded XY - `mip1`/`mip2` are computed on-the-fly by the dataloader — not stored --- ### 1b. From Blender (Full G-Buffer) Produces all 20 feature channels including normals, depth, mat IDs, and shadow. #### Blender requirements - Blender 3.x+ or 5.x+, Cycles render engine (5.x API differences handled automatically) - Object indices set: *Properties → Object → Relations → Object Index* must be > 0 for objects you want tracked in `matid` (IndexOB pass) #### Step 1 — Render EXRs ```bash blender -b scene.blend -P cnn_v3/training/blender_export.py -- \ --output /tmp/renders/frame_### \ --width 640 --height 360 \ --start-frame 1 --end-frame 200 ``` The `--` separator is **required**; arguments after it are passed to the Python script, not to Blender. Each `#` in `--output` is replaced by a zero-padded frame digit. **Available flags:** | Flag | Default | Notes | |------|---------|-------| | `--output PATH` | `//renders/frame_###` | `//` = blend file directory; `###` = frame padding | | `--width N` | 640 | Render resolution | | `--height N` | 360 | Render resolution | | `--start-frame N` | scene start | First frame | | `--end-frame N` | scene end | Last frame | | `--view-layer NAME` | first layer | View layer name; pass `?` to list available layers | **Render pass → CNN channel mapping:** | Blender pass | EXR channels | CNN use | |-------------|-------------|---------| | Combined | `.R .G .B .A` | `target.png` (beauty, sRGB-converted) | | DiffCol | `.R .G .B` | `albedo.png` (linear → sRGB gamma 2.2) | | Normal | `.X .Y .Z` | `normal.png` (world-space, oct-encoded to RG) | | Z | `.R` | `depth.png` (mapped as 1/(z+1) → uint16) | | IndexOB | `.R` | `matid.png` (object index, clamped uint8) | | Shadow | `.R` | `shadow.png` (255 = lit, 0 = shadowed) | | Combined alpha | `.A` | `transp.png` (inverted: 0 = opaque) | **Pitfall:** Blender `Normal` pass uses `.X .Y .Z` channel names in the EXR, not `.R .G .B`. `pack_blender_sample.py` handles both naming conventions automatically. #### Step 2 — Pack EXRs into sample directories ```bash python3 cnn_v3/training/pack_blender_sample.py \ --exr /tmp/renders/frame_0001.exr \ --output dataset/full/sample_0001/ ``` **Dependencies:** `pip install openexr` (preferred) or `pip install imageio[freeimage]` **Batch packing:** ```bash for exr in /tmp/renders/frame_*.exr; do name=$(basename "${exr%.exr}") python3 pack_blender_sample.py --exr "$exr" \ --output dataset/full/${name}/ done ``` **What gets written:** | File | Source | Transform | |------|--------|-----------| | `albedo.png` | DiffCol pass | Linear → sRGB (γ=2.2), uint8 | | `normal.png` | Normal pass | XYZ unit → octahedral RG, uint8 | | `depth.png` | Z pass | 1/(z+1) normalized, uint16 | | `matid.png` | IndexOB pass | Clamped [0,255], uint8 | | `shadow.png` | Shadow pass | uint8 (255=lit) | | `transp.png` | Combined alpha | 1−alpha, uint8 | | `target.png` | Combined beauty | Linear → sRGB, RGBA uint8 | **Note:** `depth_grad`, `mip1`, `mip2` are computed on-the-fly by the dataloader. `prev.rgb` is always zero during training (no temporal history for static frames). **Pitfalls:** - `DiffCol` pass not found → warning printed, albedo zeroed (not fatal; training continues) - `IndexOB` all zero if Object Index not set in Blender object properties - Alpha convention: Blender alpha=1 means opaque; `transp.png` inverts this (transp=0 opaque) - `Shadow` pass in Cycles must be explicitly enabled in Render Properties → Passes → Effects --- ### 1c. Dataset Layout ``` dataset/ simple/ ← photo samples, use --input-mode simple sample_001/ albedo.png normal.png depth.png matid.png shadow.png transp.png target.png ← must be replaced with stylized target sample_002/ ... full/ ← Blender samples, use --input-mode full sample_0001/ sample_0002/ ... ``` - If `simple/` or `full/` subdir is absent the dataloader scans the root directly - Minimum viable dataset: 1 sample (smoke test only); practical minimum ~50+ for training - You can mix Blender and photo samples in the same subdir; the dataloader treats them identically --- ## 2. Training the U-Net + FiLM The U-Net conv weights and FiLM MLP train **jointly** in a single run. No separate steps. ### Prerequisites ```bash pip install torch torchvision pillow numpy opencv-python cd cnn_v3/training ``` ### Quick-start commands **Smoke test — 1 epoch, validates end-to-end without GPU:** ```bash python3 train_cnn_v3.py --input dataset/ --epochs 1 \ --patch-size 32 --detector random ``` **Standard photo training (patch-based):** ```bash python3 train_cnn_v3.py \ --input dataset/ \ --input-mode simple \ --epochs 200 ``` **Blender G-buffer training:** ```bash python3 train_cnn_v3.py \ --input dataset/ \ --input-mode full \ --epochs 200 ``` **Full-image mode (better global coherence, slower):** ```bash python3 train_cnn_v3.py \ --input dataset/ \ --input-mode full \ --full-image --image-size 256 \ --epochs 500 ``` ### Flag reference | Flag | Default | Notes | |------|---------|-------| | `--input DIR` | `training/dataset` | Dataset root; always set explicitly | | `--input-mode` | `simple` | `simple`=photos, `full`=Blender G-buffer | | `--epochs N` | 200 | 500 recommended for full-image mode | | `--batch-size N` | 16 | Reduce to 4–8 on GPU OOM | | `--lr F` | 1e-3 | Reduce to 1e-4 if loss oscillates or NaN | | `--patch-size N` | 64 | Smaller = faster epoch, less spatial context | | `--patches-per-image N` | 256 | Reduce for small datasets | | `--detector` | `harris` | `random` for smoke tests; `shi-tomasi` as alternative | | `--channel-dropout-p F` | 0.3 | Lower if all samples have geometry (Blender only) | | `--full-image` | off | Resize full image instead of patch crops | | `--image-size N` | 256 | Resize target; only used with `--full-image` | | `--enc-channels` | `4,8` | Must match C++ constants if changed | | `--film-cond-dim N` | 5 | Must match `CNNv3FiLMParams` field count in C++ | | `--checkpoint-dir DIR` | `checkpoints/` | Set per-experiment | | `--checkpoint-every N` | 50 | 0 to disable intermediate checkpoints | ### Architecture at startup The model prints its parameter count: ``` Model: enc=[4, 8] film_cond_dim=5 params=2097 (~3.9 KB f16) ``` If `params` is much higher, `--enc-channels` was changed; update C++ constants accordingly. ### FiLM joint training The conditioning vector `cond` is **randomised per sample** during training: ```python cond = np.random.rand(5).astype(np.float32) # uniform [0,1]^5 ``` This covers the full input space so the MLP is well-conditioned for any beat/audio/style combination at inference time. At inference, real values are fed from `set_film_params()`. ### Channel dropout Applied per-sample to make the model robust to missing channels: | Channel group | Channels | Drop probability | |---------------|----------|-----------------| | Geometric | normal.xy, depth, depth_grad.xy [3,4,5,6,7] | `channel_dropout_p` (default 0.3) | | Context | mat_id, shadow, transp [8,18,19] | `channel_dropout_p × 0.67` (~0.2) | | Temporal | prev.rgb [9,10,11] | 0.5 (always) | This is why a model trained on Blender data also works on photos (geometry zeroed). To disable dropout for a pure-Blender model: `--channel-dropout-p 0`. ### Checkpoints Saved as `.pth` at `checkpoints/checkpoint_epoch_N.pth`. Contents of each checkpoint: - `epoch` — epoch number - `model_state_dict` — all weights (conv + FiLM MLP) - `optimizer_state_dict` — Adam state (not needed for export) - `loss` — final avg batch loss - `config` — `{enc_channels, film_cond_dim, input_mode}` — **required by export script** The final checkpoint is always written even if `--checkpoint-every 0`. ### Diagnosing training problems | Symptom | Likely cause | Fix | |---------|-------------|-----| | `RuntimeError: No samples found` | Wrong `--input` or missing `albedo.png` | Check dataset path | | Loss stuck at epoch 1 | Dataset too small | Add more samples | | Loss NaN from epoch 1 | Learning rate too high | Use `--lr 1e-4` | | CUDA OOM | Batch or patch too large | `--batch-size 4 --patch-size 32` | | Loss oscillates | LR too high late in training | Use `--lr 1e-4` or cosine schedule | | Loss drops then plateaus | Too few samples | Add more or use `--full-image` | --- ## 3. Exporting Weights Converts a trained `.pth` checkpoint to two raw binary files for the C++ runtime. ```bash cd cnn_v3/training python3 export_cnn_v3_weights.py checkpoints/checkpoint_epoch_200.pth # writes to export/ by default python3 export_cnn_v3_weights.py checkpoints/checkpoint_epoch_200.pth \ --output /path/to/assets/ ``` ### Output files **`cnn_v3_weights.bin`** — conv+bias weights for all 5 passes, packed as f16-pairs-in-u32: | Layer | f16 count | Bytes | |-------|-----------|-------| | enc0 Conv(20→4,3×3)+bias | 724 | — | | enc1 Conv(4→8,3×3)+bias | 296 | — | | bottleneck Conv(8→8,1×1)+bias | 72 | — | | dec1 Conv(16→4,3×3)+bias | 580 | — | | dec0 Conv(8→4,3×3)+bias | 292 | — | | **Total** | **1964 f16** | **3928 bytes** | **`cnn_v3_film_mlp.bin`** — FiLM MLP weights as raw f32, row-major: | Layer | Shape | f32 count | |-------|-------|-----------| | L0 weight | (16, 5) | 80 | | L0 bias | (16,) | 16 | | L1 weight | (40, 16) | 640 | | L1 bias | (40,) | 40 | | **Total** | | **776 f32 = 3104 bytes** | The FiLM MLP is for CPU-side inference (future — see §4d). The U-Net weights in `cnn_v3_weights.bin` are what you need immediately. ### f16 packing format WGSL `get_w(buf, base, idx)` reads: `pair = buf[(base+idx)/2]`. - Even index → low 16 bits of u32 - Odd index → high 16 bits of u32 The export script produces this layout: `u32 = u16[0::2] | (u16[1::2] << 16)`. ### Expected output ``` Checkpoint: epoch=200 loss=0.012345 enc_channels=[4, 8] film_cond_dim=5 cnn_v3_weights.bin 1964 f16 values → 982 u32 → 3928 bytes Upload via CNNv3Effect::upload_weights(queue, data, 3928) cnn_v3_film_mlp.bin L0: weight (16, 5) + bias (16,) L1: weight (40, 16) + bias (40,) 776 f32 values → 3104 bytes ``` ### Pitfalls - **`enc_channels` mismatch:** if you changed `--enc-channels` during training, the layer size assertion in the export script fires. The C++ weight-offset constants (`kEnc0Weights` etc.) in `cnn_v3_effect.cc` must also be updated to match. - **Old checkpoint missing `config`:** if `config` key is absent (checkpoint from a very early version), the script defaults to `enc_channels=[4,8], film_cond_dim=5`. - **`weights_only=True`:** requires PyTorch ≥ 2.0. If you get a warning, upgrade torch. --- ## 4. Wiring into CNNv3Effect (C++) ### Class overview `CNNv3Effect` (in `cnn_v3/src/cnn_v3_effect.h/.cc`) implements the `Effect` base class. It owns: - 5 compute pipelines (enc0, enc1, bottleneck, dec1, dec0) - 5 params uniform buffers with per-pass `weight_offset` + FiLM γ/β - 1 shared storage buffer `weights_buf_` (~4 KB, read-only across all shaders) ### Wiring in a `.seq` file ``` SEQUENCE 0 0 "Scene with CNN v3" EFFECT + GBufferEffect prev_cnn -> gbuf_feat0 gbuf_feat1 0 60 EFFECT + CNNv3Effect gbuf_feat0 gbuf_feat1 -> sink 0 60 ``` Or direct C++: ```cpp #include "cnn_v3/src/cnn_v3_effect.h" auto cnn = std::make_shared( ctx, /*inputs=*/ {"gbuf_feat0", "gbuf_feat1"}, /*outputs=*/{"cnn_output"}, /*start=*/0.0f, /*end=*/60.0f); ``` ### Uploading weights Load `cnn_v3_weights.bin` once at startup, before the first `render()`: ```cpp // Read binary file std::vector data; { std::ifstream f("cnn_v3_weights.bin", std::ios::binary | std::ios::ate); data.resize(f.tellg()); f.seekg(0); f.read(reinterpret_cast(data.data()), data.size()); } // Upload to GPU cnn->upload_weights(ctx.queue, data.data(), (uint32_t)data.size()); ``` Before `upload_weights()`: all conv weights are zero, so output is `sigmoid(0) = 0.5` gray. After: output reflects trained style. ### Setting FiLM parameters each frame Call before `render()` each frame: ```cpp CNNv3FiLMParams fp; fp.beat_phase = params.beat_phase; // 0-1 within current beat fp.beat_norm = params.beat_time / 8.0f; // normalized 8-beat cycle fp.audio_intensity = params.audio_intensity; // peak audio level [0,1] fp.style_p0 = my_style_p0; // user-defined style param fp.style_p1 = my_style_p1; cnn->set_film_params(fp); cnn->render(encoder, params, nodes); ``` **Current `set_film_params` behaviour (placeholder):** applies a hardcoded linear mapping — audio modulates gamma, beat modulates beta. This is a heuristic until `cnn_v3_film_mlp.bin` is integrated as a CPU-side MLP. **Future MLP inference** (when integrating `cnn_v3_film_mlp.bin`): 1. Load `cnn_v3_film_mlp.bin` → 4 matrices/biases in f32 2. Run forward pass: `h = relu(cond @ L0_W.T + L0_b); out = h @ L1_W.T + L1_b` 3. Split `out[40]` into per-layer γ/β and write into the Params structs directly ### Uniform struct layout (for debugging) `CnnV3Params4ch` (enc0, dec1, dec0 — 64 bytes): ``` offset 0: weight_offset u32 offset 4-31: padding (vec3u has align=16 in WGSL) offset 32: gamma[4] vec4f offset 48: beta[4] vec4f ``` `CnnV3ParamsEnc1` (enc1 — 96 bytes): same header, then `gamma_lo/hi` at 32/48, `beta_lo/hi` at 64/80. Static asserts in `cnn_v3_effect.h` verify exact sizes; a compile failure here means the WGSL layout diverged from the C++ struct. ### Intermediate node names Internal textures are named `_enc0`, `_enc1`, `_bottleneck`, `_dec1`. These are declared in `declare_nodes()` at the correct fractional resolutions (W/2, W/4). Do not reference them from outside the effect unless debugging. ### Pitfalls - **`upload_weights` size mismatch:** the call is a raw `wgpuQueueWriteBuffer`. If the `.bin` was generated with different `enc_channels`, inference silently corrupts. Always verify sizes match. - **`set_film_params` must be called before `render()`** each frame; stale shadow copies from the previous frame persist otherwise. - **GBufferEffect must precede CNNv3Effect** in the same command encoder. - **Bind groups are rebuilt each `render()`** — node texture views may change on resize. --- ## 5. Running a Demo ### Build ```bash cmake -B build -DCMAKE_BUILD_TYPE=Release cmake --build build -j$(nproc) ./build/demo ``` ### Expected visual output | Weights state | FiLM state | Expected output | |---------------|-----------|-----------------| | Not uploaded (zero) | any | Uniform gray (all channels ≈ 0.5) | | Uploaded | Identity (γ=1, β=0) | Stylization from conv weights only | | Uploaded | Varying beat_phase | Per-channel gamma/beta shift visible | | Uploaded | Full audio + beat | Full dynamic style modulation | ### Sanity checks 1. **Black output:** GBufferEffect likely didn't run. Confirm it precedes CNNv3Effect and that `set_scene()` was called. 2. **Uniform gray:** weights not uploaded. Check file path and that `upload_weights` was called before the first `render()`. 3. **Correct but static style:** `set_film_params` may be called with constant zeros. Animate `beat_phase` 0→1 to verify FiLM response. 4. **Resolution artefacts at enc1/bottleneck boundaries:** check that `W` and `H` are divisible by 4 (required by the 2-level pooling chain). --- ## 6. Parity Testing The parity test validates that WGSL shaders produce bit-accurate results vs. the Python/NumPy reference implementation in `gen_test_vectors.py`. ### Build and run ```bash cmake -B build -DDEMO_BUILD_TESTS=ON cmake --build build -j4 cd build && ./test_cnn_v3_parity ``` Two tests run: 1. **Zero-weight test:** all conv weights zero → output must equal `sigmoid(0) = 0.5` (deterministic, no reference vectors needed) 2. **Random-weight test:** random weights from fixed seed=42 applied to an 8×8 test tensor → WGSL output compared against Python-computed reference values ### Pass criteria Tolerance: **max absolute error ≤ 1/255 = 3.92e-3** (one ULP in uint8 space) Current results (8×8 tensors): ``` enc0 max_err = 1.95e-3 ✓ dec1 max_err = 1.95e-3 ✓ final max_err = 4.88e-4 ✓ ``` ### Regenerating test vectors If you change `gen_test_vectors.py` or need to refresh the seed: ```bash cd cnn_v3/training python3 gen_test_vectors.py --header > ../test_vectors.h ``` Then recompile the parity test. The `--header` flag emits pure C to stdout; everything else (self-test results) goes to stderr. ### Parity rules baked into the shaders If results drift after shader edits, verify these invariants match the Python reference: | Rule | WGSL | Python (`gen_test_vectors.py`) | |------|------|-------------------------------| | Border padding | zero-pad (not clamp) | `np.pad(..., mode='constant')` | | Downsampling | AvgPool 2×2 exact | `0.25 * sum of 4 neighbours` | | Upsampling | `coord / 2` integer | `min(y//2, qH-1)` nearest | | Skip connections | channel concatenation | `np.concatenate([up, skip], axis=2)` | | FiLM application | after conv+bias, before ReLU | `max(0, γ·x + β)` | | Weight layout | OIHW, biases appended | `o * IN * K² + i * K² + ky*K + kx` | | f16 quantisation | rgba16float / rgba32uint boundaries | `np.float16(out).astype(np.float32)` | ### Pitfalls - **Test fails on null/headless backend:** the test requires a real GPU (Dawn/wgpu). Will error early if the WebGPU device cannot be created. - **Consistent failure on random-weight test only:** `test_vectors.h` is out of sync. Regenerate with `gen_test_vectors.py --header` and recompile. - **Consistent failure on both tests:** shader logic diverged from the parity rules above. --- ## 7. HTML WebGPU Tool **Location:** `cnn_v3/tools/` — three files, no build step. | File | Lines | Contents | |------|-------|----------| | `index.html` | 147 | HTML + CSS | | `shaders.js` | 252 | WGSL shader constants, weight-offset constants | | `tester.js` | 540 | `CNNv3Tester` class, event wiring | ### Usage ```bash # Requires HTTP server (WebGPU blocked on file://) cd /path/to/demo python3 -m http.server 8080 # Open: http://localhost:8080/cnn_v3/tools/ ``` Or on macOS with Chrome: ```bash open -a "Google Chrome" --args --allow-file-access-from-files open cnn_v3/tools/index.html ``` ### Workflow 1. **Drop `cnn_v3_weights.bin`** onto the left "weights" drop zone. 2. **Drop a PNG or video** onto the centre canvas → CNN runs immediately. 3. _(Optional)_ **Drop `cnn_v3_film_mlp.bin`** → FiLM sliders become active. 4. Adjust **beat_phase / beat_norm / audio_int / style_p0 / style_p1** sliders → reruns on change. 5. Click layer buttons (**Feat · Enc0 · Enc1 · BN · Dec1 · Output**) in the right panel to inspect activations. 6. **Save PNG** to export the current output. Keyboard: `[SPACE]` toggle original · `[D]` diff×10. ### Input files | File | Format | Notes | |------|--------|-------| | `cnn_v3_weights.bin` | raw u32 (no header) | 982 u32 = 1964 f16 = ~3.9 KB | | `cnn_v3_film_mlp.bin` | raw f32 | 776 f32 = 3.1 KB; optional — identity FiLM used if absent | Both produced by `export_cnn_v3_weights.py` (§3). ### Texture chain | Texture | Format | Size | |---------|--------|------| | `feat_tex0` | rgba32uint | W × H (8 f16: albedo, normal, depth, depth_grad) | | `feat_tex1` | rgba32uint | W × H (12 u8: mat_id, prev, mip1, mip2, shadow, transp) | | `enc0_tex` | rgba16float | W × H | | `enc1_tex` | rgba32uint | W/2 × H/2 (8 f16 packed) | | `bn_tex` | rgba32uint | W/4 × H/4 | | `dec1_tex` | rgba16float | W/2 × H/2 | | `output_tex` | rgba16float | W × H → displayed on canvas | ### Simple mode (photo input) Albedo = image RGB, mip1/mip2 from GPU mipmaps, shadow = 1.0, transp = 1 − alpha, all geometric channels (normal, depth, depth_grad, mat_id, prev) = 0. ### Browser requirements - Chrome 113+ / Edge 113+ (WebGPU on by default) - Firefox Nightly with `dom.webgpu.enabled = true` ### Pitfalls - `rgba32uint` and `rgba16float` textures both need `STORAGE_BINDING | TEXTURE_BINDING` usage. - Weight offsets are **f16 indices** (enc0=0, enc1=724, bn=1020, dec1=1092, dec0=1672). - Uniform buffer layouts must match WGSL `Params` structs exactly (padding included). --- ## Appendix A — File Reference | File | Purpose | |------|---------| | `cnn_v3/training/gen_sample.sh` | One-shot wrapper: pack input+target pair into sample directory | | `cnn_v3/training/blender_export.py` | Configure Blender Cycles passes, render multi-layer EXR (Blender 3.x–5.x compatible) | | `cnn_v3/training/pack_blender_sample.py` | EXR → sample PNG directory (7 files) | | `cnn_v3/training/pack_photo_sample.py` | Photo → zeroed-geometry sample directory | | `cnn_v3/training/cnn_v3_utils.py` | Dataset class, feature assembly, channel dropout, salient-point detection | | `cnn_v3/training/train_cnn_v3.py` | CNNv3 model definition, training loop, CLI | | `cnn_v3/training/export_cnn_v3_weights.py` | Checkpoint → `cnn_v3_weights.bin` + `cnn_v3_film_mlp.bin` | | `cnn_v3/training/gen_test_vectors.py` | NumPy reference forward pass + C header generator | | `cnn_v3/test_vectors.h` | Compiled-in test vectors (auto-generated, do not edit) | | `cnn_v3/src/cnn_v3_effect.h` | C++ class, Params structs, `CNNv3FiLMParams` API | | `cnn_v3/src/cnn_v3_effect.cc` | Effect implementation: pipelines, render, weight upload | | `cnn_v3/src/gbuffer_effect.h/.cc` | GBufferEffect: rasterise + pack G-buffer feature textures | | `src/tests/gpu/test_cnn_v3_parity.cc` | Per-pixel parity test (WGSL vs. Python reference) | | `cnn_v3/docs/CNN_V3.md` | Full architecture spec (U-Net, FiLM, WGSL uniform layouts) | | `cnn_v3/tools/index.html` | HTML tool — UI shell + CSS | | `cnn_v3/tools/shaders.js` | HTML tool — inline WGSL shaders + weight-offset constants | | `cnn_v3/tools/tester.js` | HTML tool — CNNv3Tester class, inference pipeline, layer viz | | `cnn_v2/tools/cnn_v2_test/index.html` | HTML tool reference pattern (v2) | --- ## Appendix B — 20-Channel Feature Layout | Index | Channel | Source | Encoding | |-------|---------|--------|----------| | 0–2 | albedo.rgb | `albedo.png` | f32 [0,1] | | 3–4 | normal.xy | `normal.png` RG | oct-encoded f32 [0,1] | | 5 | depth | `depth.png` | f32 [0,1] (1/(z+1)) | | 6–7 | depth_grad.xy | computed from depth | central diff, signed | | 8 | mat_id | `matid.png` | f32 [0,1] | | 9–11 | prev.rgb | previous frame output | zero during training | | 12–14 | mip1.rgb | pyrdown(albedo) | f32 [0,1] | | 15–17 | mip2.rgb | pyrdown(mip1) | f32 [0,1] | | 18 | shadow | `shadow.png` | f32 [0,1] (1=lit) | | 19 | transp | `transp.png` | f32 [0,1] (0=opaque) | **Feature texture packing** (`feat_tex0` / `feat_tex1`, both `rgba32uint`): ``` feat_tex0 (4×u32 = 8 f16 channels via pack2x16float): .x = pack2x16float(albedo.r, albedo.g) .y = pack2x16float(albedo.b, normal.x) .z = pack2x16float(normal.y, depth) .w = pack2x16float(dgrad.x, dgrad.y) feat_tex1 (4×u32 = 12 u8 channels + padding via pack4x8unorm): .x = pack4x8unorm(mat_id, prev.r, prev.g, prev.b) .y = pack4x8unorm(mip1.r, mip1.g, mip1.b, mip2.r) .z = pack4x8unorm(mip2.g, mip2.b, shadow, transp) .w = 0 (unused, 8 reserved channels) ```