# CNN v3 How-To Practical playbook for the CNN v3 pipeline: G-buffer effect, training data, training the U-Net+FiLM network, and wiring everything into the demo. See `CNN_V3.md` for the full architecture design. --- ## 1. Using GBufferEffect in the Demo `GBufferEffect` is a full-class effect (Path B in `doc/EFFECT_WORKFLOW.md`). It rasterizes proxy geometry to MRT G-buffer textures and packs them into two `rgba32uint` feature textures (`feat_tex0`, `feat_tex1`) consumed by the CNN. ### Registration (already done) - Shaders in `assets.txt`: `SHADER_GBUF_RASTER`, `SHADER_GBUF_PACK` - Source in `cmake/DemoSourceLists.cmake`: `cnn_v3/src/gbuffer_effect.cc` - Header included in `src/gpu/demo_effects.h` - Test in `src/tests/gpu/test_demo_effects.cc` ### Adding to a Sequence `GBufferEffect` does not exist in `seq_compiler.py` as a named effect yet (no `.seq` syntax integration for Phase 1). Wire it directly in C++ alongside your scene code, or add it to the timeline when the full CNNv3Effect is ready. **C++ wiring example** (e.g. inside a Sequence or main.cc): ```cpp #include "../../cnn_v3/src/gbuffer_effect.h" // Allocate once alongside your scene auto gbuf = std::make_shared( ctx, /*inputs=*/{"prev_cnn"}, // or any dummy node /*outputs=*/{"gbuf_feat0", "gbuf_feat1"}, /*start=*/0.0f, /*end=*/60.0f); gbuf->set_scene(&my_scene, &my_camera); // In render loop, call before CNN pass: gbuf->render(encoder, params, nodes); ``` ### Internal passes Each frame, `GBufferEffect::render()` executes: 1. **Pass 1 — MRT rasterization** (`gbuf_raster.wgsl`) - Proxy box (36 verts) × N objects, instanced - MRT outputs: `gbuf_albedo` (rgba16float), `gbuf_normal_mat` (rgba16float) - Depth test + write into `gbuf_depth` (depth32float) 2. **Pass 2/3 — SDF + Lighting** — TODO (placeholder: shadow=1, transp=0) 3. **Pass 4 — Pack compute** (`gbuf_pack.wgsl`) - Reads all G-buffer textures + `prev_cnn` input - Writes `feat_tex0` + `feat_tex1` (rgba32uint, 20 channels, 32 bytes/pixel) ### Output node names By default the outputs are named from the `outputs` vector passed to the constructor. Use these names when binding the CNN effect input: ``` outputs[0] → feat_tex0 (rgba32uint: albedo.rgb, normal.xy, depth, depth_grad.xy) outputs[1] → feat_tex1 (rgba32uint: mat_id, prev.rgb, mip1.rgb, mip2.rgb, shadow, transp) ``` ### Scene data Call `set_scene(scene, camera)` before the first render. The effect uploads `GlobalUniforms` (view-proj, camera pos, resolution) and `ObjectData` (model matrix, color) to GPU storage buffers each frame. --- ## 2. Preparing Training Data CNN v3 supports two data sources: Blender renders and real photos. ### 2a. From Blender Renders ```bash # 1. In Blender: run the export script (requires Blender 3.x+) blender --background scene.blend --python cnn_v3/training/blender_export.py \ -- --output /tmp/renders/ --frames 200 # 2. Pack into sample directory python3 cnn_v3/training/pack_blender_sample.py \ --render-dir /tmp/renders/frame_0001/ \ --output dataset/blender/sample_0001/ ``` Each sample directory contains: ``` sample_XXXX/ albedo.png — RGB uint8 (material color, pre-lighting) normal.png — RG uint8 (oct-encoded XY, remap [0,1]) depth.png — R uint16 (1/z normalized, 16-bit) matid.png — R uint8 (object index / 255) shadow.png — R uint8 (0=dark, 255=lit) transp.png — R uint8 (0=opaque, 255=transparent) target.png — RGB/RGBA (stylized ground truth) ``` ### 2b. From Real Photos Geometric channels are zeroed; the network degrades gracefully due to channel-dropout training. ```bash python3 cnn_v3/training/pack_photo_sample.py \ --photo cnn_v3/training/input/photo1.jpg \ --output dataset/photos/sample_001/ ``` The output `target.png` defaults to the input photo (no style). Copy in your stylized version as `target.png` before training. ### Dataset layout ``` dataset/ blender/ sample_0001/ sample_0002/ ... photos/ sample_001/ sample_002/ ... ``` Mix freely; the dataloader treats all sample directories uniformly. --- ## 3. Training *(Script not yet written — see TODO.md. Architecture spec in `CNN_V3.md` §Training.)* **Planned command:** ```bash python3 cnn_v3/training/train_cnn_v3.py \ --dataset dataset/ \ --epochs 500 \ --output cnn_v3/weights/cnn_v3_weights.bin ``` **FiLM conditioning** during training: - Beat/audio inputs randomized per sample - MLP: `Linear(5→16) → ReLU → Linear(16→40)` trained jointly with U-Net - Output: γ/β for enc0(4ch) + enc1(8ch) + dec1(4ch) + dec0(4ch) = 40 floats --- ## 4. Running the CNN v3 Effect `CNNv3Effect` is implemented. Wire into a sequence: ```seq # BPM 120 SEQUENCE 0 0 "Scene with CNN v3" EFFECT + GBufferEffect prev_cnn -> gbuf_feat0 gbuf_feat1 0 60 EFFECT + CNNv3Effect gbuf_feat0 gbuf_feat1 -> sink 0 60 ``` FiLM parameters uploaded each frame: ```cpp cnn_v3_effect->set_film_params( params.beat_phase, params.beat_time / 8.0f, params.audio_intensity, style_p0, style_p1); ``` FiLM γ/β default to identity (γ=1, β=0) until `train_cnn_v3.py` produces a trained MLP. --- ## 5. Per-Pixel Validation C++ parity test passes: `src/tests/gpu/test_cnn_v3_parity.cc` (2 tests). ```bash cmake -B build -DDEMO_BUILD_TESTS=ON && cmake --build build -j4 cd build && ./test_cnn_v3_parity ``` Results (8×8 test tensors, random weights): - enc0 max_err = 1.95e-3 ✓ - dec1 max_err = 1.95e-3 ✓ - final max_err = 4.88e-4 ✓ (all ≤ 1/255 = 3.92e-3) Test vectors generated by `cnn_v3/training/gen_test_vectors.py` (PyTorch reference). --- ## 6. Phase Status | Phase | Status | Notes | |-------|--------|-------| | 1 — G-buffer (raster + pack) | ✅ Done | Integrated, 36/36 tests pass | | 1 — G-buffer (SDF + shadow passes) | TODO | Placeholder: shadow=1, transp=0 | | 2 — Training infrastructure | ✅ Done | blender_export.py, pack_*_sample.py | | 3 — WGSL U-Net shaders | ✅ Done | 5 compute shaders + cnn_v3/common snippet | | 4 — C++ CNNv3Effect | ✅ Done | FiLM uniform upload, 36/36 tests pass | | 5 — Parity validation | ✅ Done | test_cnn_v3_parity.cc, max_err=4.88e-4 | | 6 — FiLM MLP training | TODO | train_cnn_v3.py not yet written | --- ## 7. CNN v3 Inference Shaders (Phase 3) Five compute passes, each a standalone WGSL shader using `#include "cnn_v3/common"`. The common snippet provides `get_w()` and `unpack_8ch()`. | Pass | Shader | Input(s) | Output | Dims | |------|--------|----------|--------|------| | enc0 | `cnn_v3_enc0.wgsl` | feat_tex0+feat_tex1 (20ch) | enc0_tex rgba16float (4ch) | full | | enc1 | `cnn_v3_enc1.wgsl` | enc0_tex (AvgPool2×2 inline) | enc1_tex rgba32uint (8ch) | ½ | | bottleneck | `cnn_v3_bottleneck.wgsl` | enc1_tex (AvgPool2×2 inline) | bottleneck_tex rgba32uint (8ch) | ¼ | | dec1 | `cnn_v3_dec1.wgsl` | bottleneck_tex + enc1_tex (skip) | dec1_tex rgba16float (4ch) | ½ | | dec0 | `cnn_v3_dec0.wgsl` | dec1_tex + enc0_tex (skip) | output_tex rgba16float (4ch) | full | **Parity rules baked into the shaders:** - Zero-padding (not clamp) at conv borders - AvgPool 2×2 for downsampling (exact, deterministic) - Nearest-neighbor for upsampling (integer `coord / 2`) - Skip connections: channel concatenation (not add) - FiLM applied after conv+bias, before ReLU: `max(0, γ·x + β)` - No batch norm at inference - Weight layout: OIHW (out × in × kH × kW), biases after conv weights **Params uniform per shader** (`group 0, binding 3`): ``` struct Params { weight_offset: u32, // f16 index into shared weights buffer _pad: vec3u, gamma: vec4f, // FiLM γ (enc1: gamma_lo+gamma_hi for 8ch) beta: vec4f, // FiLM β (enc1: beta_lo+beta_hi for 8ch) } ``` FiLM γ/β are computed CPU-side by the FiLM MLP (Phase 4) and uploaded each frame. **Weight offsets** (f16 units, including bias): | Layer | Weights | Bias | Total f16 | |-------|---------|------|-----------| | enc0 | 20×4×9=720 | +4 | 724 | | enc1 | 4×8×9=288 | +8 | 296 | | bottleneck | 8×8×1=64 | +8 | 72 | | dec1 | 16×4×9=576 | +4 | 580 | | dec0 | 8×4×9=288 | +4 | 292 | | **Total** | | | **2064 f16 = ~4 KB** | **Asset IDs** (registered in `workspaces/main/assets.txt` + `src/effects/shaders.cc`): `SHADER_CNN_V3_COMMON`, `SHADER_CNN_V3_ENC0`, `SHADER_CNN_V3_ENC1`, `SHADER_CNN_V3_BOTTLENECK`, `SHADER_CNN_V3_DEC1`, `SHADER_CNN_V3_DEC0` **C++ usage (Phase 4):** ```cpp auto src = ShaderComposer::Get().Compose({"cnn_v3/common"}, raw_wgsl); ``` --- ## 8. Quick Troubleshooting **GBufferEffect renders nothing / albedo is black** - Check `set_scene()` was called before `render()` - Verify scene has at least one object - Check camera matrix is not degenerate (near/far, aspect) **Pack shader fails to compile** - `gbuf_pack.wgsl` uses no `#include`s; ShaderComposer compose is a no-op - Check `ASSET_SHADER_GBUF_PACK` resolves in assets.txt **Raster shader fails with `#include "common_uniforms"` error** - `ShaderComposer::Get().Compose({"common_uniforms"}, src)` must be called before passing to `wgpuDeviceCreateShaderModule` — already done in effect.cc **G-buffer outputs wrong resolution** - `resize()` is not yet implemented in GBufferEffect; textures are fixed at construction size. Will be added when resize support is needed. --- ## 9. See Also - `cnn_v3/docs/CNN_V3.md` — Full architecture design (U-Net, FiLM, feature layout) - `doc/EFFECT_WORKFLOW.md` — General effect integration guide - `cnn_v2/docs/CNN_V2.md` — Reference implementation (simpler, operational) - `src/tests/gpu/test_demo_effects.cc` — GBufferEffect construction test