# CNN Post-Processing Effect Neural network-based stylization for rendered scenes. --- ## Overview Trainable convolutional neural network layers for artistic stylization (painterly, sketch, cel-shaded effects) with minimal runtime overhead. **Key Features:** - Position-aware layer 0 (coordinate input for vignetting, edge effects) - Multi-layer convolutions (3×3, 5×5, 7×7 kernels) with automatic chaining - Original input available to all layers via framebuffer capture - Configurable final blend with original scene - Modular WGSL shader architecture - Hardcoded weights (trained offline via PyTorch) - ~5-8 KB binary footprint --- ## Architecture ### RGBD → Grayscale Pipeline **Input:** RGBD (RGB + inverse depth D=1/z) **Output:** Grayscale (1 channel) **Layer Input:** 7 channels = [RGBD, UV coords, grayscale] all normalized to [-1,1] **Architecture:** - **Inner layers (0..N-2):** Conv2d(7→4) - output RGBD - **Final layer (N-1):** Conv2d(7→1) - output grayscale ```wgsl // Inner layers: 7→4 (RGBD output) fn cnn_conv3x3_7to4( tex: texture_2d, samp: sampler, uv: vec2, resolution: vec2, original: vec4, # Original RGBD [-1,1] weights: array, 36> # 9 pos × 4 out × (7 weights + bias) ) -> vec4 // Final layer: 7→1 (grayscale output) fn cnn_conv3x3_7to1( tex: texture_2d, samp: sampler, uv: vec2, resolution: vec2, original: vec4, weights: array, 9> # 9 pos × (7 weights + bias) ) -> f32 ``` **Input normalization:** - **fs_main** normalizes textures once: `(tex - 0.5) * 2` → [-1,1] - **Conv functions** normalize UV coords: `(uv - 0.5) * 2` → [-1,1] - **Grayscale** computed from normalized RGBD: `0.2126*R + 0.7152*G + 0.0722*B` - **Inter-layer data** stays in [-1,1] (no denormalization) - **Final output** denormalized for display: `(result + 1.0) * 0.5` → [0,1] **Activation:** tanh for inner layers (output stays [-1,1]), none for final layer ### Multi-Layer Architecture CNNEffect supports multi-layer networks via automatic effect chaining: 1. **Timeline specifies total layers**: `CNNEffect layers=3 blend=0.7` 2. **Compiler expands to chain**: 3 separate CNNEffect instances (layer 0→1→2) 3. **Framebuffer capture**: Layer 0 captures original input to `"captured_frame"` 4. **Original input binding**: All layers access original via `@binding(4)` 5. **Final blend**: Last layer blends result with original: `mix(original, result, 0.7)` **Framebuffer Capture API:** - `Effect::needs_framebuffer_capture()` - effect requests pre-capture - MainSequence automatically blits input → `"captured_frame"` auxiliary texture - Generic mechanism usable by any effect ### File Structure ``` src/gpu/effects/ cnn_effect.h/cc # CNNEffect class + framebuffer capture workspaces/main/shaders/cnn/ cnn_activation.wgsl # tanh, ReLU, sigmoid, leaky_relu cnn_conv3x3.wgsl # 3×3 convolution (standard + coord-aware) cnn_conv5x5.wgsl # 5×5 convolution (standard + coord-aware) cnn_conv7x7.wgsl # 7×7 convolution (standard + coord-aware) cnn_weights_generated.wgsl # Weight arrays (auto-generated by train_cnn.py) cnn_layer.wgsl # Main shader with layer switches (auto-generated by train_cnn.py) ``` --- ## Training Workflow ### 1. Prepare Training Data Collect input/target image pairs: - **Input:** RGBA (RGB + depth as alpha channel, D=1/z) - **Target:** Grayscale stylized output ```bash training/input/img_000.png # RGBA render (RGB + depth) training/output/img_000.png # Grayscale target ``` **Note:** Input images must be RGBA where alpha = inverse depth (1/z) ### 2. Train Network ```bash python3 training/train_cnn.py \ --input training/input \ --target training/output \ --layers 1 \ --kernel-sizes 3 \ --epochs 500 \ --checkpoint-every 50 ``` **Multi-layer example (3 layers with varying kernel sizes):** ```bash python3 training/train_cnn.py \ --input training/input \ --target training/output \ --layers 3 \ --kernel-sizes 3,5,3 \ --epochs 1000 \ --checkpoint-every 100 ``` **Note:** Training script auto-generates: - `cnn_weights_generated.wgsl` - weight arrays for all layers - `cnn_layer.wgsl` - shader with layer switches and original input binding **Resume from checkpoint:** ```bash python3 training/train_cnn.py \ --input training/input \ --target training/output \ --resume training/checkpoints/checkpoint_epoch_200.pth ``` **Export WGSL from checkpoint (no training):** ```bash python3 training/train_cnn.py \ --export-only training/checkpoints/checkpoint_epoch_200.pth \ --output workspaces/main/shaders/cnn/cnn_weights_generated.wgsl ``` **Generate ground truth (for shader validation):** ```bash python3 training/train_cnn.py \ --infer training/input/img_000.png \ --export-only training/checkpoints/checkpoint_epoch_200.pth \ --output training/ground_truth.png ``` ### 3. Rebuild Demo Training script auto-generates both `cnn_weights_generated.wgsl` and `cnn_layer.wgsl`: ```bash cmake --build build -j4 ./build/demo64k ``` --- ## Usage ### C++ Integration **Single layer (manual):** ```cpp #include "gpu/effects/cnn_effect.h" CNNEffectParams p; p.layer_index = 0; p.total_layers = 1; p.blend_amount = 1.0f; auto cnn = std::make_shared(ctx, p); timeline.add_effect(cnn, start_time, end_time); ``` **Multi-layer (automatic via timeline compiler):** Use timeline syntax - `seq_compiler` expands to multiple instances. ### Timeline Examples **Single-layer CNN (full stylization):** ``` SEQUENCE 10.0 0 EFFECT + Hybrid3DEffect 0.00 5.00 EFFECT + CNNEffect 0.50 5.00 layers=1 ``` **Multi-layer CNN with blend:** ``` SEQUENCE 10.0 0 EFFECT + Hybrid3DEffect 0.00 5.00 EFFECT + CNNEffect 0.50 5.00 layers=3 blend=0.7 ``` Expands to: ```cpp // Layer 0 (captures original, blend=1.0) { CNNEffectParams p; p.layer_index = 0; p.total_layers = 3; p.blend_amount = 1.0f; seq->add_effect(std::make_shared(ctx, p), 0.50f, 5.00f, 1); } // Layer 1 (blend=1.0) { CNNEffectParams p; p.layer_index = 1; p.total_layers = 3; p.blend_amount = 1.0f; seq->add_effect(std::make_shared(ctx, p), 0.50f, 5.00f, 2); } // Layer 2 (final blend=0.7) { CNNEffectParams p; p.layer_index = 2; p.total_layers = 3; p.blend_amount = 0.7f; seq->add_effect(std::make_shared(ctx, p), 0.50f, 5.00f, 3); } ``` --- ## Shader Structure **Bindings:** ```wgsl @group(0) @binding(0) var smplr: sampler; @group(0) @binding(1) var txt: texture_2d; // Current layer input @group(0) @binding(2) var uniforms: CommonUniforms; @group(0) @binding(3) var params: CNNLayerParams; @group(0) @binding(4) var original_input: texture_2d; // Layer 0 input (captured) ``` **Fragment shader logic:** ```wgsl @fragment fn fs_main(@builtin(position) p: vec4) -> @location(0) vec4 { let uv = p.xy / uniforms.resolution; let input = textureSample(txt, smplr, uv); // Layer N-1 output let original = textureSample(original_input, smplr, uv); // Layer 0 input var result = vec4(0.0); if (params.layer_index == 0) { result = cnn_conv3x3_with_coord(txt, smplr, uv, uniforms.resolution, rgba_weights_layer0, coord_weights_layer0, bias_layer0); result = cnn_tanh(result); } // ... other layers // Blend with ORIGINAL input (not previous layer) return mix(original, result, params.blend_amount); } ``` **Weight Storage:** **Inner layers (7→4 RGBD output):** ```wgsl // Structure: array, 36> // 9 positions × 4 output channels, each with 7 weights + bias const weights_layer0: array, 36> = array( array(w0_r, w0_g, w0_b, w0_d, w0_u, w0_v, w0_gray, bias0), // pos0_ch0 array(w1_r, w1_g, w1_b, w1_d, w1_u, w1_v, w1_gray, bias1), // pos0_ch1 // ... 34 more entries ); ``` **Final layer (7→1 grayscale output):** ```wgsl // Structure: array, 9> // 9 positions, each with 7 weights + bias const weights_layerN: array, 9> = array( array(w0_r, w0_g, w0_b, w0_d, w0_u, w0_v, w0_gray, bias0), // pos0 // ... 8 more entries ); ``` --- ## Size Budget | Component | Size | Notes | |-----------|------|-------| | Activation functions | ~200 B | 4 functions | | Conv3x3 (standard + coord) | ~500 B | Both variants | | Conv5x5 (standard + coord) | ~700 B | Both variants | | Conv7x7 (standard + coord) | ~900 B | Both variants | | Main shader | ~800 B | Layer composition | | C++ implementation | ~300 B | Effect class | | **Coord weights** | **+32 B** | Per-layer overhead (layer 0 only) | | **RGBA weights** | **2-6 KB** | Depends on depth/kernel sizes | | **Total** | **5-9 KB** | Acceptable for 64k | **Optimization strategies:** - Quantize weights (float32 → int8) - Prune near-zero weights - Use separable convolutions --- ## Testing ```bash ./build/test_demo_effects # CNN construction/shader tests ./build/demo64k # Visual test ``` --- ## Blend Parameter Behavior **blend_amount** controls final compositing with original: - `blend=0.0`: Pure original (no CNN effect) - `blend=0.5`: 50% original + 50% CNN - `blend=1.0`: Pure CNN output (full stylization) **Important:** Blend uses captured layer 0 input, not previous layer output. **Example use cases:** - `blend=1.0`: Full stylization (default) - `blend=0.7`: Subtle effect preserving original details - `blend=0.3`: Light artistic touch ## Troubleshooting **Shader compilation fails:** - Check `cnn_weights_generated.wgsl` syntax - Verify snippets registered in `shaders.cc::InitShaderComposer()` - Ensure `cnn_layer.wgsl` has 5 bindings (including `original_input`) **Black/corrupted output:** - Weights untrained (identity placeholder) - Check `captured_frame` auxiliary texture is registered - Verify layer priorities in timeline are sequential **Wrong blend result:** - Ensure layer 0 has `needs_framebuffer_capture() == true` - Check MainSequence framebuffer capture logic - Verify `original_input` binding is populated **Training loss not decreasing:** - Lower learning rate (`--learning-rate 0.0001`) - More epochs (`--epochs 1000`) - Check input/target image alignment --- ## References - **Training Script:** `training/train_cnn.py` - **Shader Composition:** `doc/SEQUENCE.md` - **Effect System:** `src/gpu/effect.h`