# CNN Post-Processing Effect Neural network-based stylization for rendered scenes. --- ## Overview Trainable convolutional neural network layers for artistic stylization (painterly, sketch, cel-shaded effects) with minimal runtime overhead. **Key Features:** - Position-aware layer 0 (coordinate input for vignetting, edge effects) - Multi-layer convolutions (3×3, 5×5, 7×7 kernels) with automatic chaining - Original input available to all layers via framebuffer capture - Configurable final blend with original scene - Modular WGSL shader architecture - Hardcoded weights (trained offline via PyTorch) - ~5-8 KB binary footprint --- ## Architecture ### Coordinate-Aware Layer 0 Layer 0 accepts normalized (x,y) patch center coordinates alongside RGBA samples: ```wgsl fn cnn_conv3x3_with_coord( tex: texture_2d, samp: sampler, uv: vec2, # Center position [0,1] resolution: vec2, rgba_weights: array, 9>, # 9 samples × 4×4 matrix coord_weights: mat2x4, # 2 coords → 4 outputs bias: vec4 ) -> vec4 ``` **Input structure:** 9 RGBA samples (36 values) + 1 xy coordinate (2 values) = 38 inputs → 4 outputs **Size impact:** +32B coord weights, kernel-agnostic **Use cases:** Position-dependent stylization (vignettes, corner darkening, radial gradients) ### Multi-Layer Architecture CNNEffect supports multi-layer networks via automatic effect chaining: 1. **Timeline specifies total layers**: `CNNEffect layers=3 blend=0.7` 2. **Compiler expands to chain**: 3 separate CNNEffect instances (layer 0→1→2) 3. **Framebuffer capture**: Layer 0 captures original input to `"captured_frame"` 4. **Original input binding**: All layers access original via `@binding(4)` 5. **Final blend**: Last layer blends result with original: `mix(original, result, 0.7)` **Framebuffer Capture API:** - `Effect::needs_framebuffer_capture()` - effect requests pre-capture - MainSequence automatically blits input → `"captured_frame"` auxiliary texture - Generic mechanism usable by any effect ### File Structure ``` src/gpu/effects/ cnn_effect.h/cc # CNNEffect class + framebuffer capture workspaces/main/shaders/cnn/ cnn_activation.wgsl # tanh, ReLU, sigmoid, leaky_relu cnn_conv3x3.wgsl # 3×3 convolution (standard + coord-aware) cnn_conv5x5.wgsl # 5×5 convolution (standard + coord-aware) cnn_conv7x7.wgsl # 7×7 convolution (standard + coord-aware) cnn_weights_generated.wgsl # Weight arrays (auto-generated by train_cnn.py) cnn_layer.wgsl # Main shader with layer switches (auto-generated by train_cnn.py) ``` --- ## Training Workflow ### 1. Prepare Training Data Collect input/target image pairs: - **Input:** Raw 3D render - **Target:** Artistic style (hand-painted, filtered, stylized) ```bash training/input/img_000.png # Raw render training/output/img_000.png # Stylized target ``` Use `image_style_processor.py` to generate targets: ```bash python3 training/image_style_processor.py input/ output/ pencil_sketch ``` ### 2. Train Network ```bash python3 training/train_cnn.py \ --input training/input \ --target training/output \ --layers 1 \ --kernel-sizes 3 \ --epochs 500 \ --checkpoint-every 50 ``` **Multi-layer example (3 layers with varying kernel sizes):** ```bash python3 training/train_cnn.py \ --input training/input \ --target training/output \ --layers 3 \ --kernel-sizes 3,5,3 \ --epochs 1000 \ --checkpoint-every 100 ``` **Note:** Training script auto-generates: - `cnn_weights_generated.wgsl` - weight arrays for all layers - `cnn_layer.wgsl` - shader with layer switches and original input binding **Resume from checkpoint:** ```bash python3 training/train_cnn.py \ --input training/input \ --target training/output \ --resume training/checkpoints/checkpoint_epoch_200.pth ``` **Export WGSL from checkpoint (no training):** ```bash python3 training/train_cnn.py \ --export-only training/checkpoints/checkpoint_epoch_200.pth \ --output workspaces/main/shaders/cnn/cnn_weights_generated.wgsl ``` ### 3. Rebuild Demo Training script auto-generates both `cnn_weights_generated.wgsl` and `cnn_layer.wgsl`: ```bash cmake --build build -j4 ./build/demo64k ``` --- ## Usage ### C++ Integration **Single layer (manual):** ```cpp #include "gpu/effects/cnn_effect.h" CNNEffectParams p; p.layer_index = 0; p.total_layers = 1; p.blend_amount = 1.0f; auto cnn = std::make_shared(ctx, p); timeline.add_effect(cnn, start_time, end_time); ``` **Multi-layer (automatic via timeline compiler):** Use timeline syntax - `seq_compiler` expands to multiple instances. ### Timeline Examples **Single-layer CNN (full stylization):** ``` SEQUENCE 10.0 0 EFFECT + Hybrid3DEffect 0.00 5.00 EFFECT + CNNEffect 0.50 5.00 layers=1 ``` **Multi-layer CNN with blend:** ``` SEQUENCE 10.0 0 EFFECT + Hybrid3DEffect 0.00 5.00 EFFECT + CNNEffect 0.50 5.00 layers=3 blend=0.7 ``` Expands to: ```cpp // Layer 0 (captures original, blend=1.0) { CNNEffectParams p; p.layer_index = 0; p.total_layers = 3; p.blend_amount = 1.0f; seq->add_effect(std::make_shared(ctx, p), 0.50f, 5.00f, 1); } // Layer 1 (blend=1.0) { CNNEffectParams p; p.layer_index = 1; p.total_layers = 3; p.blend_amount = 1.0f; seq->add_effect(std::make_shared(ctx, p), 0.50f, 5.00f, 2); } // Layer 2 (final blend=0.7) { CNNEffectParams p; p.layer_index = 2; p.total_layers = 3; p.blend_amount = 0.7f; seq->add_effect(std::make_shared(ctx, p), 0.50f, 5.00f, 3); } ``` --- ## Shader Structure **Bindings:** ```wgsl @group(0) @binding(0) var smplr: sampler; @group(0) @binding(1) var txt: texture_2d; // Current layer input @group(0) @binding(2) var uniforms: CommonUniforms; @group(0) @binding(3) var params: CNNLayerParams; @group(0) @binding(4) var original_input: texture_2d; // Layer 0 input (captured) ``` **Fragment shader logic:** ```wgsl @fragment fn fs_main(@builtin(position) p: vec4) -> @location(0) vec4 { let uv = p.xy / uniforms.resolution; let input = textureSample(txt, smplr, uv); // Layer N-1 output let original = textureSample(original_input, smplr, uv); // Layer 0 input var result = vec4(0.0); if (params.layer_index == 0) { result = cnn_conv3x3_with_coord(txt, smplr, uv, uniforms.resolution, rgba_weights_layer0, coord_weights_layer0, bias_layer0); result = cnn_tanh(result); } // ... other layers // Blend with ORIGINAL input (not previous layer) return mix(original, result, params.blend_amount); } ``` **Weight Storage:** **Layer 0 (coordinate-aware):** ```wgsl const rgba_weights_layer0: array, 9> = array(...); const coord_weights_layer0 = mat2x4( 0.1, -0.2, 0.0, 0.0, # x-coord weights -0.1, 0.0, 0.2, 0.0 # y-coord weights ); const bias_layer0 = vec4(0.0, 0.0, 0.0, 0.0); ``` **Layers 1+ (standard):** ```wgsl const weights_layer1: array, 9> = array(...); const bias_layer1 = vec4(0.0, 0.0, 0.0, 0.0); ``` --- ## Size Budget | Component | Size | Notes | |-----------|------|-------| | Activation functions | ~200 B | 4 functions | | Conv3x3 (standard + coord) | ~500 B | Both variants | | Conv5x5 (standard + coord) | ~700 B | Both variants | | Conv7x7 (standard + coord) | ~900 B | Both variants | | Main shader | ~800 B | Layer composition | | C++ implementation | ~300 B | Effect class | | **Coord weights** | **+32 B** | Per-layer overhead (layer 0 only) | | **RGBA weights** | **2-6 KB** | Depends on depth/kernel sizes | | **Total** | **5-9 KB** | Acceptable for 64k | **Optimization strategies:** - Quantize weights (float32 → int8) - Prune near-zero weights - Use separable convolutions --- ## Testing ```bash ./build/test_demo_effects # CNN construction/shader tests ./build/demo64k # Visual test ``` --- ## Blend Parameter Behavior **blend_amount** controls final compositing with original: - `blend=0.0`: Pure original (no CNN effect) - `blend=0.5`: 50% original + 50% CNN - `blend=1.0`: Pure CNN output (full stylization) **Important:** Blend uses captured layer 0 input, not previous layer output. **Example use cases:** - `blend=1.0`: Full stylization (default) - `blend=0.7`: Subtle effect preserving original details - `blend=0.3`: Light artistic touch ## Troubleshooting **Shader compilation fails:** - Check `cnn_weights_generated.wgsl` syntax - Verify snippets registered in `shaders.cc::InitShaderComposer()` - Ensure `cnn_layer.wgsl` has 5 bindings (including `original_input`) **Black/corrupted output:** - Weights untrained (identity placeholder) - Check `captured_frame` auxiliary texture is registered - Verify layer priorities in timeline are sequential **Wrong blend result:** - Ensure layer 0 has `needs_framebuffer_capture() == true` - Check MainSequence framebuffer capture logic - Verify `original_input` binding is populated **Training loss not decreasing:** - Lower learning rate (`--learning-rate 0.0001`) - More epochs (`--epochs 1000`) - Check input/target image alignment --- ## References - **Training Script:** `training/train_cnn.py` - **Shader Composition:** `doc/SEQUENCE.md` - **Effect System:** `src/gpu/effect.h`