diff options
Diffstat (limited to 'doc/CNN_EFFECT.md')
| -rw-r--r-- | doc/CNN_EFFECT.md | 376 |
1 files changed, 255 insertions, 121 deletions
diff --git a/doc/CNN_EFFECT.md b/doc/CNN_EFFECT.md index 9045739..b7d157f 100644 --- a/doc/CNN_EFFECT.md +++ b/doc/CNN_EFFECT.md @@ -6,157 +6,279 @@ Neural network-based stylization for rendered scenes. ## Overview -The CNN effect applies trainable convolutional neural network layers to post-process 3D rendered output, enabling artistic stylization (e.g., painterly, sketch, cel-shaded effects) with minimal runtime overhead. +Trainable convolutional neural network layers for artistic stylization (painterly, sketch, cel-shaded effects) with minimal runtime overhead. **Key Features:** -- Multi-layer convolutions (3×3, 5×5, 7×7 kernels) +- Position-aware layer 0 (coordinate input for vignetting, edge effects) +- Multi-layer convolutions (3×3, 5×5, 7×7 kernels) with automatic chaining +- Original input available to all layers via framebuffer capture +- Configurable final blend with original scene - Modular WGSL shader architecture -- Hardcoded weights (trained offline) -- Residual connections for stable learning +- Hardcoded weights (trained offline via PyTorch) - ~5-8 KB binary footprint --- ## Architecture -### File Structure - -``` -src/gpu/effects/ - cnn_effect.h # CNNEffect class - cnn_effect.cc # Implementation +### RGBD → Grayscale Pipeline -workspaces/main/shaders/cnn/ - cnn_activation.wgsl # Activation functions (tanh, ReLU, sigmoid, leaky_relu) - cnn_conv3x3.wgsl # 3×3 convolution - cnn_conv5x5.wgsl # 5×5 convolution - cnn_conv7x7.wgsl # 7×7 convolution - cnn_weights_generated.wgsl # Weight arrays (generated by training script) - cnn_layer.wgsl # Main shader (composes above snippets) -``` +**Input:** RGBD (RGB + inverse depth D=1/z) +**Output:** Grayscale (1 channel) +**Layer Input:** 7 channels = [RGBD, UV coords, grayscale] all normalized to [-1,1] -### Shader Composition +**Architecture:** +- **Inner layers (0..N-2):** Conv2d(7→4) - output RGBD +- **Final layer (N-1):** Conv2d(7→1) - output grayscale -`cnn_layer.wgsl` uses `#include` directives (resolved by `ShaderComposer`): ```wgsl -#include "common_uniforms" -#include "cnn_activation" -#include "cnn_conv3x3" -#include "cnn_weights_generated" +// Inner layers: 7→4 (RGBD output) +fn cnn_conv3x3_7to4( + tex: texture_2d<f32>, + samp: sampler, + uv: vec2<f32>, + resolution: vec2<f32>, + original: vec4<f32>, # Original RGBD [0,1] + weights: array<array<f32, 8>, 36> # 9 pos × 4 out × (7 weights + bias) +) -> vec4<f32> + +// Final layer: 7→1 (grayscale output) +fn cnn_conv3x3_7to1( + tex: texture_2d<f32>, + samp: sampler, + uv: vec2<f32>, + resolution: vec2<f32>, + original: vec4<f32>, + weights: array<array<f32, 8>, 9> # 9 pos × (7 weights + bias) +) -> f32 ``` ---- +**Input normalization (all to [-1,1]):** +- RGBD: `(rgbd - 0.5) * 2` +- UV coords: `(uv - 0.5) * 2` +- Grayscale: `(0.2126*R + 0.7152*G + 0.0722*B - 0.5) * 2` -## Usage +**Activation:** tanh for inner layers, none for final layer -### C++ Integration +### Multi-Layer Architecture -```cpp -#include "gpu/effects/cnn_effect.h" +CNNEffect supports multi-layer networks via automatic effect chaining: -// Create effect (1 layer for now, expandable to 4) -auto cnn = std::make_shared<CNNEffect>(ctx, /*num_layers=*/1); +1. **Timeline specifies total layers**: `CNNEffect layers=3 blend=0.7` +2. **Compiler expands to chain**: 3 separate CNNEffect instances (layer 0→1→2) +3. **Framebuffer capture**: Layer 0 captures original input to `"captured_frame"` +4. **Original input binding**: All layers access original via `@binding(4)` +5. **Final blend**: Last layer blends result with original: `mix(original, result, 0.7)` -// Add to timeline -timeline.add_effect(cnn, start_time, end_time); -``` +**Framebuffer Capture API:** +- `Effect::needs_framebuffer_capture()` - effect requests pre-capture +- MainSequence automatically blits input → `"captured_frame"` auxiliary texture +- Generic mechanism usable by any effect -### Timeline Example +### File Structure ``` -SEQUENCE 10.0 0 - EFFECT CNNEffect 10.0 15.0 0 # Apply CNN stylization for 5 seconds +src/gpu/effects/ + cnn_effect.h/cc # CNNEffect class + framebuffer capture + +workspaces/main/shaders/cnn/ + cnn_activation.wgsl # tanh, ReLU, sigmoid, leaky_relu + cnn_conv3x3.wgsl # 3×3 convolution (standard + coord-aware) + cnn_conv5x5.wgsl # 5×5 convolution (standard + coord-aware) + cnn_conv7x7.wgsl # 7×7 convolution (standard + coord-aware) + cnn_weights_generated.wgsl # Weight arrays (auto-generated by train_cnn.py) + cnn_layer.wgsl # Main shader with layer switches (auto-generated by train_cnn.py) ``` --- -## Training Workflow (Planned) +## Training Workflow + +### 1. Prepare Training Data + +Collect input/target image pairs: +- **Input:** RGBA (RGB + depth as alpha channel, D=1/z) +- **Target:** Grayscale stylized output -**Step 1: Prepare Training Data** ```bash -# Collect before/after image pairs -# - Before: Raw 3D render -# - After: Target artistic style (hand-painted, filtered, etc.) +training/input/img_000.png # RGBA render (RGB + depth) +training/output/img_000.png # Grayscale target ``` -**Step 2: Train Network** +**Note:** Input images must be RGBA where alpha = inverse depth (1/z) + +### 2. Train Network + ```bash -python scripts/train_cnn.py \ - --input rendered_scene.png \ - --target stylized_scene.png \ +python3 training/train_cnn.py \ + --input training/input \ + --target training/output \ + --layers 1 \ + --kernel-sizes 3 \ + --epochs 500 \ + --checkpoint-every 50 +``` + +**Multi-layer example (3 layers with varying kernel sizes):** +```bash +python3 training/train_cnn.py \ + --input training/input \ + --target training/output \ --layers 3 \ - --kernel_sizes 3,5,3 \ - --epochs 100 + --kernel-sizes 3,5,3 \ + --epochs 1000 \ + --checkpoint-every 100 +``` + +**Note:** Training script auto-generates: +- `cnn_weights_generated.wgsl` - weight arrays for all layers +- `cnn_layer.wgsl` - shader with layer switches and original input binding + +**Resume from checkpoint:** +```bash +python3 training/train_cnn.py \ + --input training/input \ + --target training/output \ + --resume training/checkpoints/checkpoint_epoch_200.pth ``` -**Step 3: Export Weights** -```python -# scripts/train_cnn.py automatically generates: -# workspaces/main/shaders/cnn/cnn_weights_generated.wgsl +**Export WGSL from checkpoint (no training):** +```bash +python3 training/train_cnn.py \ + --export-only training/checkpoints/checkpoint_epoch_200.pth \ + --output workspaces/main/shaders/cnn/cnn_weights_generated.wgsl ``` -**Step 4: Rebuild** +### 3. Rebuild Demo + +Training script auto-generates both `cnn_weights_generated.wgsl` and `cnn_layer.wgsl`: ```bash cmake --build build -j4 +./build/demo64k ``` --- -## Implementation Details +## Usage + +### C++ Integration -### Convolution Function Signature +**Single layer (manual):** +```cpp +#include "gpu/effects/cnn_effect.h" -```wgsl -fn cnn_conv3x3( - tex: texture_2d<f32>, - samp: sampler, - uv: vec2<f32>, - resolution: vec2<f32>, - weights: array<mat4x4<f32>, 9>, # 9 samples × 4×4 matrix - bias: vec4<f32> -) -> vec4<f32> +CNNEffectParams p; +p.layer_index = 0; +p.total_layers = 1; +p.blend_amount = 1.0f; +auto cnn = std::make_shared<CNNEffect>(ctx, p); +timeline.add_effect(cnn, start_time, end_time); ``` -- Samples 9 pixels (3×3 neighborhood) -- Applies 4×4 weight matrix per sample (RGBA channels) -- Returns weighted sum + bias (pre-activation) +**Multi-layer (automatic via timeline compiler):** -### Weight Storage +Use timeline syntax - `seq_compiler` expands to multiple instances. -Weights are stored as WGSL constants: -```wgsl -const weights_layer0: array<mat4x4<f32>, 9> = array( - mat4x4<f32>(1.0, 0.0, 0.0, 0.0, ...), # Center pixel - mat4x4<f32>(0.0, 0.0, 0.0, 0.0, ...), # Neighbor 1 - // ... 7 more matrices -); -const bias_layer0 = vec4<f32>(0.0, 0.0, 0.0, 0.0); +### Timeline Examples + +**Single-layer CNN (full stylization):** +``` +SEQUENCE 10.0 0 + EFFECT + Hybrid3DEffect 0.00 5.00 + EFFECT + CNNEffect 0.50 5.00 layers=1 ``` -### Residual Connection +**Multi-layer CNN with blend:** +``` +SEQUENCE 10.0 0 + EFFECT + Hybrid3DEffect 0.00 5.00 + EFFECT + CNNEffect 0.50 5.00 layers=3 blend=0.7 +``` -Final layer adds original input: -```wgsl -if (params.use_residual != 0) { - let input = textureSample(txt, smplr, uv); - result = input + result * 0.3; # Blend 30% stylization +Expands to: +```cpp +// Layer 0 (captures original, blend=1.0) +{ + CNNEffectParams p; + p.layer_index = 0; + p.total_layers = 3; + p.blend_amount = 1.0f; + seq->add_effect(std::make_shared<CNNEffect>(ctx, p), 0.50f, 5.00f, 1); +} +// Layer 1 (blend=1.0) +{ + CNNEffectParams p; + p.layer_index = 1; + p.total_layers = 3; + p.blend_amount = 1.0f; + seq->add_effect(std::make_shared<CNNEffect>(ctx, p), 0.50f, 5.00f, 2); +} +// Layer 2 (final blend=0.7) +{ + CNNEffectParams p; + p.layer_index = 2; + p.total_layers = 3; + p.blend_amount = 0.7f; + seq->add_effect(std::make_shared<CNNEffect>(ctx, p), 0.50f, 5.00f, 3); } ``` --- -## Multi-Layer Rendering (Future) +## Shader Structure -For N layers, use ping-pong textures: +**Bindings:** +```wgsl +@group(0) @binding(0) var smplr: sampler; +@group(0) @binding(1) var txt: texture_2d<f32>; // Current layer input +@group(0) @binding(2) var<uniform> uniforms: CommonUniforms; +@group(0) @binding(3) var<uniform> params: CNNLayerParams; +@group(0) @binding(4) var original_input: texture_2d<f32>; // Layer 0 input (captured) +``` + +**Fragment shader logic:** +```wgsl +@fragment fn fs_main(@builtin(position) p: vec4<f32>) -> @location(0) vec4<f32> { + let uv = p.xy / uniforms.resolution; + let input = textureSample(txt, smplr, uv); // Layer N-1 output + let original = textureSample(original_input, smplr, uv); // Layer 0 input + + var result = vec4<f32>(0.0); + + if (params.layer_index == 0) { + result = cnn_conv3x3_with_coord(txt, smplr, uv, uniforms.resolution, + rgba_weights_layer0, coord_weights_layer0, bias_layer0); + result = cnn_tanh(result); + } + // ... other layers + // Blend with ORIGINAL input (not previous layer) + return mix(original, result, params.blend_amount); +} ``` -Pass 0: input → temp_a (conv + activate) -Pass 1: temp_a → temp_b (conv + activate) -Pass 2: temp_b → temp_a (conv + activate) -Pass 3: temp_a → screen (conv + activate + residual) + +**Weight Storage:** + +**Inner layers (7→4 RGBD output):** +```wgsl +// Structure: array<array<f32, 8>, 36> +// 9 positions × 4 output channels, each with 7 weights + bias +const weights_layer0: array<array<f32, 8>, 36> = array( + array<f32, 8>(w0_r, w0_g, w0_b, w0_d, w0_u, w0_v, w0_gray, bias0), // pos0_ch0 + array<f32, 8>(w1_r, w1_g, w1_b, w1_d, w1_u, w1_v, w1_gray, bias1), // pos0_ch1 + // ... 34 more entries +); ``` -**Current Status:** Single-layer implementation. Multi-pass infrastructure ready but not exposed. +**Final layer (7→1 grayscale output):** +```wgsl +// Structure: array<array<f32, 8>, 9> +// 9 positions, each with 7 weights + bias +const weights_layerN: array<array<f32, 8>, 9> = array( + array<f32, 8>(w0_r, w0_g, w0_b, w0_d, w0_u, w0_v, w0_gray, bias0), // pos0 + // ... 8 more entries +); +``` --- @@ -164,60 +286,72 @@ Pass 3: temp_a → screen (conv + activate + residual) | Component | Size | Notes | |-----------|------|-------| -| `cnn_activation.wgsl` | ~200 B | 4 activation functions | -| `cnn_conv3x3.wgsl` | ~400 B | 3×3 convolution logic | -| `cnn_conv5x5.wgsl` | ~600 B | 5×5 convolution logic | -| `cnn_conv7x7.wgsl` | ~800 B | 7×7 convolution logic | -| `cnn_layer.wgsl` | ~800 B | Main shader | -| `cnn_effect.cc` | ~300 B | C++ implementation | -| **Weights (variable)** | **2-6 KB** | Depends on network depth/width | -| **Total** | **5-9 KB** | Acceptable for 64k demo | +| Activation functions | ~200 B | 4 functions | +| Conv3x3 (standard + coord) | ~500 B | Both variants | +| Conv5x5 (standard + coord) | ~700 B | Both variants | +| Conv7x7 (standard + coord) | ~900 B | Both variants | +| Main shader | ~800 B | Layer composition | +| C++ implementation | ~300 B | Effect class | +| **Coord weights** | **+32 B** | Per-layer overhead (layer 0 only) | +| **RGBA weights** | **2-6 KB** | Depends on depth/kernel sizes | +| **Total** | **5-9 KB** | Acceptable for 64k | -**Optimization Strategies:** +**Optimization strategies:** - Quantize weights (float32 → int8) - Prune near-zero weights -- Share weights across layers -- Use separable convolutions (not yet implemented) +- Use separable convolutions --- ## Testing ```bash -# Run effect test -./build/test_demo_effects - -# Visual test in demo -./build/demo64k # CNN appears in timeline if added +./build/test_demo_effects # CNN construction/shader tests +./build/demo64k # Visual test ``` -**Test Coverage:** -- Construction/initialization -- Shader compilation -- Bind group creation -- Render pass execution - --- +## Blend Parameter Behavior + +**blend_amount** controls final compositing with original: +- `blend=0.0`: Pure original (no CNN effect) +- `blend=0.5`: 50% original + 50% CNN +- `blend=1.0`: Pure CNN output (full stylization) + +**Important:** Blend uses captured layer 0 input, not previous layer output. + +**Example use cases:** +- `blend=1.0`: Full stylization (default) +- `blend=0.7`: Subtle effect preserving original details +- `blend=0.3`: Light artistic touch + ## Troubleshooting **Shader compilation fails:** - Check `cnn_weights_generated.wgsl` syntax -- Verify all snippets registered in `shaders.cc::InitShaderComposer()` +- Verify snippets registered in `shaders.cc::InitShaderComposer()` +- Ensure `cnn_layer.wgsl` has 5 bindings (including `original_input`) **Black/corrupted output:** -- Weights likely untrained (using placeholder identity) -- Check residual blending factor (0.3 default) +- Weights untrained (identity placeholder) +- Check `captured_frame` auxiliary texture is registered +- Verify layer priorities in timeline are sequential + +**Wrong blend result:** +- Ensure layer 0 has `needs_framebuffer_capture() == true` +- Check MainSequence framebuffer capture logic +- Verify `original_input` binding is populated -**Performance issues:** -- Reduce kernel sizes (7×7 → 3×3) -- Decrease layer count -- Profile with `--hot-reload` to measure frame time +**Training loss not decreasing:** +- Lower learning rate (`--learning-rate 0.0001`) +- More epochs (`--epochs 1000`) +- Check input/target image alignment --- ## References -- **Shader Composition:** `doc/SEQUENCE.md` (shader parameters) -- **Effect System:** `src/gpu/effect.h` (Effect base class) -- **Training (external):** TensorFlow/PyTorch CNN tutorials +- **Training Script:** `training/train_cnn.py` +- **Shader Composition:** `doc/SEQUENCE.md` +- **Effect System:** `src/gpu/effect.h` |
