summaryrefslogtreecommitdiff
path: root/doc/CNN_EFFECT.md
diff options
context:
space:
mode:
Diffstat (limited to 'doc/CNN_EFFECT.md')
-rw-r--r--doc/CNN_EFFECT.md376
1 files changed, 255 insertions, 121 deletions
diff --git a/doc/CNN_EFFECT.md b/doc/CNN_EFFECT.md
index 9045739..b7d157f 100644
--- a/doc/CNN_EFFECT.md
+++ b/doc/CNN_EFFECT.md
@@ -6,157 +6,279 @@ Neural network-based stylization for rendered scenes.
## Overview
-The CNN effect applies trainable convolutional neural network layers to post-process 3D rendered output, enabling artistic stylization (e.g., painterly, sketch, cel-shaded effects) with minimal runtime overhead.
+Trainable convolutional neural network layers for artistic stylization (painterly, sketch, cel-shaded effects) with minimal runtime overhead.
**Key Features:**
-- Multi-layer convolutions (3×3, 5×5, 7×7 kernels)
+- Position-aware layer 0 (coordinate input for vignetting, edge effects)
+- Multi-layer convolutions (3×3, 5×5, 7×7 kernels) with automatic chaining
+- Original input available to all layers via framebuffer capture
+- Configurable final blend with original scene
- Modular WGSL shader architecture
-- Hardcoded weights (trained offline)
-- Residual connections for stable learning
+- Hardcoded weights (trained offline via PyTorch)
- ~5-8 KB binary footprint
---
## Architecture
-### File Structure
-
-```
-src/gpu/effects/
- cnn_effect.h # CNNEffect class
- cnn_effect.cc # Implementation
+### RGBD → Grayscale Pipeline
-workspaces/main/shaders/cnn/
- cnn_activation.wgsl # Activation functions (tanh, ReLU, sigmoid, leaky_relu)
- cnn_conv3x3.wgsl # 3×3 convolution
- cnn_conv5x5.wgsl # 5×5 convolution
- cnn_conv7x7.wgsl # 7×7 convolution
- cnn_weights_generated.wgsl # Weight arrays (generated by training script)
- cnn_layer.wgsl # Main shader (composes above snippets)
-```
+**Input:** RGBD (RGB + inverse depth D=1/z)
+**Output:** Grayscale (1 channel)
+**Layer Input:** 7 channels = [RGBD, UV coords, grayscale] all normalized to [-1,1]
-### Shader Composition
+**Architecture:**
+- **Inner layers (0..N-2):** Conv2d(7→4) - output RGBD
+- **Final layer (N-1):** Conv2d(7→1) - output grayscale
-`cnn_layer.wgsl` uses `#include` directives (resolved by `ShaderComposer`):
```wgsl
-#include "common_uniforms"
-#include "cnn_activation"
-#include "cnn_conv3x3"
-#include "cnn_weights_generated"
+// Inner layers: 7→4 (RGBD output)
+fn cnn_conv3x3_7to4(
+ tex: texture_2d<f32>,
+ samp: sampler,
+ uv: vec2<f32>,
+ resolution: vec2<f32>,
+ original: vec4<f32>, # Original RGBD [0,1]
+ weights: array<array<f32, 8>, 36> # 9 pos × 4 out × (7 weights + bias)
+) -> vec4<f32>
+
+// Final layer: 7→1 (grayscale output)
+fn cnn_conv3x3_7to1(
+ tex: texture_2d<f32>,
+ samp: sampler,
+ uv: vec2<f32>,
+ resolution: vec2<f32>,
+ original: vec4<f32>,
+ weights: array<array<f32, 8>, 9> # 9 pos × (7 weights + bias)
+) -> f32
```
----
+**Input normalization (all to [-1,1]):**
+- RGBD: `(rgbd - 0.5) * 2`
+- UV coords: `(uv - 0.5) * 2`
+- Grayscale: `(0.2126*R + 0.7152*G + 0.0722*B - 0.5) * 2`
-## Usage
+**Activation:** tanh for inner layers, none for final layer
-### C++ Integration
+### Multi-Layer Architecture
-```cpp
-#include "gpu/effects/cnn_effect.h"
+CNNEffect supports multi-layer networks via automatic effect chaining:
-// Create effect (1 layer for now, expandable to 4)
-auto cnn = std::make_shared<CNNEffect>(ctx, /*num_layers=*/1);
+1. **Timeline specifies total layers**: `CNNEffect layers=3 blend=0.7`
+2. **Compiler expands to chain**: 3 separate CNNEffect instances (layer 0→1→2)
+3. **Framebuffer capture**: Layer 0 captures original input to `"captured_frame"`
+4. **Original input binding**: All layers access original via `@binding(4)`
+5. **Final blend**: Last layer blends result with original: `mix(original, result, 0.7)`
-// Add to timeline
-timeline.add_effect(cnn, start_time, end_time);
-```
+**Framebuffer Capture API:**
+- `Effect::needs_framebuffer_capture()` - effect requests pre-capture
+- MainSequence automatically blits input → `"captured_frame"` auxiliary texture
+- Generic mechanism usable by any effect
-### Timeline Example
+### File Structure
```
-SEQUENCE 10.0 0
- EFFECT CNNEffect 10.0 15.0 0 # Apply CNN stylization for 5 seconds
+src/gpu/effects/
+ cnn_effect.h/cc # CNNEffect class + framebuffer capture
+
+workspaces/main/shaders/cnn/
+ cnn_activation.wgsl # tanh, ReLU, sigmoid, leaky_relu
+ cnn_conv3x3.wgsl # 3×3 convolution (standard + coord-aware)
+ cnn_conv5x5.wgsl # 5×5 convolution (standard + coord-aware)
+ cnn_conv7x7.wgsl # 7×7 convolution (standard + coord-aware)
+ cnn_weights_generated.wgsl # Weight arrays (auto-generated by train_cnn.py)
+ cnn_layer.wgsl # Main shader with layer switches (auto-generated by train_cnn.py)
```
---
-## Training Workflow (Planned)
+## Training Workflow
+
+### 1. Prepare Training Data
+
+Collect input/target image pairs:
+- **Input:** RGBA (RGB + depth as alpha channel, D=1/z)
+- **Target:** Grayscale stylized output
-**Step 1: Prepare Training Data**
```bash
-# Collect before/after image pairs
-# - Before: Raw 3D render
-# - After: Target artistic style (hand-painted, filtered, etc.)
+training/input/img_000.png # RGBA render (RGB + depth)
+training/output/img_000.png # Grayscale target
```
-**Step 2: Train Network**
+**Note:** Input images must be RGBA where alpha = inverse depth (1/z)
+
+### 2. Train Network
+
```bash
-python scripts/train_cnn.py \
- --input rendered_scene.png \
- --target stylized_scene.png \
+python3 training/train_cnn.py \
+ --input training/input \
+ --target training/output \
+ --layers 1 \
+ --kernel-sizes 3 \
+ --epochs 500 \
+ --checkpoint-every 50
+```
+
+**Multi-layer example (3 layers with varying kernel sizes):**
+```bash
+python3 training/train_cnn.py \
+ --input training/input \
+ --target training/output \
--layers 3 \
- --kernel_sizes 3,5,3 \
- --epochs 100
+ --kernel-sizes 3,5,3 \
+ --epochs 1000 \
+ --checkpoint-every 100
+```
+
+**Note:** Training script auto-generates:
+- `cnn_weights_generated.wgsl` - weight arrays for all layers
+- `cnn_layer.wgsl` - shader with layer switches and original input binding
+
+**Resume from checkpoint:**
+```bash
+python3 training/train_cnn.py \
+ --input training/input \
+ --target training/output \
+ --resume training/checkpoints/checkpoint_epoch_200.pth
```
-**Step 3: Export Weights**
-```python
-# scripts/train_cnn.py automatically generates:
-# workspaces/main/shaders/cnn/cnn_weights_generated.wgsl
+**Export WGSL from checkpoint (no training):**
+```bash
+python3 training/train_cnn.py \
+ --export-only training/checkpoints/checkpoint_epoch_200.pth \
+ --output workspaces/main/shaders/cnn/cnn_weights_generated.wgsl
```
-**Step 4: Rebuild**
+### 3. Rebuild Demo
+
+Training script auto-generates both `cnn_weights_generated.wgsl` and `cnn_layer.wgsl`:
```bash
cmake --build build -j4
+./build/demo64k
```
---
-## Implementation Details
+## Usage
+
+### C++ Integration
-### Convolution Function Signature
+**Single layer (manual):**
+```cpp
+#include "gpu/effects/cnn_effect.h"
-```wgsl
-fn cnn_conv3x3(
- tex: texture_2d<f32>,
- samp: sampler,
- uv: vec2<f32>,
- resolution: vec2<f32>,
- weights: array<mat4x4<f32>, 9>, # 9 samples × 4×4 matrix
- bias: vec4<f32>
-) -> vec4<f32>
+CNNEffectParams p;
+p.layer_index = 0;
+p.total_layers = 1;
+p.blend_amount = 1.0f;
+auto cnn = std::make_shared<CNNEffect>(ctx, p);
+timeline.add_effect(cnn, start_time, end_time);
```
-- Samples 9 pixels (3×3 neighborhood)
-- Applies 4×4 weight matrix per sample (RGBA channels)
-- Returns weighted sum + bias (pre-activation)
+**Multi-layer (automatic via timeline compiler):**
-### Weight Storage
+Use timeline syntax - `seq_compiler` expands to multiple instances.
-Weights are stored as WGSL constants:
-```wgsl
-const weights_layer0: array<mat4x4<f32>, 9> = array(
- mat4x4<f32>(1.0, 0.0, 0.0, 0.0, ...), # Center pixel
- mat4x4<f32>(0.0, 0.0, 0.0, 0.0, ...), # Neighbor 1
- // ... 7 more matrices
-);
-const bias_layer0 = vec4<f32>(0.0, 0.0, 0.0, 0.0);
+### Timeline Examples
+
+**Single-layer CNN (full stylization):**
+```
+SEQUENCE 10.0 0
+ EFFECT + Hybrid3DEffect 0.00 5.00
+ EFFECT + CNNEffect 0.50 5.00 layers=1
```
-### Residual Connection
+**Multi-layer CNN with blend:**
+```
+SEQUENCE 10.0 0
+ EFFECT + Hybrid3DEffect 0.00 5.00
+ EFFECT + CNNEffect 0.50 5.00 layers=3 blend=0.7
+```
-Final layer adds original input:
-```wgsl
-if (params.use_residual != 0) {
- let input = textureSample(txt, smplr, uv);
- result = input + result * 0.3; # Blend 30% stylization
+Expands to:
+```cpp
+// Layer 0 (captures original, blend=1.0)
+{
+ CNNEffectParams p;
+ p.layer_index = 0;
+ p.total_layers = 3;
+ p.blend_amount = 1.0f;
+ seq->add_effect(std::make_shared<CNNEffect>(ctx, p), 0.50f, 5.00f, 1);
+}
+// Layer 1 (blend=1.0)
+{
+ CNNEffectParams p;
+ p.layer_index = 1;
+ p.total_layers = 3;
+ p.blend_amount = 1.0f;
+ seq->add_effect(std::make_shared<CNNEffect>(ctx, p), 0.50f, 5.00f, 2);
+}
+// Layer 2 (final blend=0.7)
+{
+ CNNEffectParams p;
+ p.layer_index = 2;
+ p.total_layers = 3;
+ p.blend_amount = 0.7f;
+ seq->add_effect(std::make_shared<CNNEffect>(ctx, p), 0.50f, 5.00f, 3);
}
```
---
-## Multi-Layer Rendering (Future)
+## Shader Structure
-For N layers, use ping-pong textures:
+**Bindings:**
+```wgsl
+@group(0) @binding(0) var smplr: sampler;
+@group(0) @binding(1) var txt: texture_2d<f32>; // Current layer input
+@group(0) @binding(2) var<uniform> uniforms: CommonUniforms;
+@group(0) @binding(3) var<uniform> params: CNNLayerParams;
+@group(0) @binding(4) var original_input: texture_2d<f32>; // Layer 0 input (captured)
+```
+
+**Fragment shader logic:**
+```wgsl
+@fragment fn fs_main(@builtin(position) p: vec4<f32>) -> @location(0) vec4<f32> {
+ let uv = p.xy / uniforms.resolution;
+ let input = textureSample(txt, smplr, uv); // Layer N-1 output
+ let original = textureSample(original_input, smplr, uv); // Layer 0 input
+
+ var result = vec4<f32>(0.0);
+
+ if (params.layer_index == 0) {
+ result = cnn_conv3x3_with_coord(txt, smplr, uv, uniforms.resolution,
+ rgba_weights_layer0, coord_weights_layer0, bias_layer0);
+ result = cnn_tanh(result);
+ }
+ // ... other layers
+ // Blend with ORIGINAL input (not previous layer)
+ return mix(original, result, params.blend_amount);
+}
```
-Pass 0: input → temp_a (conv + activate)
-Pass 1: temp_a → temp_b (conv + activate)
-Pass 2: temp_b → temp_a (conv + activate)
-Pass 3: temp_a → screen (conv + activate + residual)
+
+**Weight Storage:**
+
+**Inner layers (7→4 RGBD output):**
+```wgsl
+// Structure: array<array<f32, 8>, 36>
+// 9 positions × 4 output channels, each with 7 weights + bias
+const weights_layer0: array<array<f32, 8>, 36> = array(
+ array<f32, 8>(w0_r, w0_g, w0_b, w0_d, w0_u, w0_v, w0_gray, bias0), // pos0_ch0
+ array<f32, 8>(w1_r, w1_g, w1_b, w1_d, w1_u, w1_v, w1_gray, bias1), // pos0_ch1
+ // ... 34 more entries
+);
```
-**Current Status:** Single-layer implementation. Multi-pass infrastructure ready but not exposed.
+**Final layer (7→1 grayscale output):**
+```wgsl
+// Structure: array<array<f32, 8>, 9>
+// 9 positions, each with 7 weights + bias
+const weights_layerN: array<array<f32, 8>, 9> = array(
+ array<f32, 8>(w0_r, w0_g, w0_b, w0_d, w0_u, w0_v, w0_gray, bias0), // pos0
+ // ... 8 more entries
+);
+```
---
@@ -164,60 +286,72 @@ Pass 3: temp_a → screen (conv + activate + residual)
| Component | Size | Notes |
|-----------|------|-------|
-| `cnn_activation.wgsl` | ~200 B | 4 activation functions |
-| `cnn_conv3x3.wgsl` | ~400 B | 3×3 convolution logic |
-| `cnn_conv5x5.wgsl` | ~600 B | 5×5 convolution logic |
-| `cnn_conv7x7.wgsl` | ~800 B | 7×7 convolution logic |
-| `cnn_layer.wgsl` | ~800 B | Main shader |
-| `cnn_effect.cc` | ~300 B | C++ implementation |
-| **Weights (variable)** | **2-6 KB** | Depends on network depth/width |
-| **Total** | **5-9 KB** | Acceptable for 64k demo |
+| Activation functions | ~200 B | 4 functions |
+| Conv3x3 (standard + coord) | ~500 B | Both variants |
+| Conv5x5 (standard + coord) | ~700 B | Both variants |
+| Conv7x7 (standard + coord) | ~900 B | Both variants |
+| Main shader | ~800 B | Layer composition |
+| C++ implementation | ~300 B | Effect class |
+| **Coord weights** | **+32 B** | Per-layer overhead (layer 0 only) |
+| **RGBA weights** | **2-6 KB** | Depends on depth/kernel sizes |
+| **Total** | **5-9 KB** | Acceptable for 64k |
-**Optimization Strategies:**
+**Optimization strategies:**
- Quantize weights (float32 → int8)
- Prune near-zero weights
-- Share weights across layers
-- Use separable convolutions (not yet implemented)
+- Use separable convolutions
---
## Testing
```bash
-# Run effect test
-./build/test_demo_effects
-
-# Visual test in demo
-./build/demo64k # CNN appears in timeline if added
+./build/test_demo_effects # CNN construction/shader tests
+./build/demo64k # Visual test
```
-**Test Coverage:**
-- Construction/initialization
-- Shader compilation
-- Bind group creation
-- Render pass execution
-
---
+## Blend Parameter Behavior
+
+**blend_amount** controls final compositing with original:
+- `blend=0.0`: Pure original (no CNN effect)
+- `blend=0.5`: 50% original + 50% CNN
+- `blend=1.0`: Pure CNN output (full stylization)
+
+**Important:** Blend uses captured layer 0 input, not previous layer output.
+
+**Example use cases:**
+- `blend=1.0`: Full stylization (default)
+- `blend=0.7`: Subtle effect preserving original details
+- `blend=0.3`: Light artistic touch
+
## Troubleshooting
**Shader compilation fails:**
- Check `cnn_weights_generated.wgsl` syntax
-- Verify all snippets registered in `shaders.cc::InitShaderComposer()`
+- Verify snippets registered in `shaders.cc::InitShaderComposer()`
+- Ensure `cnn_layer.wgsl` has 5 bindings (including `original_input`)
**Black/corrupted output:**
-- Weights likely untrained (using placeholder identity)
-- Check residual blending factor (0.3 default)
+- Weights untrained (identity placeholder)
+- Check `captured_frame` auxiliary texture is registered
+- Verify layer priorities in timeline are sequential
+
+**Wrong blend result:**
+- Ensure layer 0 has `needs_framebuffer_capture() == true`
+- Check MainSequence framebuffer capture logic
+- Verify `original_input` binding is populated
-**Performance issues:**
-- Reduce kernel sizes (7×7 → 3×3)
-- Decrease layer count
-- Profile with `--hot-reload` to measure frame time
+**Training loss not decreasing:**
+- Lower learning rate (`--learning-rate 0.0001`)
+- More epochs (`--epochs 1000`)
+- Check input/target image alignment
---
## References
-- **Shader Composition:** `doc/SEQUENCE.md` (shader parameters)
-- **Effect System:** `src/gpu/effect.h` (Effect base class)
-- **Training (external):** TensorFlow/PyTorch CNN tutorials
+- **Training Script:** `training/train_cnn.py`
+- **Shader Composition:** `doc/SEQUENCE.md`
+- **Effect System:** `src/gpu/effect.h`