# CNN Post-Processing Effect

Neural network-based stylization for rendered scenes.

---

## Overview

Trainable convolutional neural network layers for artistic stylization (painterly, sketch, cel-shaded effects) with minimal runtime overhead.

**Key Features:**
- Position-aware layer 0 (coordinate input for vignetting, edge effects)
- Multi-layer convolutions (3×3, 5×5, 7×7 kernels) with automatic chaining
- Original input available to all layers via framebuffer capture
- Configurable final blend with original scene
- Modular WGSL shader architecture
- Hardcoded weights (trained offline via PyTorch)
- ~5-8 KB binary footprint

---

## Architecture

### RGBD → Grayscale Pipeline

**Input:** RGBD (RGB + inverse depth D=1/z)
**Output:** Grayscale (1 channel)
**Layer Input:** 7 channels = [RGBD, UV coords, grayscale] all normalized to [-1,1]

**Architecture:**
- **Inner layers (0..N-2):** Conv2d(7→4) - output RGBD
- **Final layer (N-1):** Conv2d(7→1) - output grayscale

```wgsl
// Inner layers: 7→4 (RGBD output)
fn cnn_conv3x3_7to4(
  tex: texture_2d<f32>,
  samp: sampler,
  uv: vec2<f32>,
  resolution: vec2<f32>,
  original: vec4<f32>,                     # Original RGBD [-1,1]
  weights: array<array<f32, 8>, 36>       # 9 pos × 4 out × (7 weights + bias)
) -> vec4<f32>

// Final layer: 7→1 (grayscale output)
fn cnn_conv3x3_7to1(
  tex: texture_2d<f32>,
  samp: sampler,
  uv: vec2<f32>,
  resolution: vec2<f32>,
  original: vec4<f32>,
  weights: array<array<f32, 8>, 9>        # 9 pos × (7 weights + bias)
) -> f32
```

**Input normalization:**
- **fs_main** normalizes textures once: `(tex - 0.5) * 2` → [-1,1]
- **Conv functions** normalize UV coords: `(uv - 0.5) * 2` → [-1,1]
- **Grayscale** computed from normalized RGBD: `0.2126*R + 0.7152*G + 0.0722*B`
- **Inter-layer data** stays in [-1,1] (no denormalization)
- **Final output** denormalized for display: `(result + 1.0) * 0.5` → [0,1]

**Activation:** tanh for inner layers (output stays [-1,1]), none for final layer

### Multi-Layer Architecture

CNNEffect supports multi-layer networks via automatic effect chaining:

1. **Timeline specifies total layers**: `CNNEffect layers=3 blend=0.7`
2. **Compiler expands to chain**: 3 separate CNNEffect instances (layer 0→1→2)
3. **Framebuffer capture**: Layer 0 captures original input to `"captured_frame"`
4. **Original input binding**: All layers access original via `@binding(4)`
5. **Final blend**: Last layer blends result with original: `mix(original, result, 0.7)`

**Framebuffer Capture API:**
- `Effect::needs_framebuffer_capture()` - effect requests pre-capture
- MainSequence automatically blits input → `"captured_frame"` auxiliary texture
- Generic mechanism usable by any effect

### File Structure

```
src/gpu/effects/
  cnn_effect.h/cc         # CNNEffect class + framebuffer capture

workspaces/main/shaders/cnn/
  cnn_activation.wgsl     # tanh, ReLU, sigmoid, leaky_relu
  cnn_conv3x3.wgsl        # 3×3 convolution (standard + coord-aware)
  cnn_conv5x5.wgsl        # 5×5 convolution (standard + coord-aware)
  cnn_conv7x7.wgsl        # 7×7 convolution (standard + coord-aware)
  cnn_weights_generated.wgsl  # Weight arrays (auto-generated by train_cnn.py)
  cnn_layer.wgsl          # Main shader with layer switches (auto-generated by train_cnn.py)
```

---

## Training Workflow

### 1. Prepare Training Data

Collect input/target image pairs:
- **Input:** RGBA (RGB + depth as alpha channel, D=1/z)
- **Target:** Grayscale stylized output

```bash
training/input/img_000.png   # RGBA render (RGB + depth)
training/output/img_000.png  # Grayscale target
```

**Note:** Input images must be RGBA where alpha = inverse depth (1/z)

### 2. Train Network

```bash
python3 training/train_cnn.py \
  --input training/input \
  --target training/output \
  --layers 1 \
  --kernel-sizes 3 \
  --epochs 500 \
  --checkpoint-every 50
```

**Multi-layer example (3 layers with varying kernel sizes):**
```bash
python3 training/train_cnn.py \
  --input training/input \
  --target training/output \
  --layers 3 \
  --kernel-sizes 3,5,3 \
  --epochs 1000 \
  --checkpoint-every 100
```

**Note:** Training script auto-generates:
- `cnn_weights_generated.wgsl` - weight arrays for all layers
- `cnn_layer.wgsl` - shader with layer switches and original input binding

**Resume from checkpoint:**
```bash
python3 training/train_cnn.py \
  --input training/input \
  --target training/output \
  --resume training/checkpoints/checkpoint_epoch_200.pth
```

**Export WGSL from checkpoint (no training):**
```bash
python3 training/train_cnn.py \
  --export-only training/checkpoints/checkpoint_epoch_200.pth \
  --output workspaces/main/shaders/cnn/cnn_weights_generated.wgsl
```

**Generate ground truth (for shader validation):**
```bash
python3 training/train_cnn.py \
  --infer training/input/img_000.png \
  --export-only training/checkpoints/checkpoint_epoch_200.pth \
  --output training/ground_truth.png
```

### 3. Rebuild Demo

Training script auto-generates both `cnn_weights_generated.wgsl` and `cnn_layer.wgsl`:
```bash
cmake --build build -j4
./build/demo64k
```

---

## Usage

### C++ Integration

**Single layer (manual):**
```cpp
#include "gpu/effects/cnn_effect.h"

CNNEffectParams p;
p.layer_index = 0;
p.total_layers = 1;
p.blend_amount = 1.0f;
auto cnn = std::make_shared<CNNEffect>(ctx, p);
timeline.add_effect(cnn, start_time, end_time);
```

**Multi-layer (automatic via timeline compiler):**

Use timeline syntax - `seq_compiler` expands to multiple instances.

### Timeline Examples

**Single-layer CNN (full stylization):**
```
SEQUENCE 10.0 0
  EFFECT + Hybrid3DEffect 0.00 5.00
  EFFECT + CNNEffect 0.50 5.00 layers=1
```

**Multi-layer CNN with blend:**
```
SEQUENCE 10.0 0
  EFFECT + Hybrid3DEffect 0.00 5.00
  EFFECT + CNNEffect 0.50 5.00 layers=3 blend=0.7
```

Expands to:
```cpp
// Layer 0 (captures original, blend=1.0)
{
  CNNEffectParams p;
  p.layer_index = 0;
  p.total_layers = 3;
  p.blend_amount = 1.0f;
  seq->add_effect(std::make_shared<CNNEffect>(ctx, p), 0.50f, 5.00f, 1);
}
// Layer 1 (blend=1.0)
{
  CNNEffectParams p;
  p.layer_index = 1;
  p.total_layers = 3;
  p.blend_amount = 1.0f;
  seq->add_effect(std::make_shared<CNNEffect>(ctx, p), 0.50f, 5.00f, 2);
}
// Layer 2 (final blend=0.7)
{
  CNNEffectParams p;
  p.layer_index = 2;
  p.total_layers = 3;
  p.blend_amount = 0.7f;
  seq->add_effect(std::make_shared<CNNEffect>(ctx, p), 0.50f, 5.00f, 3);
}
```

---

## Shader Structure

**Bindings:**
```wgsl
@group(0) @binding(0) var smplr: sampler;
@group(0) @binding(1) var txt: texture_2d<f32>;              // Current layer input
@group(0) @binding(2) var<uniform> uniforms: CommonUniforms;
@group(0) @binding(3) var<uniform> params: CNNLayerParams;
@group(0) @binding(4) var original_input: texture_2d<f32>;   // Layer 0 input (captured)
```

**Fragment shader logic:**
```wgsl
@fragment fn fs_main(@builtin(position) p: vec4<f32>) -> @location(0) vec4<f32> {
    let uv = p.xy / uniforms.resolution;
    let input = textureSample(txt, smplr, uv);               // Layer N-1 output
    let original = textureSample(original_input, smplr, uv); // Layer 0 input

    var result = vec4<f32>(0.0);

    if (params.layer_index == 0) {
        result = cnn_conv3x3_with_coord(txt, smplr, uv, uniforms.resolution,
                                        rgba_weights_layer0, coord_weights_layer0, bias_layer0);
        result = cnn_tanh(result);
    }
    // ... other layers

    // Blend with ORIGINAL input (not previous layer)
    return mix(original, result, params.blend_amount);
}
```

**Weight Storage:**

**Inner layers (7→4 RGBD output):**
```wgsl
// Structure: array<array<f32, 8>, 36>
// 9 positions × 4 output channels, each with 7 weights + bias
const weights_layer0: array<array<f32, 8>, 36> = array(
  array<f32, 8>(w0_r, w0_g, w0_b, w0_d, w0_u, w0_v, w0_gray, bias0),  // pos0_ch0
  array<f32, 8>(w1_r, w1_g, w1_b, w1_d, w1_u, w1_v, w1_gray, bias1),  // pos0_ch1
  // ... 34 more entries
);
```

**Final layer (7→1 grayscale output):**
```wgsl
// Structure: array<array<f32, 8>, 9>
// 9 positions, each with 7 weights + bias
const weights_layerN: array<array<f32, 8>, 9> = array(
  array<f32, 8>(w0_r, w0_g, w0_b, w0_d, w0_u, w0_v, w0_gray, bias0),  // pos0
  // ... 8 more entries
);
```

---

## Size Budget

| Component | Size | Notes |
|-----------|------|-------|
| Activation functions | ~200 B | 4 functions |
| Conv3x3 (standard + coord) | ~500 B | Both variants |
| Conv5x5 (standard + coord) | ~700 B | Both variants |
| Conv7x7 (standard + coord) | ~900 B | Both variants |
| Main shader | ~800 B | Layer composition |
| C++ implementation | ~300 B | Effect class |
| **Coord weights** | **+32 B** | Per-layer overhead (layer 0 only) |
| **RGBA weights** | **2-6 KB** | Depends on depth/kernel sizes |
| **Total** | **5-9 KB** | Acceptable for 64k |

**Optimization strategies:**
- Quantize weights (float32 → int8)
- Prune near-zero weights
- Use separable convolutions

---

## Testing

```bash
./build/test_demo_effects  # CNN construction/shader tests
./build/demo64k            # Visual test
```

---

## Blend Parameter Behavior

**blend_amount** controls final compositing with original:
- `blend=0.0`: Pure original (no CNN effect)
- `blend=0.5`: 50% original + 50% CNN
- `blend=1.0`: Pure CNN output (full stylization)

**Important:** Blend uses captured layer 0 input, not previous layer output.

**Example use cases:**
- `blend=1.0`: Full stylization (default)
- `blend=0.7`: Subtle effect preserving original details
- `blend=0.3`: Light artistic touch

## Troubleshooting

**Shader compilation fails:**
- Check `cnn_weights_generated.wgsl` syntax
- Verify snippets registered in `shaders.cc::InitShaderComposer()`
- Ensure `cnn_layer.wgsl` has 5 bindings (including `original_input`)

**Black/corrupted output:**
- Weights untrained (identity placeholder)
- Check `captured_frame` auxiliary texture is registered
- Verify layer priorities in timeline are sequential

**Wrong blend result:**
- Ensure layer 0 has `needs_framebuffer_capture() == true`
- Check MainSequence framebuffer capture logic
- Verify `original_input` binding is populated

**Training loss not decreasing:**
- Lower learning rate (`--learning-rate 0.0001`)
- More epochs (`--epochs 1000`)
- Check input/target image alignment

---

## References

- **Training Script:** `training/train_cnn.py`
- **Shader Composition:** `doc/SEQUENCE.md`
- **Effect System:** `src/gpu/effect.h`