# CNN Post-Processing Effect

Neural network-based stylization for rendered scenes.

---

## Overview

Trainable convolutional neural network layers for artistic stylization (painterly, sketch, cel-shaded effects) with minimal runtime overhead.

**Key Features:**
- Position-aware layer 0 (coordinate input for vignetting, edge effects)
- Multi-layer convolutions (3×3, 5×5, 7×7 kernels) with automatic chaining
- Original input available to all layers via framebuffer capture
- Configurable final blend with original scene
- Modular WGSL shader architecture
- Hardcoded weights (trained offline via PyTorch)
- ~5-8 KB binary footprint

---

## Architecture

### Coordinate-Aware Layer 0

Layer 0 accepts normalized (x,y) patch center coordinates alongside RGBA samples:

```wgsl
fn cnn_conv3x3_with_coord(
  tex: texture_2d<f32>,
  samp: sampler,
  uv: vec2<f32>,                          # Center position [0,1]
  resolution: vec2<f32>,
  rgba_weights: array<mat4x4<f32>, 9>,    # 9 samples × 4×4 matrix
  coord_weights: mat2x4<f32>,             # 2 coords → 4 outputs
  bias: vec4<f32>
) -> vec4<f32>
```

**Input structure:** 9 RGBA samples (36 values) + 1 xy coordinate (2 values) = 38 inputs → 4 outputs

**Size impact:** +32B coord weights, kernel-agnostic

**Use cases:** Position-dependent stylization (vignettes, corner darkening, radial gradients)

### Multi-Layer Architecture

CNNEffect supports multi-layer networks via automatic effect chaining:

1. **Timeline specifies total layers**: `CNNEffect layers=3 blend=0.7`
2. **Compiler expands to chain**: 3 separate CNNEffect instances (layer 0→1→2)
3. **Framebuffer capture**: Layer 0 captures original input to `"captured_frame"`
4. **Original input binding**: All layers access original via `@binding(4)`
5. **Final blend**: Last layer blends result with original: `mix(original, result, 0.7)`

**Framebuffer Capture API:**
- `Effect::needs_framebuffer_capture()` - effect requests pre-capture
- MainSequence automatically blits input → `"captured_frame"` auxiliary texture
- Generic mechanism usable by any effect

### File Structure

```
src/gpu/effects/
  cnn_effect.h/cc         # CNNEffect class + framebuffer capture

workspaces/main/shaders/cnn/
  cnn_activation.wgsl     # tanh, ReLU, sigmoid, leaky_relu
  cnn_conv3x3.wgsl        # 3×3 convolution (standard + coord-aware)
  cnn_conv5x5.wgsl        # 5×5 convolution (standard + coord-aware)
  cnn_conv7x7.wgsl        # 7×7 convolution (standard + coord-aware)
  cnn_weights_generated.wgsl  # Weight arrays (auto-generated by train_cnn.py)
  cnn_layer.wgsl          # Main shader with layer switches (auto-generated by train_cnn.py)
```

---

## Training Workflow

### 1. Prepare Training Data

Collect input/target image pairs:
- **Input:** Raw 3D render
- **Target:** Artistic style (hand-painted, filtered, stylized)

```bash
training/input/img_000.png   # Raw render
training/output/img_000.png  # Stylized target
```

Use `image_style_processor.py` to generate targets:
```bash
python3 training/image_style_processor.py input/ output/ pencil_sketch
```

### 2. Train Network

```bash
python3 training/train_cnn.py \
  --input training/input \
  --target training/output \
  --layers 1 \
  --kernel-sizes 3 \
  --epochs 500 \
  --checkpoint-every 50
```

**Multi-layer example (3 layers with varying kernel sizes):**
```bash
python3 training/train_cnn.py \
  --input training/input \
  --target training/output \
  --layers 3 \
  --kernel-sizes 3,5,3 \
  --epochs 1000 \
  --checkpoint-every 100
```

**Note:** Training script auto-generates:
- `cnn_weights_generated.wgsl` - weight arrays for all layers
- `cnn_layer.wgsl` - shader with layer switches and original input binding

**Resume from checkpoint:**
```bash
python3 training/train_cnn.py \
  --input training/input \
  --target training/output \
  --resume training/checkpoints/checkpoint_epoch_200.pth
```

**Export WGSL from checkpoint (no training):**
```bash
python3 training/train_cnn.py \
  --export-only training/checkpoints/checkpoint_epoch_200.pth \
  --output workspaces/main/shaders/cnn/cnn_weights_generated.wgsl
```

### 3. Rebuild Demo

Training script auto-generates both `cnn_weights_generated.wgsl` and `cnn_layer.wgsl`:
```bash
cmake --build build -j4
./build/demo64k
```

---

## Usage

### C++ Integration

**Single layer (manual):**
```cpp
#include "gpu/effects/cnn_effect.h"

CNNEffectParams p;
p.layer_index = 0;
p.total_layers = 1;
p.blend_amount = 1.0f;
auto cnn = std::make_shared<CNNEffect>(ctx, p);
timeline.add_effect(cnn, start_time, end_time);
```

**Multi-layer (automatic via timeline compiler):**

Use timeline syntax - `seq_compiler` expands to multiple instances.

### Timeline Examples

**Single-layer CNN (full stylization):**
```
SEQUENCE 10.0 0
  EFFECT + Hybrid3DEffect 0.00 5.00
  EFFECT + CNNEffect 0.50 5.00 layers=1
```

**Multi-layer CNN with blend:**
```
SEQUENCE 10.0 0
  EFFECT + Hybrid3DEffect 0.00 5.00
  EFFECT + CNNEffect 0.50 5.00 layers=3 blend=0.7
```

Expands to:
```cpp
// Layer 0 (captures original, blend=1.0)
{
  CNNEffectParams p;
  p.layer_index = 0;
  p.total_layers = 3;
  p.blend_amount = 1.0f;
  seq->add_effect(std::make_shared<CNNEffect>(ctx, p), 0.50f, 5.00f, 1);
}
// Layer 1 (blend=1.0)
{
  CNNEffectParams p;
  p.layer_index = 1;
  p.total_layers = 3;
  p.blend_amount = 1.0f;
  seq->add_effect(std::make_shared<CNNEffect>(ctx, p), 0.50f, 5.00f, 2);
}
// Layer 2 (final blend=0.7)
{
  CNNEffectParams p;
  p.layer_index = 2;
  p.total_layers = 3;
  p.blend_amount = 0.7f;
  seq->add_effect(std::make_shared<CNNEffect>(ctx, p), 0.50f, 5.00f, 3);
}
```

---

## Shader Structure

**Bindings:**
```wgsl
@group(0) @binding(0) var smplr: sampler;
@group(0) @binding(1) var txt: texture_2d<f32>;              // Current layer input
@group(0) @binding(2) var<uniform> uniforms: CommonUniforms;
@group(0) @binding(3) var<uniform> params: CNNLayerParams;
@group(0) @binding(4) var original_input: texture_2d<f32>;   // Layer 0 input (captured)
```

**Fragment shader logic:**
```wgsl
@fragment fn fs_main(@builtin(position) p: vec4<f32>) -> @location(0) vec4<f32> {
    let uv = p.xy / uniforms.resolution;
    let input = textureSample(txt, smplr, uv);               // Layer N-1 output
    let original = textureSample(original_input, smplr, uv); // Layer 0 input

    var result = vec4<f32>(0.0);

    if (params.layer_index == 0) {
        result = cnn_conv3x3_with_coord(txt, smplr, uv, uniforms.resolution,
                                        rgba_weights_layer0, coord_weights_layer0, bias_layer0);
        result = cnn_tanh(result);
    }
    // ... other layers

    // Blend with ORIGINAL input (not previous layer)
    return mix(original, result, params.blend_amount);
}
```

**Weight Storage:**

**Layer 0 (coordinate-aware):**
```wgsl
const rgba_weights_layer0: array<mat4x4<f32>, 9> = array(...);
const coord_weights_layer0 = mat2x4<f32>(
  0.1, -0.2, 0.0, 0.0,  # x-coord weights
  -0.1, 0.0, 0.2, 0.0   # y-coord weights
);
const bias_layer0 = vec4<f32>(0.0, 0.0, 0.0, 0.0);
```

**Layers 1+ (standard):**
```wgsl
const weights_layer1: array<mat4x4<f32>, 9> = array(...);
const bias_layer1 = vec4<f32>(0.0, 0.0, 0.0, 0.0);
```

---

## Size Budget

| Component | Size | Notes |
|-----------|------|-------|
| Activation functions | ~200 B | 4 functions |
| Conv3x3 (standard + coord) | ~500 B | Both variants |
| Conv5x5 (standard + coord) | ~700 B | Both variants |
| Conv7x7 (standard + coord) | ~900 B | Both variants |
| Main shader | ~800 B | Layer composition |
| C++ implementation | ~300 B | Effect class |
| **Coord weights** | **+32 B** | Per-layer overhead (layer 0 only) |
| **RGBA weights** | **2-6 KB** | Depends on depth/kernel sizes |
| **Total** | **5-9 KB** | Acceptable for 64k |

**Optimization strategies:**
- Quantize weights (float32 → int8)
- Prune near-zero weights
- Use separable convolutions

---

## Testing

```bash
./build/test_demo_effects  # CNN construction/shader tests
./build/demo64k            # Visual test
```

---

## Blend Parameter Behavior

**blend_amount** controls final compositing with original:
- `blend=0.0`: Pure original (no CNN effect)
- `blend=0.5`: 50% original + 50% CNN
- `blend=1.0`: Pure CNN output (full stylization)

**Important:** Blend uses captured layer 0 input, not previous layer output.

**Example use cases:**
- `blend=1.0`: Full stylization (default)
- `blend=0.7`: Subtle effect preserving original details
- `blend=0.3`: Light artistic touch

## Troubleshooting

**Shader compilation fails:**
- Check `cnn_weights_generated.wgsl` syntax
- Verify snippets registered in `shaders.cc::InitShaderComposer()`
- Ensure `cnn_layer.wgsl` has 5 bindings (including `original_input`)

**Black/corrupted output:**
- Weights untrained (identity placeholder)
- Check `captured_frame` auxiliary texture is registered
- Verify layer priorities in timeline are sequential

**Wrong blend result:**
- Ensure layer 0 has `needs_framebuffer_capture() == true`
- Check MainSequence framebuffer capture logic
- Verify `original_input` binding is populated

**Training loss not decreasing:**
- Lower learning rate (`--learning-rate 0.0001`)
- More epochs (`--epochs 1000`)
- Check input/target image alignment

---

## References

- **Training Script:** `training/train_cnn.py`
- **Shader Composition:** `doc/SEQUENCE.md`
- **Effect System:** `src/gpu/effect.h`