# CNN Post-Processing Effect

Neural network-based stylization for rendered scenes.

---

## Overview

Trainable convolutional neural network layers for artistic stylization (painterly, sketch, cel-shaded effects) with minimal runtime overhead.

**Key Features:**
- Position-aware layer 0 (coordinate input for vignetting, edge effects)
- Multi-layer convolutions (3×3, 5×5, 7×7 kernels)
- Modular WGSL shader architecture
- Hardcoded weights (trained offline via PyTorch)
- Residual connections for stable learning
- ~5-8 KB binary footprint

---

## Architecture

### Coordinate-Aware Layer 0

Layer 0 accepts normalized (x,y) patch center coordinates alongside RGBA samples:

```wgsl
fn cnn_conv3x3_with_coord(
  tex: texture_2d<f32>,
  samp: sampler,
  uv: vec2<f32>,                          # Center position [0,1]
  resolution: vec2<f32>,
  rgba_weights: array<mat4x4<f32>, 9>,    # 9 samples × 4×4 matrix
  coord_weights: mat2x4<f32>,             # 2 coords → 4 outputs
  bias: vec4<f32>
) -> vec4<f32>
```

**Input structure:** 9 RGBA samples (36 values) + 1 xy coordinate (2 values) = 38 inputs → 4 outputs

**Size impact:** +32B coord weights, kernel-agnostic

**Use cases:** Position-dependent stylization (vignettes, corner darkening, radial gradients)

### File Structure

```
src/gpu/effects/
  cnn_effect.h/cc         # CNNEffect class

workspaces/main/shaders/cnn/
  cnn_activation.wgsl     # tanh, ReLU, sigmoid, leaky_relu
  cnn_conv3x3.wgsl        # 3×3 convolution (standard + coord-aware)
  cnn_conv5x5.wgsl        # 5×5 convolution (standard + coord-aware)
  cnn_conv7x7.wgsl        # 7×7 convolution (standard + coord-aware)
  cnn_weights_generated.wgsl  # Weight arrays (auto-generated)
  cnn_layer.wgsl          # Main shader (composes above snippets)
```

---

## Training Workflow

### 1. Prepare Training Data

Collect input/target image pairs:
- **Input:** Raw 3D render
- **Target:** Artistic style (hand-painted, filtered, stylized)

```bash
training/input/img_000.png   # Raw render
training/output/img_000.png  # Stylized target
```

Use `image_style_processor.py` to generate targets:
```bash
python3 training/image_style_processor.py input/ output/ pencil_sketch
```

### 2. Train Network

```bash
python3 training/train_cnn.py \
  --input training/input \
  --target training/output \
  --layers 1 \
  --kernel-sizes 3 \
  --epochs 500 \
  --checkpoint-every 50
```

**Multi-layer example:**
```bash
python3 training/train_cnn.py \
  --input training/input \
  --target training/output \
  --layers 3 \
  --kernel-sizes 3,5,3 \
  --epochs 1000 \
  --checkpoint-every 100
```

**Resume from checkpoint:**
```bash
python3 training/train_cnn.py \
  --input training/input \
  --target training/output \
  --resume training/checkpoints/checkpoint_epoch_200.pth
```

### 3. Rebuild Demo

Training script auto-generates `cnn_weights_generated.wgsl`:
```bash
cmake --build build -j4
./build/demo64k
```

---

## Usage

### C++ Integration

```cpp
#include "gpu/effects/cnn_effect.h"

auto cnn = std::make_shared<CNNEffect>(ctx, /*num_layers=*/1);
timeline.add_effect(cnn, start_time, end_time);
```

### Timeline Example

```
SEQUENCE 10.0 0
  EFFECT CNNEffect 10.0 15.0 0
```

---

## Weight Storage

**Layer 0 (coordinate-aware):**
```wgsl
const rgba_weights_layer0: array<mat4x4<f32>, 9> = array(...);
const coord_weights_layer0 = mat2x4<f32>(
  0.1, -0.2, 0.0, 0.0,  # x-coord weights
  -0.1, 0.0, 0.2, 0.0   # y-coord weights
);
const bias_layer0 = vec4<f32>(0.0, 0.0, 0.0, 0.0);
```

**Layers 1+ (standard):**
```wgsl
const weights_layer1: array<mat4x4<f32>, 9> = array(...);
const bias_layer1 = vec4<f32>(0.0, 0.0, 0.0, 0.0);
```

---

## Size Budget

| Component | Size | Notes |
|-----------|------|-------|
| Activation functions | ~200 B | 4 functions |
| Conv3x3 (standard + coord) | ~500 B | Both variants |
| Conv5x5 (standard + coord) | ~700 B | Both variants |
| Conv7x7 (standard + coord) | ~900 B | Both variants |
| Main shader | ~800 B | Layer composition |
| C++ implementation | ~300 B | Effect class |
| **Coord weights** | **+32 B** | Per-layer overhead (layer 0 only) |
| **RGBA weights** | **2-6 KB** | Depends on depth/kernel sizes |
| **Total** | **5-9 KB** | Acceptable for 64k |

**Optimization strategies:**
- Quantize weights (float32 → int8)
- Prune near-zero weights
- Use separable convolutions

---

## Testing

```bash
./build/test_demo_effects  # CNN construction/shader tests
./build/demo64k            # Visual test
```

---

## Troubleshooting

**Shader compilation fails:**
- Check `cnn_weights_generated.wgsl` syntax
- Verify snippets registered in `shaders.cc::InitShaderComposer()`

**Black/corrupted output:**
- Weights untrained (identity placeholder)
- Check residual blending (0.3 default)

**Training loss not decreasing:**
- Lower learning rate (`--learning-rate 0.0001`)
- More epochs (`--epochs 1000`)
- Check input/target image alignment

---

## References

- **Training Script:** `training/train_cnn.py`
- **Shader Composition:** `doc/SEQUENCE.md`
- **Effect System:** `src/gpu/effect.h`