summaryrefslogtreecommitdiff
path: root/doc/CNN_EFFECT.md
blob: ec70b13f3396213c41654c9590883eead4bb85d3 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
# CNN Post-Processing Effect

Neural network-based stylization for rendered scenes.

---

## Overview

Trainable convolutional neural network layers for artistic stylization (painterly, sketch, cel-shaded effects) with minimal runtime overhead.

**Key Features:**
- Position-aware layer 0 (coordinate input for vignetting, edge effects)
- Multi-layer convolutions (3×3, 5×5, 7×7 kernels)
- Modular WGSL shader architecture
- Hardcoded weights (trained offline via PyTorch)
- Residual connections for stable learning
- ~5-8 KB binary footprint

---

## Architecture

### Coordinate-Aware Layer 0

Layer 0 accepts normalized (x,y) patch center coordinates alongside RGBA samples:

```wgsl
fn cnn_conv3x3_with_coord(
  tex: texture_2d<f32>,
  samp: sampler,
  uv: vec2<f32>,                          # Center position [0,1]
  resolution: vec2<f32>,
  rgba_weights: array<mat4x4<f32>, 9>,    # 9 samples × 4×4 matrix
  coord_weights: mat2x4<f32>,             # 2 coords → 4 outputs
  bias: vec4<f32>
) -> vec4<f32>
```

**Input structure:** 9 RGBA samples (36 values) + 1 xy coordinate (2 values) = 38 inputs → 4 outputs

**Size impact:** +32B coord weights, kernel-agnostic

**Use cases:** Position-dependent stylization (vignettes, corner darkening, radial gradients)

### File Structure

```
src/gpu/effects/
  cnn_effect.h/cc         # CNNEffect class

workspaces/main/shaders/cnn/
  cnn_activation.wgsl     # tanh, ReLU, sigmoid, leaky_relu
  cnn_conv3x3.wgsl        # 3×3 convolution (standard + coord-aware)
  cnn_conv5x5.wgsl        # 5×5 convolution (standard + coord-aware)
  cnn_conv7x7.wgsl        # 7×7 convolution (standard + coord-aware)
  cnn_weights_generated.wgsl  # Weight arrays (auto-generated)
  cnn_layer.wgsl          # Main shader (composes above snippets)
```

---

## Training Workflow

### 1. Prepare Training Data

Collect input/target image pairs:
- **Input:** Raw 3D render
- **Target:** Artistic style (hand-painted, filtered, stylized)

```bash
training/input/img_000.png   # Raw render
training/output/img_000.png  # Stylized target
```

Use `image_style_processor.py` to generate targets:
```bash
python3 training/image_style_processor.py input/ output/ pencil_sketch
```

### 2. Train Network

```bash
python3 training/train_cnn.py \
  --input training/input \
  --target training/output \
  --layers 1 \
  --kernel-sizes 3 \
  --epochs 500 \
  --checkpoint-every 50
```

**Multi-layer example:**
```bash
python3 training/train_cnn.py \
  --input training/input \
  --target training/output \
  --layers 3 \
  --kernel-sizes 3,5,3 \
  --epochs 1000 \
  --checkpoint-every 100
```

**Resume from checkpoint:**
```bash
python3 training/train_cnn.py \
  --input training/input \
  --target training/output \
  --resume training/checkpoints/checkpoint_epoch_200.pth
```

### 3. Rebuild Demo

Training script auto-generates `cnn_weights_generated.wgsl`:
```bash
cmake --build build -j4
./build/demo64k
```

---

## Usage

### C++ Integration

```cpp
#include "gpu/effects/cnn_effect.h"

auto cnn = std::make_shared<CNNEffect>(ctx, /*num_layers=*/1);
timeline.add_effect(cnn, start_time, end_time);
```

### Timeline Example

```
SEQUENCE 10.0 0
  EFFECT CNNEffect 10.0 15.0 0
```

---

## Weight Storage

**Layer 0 (coordinate-aware):**
```wgsl
const rgba_weights_layer0: array<mat4x4<f32>, 9> = array(...);
const coord_weights_layer0 = mat2x4<f32>(
  0.1, -0.2, 0.0, 0.0,  # x-coord weights
  -0.1, 0.0, 0.2, 0.0   # y-coord weights
);
const bias_layer0 = vec4<f32>(0.0, 0.0, 0.0, 0.0);
```

**Layers 1+ (standard):**
```wgsl
const weights_layer1: array<mat4x4<f32>, 9> = array(...);
const bias_layer1 = vec4<f32>(0.0, 0.0, 0.0, 0.0);
```

---

## Size Budget

| Component | Size | Notes |
|-----------|------|-------|
| Activation functions | ~200 B | 4 functions |
| Conv3x3 (standard + coord) | ~500 B | Both variants |
| Conv5x5 (standard + coord) | ~700 B | Both variants |
| Conv7x7 (standard + coord) | ~900 B | Both variants |
| Main shader | ~800 B | Layer composition |
| C++ implementation | ~300 B | Effect class |
| **Coord weights** | **+32 B** | Per-layer overhead (layer 0 only) |
| **RGBA weights** | **2-6 KB** | Depends on depth/kernel sizes |
| **Total** | **5-9 KB** | Acceptable for 64k |

**Optimization strategies:**
- Quantize weights (float32 → int8)
- Prune near-zero weights
- Use separable convolutions

---

## Testing

```bash
./build/test_demo_effects  # CNN construction/shader tests
./build/demo64k            # Visual test
```

---

## Troubleshooting

**Shader compilation fails:**
- Check `cnn_weights_generated.wgsl` syntax
- Verify snippets registered in `shaders.cc::InitShaderComposer()`

**Black/corrupted output:**
- Weights untrained (identity placeholder)
- Check residual blending (0.3 default)

**Training loss not decreasing:**
- Lower learning rate (`--learning-rate 0.0001`)
- More epochs (`--epochs 1000`)
- Check input/target image alignment

---

## References

- **Training Script:** `training/train_cnn.py`
- **Shader Composition:** `doc/SEQUENCE.md`
- **Effect System:** `src/gpu/effect.h`