doc/CNN.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79

# Convolutional Neural Net Shader (CNN) post-processing

**Status:** ✅ Foundation implemented (single-layer, expandable to multi-pass)

## Idea

Have the input 3d scene be processed by a multi-layer CNN trained on the side.
Input: some rendered scene.
Output: 'stylized' scene with CNN post-processing.

**See `doc/CNN_EFFECT.md` for implementation details, usage, and API reference.**

## Shader implementation

### input / output

Need 1 texture buffer per CNN layer.
Input (r,g,b,1/z) for layer 0 (render 3d scene), or output from layer N-1 for layer N.
output: (r,g,b, alpha). Don't need the 1/z information (can be fetched from input)

### size of one layer

Notation:
S: the number of input samples from layer N-1.
Example: 3x3 input -> S = 3x3 = 9. 

Each S samples is 4 values (r,g,b, w=1/z).

Each sample is processed by a mat4 matrix. 4 input => 4 output.

Weight matrix = S x mat4

Final bias: 4 values.

WGSL code example: See file CNN.shader

### Layers

we need 3 or 4 layer ?
Several different shaders for each layer.
Ping-pong for input/output texture buffer between each layers?

## Implementation Status

**Completed:**
- ✅ Modular WGSL shader architecture (6 snippet files)
- ✅ CNNEffect C++ class (single-layer rendering)
- ✅ ShaderComposer integration (#include resolution)
- ✅ Asset registration (7 new shader assets)
- ✅ Test coverage (test_demo_effects.cc)
- ✅ Placeholder identity weights for testing

**Size:** ~3-4 KB shader code + ~2-4 KB weights = **5-8 KB total**

**Pending:**
- ⏳ Training script (`scripts/train_cnn.py`) to generate real weights
- ⏳ Multi-layer rendering with ping-pong textures
- ⏳ Weight quantization for size optimization

---

## Training (To Be Implemented)

The layer weight/bias data are hard-coded in the shaders.
Training workflow:

1. Prepare image pairs (before: raw render, after: target style)
2. Run `python scripts/train_cnn.py --input scene.png --target stylized.png`
3. Script generates `cnn_weights_generated.wgsl`
4. Rebuild: `cmake --build build -j4`

**Reference:** File `CNN.py` contains training example (needs adaptation).

Need a repository of reference image pairs (before/after) for training and validation.
Each input image is randomly sampled into 3×3 patch of (r,g,b,1/z) input samples.
And trained to match the (r,g,b,a) output.

Training generates the .wgsl code for layers' shaders.