doc/CNN_TEST_TOOL.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228

# CNN Shader Testing Tool

Standalone tool for validating trained CNN shaders with GPU-to-CPU readback.

---

## Purpose

- Validate trained weights (`cnn_weights_generated.wgsl`) against ground truth
- Debug CNN layer behavior in isolation
- Generate test outputs for patch-based training workflow
- Match Python training script's inference mode (`train_cnn.py --infer`)

---

## Architecture

**Two-part implementation:**

1. **Core GPU utility:** `src/gpu/texture_readback.{h,cc}` (~150 lines)
   - Synchronous texture-to-CPU readback
   - Reusable for screenshots, validation, video export
   - Protected with STRIP_ALL (0 bytes in release builds)

2. **Standalone tool:** `tools/cnn_test.cc` (~450 lines)
   - Custom CNN inference pipeline
   - No MainSequence dependency
   - Asset-based shader loading with automatic include resolution

---

## Usage

```bash
cnn_test input.png output.png [OPTIONS]

OPTIONS:
  --blend F         Final blend amount (0.0-1.0, default: 1.0)
  --format ppm|png  Output format (default: png)
  --help            Show usage
```

**Examples:**
```bash
# Full CNN processing
./build/cnn_test input.png output.png

# 50% blend with original
./build/cnn_test input.png output.png --blend 0.5

# No CNN effect (original passthrough)
./build/cnn_test input.png output.png --blend 0.0

# PPM output format
./build/cnn_test input.png output.ppm --format ppm
```

---

## Implementation Details

### Core Readback Utility

**File:** `src/gpu/texture_readback.{h,cc}`

**Function:**
```cpp
std::vector<uint8_t> read_texture_pixels(
    WGPUInstance instance,
    WGPUDevice device,
    WGPUTexture texture,
    int width,
    int height);
```

**Features:**
- Returns BGRA8 format (4 bytes per pixel)
- Synchronous blocking operation
- Cross-platform async callback handling (Win32 vs Native API)
- Automatic staging buffer creation and cleanup

**Refactored OffscreenRenderTarget:**
```cpp
std::vector<uint8_t> OffscreenRenderTarget::read_pixels() {
#if !defined(STRIP_ALL)
  return read_texture_pixels(instance_, device_, texture_, width_, height_);
#else
  return std::vector<uint8_t>();
#endif
}
```

### CNN Processing Pipeline

**Fixed 3-layer architecture** (matches trained CNN):
1. Layer 0: Initial convolution
2. Layer 1: Intermediate convolution
3. Layer 2: Final convolution + blend with original

**Ping-pong textures:**
- 2 intermediate render targets
- 1 original input reference (binding 4)

**Uniforms:**
- `CommonPostProcessUniforms` (binding 2): resolution, aspect_ratio, time, beat, audio_intensity
- `CNNLayerParams` (binding 3): layer_index, blend_amount

**Shader composition:**
- Uses `ShaderComposer::Get()` via `RenderPipelineBuilder`
- Automatically resolves `#include` directives
- Registers CNN snippets: activation, conv3×3, conv5×5, weights

---

## Build Integration

**CMakeLists.txt:**

1. Added `src/gpu/texture_readback.cc` to GPU_SOURCES (both sections)
2. Tool target:
```cmake
add_executable(cnn_test
    tools/cnn_test.cc
    src/tests/common/webgpu_test_fixture.cc
    src/tests/common/offscreen_render_target.cc
    ${PLATFORM_SOURCES}
    ${GEN_DEMO_CC})

target_link_libraries(cnn_test PRIVATE
    gpu util procedural ${DEMO_LIBS})

add_dependencies(cnn_test generate_demo_assets)

target_compile_definitions(cnn_test PRIVATE
    STB_IMAGE_IMPLEMENTATION
    STB_IMAGE_WRITE_IMPLEMENTATION)
```

**Build:**
```bash
cmake -S . -B build -DDEMO_BUILD_TOOLS=ON
cmake --build build -j4
```

---

## Validation Workflow

### 1. Ground Truth Generation
```bash
# Generate ground truth from Python
./training/train_cnn.py --infer test.png \
  --export-only training/checkpoints/checkpoint_epoch_5000.pth \
  --output ground_truth.png
```

### 2. Tool Inference
```bash
# Run tool (always 3 layers, matching trained CNN)
./build/cnn_test test.png tool_output.png --blend 1.0
```

### 3. Comparison
```bash
# Compare (MSE should be low)
python -c "
import numpy as np
from PIL import Image
gt = np.array(Image.open('ground_truth.png'))
out = np.array(Image.open('tool_output.png'))
mse = np.mean((gt.astype(float) - out.astype(float)) ** 2)
print(f'MSE: {mse:.4f}')
assert mse < 10.0, f'MSE too high: {mse}'
"
```

---

## Known Issues

**BUG: Black output (uninitialized input texture)**
- Tool produces all-black output (MSE 64860 vs ground truth)
- Root cause: First intermediate texture not initialized with input image
- Multi-layer processing starts with uninitialized data
- Fix required: Copy input_texture → intermediate_textures[0] before layer loop

---

## Limitations

- **Fixed layer count:** Cannot run partial networks (3 layers hardcoded)
- **Single image:** Batch processing requires shell loop
- **No real-time preview:** Offline processing only
- **PNG input only:** Uses stb_image (JPEG/PNG/BMP/TGA supported)

---

## Future Enhancements

- Batch processing (directory input)
- Interactive preview mode
- Per-layer weight inspection
- Checksum validation against training checkpoints
- CUDA/Metal direct backends (bypass WebGPU overhead)

---

## Technical Notes

**Number of layers is fixed by trained CNN architecture:**
- Defined in `cnn_weights_generated.wgsl`
- Cannot meaningfully run partial networks (layer outputs have different formats/ranges)
- Tool always processes full 3-layer stack

**Blend parameter:**
- Applied only to final layer (layer 2)
- Intermediate layers always use blend=1.0
- `mix(input, cnn_output, blend_amount)` in shader

**Cross-platform:**
- Tested on macOS (native WebGPU)
- Builds on Windows via mingw-w64 cross-compile
- Linux support via native WebGPU

**Size impact:**
- Debug/STRIP_ALL=OFF: ~150 lines compiled
- STRIP_ALL=ON: 0 bytes (entirely compiled out)
- FINAL_STRIP=ON: 0 bytes (tool not built)