summaryrefslogtreecommitdiff
path: root/doc/CNN_TEST_TOOL.md
diff options
context:
space:
mode:
Diffstat (limited to 'doc/CNN_TEST_TOOL.md')
-rw-r--r--doc/CNN_TEST_TOOL.md174
1 files changed, 75 insertions, 99 deletions
diff --git a/doc/CNN_TEST_TOOL.md b/doc/CNN_TEST_TOOL.md
index e7d679e..ee0d9c5 100644
--- a/doc/CNN_TEST_TOOL.md
+++ b/doc/CNN_TEST_TOOL.md
@@ -1,31 +1,37 @@
# CNN Shader Testing Tool
-Standalone tool for validating trained CNN shaders with GPU-to-CPU readback.
+Standalone tool for validating trained CNN shaders with GPU-to-CPU readback. Supports both CNN v1 (render pipeline) and v2 (compute, storage buffer).
---
## Purpose
-- Validate trained weights (`cnn_weights_generated.wgsl`) against ground truth
+- Validate trained weights against ground truth
- Debug CNN layer behavior in isolation
-- Generate test outputs for patch-based training workflow
-- Match Python training script's inference mode (`train_cnn.py --infer`)
+- Generate test outputs for training workflow
+- Match Python training script's inference mode
---
## Architecture
-**Two-part implementation:**
+**Two implementations:**
-1. **Core GPU utility:** `src/gpu/texture_readback.{h,cc}` (~150 lines)
- - Synchronous texture-to-CPU readback
- - Reusable for screenshots, validation, video export
- - Protected with STRIP_ALL (0 bytes in release builds)
+1. **CNN v1** (render pipeline, texture atlas weights)
+ - 3 fixed layers
+ - RGBA16Float intermediates
+ - BGRA8Unorm final output
-2. **Standalone tool:** `tools/cnn_test.cc` (~450 lines)
- - Custom CNN inference pipeline
- - No MainSequence dependency
- - Asset-based shader loading with automatic include resolution
+2. **CNN v2** (compute shaders, storage buffer weights)
+ - Dynamic layer count from binary
+ - 7D static features (RGBD + UV + sin + bias)
+ - RGBA32Uint packed f16 intermediates
+ - Storage buffer: ~3-5 KB weights
+
+**Core GPU utility:** `src/gpu/texture_readback.{h,cc}`
+- Synchronous texture-to-CPU readback
+- Supports RGBA16Float, RGBA32Uint, BGRA8Unorm
+- Protected with STRIP_ALL (0 bytes in release)
---
@@ -35,24 +41,28 @@ Standalone tool for validating trained CNN shaders with GPU-to-CPU readback.
cnn_test input.png output.png [OPTIONS]
OPTIONS:
- --blend F Final blend amount (0.0-1.0, default: 1.0)
- --format ppm|png Output format (default: png)
- --help Show usage
+ --cnn-version N CNN version: 1 (default) or 2
+ --blend F Final blend amount (0.0-1.0, default: 1.0)
+ --format ppm|png Output format (default: png)
+ --layers N Number of CNN layers (1-10, v1 only, default: 3)
+ --save-intermediates DIR Save intermediate layers to directory
+ --debug-hex Print first 8 pixels as hex (debug)
+ --help Show usage
```
**Examples:**
```bash
-# Full CNN processing
-./build/cnn_test input.png output.png
+# CNN v1 (render pipeline, 3 layers)
+./build/cnn_test input.png output.png --cnn-version 1
-# 50% blend with original
-./build/cnn_test input.png output.png --blend 0.5
+# CNN v2 (compute, storage buffer, dynamic layers)
+./build/cnn_test input.png output.png --cnn-version 2
-# No CNN effect (original passthrough)
-./build/cnn_test input.png output.png --blend 0.0
+# 50% blend with original (v2)
+./build/cnn_test input.png output.png --cnn-version 2 --blend 0.5
-# PPM output format
-./build/cnn_test input.png output.ppm --format ppm
+# Debug hex dump
+./build/cnn_test input.png output.png --cnn-version 2 --debug-hex
```
---
@@ -90,25 +100,24 @@ std::vector<uint8_t> OffscreenRenderTarget::read_pixels() {
}
```
-### CNN Processing Pipeline
+### CNN v1 Pipeline (Render)
-**Fixed 3-layer architecture** (matches trained CNN):
-1. Layer 0: Initial convolution
-2. Layer 1: Intermediate convolution
-3. Layer 2: Final convolution + blend with original
+**Fixed 3-layer architecture:**
+- Ping-pong RGBA16Float textures
+- CNNLayerParams (binding 3): layer_index, blend_amount
+- Shader composer resolves #include directives
-**Ping-pong textures:**
-- 2 intermediate render targets
-- 1 original input reference (binding 4)
+### CNN v2 Pipeline (Compute)
-**Uniforms:**
-- `CommonPostProcessUniforms` (binding 2): resolution, aspect_ratio, time, beat, audio_intensity
-- `CNNLayerParams` (binding 3): layer_index, blend_amount
+**Dynamic layer architecture:**
+1. **Static features compute:** Generate 7D features (RGBD + UV + sin + bias)
+2. **Layer computes:** N layers from binary weights (3-5 typically)
+ - Storage buffer weights (read-only)
+ - RGBA32Uint packed f16 textures (ping-pong)
+ - CNNv2LayerParams: kernel_size, channels, weight_offset, blend
+3. **Readback:** RGBA32Uint → f16 decode → u8 clamp
-**Shader composition:**
-- Uses `ShaderComposer::Get()` via `RenderPipelineBuilder`
-- Automatically resolves `#include` directives
-- Registers CNN snippets: activation, conv3×3, conv5×5, weights
+**Binary format:** Header (20B) + layer info (20B×N) + f16 weights
---
@@ -144,51 +153,34 @@ cmake --build build -j4
---
-## Validation Workflow
+## Validation Workflow (CNN v2)
-### 1. Ground Truth Generation
+### 1. Train and Export
```bash
-# Generate ground truth from Python
-./training/train_cnn.py --infer test.png \
- --export-only training/checkpoints/checkpoint_epoch_5000.pth \
- --output ground_truth.png
+# Train and export weights
+./scripts/train_cnn_v2_full.sh --epochs 200 --batch-size 16
```
### 2. Tool Inference
```bash
-# Run tool (always 3 layers, matching trained CNN)
-./build/cnn_test test.png tool_output.png --blend 1.0
+# Run tool with v2
+./build/cnn_test training/input/img_000.png output.png --cnn-version 2
```
-### 3. Comparison
-```bash
-# Compare (MSE should be low)
-python -c "
-import numpy as np
-from PIL import Image
-gt = np.array(Image.open('ground_truth.png'))
-out = np.array(Image.open('tool_output.png'))
-mse = np.mean((gt.astype(float) - out.astype(float)) ** 2)
-print(f'MSE: {mse:.4f}')
-assert mse < 10.0, f'MSE too high: {mse}'
-"
-```
+### 3. Visual Comparison
+Compare output.png with training/target_X/img_000.png
---
-## Known Issues
+## Status
-**BUG: CNN produces incorrect output (all white)**
-- Readback works correctly (see Technical Notes below)
-- Shader compiles and executes without errors
-- Output is all white (255) regardless of input or blend setting
-- **Likely causes:**
- - Uniform buffer layout mismatch between C++ and WGSL
- - Texture binding issue (input not sampled correctly)
- - Weight matrix initialization problem
-- CNNEffect works correctly in demo (visual validation confirms)
-- **Status:** Under investigation - rendering pipeline differs from demo's CNNEffect
-- **Workaround:** Use CNNEffect visual validation in demo until tool fixed
+**CNN v1:** Builds and runs, produces incorrect output (all white). Use CNNEffect in demo for visual validation.
+
+**CNN v2:** ✅ Fully functional. Tested and working.
+- Loads binary weights from `workspaces/main/weights/cnn_v2_weights.bin`
+- Matches CNNv2Effect architecture
+- Produces correct output
+- Recommended for validation
---
@@ -214,41 +206,25 @@ assert mse < 10.0, f'MSE too high: {mse}'
## Limitations
-- **Fixed layer count:** Cannot run partial networks (3 layers hardcoded)
+- **CNN v1:** Produces incorrect output, use for debugging only
- **Single image:** Batch processing requires shell loop
- **No real-time preview:** Offline processing only
-- **PNG input only:** Uses stb_image (JPEG/PNG/BMP/TGA supported)
-
----
-
-## Future Enhancements
-
-- Batch processing (directory input)
-- Interactive preview mode
-- Per-layer weight inspection
-- Checksum validation against training checkpoints
-- CUDA/Metal direct backends (bypass WebGPU overhead)
+- **PNG input:** stb_image (JPEG/PNG/BMP/TGA also supported)
---
## Technical Notes
-**Number of layers is fixed by trained CNN architecture:**
-- Defined in `cnn_weights_generated.wgsl`
-- Cannot meaningfully run partial networks (layer outputs have different formats/ranges)
-- Tool always processes full 3-layer stack
-
-**Blend parameter:**
-- Applied only to final layer (layer 2)
-- Intermediate layers always use blend=1.0
-- `mix(input, cnn_output, blend_amount)` in shader
+**CNN v2 f16 decoding:**
+- RGBA32Uint texture stores 8×f16 as 4×u32
+- Custom decoder: extract u16, decode f16→f32, clamp [0,1]→u8
+- Handles denormals, infinity, NaN
**Cross-platform:**
-- Tested on macOS (native WebGPU)
-- Builds on Windows via mingw-w64 cross-compile
-- Linux support via native WebGPU
+- macOS, Linux (native WebGPU)
+- Windows (mingw-w64 cross-compile)
**Size impact:**
-- Debug/STRIP_ALL=OFF: ~150 lines compiled
-- STRIP_ALL=ON: 0 bytes (entirely compiled out)
-- FINAL_STRIP=ON: 0 bytes (tool not built)
+- Debug/STRIP_ALL=OFF: compiled
+- STRIP_ALL=ON: 0 bytes (compiled out)
+- FINAL_STRIP=ON: tool not built