diff options
| author | skal <pascal.massimino@gmail.com> | 2026-02-13 19:51:14 +0100 |
|---|---|---|
| committer | skal <pascal.massimino@gmail.com> | 2026-02-13 19:51:14 +0100 |
| commit | edd549e1527444ae9c74c70f1e3e44b11862f3da (patch) | |
| tree | ba9a6989b1b2a5ada64720db716b6e593a77e709 /doc/CNN_TEST_TOOL.md | |
| parent | a7340d378909cadbfd72dbd1f5b756f907c2a3e0 (diff) | |
CNN test tool: Add CNN v2 support with compute shader architecture
Implement full CNN v2 support for offline validation:
- Add --cnn-version flag (1=render pipeline, 2=compute shader)
- Load binary weights from storage buffer (~3-5 KB)
- Static features compute pass (7D: RGBD + UV + sin + bias)
- Dynamic layer count from binary header
- RGBA32Uint texture readback with f16→u8 conversion
- Custom f16 decoder (handles denormals, infinity, NaN)
Status:
- CNN v1: Produces incorrect output (all white)
- CNN v2: ✅ Fully functional, matches CNNv2Effect
Updated docs:
- doc/CNN_TEST_TOOL.md: Architecture, usage, validation workflow
- doc/HOWTO.md: Recommend v2 for validation
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Diffstat (limited to 'doc/CNN_TEST_TOOL.md')
| -rw-r--r-- | doc/CNN_TEST_TOOL.md | 174 |
1 files changed, 75 insertions, 99 deletions
diff --git a/doc/CNN_TEST_TOOL.md b/doc/CNN_TEST_TOOL.md index e7d679e..ee0d9c5 100644 --- a/doc/CNN_TEST_TOOL.md +++ b/doc/CNN_TEST_TOOL.md @@ -1,31 +1,37 @@ # CNN Shader Testing Tool -Standalone tool for validating trained CNN shaders with GPU-to-CPU readback. +Standalone tool for validating trained CNN shaders with GPU-to-CPU readback. Supports both CNN v1 (render pipeline) and v2 (compute, storage buffer). --- ## Purpose -- Validate trained weights (`cnn_weights_generated.wgsl`) against ground truth +- Validate trained weights against ground truth - Debug CNN layer behavior in isolation -- Generate test outputs for patch-based training workflow -- Match Python training script's inference mode (`train_cnn.py --infer`) +- Generate test outputs for training workflow +- Match Python training script's inference mode --- ## Architecture -**Two-part implementation:** +**Two implementations:** -1. **Core GPU utility:** `src/gpu/texture_readback.{h,cc}` (~150 lines) - - Synchronous texture-to-CPU readback - - Reusable for screenshots, validation, video export - - Protected with STRIP_ALL (0 bytes in release builds) +1. **CNN v1** (render pipeline, texture atlas weights) + - 3 fixed layers + - RGBA16Float intermediates + - BGRA8Unorm final output -2. **Standalone tool:** `tools/cnn_test.cc` (~450 lines) - - Custom CNN inference pipeline - - No MainSequence dependency - - Asset-based shader loading with automatic include resolution +2. **CNN v2** (compute shaders, storage buffer weights) + - Dynamic layer count from binary + - 7D static features (RGBD + UV + sin + bias) + - RGBA32Uint packed f16 intermediates + - Storage buffer: ~3-5 KB weights + +**Core GPU utility:** `src/gpu/texture_readback.{h,cc}` +- Synchronous texture-to-CPU readback +- Supports RGBA16Float, RGBA32Uint, BGRA8Unorm +- Protected with STRIP_ALL (0 bytes in release) --- @@ -35,24 +41,28 @@ Standalone tool for validating trained CNN shaders with GPU-to-CPU readback. cnn_test input.png output.png [OPTIONS] OPTIONS: - --blend F Final blend amount (0.0-1.0, default: 1.0) - --format ppm|png Output format (default: png) - --help Show usage + --cnn-version N CNN version: 1 (default) or 2 + --blend F Final blend amount (0.0-1.0, default: 1.0) + --format ppm|png Output format (default: png) + --layers N Number of CNN layers (1-10, v1 only, default: 3) + --save-intermediates DIR Save intermediate layers to directory + --debug-hex Print first 8 pixels as hex (debug) + --help Show usage ``` **Examples:** ```bash -# Full CNN processing -./build/cnn_test input.png output.png +# CNN v1 (render pipeline, 3 layers) +./build/cnn_test input.png output.png --cnn-version 1 -# 50% blend with original -./build/cnn_test input.png output.png --blend 0.5 +# CNN v2 (compute, storage buffer, dynamic layers) +./build/cnn_test input.png output.png --cnn-version 2 -# No CNN effect (original passthrough) -./build/cnn_test input.png output.png --blend 0.0 +# 50% blend with original (v2) +./build/cnn_test input.png output.png --cnn-version 2 --blend 0.5 -# PPM output format -./build/cnn_test input.png output.ppm --format ppm +# Debug hex dump +./build/cnn_test input.png output.png --cnn-version 2 --debug-hex ``` --- @@ -90,25 +100,24 @@ std::vector<uint8_t> OffscreenRenderTarget::read_pixels() { } ``` -### CNN Processing Pipeline +### CNN v1 Pipeline (Render) -**Fixed 3-layer architecture** (matches trained CNN): -1. Layer 0: Initial convolution -2. Layer 1: Intermediate convolution -3. Layer 2: Final convolution + blend with original +**Fixed 3-layer architecture:** +- Ping-pong RGBA16Float textures +- CNNLayerParams (binding 3): layer_index, blend_amount +- Shader composer resolves #include directives -**Ping-pong textures:** -- 2 intermediate render targets -- 1 original input reference (binding 4) +### CNN v2 Pipeline (Compute) -**Uniforms:** -- `CommonPostProcessUniforms` (binding 2): resolution, aspect_ratio, time, beat, audio_intensity -- `CNNLayerParams` (binding 3): layer_index, blend_amount +**Dynamic layer architecture:** +1. **Static features compute:** Generate 7D features (RGBD + UV + sin + bias) +2. **Layer computes:** N layers from binary weights (3-5 typically) + - Storage buffer weights (read-only) + - RGBA32Uint packed f16 textures (ping-pong) + - CNNv2LayerParams: kernel_size, channels, weight_offset, blend +3. **Readback:** RGBA32Uint → f16 decode → u8 clamp -**Shader composition:** -- Uses `ShaderComposer::Get()` via `RenderPipelineBuilder` -- Automatically resolves `#include` directives -- Registers CNN snippets: activation, conv3×3, conv5×5, weights +**Binary format:** Header (20B) + layer info (20B×N) + f16 weights --- @@ -144,51 +153,34 @@ cmake --build build -j4 --- -## Validation Workflow +## Validation Workflow (CNN v2) -### 1. Ground Truth Generation +### 1. Train and Export ```bash -# Generate ground truth from Python -./training/train_cnn.py --infer test.png \ - --export-only training/checkpoints/checkpoint_epoch_5000.pth \ - --output ground_truth.png +# Train and export weights +./scripts/train_cnn_v2_full.sh --epochs 200 --batch-size 16 ``` ### 2. Tool Inference ```bash -# Run tool (always 3 layers, matching trained CNN) -./build/cnn_test test.png tool_output.png --blend 1.0 +# Run tool with v2 +./build/cnn_test training/input/img_000.png output.png --cnn-version 2 ``` -### 3. Comparison -```bash -# Compare (MSE should be low) -python -c " -import numpy as np -from PIL import Image -gt = np.array(Image.open('ground_truth.png')) -out = np.array(Image.open('tool_output.png')) -mse = np.mean((gt.astype(float) - out.astype(float)) ** 2) -print(f'MSE: {mse:.4f}') -assert mse < 10.0, f'MSE too high: {mse}' -" -``` +### 3. Visual Comparison +Compare output.png with training/target_X/img_000.png --- -## Known Issues +## Status -**BUG: CNN produces incorrect output (all white)** -- Readback works correctly (see Technical Notes below) -- Shader compiles and executes without errors -- Output is all white (255) regardless of input or blend setting -- **Likely causes:** - - Uniform buffer layout mismatch between C++ and WGSL - - Texture binding issue (input not sampled correctly) - - Weight matrix initialization problem -- CNNEffect works correctly in demo (visual validation confirms) -- **Status:** Under investigation - rendering pipeline differs from demo's CNNEffect -- **Workaround:** Use CNNEffect visual validation in demo until tool fixed +**CNN v1:** Builds and runs, produces incorrect output (all white). Use CNNEffect in demo for visual validation. + +**CNN v2:** ✅ Fully functional. Tested and working. +- Loads binary weights from `workspaces/main/weights/cnn_v2_weights.bin` +- Matches CNNv2Effect architecture +- Produces correct output +- Recommended for validation --- @@ -214,41 +206,25 @@ assert mse < 10.0, f'MSE too high: {mse}' ## Limitations -- **Fixed layer count:** Cannot run partial networks (3 layers hardcoded) +- **CNN v1:** Produces incorrect output, use for debugging only - **Single image:** Batch processing requires shell loop - **No real-time preview:** Offline processing only -- **PNG input only:** Uses stb_image (JPEG/PNG/BMP/TGA supported) - ---- - -## Future Enhancements - -- Batch processing (directory input) -- Interactive preview mode -- Per-layer weight inspection -- Checksum validation against training checkpoints -- CUDA/Metal direct backends (bypass WebGPU overhead) +- **PNG input:** stb_image (JPEG/PNG/BMP/TGA also supported) --- ## Technical Notes -**Number of layers is fixed by trained CNN architecture:** -- Defined in `cnn_weights_generated.wgsl` -- Cannot meaningfully run partial networks (layer outputs have different formats/ranges) -- Tool always processes full 3-layer stack - -**Blend parameter:** -- Applied only to final layer (layer 2) -- Intermediate layers always use blend=1.0 -- `mix(input, cnn_output, blend_amount)` in shader +**CNN v2 f16 decoding:** +- RGBA32Uint texture stores 8×f16 as 4×u32 +- Custom decoder: extract u16, decode f16→f32, clamp [0,1]→u8 +- Handles denormals, infinity, NaN **Cross-platform:** -- Tested on macOS (native WebGPU) -- Builds on Windows via mingw-w64 cross-compile -- Linux support via native WebGPU +- macOS, Linux (native WebGPU) +- Windows (mingw-w64 cross-compile) **Size impact:** -- Debug/STRIP_ALL=OFF: ~150 lines compiled -- STRIP_ALL=ON: 0 bytes (entirely compiled out) -- FINAL_STRIP=ON: 0 bytes (tool not built) +- Debug/STRIP_ALL=OFF: compiled +- STRIP_ALL=ON: 0 bytes (compiled out) +- FINAL_STRIP=ON: tool not built |
