diff options
Diffstat (limited to 'doc/CNN_TEST_TOOL.md')
| -rw-r--r-- | doc/CNN_TEST_TOOL.md | 188 |
1 files changed, 89 insertions, 99 deletions
diff --git a/doc/CNN_TEST_TOOL.md b/doc/CNN_TEST_TOOL.md index e7d679e..4307894 100644 --- a/doc/CNN_TEST_TOOL.md +++ b/doc/CNN_TEST_TOOL.md @@ -1,31 +1,37 @@ # CNN Shader Testing Tool -Standalone tool for validating trained CNN shaders with GPU-to-CPU readback. +Standalone tool for validating trained CNN shaders with GPU-to-CPU readback. Supports both CNN v1 (render pipeline) and v2 (compute, storage buffer). --- ## Purpose -- Validate trained weights (`cnn_weights_generated.wgsl`) against ground truth +- Validate trained weights against ground truth - Debug CNN layer behavior in isolation -- Generate test outputs for patch-based training workflow -- Match Python training script's inference mode (`train_cnn.py --infer`) +- Generate test outputs for training workflow +- Match Python training script's inference mode --- ## Architecture -**Two-part implementation:** +**Two implementations:** -1. **Core GPU utility:** `src/gpu/texture_readback.{h,cc}` (~150 lines) - - Synchronous texture-to-CPU readback - - Reusable for screenshots, validation, video export - - Protected with STRIP_ALL (0 bytes in release builds) +1. **CNN v1** (render pipeline, texture atlas weights) + - 3 fixed layers + - RGBA16Float intermediates + - BGRA8Unorm final output -2. **Standalone tool:** `tools/cnn_test.cc` (~450 lines) - - Custom CNN inference pipeline - - No MainSequence dependency - - Asset-based shader loading with automatic include resolution +2. **CNN v2** (compute shaders, storage buffer weights) + - Dynamic layer count from binary + - 7D static features (RGBD + UV + sin + bias) + - RGBA32Uint packed f16 intermediates + - Storage buffer: ~3-5 KB weights + +**Core GPU utility:** `src/gpu/texture_readback.{h,cc}` +- Synchronous texture-to-CPU readback +- Supports RGBA16Float, RGBA32Uint, BGRA8Unorm +- Protected with STRIP_ALL (0 bytes in release) --- @@ -35,26 +41,36 @@ Standalone tool for validating trained CNN shaders with GPU-to-CPU readback. cnn_test input.png output.png [OPTIONS] OPTIONS: - --blend F Final blend amount (0.0-1.0, default: 1.0) - --format ppm|png Output format (default: png) - --help Show usage + --cnn-version N CNN version: 1 (default) or 2 (ignored with --weights) + --weights PATH Load weights from .bin (forces CNN v2, overrides layer config) + --blend F Final blend amount (0.0-1.0, default: 1.0) + --format ppm|png Output format (default: png) + --layers N Number of CNN layers (1-10, v1 only, default: 3, ignored with --weights) + --save-intermediates DIR Save intermediate layers to directory + --debug-hex Print first 8 pixels as hex (debug) + --help Show usage ``` **Examples:** ```bash -# Full CNN processing -./build/cnn_test input.png output.png +# CNN v1 (render pipeline, 3 layers) +./build/cnn_test input.png output.png --cnn-version 1 + +# CNN v2 (compute, storage buffer, uses asset system weights) +./build/cnn_test input.png output.png --cnn-version 2 -# 50% blend with original -./build/cnn_test input.png output.png --blend 0.5 +# CNN v2 with runtime weight loading (loads layer config from .bin) +./build/cnn_test input.png output.png --weights checkpoints/checkpoint_epoch_100.pth.bin -# No CNN effect (original passthrough) -./build/cnn_test input.png output.png --blend 0.0 +# 50% blend with original (v2) +./build/cnn_test input.png output.png --cnn-version 2 --blend 0.5 -# PPM output format -./build/cnn_test input.png output.ppm --format ppm +# Debug hex dump +./build/cnn_test input.png output.png --cnn-version 2 --debug-hex ``` +**Important:** When using `--weights`, the layer count and kernel sizes are read from the binary file header, overriding any `--layers` or `--cnn-version` arguments. + --- ## Implementation Details @@ -90,25 +106,31 @@ std::vector<uint8_t> OffscreenRenderTarget::read_pixels() { } ``` -### CNN Processing Pipeline +### CNN v1 Pipeline (Render) -**Fixed 3-layer architecture** (matches trained CNN): -1. Layer 0: Initial convolution -2. Layer 1: Intermediate convolution -3. Layer 2: Final convolution + blend with original +**Fixed 3-layer architecture:** +- Ping-pong RGBA16Float textures +- CNNLayerParams (binding 3): layer_index, blend_amount +- Shader composer resolves #include directives -**Ping-pong textures:** -- 2 intermediate render targets -- 1 original input reference (binding 4) +### CNN v2 Pipeline (Compute) -**Uniforms:** -- `CommonPostProcessUniforms` (binding 2): resolution, aspect_ratio, time, beat, audio_intensity -- `CNNLayerParams` (binding 3): layer_index, blend_amount +**Dynamic layer architecture:** +1. **Static features compute:** Generate 7D features (RGBD + UV + sin + bias) +2. **Layer computes:** N layers from binary weights (3-5 typically) + - Storage buffer weights (read-only) + - RGBA32Uint packed f16 textures (ping-pong) + - CNNv2LayerParams: kernel_size, channels, weight_offset, blend +3. **Readback:** RGBA32Uint → f16 decode → u8 clamp -**Shader composition:** -- Uses `ShaderComposer::Get()` via `RenderPipelineBuilder` -- Automatically resolves `#include` directives -- Registers CNN snippets: activation, conv3×3, conv5×5, weights +**Binary format:** Header (20B) + layer info (20B×N) + f16 weights + +**Weight Loading:** +- **Without `--weights`:** Loads from asset system (`ASSET_WEIGHTS_CNN_V2`) +- **With `--weights PATH`:** Loads from external `.bin` file (e.g., checkpoint exports) + - Layer count and kernel sizes parsed from binary header + - Overrides any `--layers` or `--cnn-version` arguments + - Enables runtime testing of training checkpoints without rebuild --- @@ -144,51 +166,35 @@ cmake --build build -j4 --- -## Validation Workflow +## Validation Workflow (CNN v2) -### 1. Ground Truth Generation +### 1. Train and Export ```bash -# Generate ground truth from Python -./training/train_cnn.py --infer test.png \ - --export-only training/checkpoints/checkpoint_epoch_5000.pth \ - --output ground_truth.png +# Train and export weights +./scripts/train_cnn_v2_full.sh --epochs 200 --batch-size 16 ``` ### 2. Tool Inference ```bash -# Run tool (always 3 layers, matching trained CNN) -./build/cnn_test test.png tool_output.png --blend 1.0 +# Run tool with v2 +./build/cnn_test training/input/img_000.png output.png --cnn-version 2 ``` -### 3. Comparison -```bash -# Compare (MSE should be low) -python -c " -import numpy as np -from PIL import Image -gt = np.array(Image.open('ground_truth.png')) -out = np.array(Image.open('tool_output.png')) -mse = np.mean((gt.astype(float) - out.astype(float)) ** 2) -print(f'MSE: {mse:.4f}') -assert mse < 10.0, f'MSE too high: {mse}' -" -``` +### 3. Visual Comparison +Compare output.png with training/target_X/img_000.png --- -## Known Issues +## Status -**BUG: CNN produces incorrect output (all white)** -- Readback works correctly (see Technical Notes below) -- Shader compiles and executes without errors -- Output is all white (255) regardless of input or blend setting -- **Likely causes:** - - Uniform buffer layout mismatch between C++ and WGSL - - Texture binding issue (input not sampled correctly) - - Weight matrix initialization problem -- CNNEffect works correctly in demo (visual validation confirms) -- **Status:** Under investigation - rendering pipeline differs from demo's CNNEffect -- **Workaround:** Use CNNEffect visual validation in demo until tool fixed +**CNN v1:** Builds and runs, produces incorrect output (all white). Use CNNEffect in demo for visual validation. + +**CNN v2:** ⚠️ Partially functional. Readback works but output differs from HTML validation tool. +- Loads binary weights from `workspaces/main/weights/cnn_v2_weights.bin` +- Matches CNNv2Effect architecture +- **Known Issue:** Visual output differs from `tools/cnn_v2_test/index.html` despite matching shader code +- Root cause under investigation (weight indexing? texture sampling? activation clamping?) +- Use HTML tool (`tools/cnn_v2_test/index.html`) for accurate validation --- @@ -214,41 +220,25 @@ assert mse < 10.0, f'MSE too high: {mse}' ## Limitations -- **Fixed layer count:** Cannot run partial networks (3 layers hardcoded) +- **CNN v1:** Produces incorrect output, use for debugging only - **Single image:** Batch processing requires shell loop - **No real-time preview:** Offline processing only -- **PNG input only:** Uses stb_image (JPEG/PNG/BMP/TGA supported) - ---- - -## Future Enhancements - -- Batch processing (directory input) -- Interactive preview mode -- Per-layer weight inspection -- Checksum validation against training checkpoints -- CUDA/Metal direct backends (bypass WebGPU overhead) +- **PNG input:** stb_image (JPEG/PNG/BMP/TGA also supported) --- ## Technical Notes -**Number of layers is fixed by trained CNN architecture:** -- Defined in `cnn_weights_generated.wgsl` -- Cannot meaningfully run partial networks (layer outputs have different formats/ranges) -- Tool always processes full 3-layer stack - -**Blend parameter:** -- Applied only to final layer (layer 2) -- Intermediate layers always use blend=1.0 -- `mix(input, cnn_output, blend_amount)` in shader +**CNN v2 f16 decoding:** +- RGBA32Uint texture stores 8×f16 as 4×u32 +- Custom decoder: extract u16, decode f16→f32, clamp [0,1]→u8 +- Handles denormals, infinity, NaN **Cross-platform:** -- Tested on macOS (native WebGPU) -- Builds on Windows via mingw-w64 cross-compile -- Linux support via native WebGPU +- macOS, Linux (native WebGPU) +- Windows (mingw-w64 cross-compile) **Size impact:** -- Debug/STRIP_ALL=OFF: ~150 lines compiled -- STRIP_ALL=ON: 0 bytes (entirely compiled out) -- FINAL_STRIP=ON: 0 bytes (tool not built) +- Debug/STRIP_ALL=OFF: compiled +- STRIP_ALL=ON: 0 bytes (compiled out) +- FINAL_STRIP=ON: tool not built |
