# CNN Shader Testing Tool Standalone tool for validating trained CNN shaders with GPU-to-CPU readback. --- ## Purpose - Validate trained weights (`cnn_weights_generated.wgsl`) against ground truth - Debug CNN layer behavior in isolation - Generate test outputs for patch-based training workflow - Match Python training script's inference mode (`train_cnn.py --infer`) --- ## Architecture **Two-part implementation:** 1. **Core GPU utility:** `src/gpu/texture_readback.{h,cc}` (~150 lines) - Synchronous texture-to-CPU readback - Reusable for screenshots, validation, video export - Protected with STRIP_ALL (0 bytes in release builds) 2. **Standalone tool:** `tools/cnn_test.cc` (~450 lines) - Custom CNN inference pipeline - No MainSequence dependency - Asset-based shader loading with automatic include resolution --- ## Usage ```bash cnn_test input.png output.png [OPTIONS] OPTIONS: --blend F Final blend amount (0.0-1.0, default: 1.0) --format ppm|png Output format (default: png) --help Show usage ``` **Examples:** ```bash # Full CNN processing ./build/cnn_test input.png output.png # 50% blend with original ./build/cnn_test input.png output.png --blend 0.5 # No CNN effect (original passthrough) ./build/cnn_test input.png output.png --blend 0.0 # PPM output format ./build/cnn_test input.png output.ppm --format ppm ``` --- ## Implementation Details ### Core Readback Utility **File:** `src/gpu/texture_readback.{h,cc}` **Function:** ```cpp std::vector read_texture_pixels( WGPUInstance instance, WGPUDevice device, WGPUTexture texture, int width, int height); ``` **Features:** - Returns BGRA8 format (4 bytes per pixel) - Synchronous blocking operation - Cross-platform async callback handling (Win32 vs Native API) - Automatic staging buffer creation and cleanup **Refactored OffscreenRenderTarget:** ```cpp std::vector OffscreenRenderTarget::read_pixels() { #if !defined(STRIP_ALL) return read_texture_pixels(instance_, device_, texture_, width_, height_); #else return std::vector(); #endif } ``` ### CNN Processing Pipeline **Fixed 3-layer architecture** (matches trained CNN): 1. Layer 0: Initial convolution 2. Layer 1: Intermediate convolution 3. Layer 2: Final convolution + blend with original **Ping-pong textures:** - 2 intermediate render targets - 1 original input reference (binding 4) **Uniforms:** - `CommonPostProcessUniforms` (binding 2): resolution, aspect_ratio, time, beat, audio_intensity - `CNNLayerParams` (binding 3): layer_index, blend_amount **Shader composition:** - Uses `ShaderComposer::Get()` via `RenderPipelineBuilder` - Automatically resolves `#include` directives - Registers CNN snippets: activation, conv3×3, conv5×5, weights --- ## Build Integration **CMakeLists.txt:** 1. Added `src/gpu/texture_readback.cc` to GPU_SOURCES (both sections) 2. Tool target: ```cmake add_executable(cnn_test tools/cnn_test.cc src/tests/common/webgpu_test_fixture.cc src/tests/common/offscreen_render_target.cc ${PLATFORM_SOURCES} ${GEN_DEMO_CC}) target_link_libraries(cnn_test PRIVATE gpu util procedural ${DEMO_LIBS}) add_dependencies(cnn_test generate_demo_assets) target_compile_definitions(cnn_test PRIVATE STB_IMAGE_IMPLEMENTATION STB_IMAGE_WRITE_IMPLEMENTATION) ``` **Build:** ```bash cmake -S . -B build -DDEMO_BUILD_TOOLS=ON cmake --build build -j4 ``` --- ## Validation Workflow ### 1. Ground Truth Generation ```bash # Generate ground truth from Python ./training/train_cnn.py --infer test.png \ --export-only training/checkpoints/checkpoint_epoch_5000.pth \ --output ground_truth.png ``` ### 2. Tool Inference ```bash # Run tool (always 3 layers, matching trained CNN) ./build/cnn_test test.png tool_output.png --blend 1.0 ``` ### 3. Comparison ```bash # Compare (MSE should be low) python -c " import numpy as np from PIL import Image gt = np.array(Image.open('ground_truth.png')) out = np.array(Image.open('tool_output.png')) mse = np.mean((gt.astype(float) - out.astype(float)) ** 2) print(f'MSE: {mse:.4f}') assert mse < 10.0, f'MSE too high: {mse}' " ``` --- ## Known Issues **BUG: Black output (uninitialized input texture)** - Tool produces all-black output (MSE 64860 vs ground truth) - Root cause: First intermediate texture not initialized with input image - Multi-layer processing starts with uninitialized data - Fix required: Copy input_texture → intermediate_textures[0] before layer loop --- ## Limitations - **Fixed layer count:** Cannot run partial networks (3 layers hardcoded) - **Single image:** Batch processing requires shell loop - **No real-time preview:** Offline processing only - **PNG input only:** Uses stb_image (JPEG/PNG/BMP/TGA supported) --- ## Future Enhancements - Batch processing (directory input) - Interactive preview mode - Per-layer weight inspection - Checksum validation against training checkpoints - CUDA/Metal direct backends (bypass WebGPU overhead) --- ## Technical Notes **Number of layers is fixed by trained CNN architecture:** - Defined in `cnn_weights_generated.wgsl` - Cannot meaningfully run partial networks (layer outputs have different formats/ranges) - Tool always processes full 3-layer stack **Blend parameter:** - Applied only to final layer (layer 2) - Intermediate layers always use blend=1.0 - `mix(input, cnn_output, blend_amount)` in shader **Cross-platform:** - Tested on macOS (native WebGPU) - Builds on Windows via mingw-w64 cross-compile - Linux support via native WebGPU **Size impact:** - Debug/STRIP_ALL=OFF: ~150 lines compiled - STRIP_ALL=ON: 0 bytes (entirely compiled out) - FINAL_STRIP=ON: 0 bytes (tool not built)