# CNN Shader Testing Tool Standalone tool for validating trained CNN shaders with GPU-to-CPU readback. Supports both CNN v1 (render pipeline) and v2 (compute, storage buffer). --- ## Purpose - Validate trained weights against ground truth - Debug CNN layer behavior in isolation - Generate test outputs for training workflow - Match Python training script's inference mode --- ## Architecture **Two implementations:** 1. **CNN v1** (render pipeline, texture atlas weights) - 3 fixed layers - RGBA16Float intermediates - BGRA8Unorm final output 2. **CNN v2** (compute shaders, storage buffer weights) - Dynamic layer count from binary - 7D static features (RGBD + UV + sin + bias) - RGBA32Uint packed f16 intermediates - Storage buffer: ~3-5 KB weights **Core GPU utility:** `src/gpu/texture_readback.{h,cc}` - Synchronous texture-to-CPU readback - Supports RGBA16Float, RGBA32Uint, BGRA8Unorm - Protected with STRIP_ALL (0 bytes in release) --- ## Usage ```bash cnn_test input.png output.png [OPTIONS] OPTIONS: --cnn-version N CNN version: 1 (default) or 2 --blend F Final blend amount (0.0-1.0, default: 1.0) --format ppm|png Output format (default: png) --layers N Number of CNN layers (1-10, v1 only, default: 3) --save-intermediates DIR Save intermediate layers to directory --debug-hex Print first 8 pixels as hex (debug) --help Show usage ``` **Examples:** ```bash # CNN v1 (render pipeline, 3 layers) ./build/cnn_test input.png output.png --cnn-version 1 # CNN v2 (compute, storage buffer, dynamic layers) ./build/cnn_test input.png output.png --cnn-version 2 # 50% blend with original (v2) ./build/cnn_test input.png output.png --cnn-version 2 --blend 0.5 # Debug hex dump ./build/cnn_test input.png output.png --cnn-version 2 --debug-hex ``` --- ## Implementation Details ### Core Readback Utility **File:** `src/gpu/texture_readback.{h,cc}` **Function:** ```cpp std::vector read_texture_pixels( WGPUInstance instance, WGPUDevice device, WGPUTexture texture, int width, int height); ``` **Features:** - Returns BGRA8 format (4 bytes per pixel) - Synchronous blocking operation - Cross-platform async callback handling (Win32 vs Native API) - Automatic staging buffer creation and cleanup **Refactored OffscreenRenderTarget:** ```cpp std::vector OffscreenRenderTarget::read_pixels() { #if !defined(STRIP_ALL) return read_texture_pixels(instance_, device_, texture_, width_, height_); #else return std::vector(); #endif } ``` ### CNN v1 Pipeline (Render) **Fixed 3-layer architecture:** - Ping-pong RGBA16Float textures - CNNLayerParams (binding 3): layer_index, blend_amount - Shader composer resolves #include directives ### CNN v2 Pipeline (Compute) **Dynamic layer architecture:** 1. **Static features compute:** Generate 7D features (RGBD + UV + sin + bias) 2. **Layer computes:** N layers from binary weights (3-5 typically) - Storage buffer weights (read-only) - RGBA32Uint packed f16 textures (ping-pong) - CNNv2LayerParams: kernel_size, channels, weight_offset, blend 3. **Readback:** RGBA32Uint → f16 decode → u8 clamp **Binary format:** Header (20B) + layer info (20B×N) + f16 weights --- ## Build Integration **CMakeLists.txt:** 1. Added `src/gpu/texture_readback.cc` to GPU_SOURCES (both sections) 2. Tool target: ```cmake add_executable(cnn_test tools/cnn_test.cc src/tests/common/webgpu_test_fixture.cc src/tests/common/offscreen_render_target.cc ${PLATFORM_SOURCES} ${GEN_DEMO_CC}) target_link_libraries(cnn_test PRIVATE gpu util procedural ${DEMO_LIBS}) add_dependencies(cnn_test generate_demo_assets) target_compile_definitions(cnn_test PRIVATE STB_IMAGE_IMPLEMENTATION STB_IMAGE_WRITE_IMPLEMENTATION) ``` **Build:** ```bash cmake -S . -B build -DDEMO_BUILD_TOOLS=ON cmake --build build -j4 ``` --- ## Validation Workflow (CNN v2) ### 1. Train and Export ```bash # Train and export weights ./scripts/train_cnn_v2_full.sh --epochs 200 --batch-size 16 ``` ### 2. Tool Inference ```bash # Run tool with v2 ./build/cnn_test training/input/img_000.png output.png --cnn-version 2 ``` ### 3. Visual Comparison Compare output.png with training/target_X/img_000.png --- ## Status **CNN v1:** Builds and runs, produces incorrect output (all white). Use CNNEffect in demo for visual validation. **CNN v2:** ✅ Fully functional. Tested and working. - Loads binary weights from `workspaces/main/weights/cnn_v2_weights.bin` - Matches CNNv2Effect architecture - Produces correct output - Recommended for validation --- ## Technical Notes (Readback Fix) **Original Bug:** Buffer mapping returned `WGPUMapAsyncStatus_Unknown` (status=5) **Root Cause:** Callback mode mismatch - Used `WGPUCallbackMode_WaitAnyOnly` (fires only during `wgpuInstanceWaitAny`) - Called `wgpuInstanceProcessEvents` in wait loop (wrong API for this mode) - Callback never fired → timeout → empty buffer **Fix Applied:** 1. Changed callback mode to `WGPUCallbackMode_AllowProcessEvents` 2. Replaced `wgpuInstanceProcessEvents` with `wgpuDevicePoll(device, true, nullptr)` 3. Added pre-mapping device poll to ensure copy completes **Relevant Code:** `src/gpu/texture_readback.cc` lines 97-110 **Reference:** WebGPU spec - Asynchronous Operations, Callback Modes --- ## Limitations - **CNN v1:** Produces incorrect output, use for debugging only - **Single image:** Batch processing requires shell loop - **No real-time preview:** Offline processing only - **PNG input:** stb_image (JPEG/PNG/BMP/TGA also supported) --- ## Technical Notes **CNN v2 f16 decoding:** - RGBA32Uint texture stores 8×f16 as 4×u32 - Custom decoder: extract u16, decode f16→f32, clamp [0,1]→u8 - Handles denormals, infinity, NaN **Cross-platform:** - macOS, Linux (native WebGPU) - Windows (mingw-w64 cross-compile) **Size impact:** - Debug/STRIP_ALL=OFF: compiled - STRIP_ALL=ON: 0 bytes (compiled out) - FINAL_STRIP=ON: tool not built