diff options
Diffstat (limited to 'doc/CNN_TEST_TOOL.md')
| -rw-r--r-- | doc/CNN_TEST_TOOL.md | 244 |
1 files changed, 0 insertions, 244 deletions
diff --git a/doc/CNN_TEST_TOOL.md b/doc/CNN_TEST_TOOL.md deleted file mode 100644 index 4307894..0000000 --- a/doc/CNN_TEST_TOOL.md +++ /dev/null @@ -1,244 +0,0 @@ -# CNN Shader Testing Tool - -Standalone tool for validating trained CNN shaders with GPU-to-CPU readback. Supports both CNN v1 (render pipeline) and v2 (compute, storage buffer). - ---- - -## Purpose - -- Validate trained weights against ground truth -- Debug CNN layer behavior in isolation -- Generate test outputs for training workflow -- Match Python training script's inference mode - ---- - -## Architecture - -**Two implementations:** - -1. **CNN v1** (render pipeline, texture atlas weights) - - 3 fixed layers - - RGBA16Float intermediates - - BGRA8Unorm final output - -2. **CNN v2** (compute shaders, storage buffer weights) - - Dynamic layer count from binary - - 7D static features (RGBD + UV + sin + bias) - - RGBA32Uint packed f16 intermediates - - Storage buffer: ~3-5 KB weights - -**Core GPU utility:** `src/gpu/texture_readback.{h,cc}` -- Synchronous texture-to-CPU readback -- Supports RGBA16Float, RGBA32Uint, BGRA8Unorm -- Protected with STRIP_ALL (0 bytes in release) - ---- - -## Usage - -```bash -cnn_test input.png output.png [OPTIONS] - -OPTIONS: - --cnn-version N CNN version: 1 (default) or 2 (ignored with --weights) - --weights PATH Load weights from .bin (forces CNN v2, overrides layer config) - --blend F Final blend amount (0.0-1.0, default: 1.0) - --format ppm|png Output format (default: png) - --layers N Number of CNN layers (1-10, v1 only, default: 3, ignored with --weights) - --save-intermediates DIR Save intermediate layers to directory - --debug-hex Print first 8 pixels as hex (debug) - --help Show usage -``` - -**Examples:** -```bash -# CNN v1 (render pipeline, 3 layers) -./build/cnn_test input.png output.png --cnn-version 1 - -# CNN v2 (compute, storage buffer, uses asset system weights) -./build/cnn_test input.png output.png --cnn-version 2 - -# CNN v2 with runtime weight loading (loads layer config from .bin) -./build/cnn_test input.png output.png --weights checkpoints/checkpoint_epoch_100.pth.bin - -# 50% blend with original (v2) -./build/cnn_test input.png output.png --cnn-version 2 --blend 0.5 - -# Debug hex dump -./build/cnn_test input.png output.png --cnn-version 2 --debug-hex -``` - -**Important:** When using `--weights`, the layer count and kernel sizes are read from the binary file header, overriding any `--layers` or `--cnn-version` arguments. - ---- - -## Implementation Details - -### Core Readback Utility - -**File:** `src/gpu/texture_readback.{h,cc}` - -**Function:** -```cpp -std::vector<uint8_t> read_texture_pixels( - WGPUInstance instance, - WGPUDevice device, - WGPUTexture texture, - int width, - int height); -``` - -**Features:** -- Returns BGRA8 format (4 bytes per pixel) -- Synchronous blocking operation -- Cross-platform async callback handling (Win32 vs Native API) -- Automatic staging buffer creation and cleanup - -**Refactored OffscreenRenderTarget:** -```cpp -std::vector<uint8_t> OffscreenRenderTarget::read_pixels() { -#if !defined(STRIP_ALL) - return read_texture_pixels(instance_, device_, texture_, width_, height_); -#else - return std::vector<uint8_t>(); -#endif -} -``` - -### CNN v1 Pipeline (Render) - -**Fixed 3-layer architecture:** -- Ping-pong RGBA16Float textures -- CNNLayerParams (binding 3): layer_index, blend_amount -- Shader composer resolves #include directives - -### CNN v2 Pipeline (Compute) - -**Dynamic layer architecture:** -1. **Static features compute:** Generate 7D features (RGBD + UV + sin + bias) -2. **Layer computes:** N layers from binary weights (3-5 typically) - - Storage buffer weights (read-only) - - RGBA32Uint packed f16 textures (ping-pong) - - CNNv2LayerParams: kernel_size, channels, weight_offset, blend -3. **Readback:** RGBA32Uint → f16 decode → u8 clamp - -**Binary format:** Header (20B) + layer info (20B×N) + f16 weights - -**Weight Loading:** -- **Without `--weights`:** Loads from asset system (`ASSET_WEIGHTS_CNN_V2`) -- **With `--weights PATH`:** Loads from external `.bin` file (e.g., checkpoint exports) - - Layer count and kernel sizes parsed from binary header - - Overrides any `--layers` or `--cnn-version` arguments - - Enables runtime testing of training checkpoints without rebuild - ---- - -## Build Integration - -**CMakeLists.txt:** - -1. Added `src/gpu/texture_readback.cc` to GPU_SOURCES (both sections) -2. Tool target: -```cmake -add_executable(cnn_test - tools/cnn_test.cc - src/tests/common/webgpu_test_fixture.cc - src/tests/common/offscreen_render_target.cc - ${PLATFORM_SOURCES} - ${GEN_DEMO_CC}) - -target_link_libraries(cnn_test PRIVATE - gpu util procedural ${DEMO_LIBS}) - -add_dependencies(cnn_test generate_demo_assets) - -target_compile_definitions(cnn_test PRIVATE - STB_IMAGE_IMPLEMENTATION - STB_IMAGE_WRITE_IMPLEMENTATION) -``` - -**Build:** -```bash -cmake -S . -B build -DDEMO_BUILD_TOOLS=ON -cmake --build build -j4 -``` - ---- - -## Validation Workflow (CNN v2) - -### 1. Train and Export -```bash -# Train and export weights -./scripts/train_cnn_v2_full.sh --epochs 200 --batch-size 16 -``` - -### 2. Tool Inference -```bash -# Run tool with v2 -./build/cnn_test training/input/img_000.png output.png --cnn-version 2 -``` - -### 3. Visual Comparison -Compare output.png with training/target_X/img_000.png - ---- - -## Status - -**CNN v1:** Builds and runs, produces incorrect output (all white). Use CNNEffect in demo for visual validation. - -**CNN v2:** ⚠️ Partially functional. Readback works but output differs from HTML validation tool. -- Loads binary weights from `workspaces/main/weights/cnn_v2_weights.bin` -- Matches CNNv2Effect architecture -- **Known Issue:** Visual output differs from `tools/cnn_v2_test/index.html` despite matching shader code -- Root cause under investigation (weight indexing? texture sampling? activation clamping?) -- Use HTML tool (`tools/cnn_v2_test/index.html`) for accurate validation - ---- - -## Technical Notes (Readback Fix) - -**Original Bug:** Buffer mapping returned `WGPUMapAsyncStatus_Unknown` (status=5) - -**Root Cause:** Callback mode mismatch -- Used `WGPUCallbackMode_WaitAnyOnly` (fires only during `wgpuInstanceWaitAny`) -- Called `wgpuInstanceProcessEvents` in wait loop (wrong API for this mode) -- Callback never fired → timeout → empty buffer - -**Fix Applied:** -1. Changed callback mode to `WGPUCallbackMode_AllowProcessEvents` -2. Replaced `wgpuInstanceProcessEvents` with `wgpuDevicePoll(device, true, nullptr)` -3. Added pre-mapping device poll to ensure copy completes - -**Relevant Code:** `src/gpu/texture_readback.cc` lines 97-110 - -**Reference:** WebGPU spec - Asynchronous Operations, Callback Modes - ---- - -## Limitations - -- **CNN v1:** Produces incorrect output, use for debugging only -- **Single image:** Batch processing requires shell loop -- **No real-time preview:** Offline processing only -- **PNG input:** stb_image (JPEG/PNG/BMP/TGA also supported) - ---- - -## Technical Notes - -**CNN v2 f16 decoding:** -- RGBA32Uint texture stores 8×f16 as 4×u32 -- Custom decoder: extract u16, decode f16→f32, clamp [0,1]→u8 -- Handles denormals, infinity, NaN - -**Cross-platform:** -- macOS, Linux (native WebGPU) -- Windows (mingw-w64 cross-compile) - -**Size impact:** -- Debug/STRIP_ALL=OFF: compiled -- STRIP_ALL=ON: 0 bytes (compiled out) -- FINAL_STRIP=ON: tool not built |
