diff options
Diffstat (limited to 'cnn_v1/docs/CNN_TEST_TOOL.md')
| -rw-r--r-- | cnn_v1/docs/CNN_TEST_TOOL.md | 244 |
1 files changed, 244 insertions, 0 deletions
diff --git a/cnn_v1/docs/CNN_TEST_TOOL.md b/cnn_v1/docs/CNN_TEST_TOOL.md new file mode 100644 index 0000000..4307894 --- /dev/null +++ b/cnn_v1/docs/CNN_TEST_TOOL.md @@ -0,0 +1,244 @@ +# CNN Shader Testing Tool + +Standalone tool for validating trained CNN shaders with GPU-to-CPU readback. Supports both CNN v1 (render pipeline) and v2 (compute, storage buffer). + +--- + +## Purpose + +- Validate trained weights against ground truth +- Debug CNN layer behavior in isolation +- Generate test outputs for training workflow +- Match Python training script's inference mode + +--- + +## Architecture + +**Two implementations:** + +1. **CNN v1** (render pipeline, texture atlas weights) + - 3 fixed layers + - RGBA16Float intermediates + - BGRA8Unorm final output + +2. **CNN v2** (compute shaders, storage buffer weights) + - Dynamic layer count from binary + - 7D static features (RGBD + UV + sin + bias) + - RGBA32Uint packed f16 intermediates + - Storage buffer: ~3-5 KB weights + +**Core GPU utility:** `src/gpu/texture_readback.{h,cc}` +- Synchronous texture-to-CPU readback +- Supports RGBA16Float, RGBA32Uint, BGRA8Unorm +- Protected with STRIP_ALL (0 bytes in release) + +--- + +## Usage + +```bash +cnn_test input.png output.png [OPTIONS] + +OPTIONS: + --cnn-version N CNN version: 1 (default) or 2 (ignored with --weights) + --weights PATH Load weights from .bin (forces CNN v2, overrides layer config) + --blend F Final blend amount (0.0-1.0, default: 1.0) + --format ppm|png Output format (default: png) + --layers N Number of CNN layers (1-10, v1 only, default: 3, ignored with --weights) + --save-intermediates DIR Save intermediate layers to directory + --debug-hex Print first 8 pixels as hex (debug) + --help Show usage +``` + +**Examples:** +```bash +# CNN v1 (render pipeline, 3 layers) +./build/cnn_test input.png output.png --cnn-version 1 + +# CNN v2 (compute, storage buffer, uses asset system weights) +./build/cnn_test input.png output.png --cnn-version 2 + +# CNN v2 with runtime weight loading (loads layer config from .bin) +./build/cnn_test input.png output.png --weights checkpoints/checkpoint_epoch_100.pth.bin + +# 50% blend with original (v2) +./build/cnn_test input.png output.png --cnn-version 2 --blend 0.5 + +# Debug hex dump +./build/cnn_test input.png output.png --cnn-version 2 --debug-hex +``` + +**Important:** When using `--weights`, the layer count and kernel sizes are read from the binary file header, overriding any `--layers` or `--cnn-version` arguments. + +--- + +## Implementation Details + +### Core Readback Utility + +**File:** `src/gpu/texture_readback.{h,cc}` + +**Function:** +```cpp +std::vector<uint8_t> read_texture_pixels( + WGPUInstance instance, + WGPUDevice device, + WGPUTexture texture, + int width, + int height); +``` + +**Features:** +- Returns BGRA8 format (4 bytes per pixel) +- Synchronous blocking operation +- Cross-platform async callback handling (Win32 vs Native API) +- Automatic staging buffer creation and cleanup + +**Refactored OffscreenRenderTarget:** +```cpp +std::vector<uint8_t> OffscreenRenderTarget::read_pixels() { +#if !defined(STRIP_ALL) + return read_texture_pixels(instance_, device_, texture_, width_, height_); +#else + return std::vector<uint8_t>(); +#endif +} +``` + +### CNN v1 Pipeline (Render) + +**Fixed 3-layer architecture:** +- Ping-pong RGBA16Float textures +- CNNLayerParams (binding 3): layer_index, blend_amount +- Shader composer resolves #include directives + +### CNN v2 Pipeline (Compute) + +**Dynamic layer architecture:** +1. **Static features compute:** Generate 7D features (RGBD + UV + sin + bias) +2. **Layer computes:** N layers from binary weights (3-5 typically) + - Storage buffer weights (read-only) + - RGBA32Uint packed f16 textures (ping-pong) + - CNNv2LayerParams: kernel_size, channels, weight_offset, blend +3. **Readback:** RGBA32Uint → f16 decode → u8 clamp + +**Binary format:** Header (20B) + layer info (20B×N) + f16 weights + +**Weight Loading:** +- **Without `--weights`:** Loads from asset system (`ASSET_WEIGHTS_CNN_V2`) +- **With `--weights PATH`:** Loads from external `.bin` file (e.g., checkpoint exports) + - Layer count and kernel sizes parsed from binary header + - Overrides any `--layers` or `--cnn-version` arguments + - Enables runtime testing of training checkpoints without rebuild + +--- + +## Build Integration + +**CMakeLists.txt:** + +1. Added `src/gpu/texture_readback.cc` to GPU_SOURCES (both sections) +2. Tool target: +```cmake +add_executable(cnn_test + tools/cnn_test.cc + src/tests/common/webgpu_test_fixture.cc + src/tests/common/offscreen_render_target.cc + ${PLATFORM_SOURCES} + ${GEN_DEMO_CC}) + +target_link_libraries(cnn_test PRIVATE + gpu util procedural ${DEMO_LIBS}) + +add_dependencies(cnn_test generate_demo_assets) + +target_compile_definitions(cnn_test PRIVATE + STB_IMAGE_IMPLEMENTATION + STB_IMAGE_WRITE_IMPLEMENTATION) +``` + +**Build:** +```bash +cmake -S . -B build -DDEMO_BUILD_TOOLS=ON +cmake --build build -j4 +``` + +--- + +## Validation Workflow (CNN v2) + +### 1. Train and Export +```bash +# Train and export weights +./scripts/train_cnn_v2_full.sh --epochs 200 --batch-size 16 +``` + +### 2. Tool Inference +```bash +# Run tool with v2 +./build/cnn_test training/input/img_000.png output.png --cnn-version 2 +``` + +### 3. Visual Comparison +Compare output.png with training/target_X/img_000.png + +--- + +## Status + +**CNN v1:** Builds and runs, produces incorrect output (all white). Use CNNEffect in demo for visual validation. + +**CNN v2:** ⚠️ Partially functional. Readback works but output differs from HTML validation tool. +- Loads binary weights from `workspaces/main/weights/cnn_v2_weights.bin` +- Matches CNNv2Effect architecture +- **Known Issue:** Visual output differs from `tools/cnn_v2_test/index.html` despite matching shader code +- Root cause under investigation (weight indexing? texture sampling? activation clamping?) +- Use HTML tool (`tools/cnn_v2_test/index.html`) for accurate validation + +--- + +## Technical Notes (Readback Fix) + +**Original Bug:** Buffer mapping returned `WGPUMapAsyncStatus_Unknown` (status=5) + +**Root Cause:** Callback mode mismatch +- Used `WGPUCallbackMode_WaitAnyOnly` (fires only during `wgpuInstanceWaitAny`) +- Called `wgpuInstanceProcessEvents` in wait loop (wrong API for this mode) +- Callback never fired → timeout → empty buffer + +**Fix Applied:** +1. Changed callback mode to `WGPUCallbackMode_AllowProcessEvents` +2. Replaced `wgpuInstanceProcessEvents` with `wgpuDevicePoll(device, true, nullptr)` +3. Added pre-mapping device poll to ensure copy completes + +**Relevant Code:** `src/gpu/texture_readback.cc` lines 97-110 + +**Reference:** WebGPU spec - Asynchronous Operations, Callback Modes + +--- + +## Limitations + +- **CNN v1:** Produces incorrect output, use for debugging only +- **Single image:** Batch processing requires shell loop +- **No real-time preview:** Offline processing only +- **PNG input:** stb_image (JPEG/PNG/BMP/TGA also supported) + +--- + +## Technical Notes + +**CNN v2 f16 decoding:** +- RGBA32Uint texture stores 8×f16 as 4×u32 +- Custom decoder: extract u16, decode f16→f32, clamp [0,1]→u8 +- Handles denormals, infinity, NaN + +**Cross-platform:** +- macOS, Linux (native WebGPU) +- Windows (mingw-w64 cross-compile) + +**Size impact:** +- Debug/STRIP_ALL=OFF: compiled +- STRIP_ALL=ON: 0 bytes (compiled out) +- FINAL_STRIP=ON: tool not built |
