summaryrefslogtreecommitdiff
path: root/cnn_v1/docs/CNN_TEST_TOOL.md
diff options
context:
space:
mode:
Diffstat (limited to 'cnn_v1/docs/CNN_TEST_TOOL.md')
-rw-r--r--cnn_v1/docs/CNN_TEST_TOOL.md244
1 files changed, 244 insertions, 0 deletions
diff --git a/cnn_v1/docs/CNN_TEST_TOOL.md b/cnn_v1/docs/CNN_TEST_TOOL.md
new file mode 100644
index 0000000..4307894
--- /dev/null
+++ b/cnn_v1/docs/CNN_TEST_TOOL.md
@@ -0,0 +1,244 @@
+# CNN Shader Testing Tool
+
+Standalone tool for validating trained CNN shaders with GPU-to-CPU readback. Supports both CNN v1 (render pipeline) and v2 (compute, storage buffer).
+
+---
+
+## Purpose
+
+- Validate trained weights against ground truth
+- Debug CNN layer behavior in isolation
+- Generate test outputs for training workflow
+- Match Python training script's inference mode
+
+---
+
+## Architecture
+
+**Two implementations:**
+
+1. **CNN v1** (render pipeline, texture atlas weights)
+ - 3 fixed layers
+ - RGBA16Float intermediates
+ - BGRA8Unorm final output
+
+2. **CNN v2** (compute shaders, storage buffer weights)
+ - Dynamic layer count from binary
+ - 7D static features (RGBD + UV + sin + bias)
+ - RGBA32Uint packed f16 intermediates
+ - Storage buffer: ~3-5 KB weights
+
+**Core GPU utility:** `src/gpu/texture_readback.{h,cc}`
+- Synchronous texture-to-CPU readback
+- Supports RGBA16Float, RGBA32Uint, BGRA8Unorm
+- Protected with STRIP_ALL (0 bytes in release)
+
+---
+
+## Usage
+
+```bash
+cnn_test input.png output.png [OPTIONS]
+
+OPTIONS:
+ --cnn-version N CNN version: 1 (default) or 2 (ignored with --weights)
+ --weights PATH Load weights from .bin (forces CNN v2, overrides layer config)
+ --blend F Final blend amount (0.0-1.0, default: 1.0)
+ --format ppm|png Output format (default: png)
+ --layers N Number of CNN layers (1-10, v1 only, default: 3, ignored with --weights)
+ --save-intermediates DIR Save intermediate layers to directory
+ --debug-hex Print first 8 pixels as hex (debug)
+ --help Show usage
+```
+
+**Examples:**
+```bash
+# CNN v1 (render pipeline, 3 layers)
+./build/cnn_test input.png output.png --cnn-version 1
+
+# CNN v2 (compute, storage buffer, uses asset system weights)
+./build/cnn_test input.png output.png --cnn-version 2
+
+# CNN v2 with runtime weight loading (loads layer config from .bin)
+./build/cnn_test input.png output.png --weights checkpoints/checkpoint_epoch_100.pth.bin
+
+# 50% blend with original (v2)
+./build/cnn_test input.png output.png --cnn-version 2 --blend 0.5
+
+# Debug hex dump
+./build/cnn_test input.png output.png --cnn-version 2 --debug-hex
+```
+
+**Important:** When using `--weights`, the layer count and kernel sizes are read from the binary file header, overriding any `--layers` or `--cnn-version` arguments.
+
+---
+
+## Implementation Details
+
+### Core Readback Utility
+
+**File:** `src/gpu/texture_readback.{h,cc}`
+
+**Function:**
+```cpp
+std::vector<uint8_t> read_texture_pixels(
+ WGPUInstance instance,
+ WGPUDevice device,
+ WGPUTexture texture,
+ int width,
+ int height);
+```
+
+**Features:**
+- Returns BGRA8 format (4 bytes per pixel)
+- Synchronous blocking operation
+- Cross-platform async callback handling (Win32 vs Native API)
+- Automatic staging buffer creation and cleanup
+
+**Refactored OffscreenRenderTarget:**
+```cpp
+std::vector<uint8_t> OffscreenRenderTarget::read_pixels() {
+#if !defined(STRIP_ALL)
+ return read_texture_pixels(instance_, device_, texture_, width_, height_);
+#else
+ return std::vector<uint8_t>();
+#endif
+}
+```
+
+### CNN v1 Pipeline (Render)
+
+**Fixed 3-layer architecture:**
+- Ping-pong RGBA16Float textures
+- CNNLayerParams (binding 3): layer_index, blend_amount
+- Shader composer resolves #include directives
+
+### CNN v2 Pipeline (Compute)
+
+**Dynamic layer architecture:**
+1. **Static features compute:** Generate 7D features (RGBD + UV + sin + bias)
+2. **Layer computes:** N layers from binary weights (3-5 typically)
+ - Storage buffer weights (read-only)
+ - RGBA32Uint packed f16 textures (ping-pong)
+ - CNNv2LayerParams: kernel_size, channels, weight_offset, blend
+3. **Readback:** RGBA32Uint → f16 decode → u8 clamp
+
+**Binary format:** Header (20B) + layer info (20B×N) + f16 weights
+
+**Weight Loading:**
+- **Without `--weights`:** Loads from asset system (`ASSET_WEIGHTS_CNN_V2`)
+- **With `--weights PATH`:** Loads from external `.bin` file (e.g., checkpoint exports)
+ - Layer count and kernel sizes parsed from binary header
+ - Overrides any `--layers` or `--cnn-version` arguments
+ - Enables runtime testing of training checkpoints without rebuild
+
+---
+
+## Build Integration
+
+**CMakeLists.txt:**
+
+1. Added `src/gpu/texture_readback.cc` to GPU_SOURCES (both sections)
+2. Tool target:
+```cmake
+add_executable(cnn_test
+ tools/cnn_test.cc
+ src/tests/common/webgpu_test_fixture.cc
+ src/tests/common/offscreen_render_target.cc
+ ${PLATFORM_SOURCES}
+ ${GEN_DEMO_CC})
+
+target_link_libraries(cnn_test PRIVATE
+ gpu util procedural ${DEMO_LIBS})
+
+add_dependencies(cnn_test generate_demo_assets)
+
+target_compile_definitions(cnn_test PRIVATE
+ STB_IMAGE_IMPLEMENTATION
+ STB_IMAGE_WRITE_IMPLEMENTATION)
+```
+
+**Build:**
+```bash
+cmake -S . -B build -DDEMO_BUILD_TOOLS=ON
+cmake --build build -j4
+```
+
+---
+
+## Validation Workflow (CNN v2)
+
+### 1. Train and Export
+```bash
+# Train and export weights
+./scripts/train_cnn_v2_full.sh --epochs 200 --batch-size 16
+```
+
+### 2. Tool Inference
+```bash
+# Run tool with v2
+./build/cnn_test training/input/img_000.png output.png --cnn-version 2
+```
+
+### 3. Visual Comparison
+Compare output.png with training/target_X/img_000.png
+
+---
+
+## Status
+
+**CNN v1:** Builds and runs, produces incorrect output (all white). Use CNNEffect in demo for visual validation.
+
+**CNN v2:** ⚠️ Partially functional. Readback works but output differs from HTML validation tool.
+- Loads binary weights from `workspaces/main/weights/cnn_v2_weights.bin`
+- Matches CNNv2Effect architecture
+- **Known Issue:** Visual output differs from `tools/cnn_v2_test/index.html` despite matching shader code
+- Root cause under investigation (weight indexing? texture sampling? activation clamping?)
+- Use HTML tool (`tools/cnn_v2_test/index.html`) for accurate validation
+
+---
+
+## Technical Notes (Readback Fix)
+
+**Original Bug:** Buffer mapping returned `WGPUMapAsyncStatus_Unknown` (status=5)
+
+**Root Cause:** Callback mode mismatch
+- Used `WGPUCallbackMode_WaitAnyOnly` (fires only during `wgpuInstanceWaitAny`)
+- Called `wgpuInstanceProcessEvents` in wait loop (wrong API for this mode)
+- Callback never fired → timeout → empty buffer
+
+**Fix Applied:**
+1. Changed callback mode to `WGPUCallbackMode_AllowProcessEvents`
+2. Replaced `wgpuInstanceProcessEvents` with `wgpuDevicePoll(device, true, nullptr)`
+3. Added pre-mapping device poll to ensure copy completes
+
+**Relevant Code:** `src/gpu/texture_readback.cc` lines 97-110
+
+**Reference:** WebGPU spec - Asynchronous Operations, Callback Modes
+
+---
+
+## Limitations
+
+- **CNN v1:** Produces incorrect output, use for debugging only
+- **Single image:** Batch processing requires shell loop
+- **No real-time preview:** Offline processing only
+- **PNG input:** stb_image (JPEG/PNG/BMP/TGA also supported)
+
+---
+
+## Technical Notes
+
+**CNN v2 f16 decoding:**
+- RGBA32Uint texture stores 8×f16 as 4×u32
+- Custom decoder: extract u16, decode f16→f32, clamp [0,1]→u8
+- Handles denormals, infinity, NaN
+
+**Cross-platform:**
+- macOS, Linux (native WebGPU)
+- Windows (mingw-w64 cross-compile)
+
+**Size impact:**
+- Debug/STRIP_ALL=OFF: compiled
+- STRIP_ALL=ON: 0 bytes (compiled out)
+- FINAL_STRIP=ON: tool not built