summaryrefslogtreecommitdiff
path: root/doc/CNN_TEST_TOOL.md
diff options
context:
space:
mode:
Diffstat (limited to 'doc/CNN_TEST_TOOL.md')
-rw-r--r--doc/CNN_TEST_TOOL.md244
1 files changed, 0 insertions, 244 deletions
diff --git a/doc/CNN_TEST_TOOL.md b/doc/CNN_TEST_TOOL.md
deleted file mode 100644
index 4307894..0000000
--- a/doc/CNN_TEST_TOOL.md
+++ /dev/null
@@ -1,244 +0,0 @@
-# CNN Shader Testing Tool
-
-Standalone tool for validating trained CNN shaders with GPU-to-CPU readback. Supports both CNN v1 (render pipeline) and v2 (compute, storage buffer).
-
----
-
-## Purpose
-
-- Validate trained weights against ground truth
-- Debug CNN layer behavior in isolation
-- Generate test outputs for training workflow
-- Match Python training script's inference mode
-
----
-
-## Architecture
-
-**Two implementations:**
-
-1. **CNN v1** (render pipeline, texture atlas weights)
- - 3 fixed layers
- - RGBA16Float intermediates
- - BGRA8Unorm final output
-
-2. **CNN v2** (compute shaders, storage buffer weights)
- - Dynamic layer count from binary
- - 7D static features (RGBD + UV + sin + bias)
- - RGBA32Uint packed f16 intermediates
- - Storage buffer: ~3-5 KB weights
-
-**Core GPU utility:** `src/gpu/texture_readback.{h,cc}`
-- Synchronous texture-to-CPU readback
-- Supports RGBA16Float, RGBA32Uint, BGRA8Unorm
-- Protected with STRIP_ALL (0 bytes in release)
-
----
-
-## Usage
-
-```bash
-cnn_test input.png output.png [OPTIONS]
-
-OPTIONS:
- --cnn-version N CNN version: 1 (default) or 2 (ignored with --weights)
- --weights PATH Load weights from .bin (forces CNN v2, overrides layer config)
- --blend F Final blend amount (0.0-1.0, default: 1.0)
- --format ppm|png Output format (default: png)
- --layers N Number of CNN layers (1-10, v1 only, default: 3, ignored with --weights)
- --save-intermediates DIR Save intermediate layers to directory
- --debug-hex Print first 8 pixels as hex (debug)
- --help Show usage
-```
-
-**Examples:**
-```bash
-# CNN v1 (render pipeline, 3 layers)
-./build/cnn_test input.png output.png --cnn-version 1
-
-# CNN v2 (compute, storage buffer, uses asset system weights)
-./build/cnn_test input.png output.png --cnn-version 2
-
-# CNN v2 with runtime weight loading (loads layer config from .bin)
-./build/cnn_test input.png output.png --weights checkpoints/checkpoint_epoch_100.pth.bin
-
-# 50% blend with original (v2)
-./build/cnn_test input.png output.png --cnn-version 2 --blend 0.5
-
-# Debug hex dump
-./build/cnn_test input.png output.png --cnn-version 2 --debug-hex
-```
-
-**Important:** When using `--weights`, the layer count and kernel sizes are read from the binary file header, overriding any `--layers` or `--cnn-version` arguments.
-
----
-
-## Implementation Details
-
-### Core Readback Utility
-
-**File:** `src/gpu/texture_readback.{h,cc}`
-
-**Function:**
-```cpp
-std::vector<uint8_t> read_texture_pixels(
- WGPUInstance instance,
- WGPUDevice device,
- WGPUTexture texture,
- int width,
- int height);
-```
-
-**Features:**
-- Returns BGRA8 format (4 bytes per pixel)
-- Synchronous blocking operation
-- Cross-platform async callback handling (Win32 vs Native API)
-- Automatic staging buffer creation and cleanup
-
-**Refactored OffscreenRenderTarget:**
-```cpp
-std::vector<uint8_t> OffscreenRenderTarget::read_pixels() {
-#if !defined(STRIP_ALL)
- return read_texture_pixels(instance_, device_, texture_, width_, height_);
-#else
- return std::vector<uint8_t>();
-#endif
-}
-```
-
-### CNN v1 Pipeline (Render)
-
-**Fixed 3-layer architecture:**
-- Ping-pong RGBA16Float textures
-- CNNLayerParams (binding 3): layer_index, blend_amount
-- Shader composer resolves #include directives
-
-### CNN v2 Pipeline (Compute)
-
-**Dynamic layer architecture:**
-1. **Static features compute:** Generate 7D features (RGBD + UV + sin + bias)
-2. **Layer computes:** N layers from binary weights (3-5 typically)
- - Storage buffer weights (read-only)
- - RGBA32Uint packed f16 textures (ping-pong)
- - CNNv2LayerParams: kernel_size, channels, weight_offset, blend
-3. **Readback:** RGBA32Uint → f16 decode → u8 clamp
-
-**Binary format:** Header (20B) + layer info (20B×N) + f16 weights
-
-**Weight Loading:**
-- **Without `--weights`:** Loads from asset system (`ASSET_WEIGHTS_CNN_V2`)
-- **With `--weights PATH`:** Loads from external `.bin` file (e.g., checkpoint exports)
- - Layer count and kernel sizes parsed from binary header
- - Overrides any `--layers` or `--cnn-version` arguments
- - Enables runtime testing of training checkpoints without rebuild
-
----
-
-## Build Integration
-
-**CMakeLists.txt:**
-
-1. Added `src/gpu/texture_readback.cc` to GPU_SOURCES (both sections)
-2. Tool target:
-```cmake
-add_executable(cnn_test
- tools/cnn_test.cc
- src/tests/common/webgpu_test_fixture.cc
- src/tests/common/offscreen_render_target.cc
- ${PLATFORM_SOURCES}
- ${GEN_DEMO_CC})
-
-target_link_libraries(cnn_test PRIVATE
- gpu util procedural ${DEMO_LIBS})
-
-add_dependencies(cnn_test generate_demo_assets)
-
-target_compile_definitions(cnn_test PRIVATE
- STB_IMAGE_IMPLEMENTATION
- STB_IMAGE_WRITE_IMPLEMENTATION)
-```
-
-**Build:**
-```bash
-cmake -S . -B build -DDEMO_BUILD_TOOLS=ON
-cmake --build build -j4
-```
-
----
-
-## Validation Workflow (CNN v2)
-
-### 1. Train and Export
-```bash
-# Train and export weights
-./scripts/train_cnn_v2_full.sh --epochs 200 --batch-size 16
-```
-
-### 2. Tool Inference
-```bash
-# Run tool with v2
-./build/cnn_test training/input/img_000.png output.png --cnn-version 2
-```
-
-### 3. Visual Comparison
-Compare output.png with training/target_X/img_000.png
-
----
-
-## Status
-
-**CNN v1:** Builds and runs, produces incorrect output (all white). Use CNNEffect in demo for visual validation.
-
-**CNN v2:** ⚠️ Partially functional. Readback works but output differs from HTML validation tool.
-- Loads binary weights from `workspaces/main/weights/cnn_v2_weights.bin`
-- Matches CNNv2Effect architecture
-- **Known Issue:** Visual output differs from `tools/cnn_v2_test/index.html` despite matching shader code
-- Root cause under investigation (weight indexing? texture sampling? activation clamping?)
-- Use HTML tool (`tools/cnn_v2_test/index.html`) for accurate validation
-
----
-
-## Technical Notes (Readback Fix)
-
-**Original Bug:** Buffer mapping returned `WGPUMapAsyncStatus_Unknown` (status=5)
-
-**Root Cause:** Callback mode mismatch
-- Used `WGPUCallbackMode_WaitAnyOnly` (fires only during `wgpuInstanceWaitAny`)
-- Called `wgpuInstanceProcessEvents` in wait loop (wrong API for this mode)
-- Callback never fired → timeout → empty buffer
-
-**Fix Applied:**
-1. Changed callback mode to `WGPUCallbackMode_AllowProcessEvents`
-2. Replaced `wgpuInstanceProcessEvents` with `wgpuDevicePoll(device, true, nullptr)`
-3. Added pre-mapping device poll to ensure copy completes
-
-**Relevant Code:** `src/gpu/texture_readback.cc` lines 97-110
-
-**Reference:** WebGPU spec - Asynchronous Operations, Callback Modes
-
----
-
-## Limitations
-
-- **CNN v1:** Produces incorrect output, use for debugging only
-- **Single image:** Batch processing requires shell loop
-- **No real-time preview:** Offline processing only
-- **PNG input:** stb_image (JPEG/PNG/BMP/TGA also supported)
-
----
-
-## Technical Notes
-
-**CNN v2 f16 decoding:**
-- RGBA32Uint texture stores 8×f16 as 4×u32
-- Custom decoder: extract u16, decode f16→f32, clamp [0,1]→u8
-- Handles denormals, infinity, NaN
-
-**Cross-platform:**
-- macOS, Linux (native WebGPU)
-- Windows (mingw-w64 cross-compile)
-
-**Size impact:**
-- Debug/STRIP_ALL=OFF: compiled
-- STRIP_ALL=ON: 0 bytes (compiled out)
-- FINAL_STRIP=ON: tool not built