diff options
Diffstat (limited to 'doc')
| -rw-r--r-- | doc/CNN_TEST_TOOL.md | 188 | ||||
| -rw-r--r-- | doc/CNN_V2.md | 22 | ||||
| -rw-r--r-- | doc/CNN_V2_DEBUG_TOOLS.md | 136 | ||||
| -rw-r--r-- | doc/CNN_V2_WEB_TOOL.md | 4 | ||||
| -rw-r--r-- | doc/HOWTO.md | 74 |
5 files changed, 290 insertions, 134 deletions
diff --git a/doc/CNN_TEST_TOOL.md b/doc/CNN_TEST_TOOL.md index e7d679e..4307894 100644 --- a/doc/CNN_TEST_TOOL.md +++ b/doc/CNN_TEST_TOOL.md @@ -1,31 +1,37 @@ # CNN Shader Testing Tool -Standalone tool for validating trained CNN shaders with GPU-to-CPU readback. +Standalone tool for validating trained CNN shaders with GPU-to-CPU readback. Supports both CNN v1 (render pipeline) and v2 (compute, storage buffer). --- ## Purpose -- Validate trained weights (`cnn_weights_generated.wgsl`) against ground truth +- Validate trained weights against ground truth - Debug CNN layer behavior in isolation -- Generate test outputs for patch-based training workflow -- Match Python training script's inference mode (`train_cnn.py --infer`) +- Generate test outputs for training workflow +- Match Python training script's inference mode --- ## Architecture -**Two-part implementation:** +**Two implementations:** -1. **Core GPU utility:** `src/gpu/texture_readback.{h,cc}` (~150 lines) - - Synchronous texture-to-CPU readback - - Reusable for screenshots, validation, video export - - Protected with STRIP_ALL (0 bytes in release builds) +1. **CNN v1** (render pipeline, texture atlas weights) + - 3 fixed layers + - RGBA16Float intermediates + - BGRA8Unorm final output -2. **Standalone tool:** `tools/cnn_test.cc` (~450 lines) - - Custom CNN inference pipeline - - No MainSequence dependency - - Asset-based shader loading with automatic include resolution +2. **CNN v2** (compute shaders, storage buffer weights) + - Dynamic layer count from binary + - 7D static features (RGBD + UV + sin + bias) + - RGBA32Uint packed f16 intermediates + - Storage buffer: ~3-5 KB weights + +**Core GPU utility:** `src/gpu/texture_readback.{h,cc}` +- Synchronous texture-to-CPU readback +- Supports RGBA16Float, RGBA32Uint, BGRA8Unorm +- Protected with STRIP_ALL (0 bytes in release) --- @@ -35,26 +41,36 @@ Standalone tool for validating trained CNN shaders with GPU-to-CPU readback. cnn_test input.png output.png [OPTIONS] OPTIONS: - --blend F Final blend amount (0.0-1.0, default: 1.0) - --format ppm|png Output format (default: png) - --help Show usage + --cnn-version N CNN version: 1 (default) or 2 (ignored with --weights) + --weights PATH Load weights from .bin (forces CNN v2, overrides layer config) + --blend F Final blend amount (0.0-1.0, default: 1.0) + --format ppm|png Output format (default: png) + --layers N Number of CNN layers (1-10, v1 only, default: 3, ignored with --weights) + --save-intermediates DIR Save intermediate layers to directory + --debug-hex Print first 8 pixels as hex (debug) + --help Show usage ``` **Examples:** ```bash -# Full CNN processing -./build/cnn_test input.png output.png +# CNN v1 (render pipeline, 3 layers) +./build/cnn_test input.png output.png --cnn-version 1 + +# CNN v2 (compute, storage buffer, uses asset system weights) +./build/cnn_test input.png output.png --cnn-version 2 -# 50% blend with original -./build/cnn_test input.png output.png --blend 0.5 +# CNN v2 with runtime weight loading (loads layer config from .bin) +./build/cnn_test input.png output.png --weights checkpoints/checkpoint_epoch_100.pth.bin -# No CNN effect (original passthrough) -./build/cnn_test input.png output.png --blend 0.0 +# 50% blend with original (v2) +./build/cnn_test input.png output.png --cnn-version 2 --blend 0.5 -# PPM output format -./build/cnn_test input.png output.ppm --format ppm +# Debug hex dump +./build/cnn_test input.png output.png --cnn-version 2 --debug-hex ``` +**Important:** When using `--weights`, the layer count and kernel sizes are read from the binary file header, overriding any `--layers` or `--cnn-version` arguments. + --- ## Implementation Details @@ -90,25 +106,31 @@ std::vector<uint8_t> OffscreenRenderTarget::read_pixels() { } ``` -### CNN Processing Pipeline +### CNN v1 Pipeline (Render) -**Fixed 3-layer architecture** (matches trained CNN): -1. Layer 0: Initial convolution -2. Layer 1: Intermediate convolution -3. Layer 2: Final convolution + blend with original +**Fixed 3-layer architecture:** +- Ping-pong RGBA16Float textures +- CNNLayerParams (binding 3): layer_index, blend_amount +- Shader composer resolves #include directives -**Ping-pong textures:** -- 2 intermediate render targets -- 1 original input reference (binding 4) +### CNN v2 Pipeline (Compute) -**Uniforms:** -- `CommonPostProcessUniforms` (binding 2): resolution, aspect_ratio, time, beat, audio_intensity -- `CNNLayerParams` (binding 3): layer_index, blend_amount +**Dynamic layer architecture:** +1. **Static features compute:** Generate 7D features (RGBD + UV + sin + bias) +2. **Layer computes:** N layers from binary weights (3-5 typically) + - Storage buffer weights (read-only) + - RGBA32Uint packed f16 textures (ping-pong) + - CNNv2LayerParams: kernel_size, channels, weight_offset, blend +3. **Readback:** RGBA32Uint → f16 decode → u8 clamp -**Shader composition:** -- Uses `ShaderComposer::Get()` via `RenderPipelineBuilder` -- Automatically resolves `#include` directives -- Registers CNN snippets: activation, conv3×3, conv5×5, weights +**Binary format:** Header (20B) + layer info (20B×N) + f16 weights + +**Weight Loading:** +- **Without `--weights`:** Loads from asset system (`ASSET_WEIGHTS_CNN_V2`) +- **With `--weights PATH`:** Loads from external `.bin` file (e.g., checkpoint exports) + - Layer count and kernel sizes parsed from binary header + - Overrides any `--layers` or `--cnn-version` arguments + - Enables runtime testing of training checkpoints without rebuild --- @@ -144,51 +166,35 @@ cmake --build build -j4 --- -## Validation Workflow +## Validation Workflow (CNN v2) -### 1. Ground Truth Generation +### 1. Train and Export ```bash -# Generate ground truth from Python -./training/train_cnn.py --infer test.png \ - --export-only training/checkpoints/checkpoint_epoch_5000.pth \ - --output ground_truth.png +# Train and export weights +./scripts/train_cnn_v2_full.sh --epochs 200 --batch-size 16 ``` ### 2. Tool Inference ```bash -# Run tool (always 3 layers, matching trained CNN) -./build/cnn_test test.png tool_output.png --blend 1.0 +# Run tool with v2 +./build/cnn_test training/input/img_000.png output.png --cnn-version 2 ``` -### 3. Comparison -```bash -# Compare (MSE should be low) -python -c " -import numpy as np -from PIL import Image -gt = np.array(Image.open('ground_truth.png')) -out = np.array(Image.open('tool_output.png')) -mse = np.mean((gt.astype(float) - out.astype(float)) ** 2) -print(f'MSE: {mse:.4f}') -assert mse < 10.0, f'MSE too high: {mse}' -" -``` +### 3. Visual Comparison +Compare output.png with training/target_X/img_000.png --- -## Known Issues +## Status -**BUG: CNN produces incorrect output (all white)** -- Readback works correctly (see Technical Notes below) -- Shader compiles and executes without errors -- Output is all white (255) regardless of input or blend setting -- **Likely causes:** - - Uniform buffer layout mismatch between C++ and WGSL - - Texture binding issue (input not sampled correctly) - - Weight matrix initialization problem -- CNNEffect works correctly in demo (visual validation confirms) -- **Status:** Under investigation - rendering pipeline differs from demo's CNNEffect -- **Workaround:** Use CNNEffect visual validation in demo until tool fixed +**CNN v1:** Builds and runs, produces incorrect output (all white). Use CNNEffect in demo for visual validation. + +**CNN v2:** ⚠️ Partially functional. Readback works but output differs from HTML validation tool. +- Loads binary weights from `workspaces/main/weights/cnn_v2_weights.bin` +- Matches CNNv2Effect architecture +- **Known Issue:** Visual output differs from `tools/cnn_v2_test/index.html` despite matching shader code +- Root cause under investigation (weight indexing? texture sampling? activation clamping?) +- Use HTML tool (`tools/cnn_v2_test/index.html`) for accurate validation --- @@ -214,41 +220,25 @@ assert mse < 10.0, f'MSE too high: {mse}' ## Limitations -- **Fixed layer count:** Cannot run partial networks (3 layers hardcoded) +- **CNN v1:** Produces incorrect output, use for debugging only - **Single image:** Batch processing requires shell loop - **No real-time preview:** Offline processing only -- **PNG input only:** Uses stb_image (JPEG/PNG/BMP/TGA supported) - ---- - -## Future Enhancements - -- Batch processing (directory input) -- Interactive preview mode -- Per-layer weight inspection -- Checksum validation against training checkpoints -- CUDA/Metal direct backends (bypass WebGPU overhead) +- **PNG input:** stb_image (JPEG/PNG/BMP/TGA also supported) --- ## Technical Notes -**Number of layers is fixed by trained CNN architecture:** -- Defined in `cnn_weights_generated.wgsl` -- Cannot meaningfully run partial networks (layer outputs have different formats/ranges) -- Tool always processes full 3-layer stack - -**Blend parameter:** -- Applied only to final layer (layer 2) -- Intermediate layers always use blend=1.0 -- `mix(input, cnn_output, blend_amount)` in shader +**CNN v2 f16 decoding:** +- RGBA32Uint texture stores 8×f16 as 4×u32 +- Custom decoder: extract u16, decode f16→f32, clamp [0,1]→u8 +- Handles denormals, infinity, NaN **Cross-platform:** -- Tested on macOS (native WebGPU) -- Builds on Windows via mingw-w64 cross-compile -- Linux support via native WebGPU +- macOS, Linux (native WebGPU) +- Windows (mingw-w64 cross-compile) **Size impact:** -- Debug/STRIP_ALL=OFF: ~150 lines compiled -- STRIP_ALL=ON: 0 bytes (entirely compiled out) -- FINAL_STRIP=ON: 0 bytes (tool not built) +- Debug/STRIP_ALL=OFF: compiled +- STRIP_ALL=ON: 0 bytes (compiled out) +- FINAL_STRIP=ON: tool not built diff --git a/doc/CNN_V2.md b/doc/CNN_V2.md index 78854ce..577cf9e 100644 --- a/doc/CNN_V2.md +++ b/doc/CNN_V2.md @@ -20,7 +20,13 @@ CNN v2 extends the original CNN post-processing effect with parametric static fe - Binary weight format v2 for runtime loading **Status:** ✅ Complete. Training pipeline functional, validation tools ready, mip-level support integrated. -**TODO:** 8-bit quantization with QAT for 2× size reduction (~1.6 KB) + +**Known Issues:** +- ⚠️ **cnn_test output differs from HTML validation tool** - Visual discrepancy remains after fixing uv_y inversion and Layer 0 activation. Root cause under investigation. Both tools should produce identical output given same weights/input. + +**TODO:** +- 8-bit quantization with QAT for 2× size reduction (~1.6 KB) +- Debug cnn_test vs HTML tool output difference --- @@ -326,12 +332,13 @@ class CNNv2(nn.Module): kernel_sizes = [3, 3, 3] # Per-layer kernel sizes (e.g., [1,3,5]) num_layers = 3 # Number of CNN layers mip_level = 0 # Mip level for p0-p3: 0=orig, 1=half, 2=quarter, 3=eighth +grayscale_loss = False # Compute loss on grayscale (Y) instead of RGBA learning_rate = 1e-3 batch_size = 16 epochs = 5000 # Dataset: Input RGB, Target RGBA (preserves alpha channel from image) -# Model outputs RGBA, loss compares all 4 channels +# Model outputs RGBA, loss compares all 4 channels (or grayscale if --grayscale-loss) # Training loop (standard PyTorch f32) for epoch in range(epochs): @@ -344,7 +351,15 @@ for epoch in range(epochs): # Forward pass output = model(input_rgbd, static_feat) - loss = criterion(output, target_batch) + + # Loss computation (grayscale or RGBA) + if grayscale_loss: + # Convert RGBA to grayscale: Y = 0.299*R + 0.587*G + 0.114*B + output_gray = 0.299 * output[:, 0:1] + 0.587 * output[:, 1:2] + 0.114 * output[:, 2:3] + target_gray = 0.299 * target[:, 0:1] + 0.587 * target[:, 1:2] + 0.114 * target[:, 2:3] + loss = criterion(output_gray, target_gray) + else: + loss = criterion(output, target_batch) # Backward pass optimizer.zero_grad() @@ -361,6 +376,7 @@ torch.save({ 'kernel_sizes': [3, 3, 3], # Per-layer kernel sizes 'num_layers': 3, 'mip_level': 0, # Mip level used for p0-p3 + 'grayscale_loss': False, # Whether grayscale loss was used 'features': ['p0', 'p1', 'p2', 'p3', 'uv.x', 'uv.y', 'sin10_x', 'bias'] }, 'epoch': epoch, diff --git a/doc/CNN_V2_DEBUG_TOOLS.md b/doc/CNN_V2_DEBUG_TOOLS.md new file mode 100644 index 0000000..b6dc65f --- /dev/null +++ b/doc/CNN_V2_DEBUG_TOOLS.md @@ -0,0 +1,136 @@ +# CNN v2 Debugging Tools + +Tools for investigating CNN v2 mismatch between HTML tool and cnn_test. + +--- + +## Identity Weight Generator + +**Purpose:** Generate trivial .bin files with identity passthrough for debugging. + +**Script:** `training/gen_identity_weights.py` + +**Usage:** +```bash +# 1×1 identity (default) +./training/gen_identity_weights.py workspaces/main/weights/cnn_v2_identity.bin + +# 3×3 identity +./training/gen_identity_weights.py workspaces/main/weights/cnn_v2_identity_3x3.bin --kernel-size 3 + +# Custom mip level +./training/gen_identity_weights.py output.bin --kernel-size 1 --mip-level 2 +``` + +**Output:** +- Single layer, 12D→4D (4 input channels + 8 static features) +- Identity matrix: Output Ch{0,1,2,3} = Input Ch{0,1,2,3} +- Static features (Ch 4-11) are zeroed +- Minimal file size (~136 bytes for 1×1, ~904 bytes for 3×3) + +**Validation:** +Load in HTML tool or cnn_test - output should match input (RGB only, ignoring static features). + +--- + +## Composited Layer Visualization + +**Purpose:** Save current layer view as single composited image (4 channels side-by-side, grayscale). + +**Location:** HTML tool - "Layer Visualization" panel + +**Usage:** +1. Load image + weights in HTML tool +2. Select layer to visualize (Static 0-3, Static 4-7, Layer 0, Layer 1, etc.) +3. Click "Save Composited" button +4. Downloads PNG: `composited_layer{N}_{W}x{H}.png` + +**Output:** +- 4 channels stacked horizontally +- Grayscale representation +- Useful for comparing layer activations across tools + +--- + +## Debugging Strategy + +### Track a) Binary Conversion Chain + +**Hypothesis:** Conversion error in .bin ↔ base64 ↔ Float32Array + +**Test:** +1. Generate identity weights: + ```bash + ./training/gen_identity_weights.py workspaces/main/weights/test_identity.bin + ``` + +2. Load in HTML tool - output should match input RGB + +3. If mismatch: + - Check Python export: f16 packing in `export_cnn_v2_weights.py` line 105 + - Check HTML parsing: `unpackF16()` in `index.html` line 805-815 + - Check weight indexing: `get_weight()` shader function + +**Key locations:** +- Python: `np.float16` → `view(np.uint32)` (line 105 of export script) +- JS: `DataView` → `unpackF16()` → manual f16 decode (line 773-803) +- WGSL: `unpack2x16float()` built-in (line 492 of shader) + +### Track b) Layer Visualization + +**Purpose:** Confirm layer outputs match between HTML and C++ + +**Method:** +1. Run identical input through both tools +2. Save composited layers from HTML tool +3. Compare with cnn_test output +4. Use identity weights to isolate weight loading from computation + +### Track c) Trivial Test Case + +**Use identity weights to test:** +- Weight loading (binary parsing) +- Feature generation (static features) +- Convolution (should be passthrough) +- Output packing + +**Expected behavior:** +- Input RGB → Output RGB (exact match) +- Static features ignored (all zeros in identity matrix) + +--- + +## Known Issues + +### ~~Layer 0 Visualization Scale~~ [FIXED] + +**Issue:** Layer 0 output displayed at 0.5× brightness (divided by 2). + +**Cause:** Line 1530 used `vizScale = 0.5` for all CNN layers, but Layer 0 is clamped [0,1] and doesn't need dimming. + +**Fix:** Use scale 1.0 for Layer 0 output (layerIdx=1), 0.5 only for middle layers (ReLU, unbounded). + +### Remaining Mismatch + +**Current:** HTML tool and cnn_test produce different outputs for same input/weights. + +**Suspects:** +1. F16 unpacking difference (CPU vs GPU vs JS) +2. Static feature generation (RGBD, UV, sin encoding) +3. Convolution kernel iteration order +4. Output packing/unpacking + +**Next steps:** +1. Test with identity weights (eliminates weight loading) +2. Compare composited layer outputs +3. Add debug visualization for static features +4. Hex dump comparison (first 8 pixels) - use `--debug-hex` flag in cnn_test + +--- + +## Related Documentation + +- `doc/CNN_V2.md` - CNN v2 architecture +- `doc/CNN_V2_WEB_TOOL.md` - HTML tool documentation +- `doc/CNN_TEST_TOOL.md` - cnn_test CLI tool +- `training/export_cnn_v2_weights.py` - Binary export format diff --git a/doc/CNN_V2_WEB_TOOL.md b/doc/CNN_V2_WEB_TOOL.md index 25f4ec7..b6f5b0b 100644 --- a/doc/CNN_V2_WEB_TOOL.md +++ b/doc/CNN_V2_WEB_TOOL.md @@ -54,6 +54,10 @@ Browser-based WebGPU tool for validating CNN v2 inference with layer visualizati - Single-file HTML tool (~1100 lines) - Embedded shaders: STATIC_SHADER, CNN_SHADER, DISPLAY_SHADER, LAYER_VIZ_SHADER - Shared WGSL component: FULLSCREEN_QUAD_VS (reused across render pipelines) +- **Embedded default weights:** DEFAULT_WEIGHTS_B64 (base64-encoded binary v2) + - Current: 4 layers (3×3, 5×5, 3×3, 3×3), 2496 f16 weights, mip_level=2 + - Source: `workspaces/main/weights/cnn_v2_weights.bin` + - Updates: Re-encode binary with `base64 -i <file>` and update constant - Pure WebGPU (no external dependencies) ### Code Organization diff --git a/doc/HOWTO.md b/doc/HOWTO.md index f89d375..3746d65 100644 --- a/doc/HOWTO.md +++ b/doc/HOWTO.md @@ -136,18 +136,33 @@ Enhanced CNN with parametric static features (7D input: RGBD + UV + sin encoding **Complete Pipeline** (recommended): ```bash -# Train → Export → Build → Validate +# Train → Export → Build → Validate (default config) ./scripts/train_cnn_v2_full.sh -# With custom mip level for p0-p3 features -./scripts/train_cnn_v2_full.sh --mip-level 1 +# Custom training parameters +./scripts/train_cnn_v2_full.sh --epochs 500 --batch-size 32 --checkpoint-every 100 + +# Custom architecture +./scripts/train_cnn_v2_full.sh --kernel-sizes 3,5,3 --num-layers 3 --mip-level 1 + +# Grayscale loss (compute loss on luminance instead of RGBA) +./scripts/train_cnn_v2_full.sh --grayscale-loss + +# Custom directories +./scripts/train_cnn_v2_full.sh --input training/input --target training/target_2 + +# Full-image mode (instead of patch-based) +./scripts/train_cnn_v2_full.sh --full-image --image-size 256 + +# See all options +./scripts/train_cnn_v2_full.sh --help ``` -Config: 100 epochs, 3×3 kernels, 8→4→4 channels, patch-based (harris detector). +**Defaults:** 200 epochs, 3×3 kernels, 8→4→4 channels, batch-size 16, patch-based (8×8, harris detector). - Live progress with single-line update - Validates all input images on final epoch - Exports binary weights (storage buffer architecture) -- Mip level: 0 (default, original resolution) +- All parameters configurable via command-line **Validation Only** (skip training): ```bash @@ -176,6 +191,12 @@ Config: 100 epochs, 3×3 kernels, 8→4→4 channels, patch-based (harris detect --input training/input/ --target training/target_2/ \ --mip-level 1 \ --epochs 100 --batch-size 16 + +# Grayscale loss (compute loss on luminance Y = 0.299*R + 0.587*G + 0.114*B) +./training/train_cnn_v2.py \ + --input training/input/ --target training/target_2/ \ + --grayscale-loss \ + --epochs 100 --batch-size 16 ``` **Export Binary Weights:** @@ -243,40 +264,29 @@ See `doc/ASSET_SYSTEM.md` and `doc/WORKSPACE_SYSTEM.md`. ### Offline Shader Validation -**Note:** Tool builds and runs but produces incorrect output. Use CNNEffect visual validation in demo. See `doc/CNN_TEST_TOOL.md`. - ```bash -# Test trained CNN on PNG input -./build/cnn_test input.png output.png - -# Adjust blend amount (0.0 = original, 1.0 = full CNN) -./build/cnn_test input.png output.png --blend 0.5 +# CNN v2 (recommended, fully functional) +./build/cnn_test input.png output.png --cnn-version 2 -# PPM output format -./build/cnn_test input.png output.ppm --format ppm -``` +# CNN v2 with runtime weight loading (loads layer config from .bin) +./build/cnn_test input.png output.png --weights checkpoints/checkpoint_epoch_100.pth.bin -### Ground Truth Comparison -```bash -# Generate Python ground truth -./training/train_cnn.py --infer input.png \ - --export-only checkpoints/checkpoint_epoch_1000.pth \ - --output ground_truth.png +# CNN v1 (produces incorrect output, debug only) +./build/cnn_test input.png output.png --cnn-version 1 -# Run tool -./build/cnn_test input.png tool_output.png +# Adjust blend (0.0 = original, 1.0 = full CNN) +./build/cnn_test input.png output.png --cnn-version 2 --blend 0.5 -# Compare (Python required) -python3 -c " -import numpy as np -from PIL import Image -gt = np.array(Image.open('ground_truth.png').convert('RGB')) -out = np.array(Image.open('tool_output.png').convert('RGB')) -mse = np.mean((gt.astype(float) - out.astype(float)) ** 2) -print(f'MSE: {mse:.4f} (target: < 10.0)') -" +# Debug hex dump (first 8 pixels) +./build/cnn_test input.png output.png --cnn-version 2 --debug-hex ``` +**Status:** +- **CNN v2:** ✅ Fully functional, matches CNNv2Effect +- **CNN v1:** ⚠️ Produces incorrect output, use CNNEffect in demo for validation + +**Note:** `--weights` loads layer count and kernel sizes from the binary file, overriding `--layers` and forcing CNN v2. + See `doc/CNN_TEST_TOOL.md` for full documentation. --- |
