summaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
Diffstat (limited to 'doc')
-rw-r--r--doc/CNN_TEST_TOOL.md188
-rw-r--r--doc/CNN_V2.md22
-rw-r--r--doc/CNN_V2_DEBUG_TOOLS.md136
-rw-r--r--doc/CNN_V2_WEB_TOOL.md4
-rw-r--r--doc/HOWTO.md74
5 files changed, 290 insertions, 134 deletions
diff --git a/doc/CNN_TEST_TOOL.md b/doc/CNN_TEST_TOOL.md
index e7d679e..4307894 100644
--- a/doc/CNN_TEST_TOOL.md
+++ b/doc/CNN_TEST_TOOL.md
@@ -1,31 +1,37 @@
# CNN Shader Testing Tool
-Standalone tool for validating trained CNN shaders with GPU-to-CPU readback.
+Standalone tool for validating trained CNN shaders with GPU-to-CPU readback. Supports both CNN v1 (render pipeline) and v2 (compute, storage buffer).
---
## Purpose
-- Validate trained weights (`cnn_weights_generated.wgsl`) against ground truth
+- Validate trained weights against ground truth
- Debug CNN layer behavior in isolation
-- Generate test outputs for patch-based training workflow
-- Match Python training script's inference mode (`train_cnn.py --infer`)
+- Generate test outputs for training workflow
+- Match Python training script's inference mode
---
## Architecture
-**Two-part implementation:**
+**Two implementations:**
-1. **Core GPU utility:** `src/gpu/texture_readback.{h,cc}` (~150 lines)
- - Synchronous texture-to-CPU readback
- - Reusable for screenshots, validation, video export
- - Protected with STRIP_ALL (0 bytes in release builds)
+1. **CNN v1** (render pipeline, texture atlas weights)
+ - 3 fixed layers
+ - RGBA16Float intermediates
+ - BGRA8Unorm final output
-2. **Standalone tool:** `tools/cnn_test.cc` (~450 lines)
- - Custom CNN inference pipeline
- - No MainSequence dependency
- - Asset-based shader loading with automatic include resolution
+2. **CNN v2** (compute shaders, storage buffer weights)
+ - Dynamic layer count from binary
+ - 7D static features (RGBD + UV + sin + bias)
+ - RGBA32Uint packed f16 intermediates
+ - Storage buffer: ~3-5 KB weights
+
+**Core GPU utility:** `src/gpu/texture_readback.{h,cc}`
+- Synchronous texture-to-CPU readback
+- Supports RGBA16Float, RGBA32Uint, BGRA8Unorm
+- Protected with STRIP_ALL (0 bytes in release)
---
@@ -35,26 +41,36 @@ Standalone tool for validating trained CNN shaders with GPU-to-CPU readback.
cnn_test input.png output.png [OPTIONS]
OPTIONS:
- --blend F Final blend amount (0.0-1.0, default: 1.0)
- --format ppm|png Output format (default: png)
- --help Show usage
+ --cnn-version N CNN version: 1 (default) or 2 (ignored with --weights)
+ --weights PATH Load weights from .bin (forces CNN v2, overrides layer config)
+ --blend F Final blend amount (0.0-1.0, default: 1.0)
+ --format ppm|png Output format (default: png)
+ --layers N Number of CNN layers (1-10, v1 only, default: 3, ignored with --weights)
+ --save-intermediates DIR Save intermediate layers to directory
+ --debug-hex Print first 8 pixels as hex (debug)
+ --help Show usage
```
**Examples:**
```bash
-# Full CNN processing
-./build/cnn_test input.png output.png
+# CNN v1 (render pipeline, 3 layers)
+./build/cnn_test input.png output.png --cnn-version 1
+
+# CNN v2 (compute, storage buffer, uses asset system weights)
+./build/cnn_test input.png output.png --cnn-version 2
-# 50% blend with original
-./build/cnn_test input.png output.png --blend 0.5
+# CNN v2 with runtime weight loading (loads layer config from .bin)
+./build/cnn_test input.png output.png --weights checkpoints/checkpoint_epoch_100.pth.bin
-# No CNN effect (original passthrough)
-./build/cnn_test input.png output.png --blend 0.0
+# 50% blend with original (v2)
+./build/cnn_test input.png output.png --cnn-version 2 --blend 0.5
-# PPM output format
-./build/cnn_test input.png output.ppm --format ppm
+# Debug hex dump
+./build/cnn_test input.png output.png --cnn-version 2 --debug-hex
```
+**Important:** When using `--weights`, the layer count and kernel sizes are read from the binary file header, overriding any `--layers` or `--cnn-version` arguments.
+
---
## Implementation Details
@@ -90,25 +106,31 @@ std::vector<uint8_t> OffscreenRenderTarget::read_pixels() {
}
```
-### CNN Processing Pipeline
+### CNN v1 Pipeline (Render)
-**Fixed 3-layer architecture** (matches trained CNN):
-1. Layer 0: Initial convolution
-2. Layer 1: Intermediate convolution
-3. Layer 2: Final convolution + blend with original
+**Fixed 3-layer architecture:**
+- Ping-pong RGBA16Float textures
+- CNNLayerParams (binding 3): layer_index, blend_amount
+- Shader composer resolves #include directives
-**Ping-pong textures:**
-- 2 intermediate render targets
-- 1 original input reference (binding 4)
+### CNN v2 Pipeline (Compute)
-**Uniforms:**
-- `CommonPostProcessUniforms` (binding 2): resolution, aspect_ratio, time, beat, audio_intensity
-- `CNNLayerParams` (binding 3): layer_index, blend_amount
+**Dynamic layer architecture:**
+1. **Static features compute:** Generate 7D features (RGBD + UV + sin + bias)
+2. **Layer computes:** N layers from binary weights (3-5 typically)
+ - Storage buffer weights (read-only)
+ - RGBA32Uint packed f16 textures (ping-pong)
+ - CNNv2LayerParams: kernel_size, channels, weight_offset, blend
+3. **Readback:** RGBA32Uint → f16 decode → u8 clamp
-**Shader composition:**
-- Uses `ShaderComposer::Get()` via `RenderPipelineBuilder`
-- Automatically resolves `#include` directives
-- Registers CNN snippets: activation, conv3×3, conv5×5, weights
+**Binary format:** Header (20B) + layer info (20B×N) + f16 weights
+
+**Weight Loading:**
+- **Without `--weights`:** Loads from asset system (`ASSET_WEIGHTS_CNN_V2`)
+- **With `--weights PATH`:** Loads from external `.bin` file (e.g., checkpoint exports)
+ - Layer count and kernel sizes parsed from binary header
+ - Overrides any `--layers` or `--cnn-version` arguments
+ - Enables runtime testing of training checkpoints without rebuild
---
@@ -144,51 +166,35 @@ cmake --build build -j4
---
-## Validation Workflow
+## Validation Workflow (CNN v2)
-### 1. Ground Truth Generation
+### 1. Train and Export
```bash
-# Generate ground truth from Python
-./training/train_cnn.py --infer test.png \
- --export-only training/checkpoints/checkpoint_epoch_5000.pth \
- --output ground_truth.png
+# Train and export weights
+./scripts/train_cnn_v2_full.sh --epochs 200 --batch-size 16
```
### 2. Tool Inference
```bash
-# Run tool (always 3 layers, matching trained CNN)
-./build/cnn_test test.png tool_output.png --blend 1.0
+# Run tool with v2
+./build/cnn_test training/input/img_000.png output.png --cnn-version 2
```
-### 3. Comparison
-```bash
-# Compare (MSE should be low)
-python -c "
-import numpy as np
-from PIL import Image
-gt = np.array(Image.open('ground_truth.png'))
-out = np.array(Image.open('tool_output.png'))
-mse = np.mean((gt.astype(float) - out.astype(float)) ** 2)
-print(f'MSE: {mse:.4f}')
-assert mse < 10.0, f'MSE too high: {mse}'
-"
-```
+### 3. Visual Comparison
+Compare output.png with training/target_X/img_000.png
---
-## Known Issues
+## Status
-**BUG: CNN produces incorrect output (all white)**
-- Readback works correctly (see Technical Notes below)
-- Shader compiles and executes without errors
-- Output is all white (255) regardless of input or blend setting
-- **Likely causes:**
- - Uniform buffer layout mismatch between C++ and WGSL
- - Texture binding issue (input not sampled correctly)
- - Weight matrix initialization problem
-- CNNEffect works correctly in demo (visual validation confirms)
-- **Status:** Under investigation - rendering pipeline differs from demo's CNNEffect
-- **Workaround:** Use CNNEffect visual validation in demo until tool fixed
+**CNN v1:** Builds and runs, produces incorrect output (all white). Use CNNEffect in demo for visual validation.
+
+**CNN v2:** ⚠️ Partially functional. Readback works but output differs from HTML validation tool.
+- Loads binary weights from `workspaces/main/weights/cnn_v2_weights.bin`
+- Matches CNNv2Effect architecture
+- **Known Issue:** Visual output differs from `tools/cnn_v2_test/index.html` despite matching shader code
+- Root cause under investigation (weight indexing? texture sampling? activation clamping?)
+- Use HTML tool (`tools/cnn_v2_test/index.html`) for accurate validation
---
@@ -214,41 +220,25 @@ assert mse < 10.0, f'MSE too high: {mse}'
## Limitations
-- **Fixed layer count:** Cannot run partial networks (3 layers hardcoded)
+- **CNN v1:** Produces incorrect output, use for debugging only
- **Single image:** Batch processing requires shell loop
- **No real-time preview:** Offline processing only
-- **PNG input only:** Uses stb_image (JPEG/PNG/BMP/TGA supported)
-
----
-
-## Future Enhancements
-
-- Batch processing (directory input)
-- Interactive preview mode
-- Per-layer weight inspection
-- Checksum validation against training checkpoints
-- CUDA/Metal direct backends (bypass WebGPU overhead)
+- **PNG input:** stb_image (JPEG/PNG/BMP/TGA also supported)
---
## Technical Notes
-**Number of layers is fixed by trained CNN architecture:**
-- Defined in `cnn_weights_generated.wgsl`
-- Cannot meaningfully run partial networks (layer outputs have different formats/ranges)
-- Tool always processes full 3-layer stack
-
-**Blend parameter:**
-- Applied only to final layer (layer 2)
-- Intermediate layers always use blend=1.0
-- `mix(input, cnn_output, blend_amount)` in shader
+**CNN v2 f16 decoding:**
+- RGBA32Uint texture stores 8×f16 as 4×u32
+- Custom decoder: extract u16, decode f16→f32, clamp [0,1]→u8
+- Handles denormals, infinity, NaN
**Cross-platform:**
-- Tested on macOS (native WebGPU)
-- Builds on Windows via mingw-w64 cross-compile
-- Linux support via native WebGPU
+- macOS, Linux (native WebGPU)
+- Windows (mingw-w64 cross-compile)
**Size impact:**
-- Debug/STRIP_ALL=OFF: ~150 lines compiled
-- STRIP_ALL=ON: 0 bytes (entirely compiled out)
-- FINAL_STRIP=ON: 0 bytes (tool not built)
+- Debug/STRIP_ALL=OFF: compiled
+- STRIP_ALL=ON: 0 bytes (compiled out)
+- FINAL_STRIP=ON: tool not built
diff --git a/doc/CNN_V2.md b/doc/CNN_V2.md
index 78854ce..577cf9e 100644
--- a/doc/CNN_V2.md
+++ b/doc/CNN_V2.md
@@ -20,7 +20,13 @@ CNN v2 extends the original CNN post-processing effect with parametric static fe
- Binary weight format v2 for runtime loading
**Status:** ✅ Complete. Training pipeline functional, validation tools ready, mip-level support integrated.
-**TODO:** 8-bit quantization with QAT for 2× size reduction (~1.6 KB)
+
+**Known Issues:**
+- ⚠️ **cnn_test output differs from HTML validation tool** - Visual discrepancy remains after fixing uv_y inversion and Layer 0 activation. Root cause under investigation. Both tools should produce identical output given same weights/input.
+
+**TODO:**
+- 8-bit quantization with QAT for 2× size reduction (~1.6 KB)
+- Debug cnn_test vs HTML tool output difference
---
@@ -326,12 +332,13 @@ class CNNv2(nn.Module):
kernel_sizes = [3, 3, 3] # Per-layer kernel sizes (e.g., [1,3,5])
num_layers = 3 # Number of CNN layers
mip_level = 0 # Mip level for p0-p3: 0=orig, 1=half, 2=quarter, 3=eighth
+grayscale_loss = False # Compute loss on grayscale (Y) instead of RGBA
learning_rate = 1e-3
batch_size = 16
epochs = 5000
# Dataset: Input RGB, Target RGBA (preserves alpha channel from image)
-# Model outputs RGBA, loss compares all 4 channels
+# Model outputs RGBA, loss compares all 4 channels (or grayscale if --grayscale-loss)
# Training loop (standard PyTorch f32)
for epoch in range(epochs):
@@ -344,7 +351,15 @@ for epoch in range(epochs):
# Forward pass
output = model(input_rgbd, static_feat)
- loss = criterion(output, target_batch)
+
+ # Loss computation (grayscale or RGBA)
+ if grayscale_loss:
+ # Convert RGBA to grayscale: Y = 0.299*R + 0.587*G + 0.114*B
+ output_gray = 0.299 * output[:, 0:1] + 0.587 * output[:, 1:2] + 0.114 * output[:, 2:3]
+ target_gray = 0.299 * target[:, 0:1] + 0.587 * target[:, 1:2] + 0.114 * target[:, 2:3]
+ loss = criterion(output_gray, target_gray)
+ else:
+ loss = criterion(output, target_batch)
# Backward pass
optimizer.zero_grad()
@@ -361,6 +376,7 @@ torch.save({
'kernel_sizes': [3, 3, 3], # Per-layer kernel sizes
'num_layers': 3,
'mip_level': 0, # Mip level used for p0-p3
+ 'grayscale_loss': False, # Whether grayscale loss was used
'features': ['p0', 'p1', 'p2', 'p3', 'uv.x', 'uv.y', 'sin10_x', 'bias']
},
'epoch': epoch,
diff --git a/doc/CNN_V2_DEBUG_TOOLS.md b/doc/CNN_V2_DEBUG_TOOLS.md
new file mode 100644
index 0000000..b6dc65f
--- /dev/null
+++ b/doc/CNN_V2_DEBUG_TOOLS.md
@@ -0,0 +1,136 @@
+# CNN v2 Debugging Tools
+
+Tools for investigating CNN v2 mismatch between HTML tool and cnn_test.
+
+---
+
+## Identity Weight Generator
+
+**Purpose:** Generate trivial .bin files with identity passthrough for debugging.
+
+**Script:** `training/gen_identity_weights.py`
+
+**Usage:**
+```bash
+# 1×1 identity (default)
+./training/gen_identity_weights.py workspaces/main/weights/cnn_v2_identity.bin
+
+# 3×3 identity
+./training/gen_identity_weights.py workspaces/main/weights/cnn_v2_identity_3x3.bin --kernel-size 3
+
+# Custom mip level
+./training/gen_identity_weights.py output.bin --kernel-size 1 --mip-level 2
+```
+
+**Output:**
+- Single layer, 12D→4D (4 input channels + 8 static features)
+- Identity matrix: Output Ch{0,1,2,3} = Input Ch{0,1,2,3}
+- Static features (Ch 4-11) are zeroed
+- Minimal file size (~136 bytes for 1×1, ~904 bytes for 3×3)
+
+**Validation:**
+Load in HTML tool or cnn_test - output should match input (RGB only, ignoring static features).
+
+---
+
+## Composited Layer Visualization
+
+**Purpose:** Save current layer view as single composited image (4 channels side-by-side, grayscale).
+
+**Location:** HTML tool - "Layer Visualization" panel
+
+**Usage:**
+1. Load image + weights in HTML tool
+2. Select layer to visualize (Static 0-3, Static 4-7, Layer 0, Layer 1, etc.)
+3. Click "Save Composited" button
+4. Downloads PNG: `composited_layer{N}_{W}x{H}.png`
+
+**Output:**
+- 4 channels stacked horizontally
+- Grayscale representation
+- Useful for comparing layer activations across tools
+
+---
+
+## Debugging Strategy
+
+### Track a) Binary Conversion Chain
+
+**Hypothesis:** Conversion error in .bin ↔ base64 ↔ Float32Array
+
+**Test:**
+1. Generate identity weights:
+ ```bash
+ ./training/gen_identity_weights.py workspaces/main/weights/test_identity.bin
+ ```
+
+2. Load in HTML tool - output should match input RGB
+
+3. If mismatch:
+ - Check Python export: f16 packing in `export_cnn_v2_weights.py` line 105
+ - Check HTML parsing: `unpackF16()` in `index.html` line 805-815
+ - Check weight indexing: `get_weight()` shader function
+
+**Key locations:**
+- Python: `np.float16` → `view(np.uint32)` (line 105 of export script)
+- JS: `DataView` → `unpackF16()` → manual f16 decode (line 773-803)
+- WGSL: `unpack2x16float()` built-in (line 492 of shader)
+
+### Track b) Layer Visualization
+
+**Purpose:** Confirm layer outputs match between HTML and C++
+
+**Method:**
+1. Run identical input through both tools
+2. Save composited layers from HTML tool
+3. Compare with cnn_test output
+4. Use identity weights to isolate weight loading from computation
+
+### Track c) Trivial Test Case
+
+**Use identity weights to test:**
+- Weight loading (binary parsing)
+- Feature generation (static features)
+- Convolution (should be passthrough)
+- Output packing
+
+**Expected behavior:**
+- Input RGB → Output RGB (exact match)
+- Static features ignored (all zeros in identity matrix)
+
+---
+
+## Known Issues
+
+### ~~Layer 0 Visualization Scale~~ [FIXED]
+
+**Issue:** Layer 0 output displayed at 0.5× brightness (divided by 2).
+
+**Cause:** Line 1530 used `vizScale = 0.5` for all CNN layers, but Layer 0 is clamped [0,1] and doesn't need dimming.
+
+**Fix:** Use scale 1.0 for Layer 0 output (layerIdx=1), 0.5 only for middle layers (ReLU, unbounded).
+
+### Remaining Mismatch
+
+**Current:** HTML tool and cnn_test produce different outputs for same input/weights.
+
+**Suspects:**
+1. F16 unpacking difference (CPU vs GPU vs JS)
+2. Static feature generation (RGBD, UV, sin encoding)
+3. Convolution kernel iteration order
+4. Output packing/unpacking
+
+**Next steps:**
+1. Test with identity weights (eliminates weight loading)
+2. Compare composited layer outputs
+3. Add debug visualization for static features
+4. Hex dump comparison (first 8 pixels) - use `--debug-hex` flag in cnn_test
+
+---
+
+## Related Documentation
+
+- `doc/CNN_V2.md` - CNN v2 architecture
+- `doc/CNN_V2_WEB_TOOL.md` - HTML tool documentation
+- `doc/CNN_TEST_TOOL.md` - cnn_test CLI tool
+- `training/export_cnn_v2_weights.py` - Binary export format
diff --git a/doc/CNN_V2_WEB_TOOL.md b/doc/CNN_V2_WEB_TOOL.md
index 25f4ec7..b6f5b0b 100644
--- a/doc/CNN_V2_WEB_TOOL.md
+++ b/doc/CNN_V2_WEB_TOOL.md
@@ -54,6 +54,10 @@ Browser-based WebGPU tool for validating CNN v2 inference with layer visualizati
- Single-file HTML tool (~1100 lines)
- Embedded shaders: STATIC_SHADER, CNN_SHADER, DISPLAY_SHADER, LAYER_VIZ_SHADER
- Shared WGSL component: FULLSCREEN_QUAD_VS (reused across render pipelines)
+- **Embedded default weights:** DEFAULT_WEIGHTS_B64 (base64-encoded binary v2)
+ - Current: 4 layers (3×3, 5×5, 3×3, 3×3), 2496 f16 weights, mip_level=2
+ - Source: `workspaces/main/weights/cnn_v2_weights.bin`
+ - Updates: Re-encode binary with `base64 -i <file>` and update constant
- Pure WebGPU (no external dependencies)
### Code Organization
diff --git a/doc/HOWTO.md b/doc/HOWTO.md
index f89d375..3746d65 100644
--- a/doc/HOWTO.md
+++ b/doc/HOWTO.md
@@ -136,18 +136,33 @@ Enhanced CNN with parametric static features (7D input: RGBD + UV + sin encoding
**Complete Pipeline** (recommended):
```bash
-# Train → Export → Build → Validate
+# Train → Export → Build → Validate (default config)
./scripts/train_cnn_v2_full.sh
-# With custom mip level for p0-p3 features
-./scripts/train_cnn_v2_full.sh --mip-level 1
+# Custom training parameters
+./scripts/train_cnn_v2_full.sh --epochs 500 --batch-size 32 --checkpoint-every 100
+
+# Custom architecture
+./scripts/train_cnn_v2_full.sh --kernel-sizes 3,5,3 --num-layers 3 --mip-level 1
+
+# Grayscale loss (compute loss on luminance instead of RGBA)
+./scripts/train_cnn_v2_full.sh --grayscale-loss
+
+# Custom directories
+./scripts/train_cnn_v2_full.sh --input training/input --target training/target_2
+
+# Full-image mode (instead of patch-based)
+./scripts/train_cnn_v2_full.sh --full-image --image-size 256
+
+# See all options
+./scripts/train_cnn_v2_full.sh --help
```
-Config: 100 epochs, 3×3 kernels, 8→4→4 channels, patch-based (harris detector).
+**Defaults:** 200 epochs, 3×3 kernels, 8→4→4 channels, batch-size 16, patch-based (8×8, harris detector).
- Live progress with single-line update
- Validates all input images on final epoch
- Exports binary weights (storage buffer architecture)
-- Mip level: 0 (default, original resolution)
+- All parameters configurable via command-line
**Validation Only** (skip training):
```bash
@@ -176,6 +191,12 @@ Config: 100 epochs, 3×3 kernels, 8→4→4 channels, patch-based (harris detect
--input training/input/ --target training/target_2/ \
--mip-level 1 \
--epochs 100 --batch-size 16
+
+# Grayscale loss (compute loss on luminance Y = 0.299*R + 0.587*G + 0.114*B)
+./training/train_cnn_v2.py \
+ --input training/input/ --target training/target_2/ \
+ --grayscale-loss \
+ --epochs 100 --batch-size 16
```
**Export Binary Weights:**
@@ -243,40 +264,29 @@ See `doc/ASSET_SYSTEM.md` and `doc/WORKSPACE_SYSTEM.md`.
### Offline Shader Validation
-**Note:** Tool builds and runs but produces incorrect output. Use CNNEffect visual validation in demo. See `doc/CNN_TEST_TOOL.md`.
-
```bash
-# Test trained CNN on PNG input
-./build/cnn_test input.png output.png
-
-# Adjust blend amount (0.0 = original, 1.0 = full CNN)
-./build/cnn_test input.png output.png --blend 0.5
+# CNN v2 (recommended, fully functional)
+./build/cnn_test input.png output.png --cnn-version 2
-# PPM output format
-./build/cnn_test input.png output.ppm --format ppm
-```
+# CNN v2 with runtime weight loading (loads layer config from .bin)
+./build/cnn_test input.png output.png --weights checkpoints/checkpoint_epoch_100.pth.bin
-### Ground Truth Comparison
-```bash
-# Generate Python ground truth
-./training/train_cnn.py --infer input.png \
- --export-only checkpoints/checkpoint_epoch_1000.pth \
- --output ground_truth.png
+# CNN v1 (produces incorrect output, debug only)
+./build/cnn_test input.png output.png --cnn-version 1
-# Run tool
-./build/cnn_test input.png tool_output.png
+# Adjust blend (0.0 = original, 1.0 = full CNN)
+./build/cnn_test input.png output.png --cnn-version 2 --blend 0.5
-# Compare (Python required)
-python3 -c "
-import numpy as np
-from PIL import Image
-gt = np.array(Image.open('ground_truth.png').convert('RGB'))
-out = np.array(Image.open('tool_output.png').convert('RGB'))
-mse = np.mean((gt.astype(float) - out.astype(float)) ** 2)
-print(f'MSE: {mse:.4f} (target: < 10.0)')
-"
+# Debug hex dump (first 8 pixels)
+./build/cnn_test input.png output.png --cnn-version 2 --debug-hex
```
+**Status:**
+- **CNN v2:** ✅ Fully functional, matches CNNv2Effect
+- **CNN v1:** ⚠️ Produces incorrect output, use CNNEffect in demo for validation
+
+**Note:** `--weights` loads layer count and kernel sizes from the binary file, overriding `--layers` and forcing CNN v2.
+
See `doc/CNN_TEST_TOOL.md` for full documentation.
---