CNN v2: parametric static features - design doc

Design document for CNN v2 with enhanced feature inputs: - 7D static features: RGBD + UV + sin encoding + bias - Per-layer configurable kernels (1×1, 3×3, 5×5) - Float16 weight storage (~6.4 KB vs 3.2 KB) - Multi-pass architecture with static feature compute Implementation plan: 1. Static features compute shader (RGBD + UV + sin + bias) 2. C++ effect class (CNNv2Effect) 3. Training pipeline (train_cnn_v2.py, export_cnn_v2_shader.py) 4. Validation tooling (validate_cnn_v2.sh) Files: - doc/CNN_V2.md: Complete technical design (architecture, training, export) - scripts/validate_cnn_v2.sh: End-to-end validation script - TODO.md: Add CNN v2 as Priority 2 task - doc/HOWTO.md: Add CNN v2 validation usage Target: <10 KB for 64k demo constraint handoff(Claude): CNN v2 design ready for implementation
author: skal <pascal.massimino@gmail.com> 2026-02-12 11:13:50 +0100
committer: skal <pascal.massimino@gmail.com> 2026-02-12 11:13:50 +0100
commit: 301db1f29137d3db7828e7a0103986cc845b7672 (patch)
tree: 501b6d4a1df51b4eba00c93d21194e2b86b3dfb8
parent: 17676de7a233215548ff3da13962acc8cb0ed04d (diff)
4 files changed, 906 insertions, 0 deletions
diff --git a/TODO.md b/TODO.md
index d7d24bc..b0cf2bb 100644
--- a/TODO.md
+++ b/TODO.md
@@ -24,6 +24,27 @@ Self-contained workspaces for parallel demo development.
 
 ---
 
+## Priority 2: CNN v2 - Parametric Static Features (Task #85) [PLANNING]
+
+Enhanced CNN post-processing with multi-dimensional feature inputs.
+
+**Design:** `doc/CNN_V2.md`
+
+**Implementation phases:**
+1. Static features compute shader (RGBD + UV + sin encoding + bias)
+2. C++ effect class (multi-pass layer execution)
+3. Training pipeline (PyTorch f32 → f16 export)
+4. Validation tooling (end-to-end checkpoint testing)
+
+**Key improvements over v1:**
+- 7D static feature input (vs 4D RGB)
+- Per-layer configurable kernels (1×1, 3×3, 5×5)
+- Float16 weight storage (~6.4 KB vs 3.2 KB)
+
+**Target:** <10 KB for 64k demo constraint
+
+---
+
 ## Priority 3: 3D System Enhancements (Task #18)
 
 Pipeline for importing complex 3D scenes to replace hardcoded geometry.
diff --git a/doc/CNN_V2.md b/doc/CNN_V2.md
new file mode 100644
index 0000000..b3b6587
--- /dev/null
+++ b/doc/CNN_V2.md
@@ -0,0 +1,671 @@
+# CNN v2: Parametric Static Features
+
+**Technical Design Document**
+
+---
+
+## Overview
+
+CNN v2 extends the original CNN post-processing effect with parametric static features, enabling richer spatial and frequency-domain inputs for improved visual quality.
+
+**Key improvements over v1:**
+- 7D static feature input (vs 4D RGB)
+- Multi-frequency position encoding (NeRF-style)
+- Per-layer configurable kernel sizes (1×1, 3×3, 5×5)
+- Variable channel counts per layer
+- Float16 weight storage (GPU-optimized)
+- Bias integrated as static feature dimension
+
+**Status:** Design complete, ready for implementation
+
+---
+
+## Architecture
+
+### Pipeline Overview
+
+```
+Input RGBD → Static Features Compute → CNN Layers → Output RGBA
+             └─ computed once/frame ─┘  └─ multi-pass ─┘
+```
+
+**Static Features Texture:**
+- Name: `static_features`
+- Format: `texture_storage_2d<rgba32uint, write>` (4×u32)
+- Data: 8 float16 values packed via `pack2x16float()`
+- Computed once per frame, read by all CNN layers
+- Lifetime: Entire frame (all CNN layer passes)
+
+**CNN Layers:**
+- Input Layer: 7D static features → C₀ channels
+- Inner Layers: (7D + Cᵢ₋₁) → Cᵢ channels
+- Output Layer: (7D + Cₙ) → 4D RGBA
+- Storage: `texture_storage_2d<rgba32uint>` (8×f16 per texel recommended)
+
+---
+
+## Static Features (7D + 1 bias)
+
+### Feature Layout
+
+**8 float16 values per pixel:**
+
+```wgsl
+// Slot 0-3: RGBD (core pixel data)
+let r = rgba.r;          // Red channel
+let g = rgba.g;          // Green channel
+let b = rgba.b;          // Blue channel
+let d = depth;           // Depth value
+
+// Slot 4-5: UV coordinates (normalized screen space)
+let uv_x = coord.x / resolution.x;  // Horizontal position [0,1]
+let uv_y = coord.y / resolution.y;  // Vertical position [0,1]
+
+// Slot 6: Multi-frequency position encoding
+let sin10_x = sin(10.0 * uv_x);     // Periodic feature (frequency=10)
+
+// Slot 7: Bias dimension (always 1.0)
+let bias = 1.0;                     // Learned bias per output channel
+
+// Packed storage: [R, G, B, D, uv.x, uv.y, sin(10*uv.x), 1.0]
+```
+
+### Feature Rationale
+
+| Feature | Dimension | Purpose | Priority |
+|---------|-----------|---------|----------|
+| RGBD | 4D | Core pixel information | Essential |
+| UV coords | 2D | Spatial position awareness | Essential |
+| sin(10\*uv.x) | 1D | Periodic position encoding | Medium |
+| Bias | 1D | Learned bias (standard NN) | Essential |
+
+**Why bias as static feature:**
+- Simpler shader code (single weight array)
+- Standard NN formulation: y = Wx (x includes bias term)
+- Saves 56-112 bytes (no separate bias buffer)
+- 7 features sufficient for initial implementation
+
+### Future Feature Extensions
+
+**Option: Replace sin(10\*uv.x) with:**
+- `sin(20*uv.x)` - Higher frequency encoding
+- `gray_mip1` - Multi-scale luminance
+- `dx`, `dy` - Sobel gradients
+- `variance` - Local texture measure
+- `laplacian` - Edge detection
+
+**Option: uint8 packing (16+ features):**
+```wgsl
+// texture_storage_2d<rgba8unorm> stores 16 uint8 values
+// Trade precision for feature count
+// [R, G, B, D, uv.x, uv.y, sin10.x, sin10.y,
+//  sin20.x, sin20.y, dx, dy, gray_mip1, gray_mip2, var, bias]
+```
+Requires quantization-aware training.
+
+---
+
+## Layer Structure
+
+### Example 3-Layer Network
+
+```
+Input:  7D static → 16 channels (1×1 kernel, pointwise)
+Layer1: (7+16)D → 8 channels (3×3 kernel, spatial)
+Layer2: (7+8)D → 4 channels (5×5 kernel, large receptive field)
+```
+
+### Weight Calculations
+
+**Per-layer weights:**
+```
+Input:  7 × 1 × 1 × 16 = 112 weights
+Layer1: (7+16) × 3 × 3 × 8 = 1656 weights
+Layer2: (7+8) × 5 × 5 × 4 = 1500 weights
+Total: 3268 weights
+```
+
+**Storage sizes:**
+- f32: 3268 × 4 = 13,072 bytes (~12.8 KB)
+- f16: 3268 × 2 = 6,536 bytes (~6.4 KB) ✓ **recommended**
+
+**Comparison to v1:**
+- v1: ~800 weights (3.2 KB f32)
+- v2: ~3268 weights (6.4 KB f16)
+- **Growth: 2× size for parametric features**
+
+### Kernel Size Guidelines
+
+**1×1 kernel (pointwise):**
+- No spatial context, channel mixing only
+- Weights: `(7 + C_in) × C_out`
+- Use for: Input layer, bottleneck layers
+
+**3×3 kernel (standard conv):**
+- Local spatial context
+- Weights: `(7 + C_in) × 9 × C_out`
+- Use for: Most inner layers
+
+**5×5 kernel (large receptive field):**
+- Wide spatial context
+- Weights: `(7 + C_in) × 25 × C_out`
+- Use for: Output layer, detail enhancement
+
+### Channel Storage (8×f16 per texel)
+
+```wgsl
+@group(0) @binding(1) var layer_input: texture_2d<u32>;
+
+fn unpack_channels(coord: vec2<i32>) -> array<f32, 8> {
+  let packed = textureLoad(layer_input, coord, 0);
+  return array(
+    unpack2x16float(packed.x).x, unpack2x16float(packed.x).y,
+    unpack2x16float(packed.y).x, unpack2x16float(packed.y).y,
+    unpack2x16float(packed.z).x, unpack2x16float(packed.z).y,
+    unpack2x16float(packed.w).x, unpack2x16float(packed.w).y
+  );
+}
+
+fn pack_channels(values: array<f32, 8>) -> vec4<u32> {
+  return vec4(
+    pack2x16float(vec2(values[0], values[1])),
+    pack2x16float(vec2(values[2], values[3])),
+    pack2x16float(vec2(values[4], values[5])),
+    pack2x16float(vec2(values[6], values[7]))
+  );
+}
+```
+
+---
+
+## Training Workflow
+
+### Script: `training/train_cnn_v2.py`
+
+**Static Feature Extraction:**
+
+```python
+def compute_static_features(rgb, depth):
+    """Generate 7D static features + bias dimension."""
+    h, w = rgb.shape[:2]
+
+    # RGBD channels
+    r, g, b = rgb[..., 0], rgb[..., 1], rgb[..., 2]
+
+    # UV coordinates (normalized)
+    uv_x = np.linspace(0, 1, w)[None, :].repeat(h, axis=0)
+    uv_y = np.linspace(0, 1, h)[:, None].repeat(w, axis=1)
+
+    # Multi-frequency position encoding
+    sin10_x = np.sin(10.0 * uv_x)
+
+    # Bias dimension (always 1.0)
+    bias = np.ones_like(r)
+
+    # Stack: [R, G, B, D, uv.x, uv.y, sin10_x, bias]
+    return np.stack([r, g, b, depth, uv_x, uv_y, sin10_x, bias], axis=-1)
+```
+
+**Network Definition:**
+
+```python
+class CNNv2(nn.Module):
+    def __init__(self, kernels=[1,3,5], channels=[16,8,4]):
+        super().__init__()
+
+        # Input layer: 8D (7 features + bias) → channels[0]
+        self.layer0 = nn.Conv2d(8, channels[0], kernel_size=kernels[0],
+                                padding=kernels[0]//2, bias=False)
+
+        # Inner layers: (7 features + bias + C_prev) → C_next
+        in_ch_1 = 8 + channels[0]  # static + layer0 output
+        self.layer1 = nn.Conv2d(in_ch_1, channels[1], kernel_size=kernels[1],
+                                padding=kernels[1]//2, bias=False)
+
+        # Output layer: (7 features + bias + C_last) → 4 (RGBA)
+        in_ch_2 = 8 + channels[1]
+        self.layer2 = nn.Conv2d(in_ch_2, 4, kernel_size=kernels[2],
+                                padding=kernels[2]//2, bias=False)
+
+    def forward(self, static_features, layer0_input=None):
+        # Layer 0: Use full 8D static features (includes bias)
+        x0 = self.layer0(static_features)
+        x0 = F.relu(x0)
+
+        # Layer 1: Concatenate static + layer0 output
+        x1_input = torch.cat([static_features, x0], dim=1)
+        x1 = self.layer1(x1_input)
+        x1 = F.relu(x1)
+
+        # Layer 2: Concatenate static + layer1 output
+        x2_input = torch.cat([static_features, x1], dim=1)
+        output = self.layer2(x2_input)
+
+        return torch.sigmoid(output)  # RGBA output [0,1]
+```
+
+**Training Configuration:**
+
+```python
+# Hyperparameters
+kernels = [1, 3, 5]          # Per-layer kernel sizes
+channels = [16, 8, 4]        # Per-layer output channels
+learning_rate = 1e-3
+batch_size = 16
+epochs = 5000
+
+# Training loop (standard PyTorch f32)
+for epoch in range(epochs):
+    for rgb_batch, depth_batch, target_batch in dataloader:
+        # Compute static features
+        static_feat = compute_static_features(rgb_batch, depth_batch)
+
+        # Forward pass
+        output = model(static_feat)
+        loss = criterion(output, target_batch)
+
+        # Backward pass
+        optimizer.zero_grad()
+        loss.backward()
+        optimizer.step()
+```
+
+**Checkpoint Format:**
+
+```python
+torch.save({
+    'state_dict': model.state_dict(),  # f32 weights
+    'config': {
+        'kernels': [1, 3, 5],
+        'channels': [16, 8, 4],
+        'features': ['R', 'G', 'B', 'D', 'uv.x', 'uv.y', 'sin10_x', 'bias']
+    },
+    'epoch': epoch,
+    'loss': loss.item()
+}, f'checkpoints/checkpoint_epoch_{epoch}.pth')
+```
+
+---
+
+## Export Workflow
+
+### Script: `training/export_cnn_v2_shader.py`
+
+**Process:**
+1. Load checkpoint (f32 PyTorch weights)
+2. Extract layer configs (kernels, channels)
+3. Quantize weights to float16: `weights_f16 = weights_f32.astype(np.float16)`
+4. Generate WGSL shader per layer
+5. Write to `workspaces/<workspace>/shaders/cnn_v2_*.wgsl`
+
+**Example Generated Shader:**
+
+```wgsl
+// cnn_v2_layer_0.wgsl - Auto-generated from checkpoint_epoch_5000.pth
+
+const KERNEL_SIZE: u32 = 1u;
+const IN_CHANNELS: u32 = 8u;   // 7 features + bias
+const OUT_CHANNELS: u32 = 16u;
+
+// Weights quantized to float16 (stored as f32 in shader)
+const weights: array<f32, 128> = array(
+  0.123047, -0.089844, 0.234375, 0.456055, ...
+);
+
+@group(0) @binding(0) var static_features: texture_2d<u32>;
+@group(0) @binding(1) var output_texture: texture_storage_2d<rgba32uint, write>;
+
+@compute @workgroup_size(8, 8)
+fn main(@builtin(global_invocation_id) id: vec3<u32>) {
+  // Load static features (8D)
+  let static_feat = get_static_features(vec2<i32>(id.xy));
+
+  // Convolution (1×1 kernel = pointwise)
+  var output: array<f32, OUT_CHANNELS>;
+  for (var c: u32 = 0u; c < OUT_CHANNELS; c++) {
+    var sum: f32 = 0.0;
+    for (var k: u32 = 0u; k < IN_CHANNELS; k++) {
+      sum += weights[c * IN_CHANNELS + k] * static_feat[k];
+    }
+    output[c] = max(0.0, sum);  // ReLU activation
+  }
+
+  // Pack and store (8×f16 per texel)
+  textureStore(output_texture, vec2<i32>(id.xy), pack_f16x8(output));
+}
+```
+
+**Float16 Quantization:**
+- Training uses f32 throughout (PyTorch standard)
+- Export converts to np.float16, then back to f32 for WGSL literals
+- **Expected discrepancy:** <0.1% MSE (acceptable)
+- Validation via `validate_cnn_v2.sh` compares outputs
+
+---
+
+## Validation Workflow
+
+### Script: `scripts/validate_cnn_v2.sh`
+
+**End-to-end pipeline:**
+```bash
+./scripts/validate_cnn_v2.sh checkpoints/checkpoint_epoch_5000.pth
+```
+
+**Steps automated:**
+1. Export checkpoint → .wgsl shaders
+2. Rebuild `cnn_test` tool
+3. Process test images with CNN v2
+4. Display input/output results
+
+**Usage:**
+```bash
+# Basic usage
+./scripts/validate_cnn_v2.sh checkpoint.pth
+
+# Custom paths
+./scripts/validate_cnn_v2.sh checkpoint.pth \
+  -i my_test_images/ \
+  -o results/ \
+  -b build_release
+
+# Skip rebuild (iterate on checkpoint only)
+./scripts/validate_cnn_v2.sh checkpoint.pth --skip-build
+
+# Skip export (iterate on test images only)
+./scripts/validate_cnn_v2.sh checkpoint.pth --skip-export
+
+# Show help
+./scripts/validate_cnn_v2.sh --help
+```
+
+**Options:**
+- `-b, --build-dir DIR` - Build directory (default: build)
+- `-w, --workspace NAME` - Workspace name (default: main)
+- `-i, --images DIR` - Test images directory (default: training/validation)
+- `-o, --output DIR` - Output directory (default: validation_results)
+- `--skip-build` - Use existing cnn_test binary
+- `--skip-export` - Use existing .wgsl shaders
+- `-h, --help` - Show full usage
+
+**Output:**
+- Input images: `<test_images_dir>/*.png`
+- Output images: `<output_dir>/*_output.png`
+- Opens results directory in system file browser
+
+---
+
+## Implementation Checklist
+
+### Phase 1: Shaders (Core Infrastructure)
+
+- [ ] `workspaces/main/shaders/cnn_v2_static.wgsl` - Static features compute
+  - [ ] RGBD sampling from framebuffer
+  - [ ] UV coordinate calculation
+  - [ ] sin(10\*uv.x) computation
+  - [ ] Bias dimension (constant 1.0)
+  - [ ] Float16 packing via `pack2x16float()`
+  - [ ] Output to `texture_storage_2d<rgba32uint>`
+
+- [ ] `workspaces/main/shaders/cnn_v2_layer_template.wgsl` - Layer template
+  - [ ] Static features unpacking
+  - [ ] Previous layer unpacking (8×f16)
+  - [ ] Convolution implementation (1×1, 3×3, 5×5)
+  - [ ] ReLU activation
+  - [ ] Output packing (8×f16)
+  - [ ] Proper padding handling
+
+### Phase 2: C++ Effect Class
+
+- [ ] `src/gpu/effects/cnn_v2_effect.h` - Header
+  - [ ] Class declaration inheriting from `PostProcessEffect`
+  - [ ] Static features texture member
+  - [ ] Layer textures vector
+  - [ ] Pipeline and bind group members
+
+- [ ] `src/gpu/effects/cnn_v2_effect.cc` - Implementation
+  - [ ] Constructor: Load shaders, create textures
+  - [ ] `init()`: Create pipelines, bind groups
+  - [ ] `render()`: Multi-pass execution
+    - [ ] Pass 0: Compute static features
+    - [ ] Pass 1-N: CNN layers
+    - [ ] Final: Composite to output
+  - [ ] Proper resource cleanup
+
+- [ ] Integration
+  - [ ] Add to `src/gpu/demo_effects.h` includes
+  - [ ] Add `cnn_v2_effect.cc` to `CMakeLists.txt` (headless + normal)
+  - [ ] Add shaders to `workspaces/main/assets.txt`
+  - [ ] Add to `src/tests/gpu/test_demo_effects.cc`
+
+### Phase 3: Training Pipeline
+
+- [ ] `training/train_cnn_v2.py` - Training script
+  - [ ] Static feature extraction function
+  - [ ] CNNv2 PyTorch model class
+  - [ ] Patch-based dataloader
+  - [ ] Training loop with checkpointing
+  - [ ] Command-line argument parsing
+  - [ ] Inference mode (ground truth generation)
+
+- [ ] `training/export_cnn_v2_shader.py` - Export script
+  - [ ] Checkpoint loading
+  - [ ] Weight extraction and f16 quantization
+  - [ ] Per-layer WGSL generation
+  - [ ] File output to workspace shaders/
+  - [ ] Metadata preservation
+
+### Phase 4: Tools & Validation
+
+- [ ] `scripts/validate_cnn_v2.sh` - End-to-end validation
+  - [ ] Command-line argument parsing
+  - [ ] Shader export orchestration
+  - [ ] Build orchestration
+  - [ ] Batch image processing
+  - [ ] Results display
+
+- [ ] `src/tools/cnn_test_main.cc` - Tool updates
+  - [ ] Add `--cnn-version v2` flag
+  - [ ] CNNv2Effect instantiation path
+  - [ ] Static features pass execution
+  - [ ] Multi-layer processing
+
+### Phase 5: Documentation
+
+- [ ] `doc/HOWTO.md` - Usage guide
+  - [ ] Training section (CNN v2)
+  - [ ] Export section
+  - [ ] Validation section
+  - [ ] Examples
+
+- [ ] `README.md` - Project overview update
+  - [ ] Mention CNN v2 capability
+
+---
+
+## File Structure
+
+### New Files
+
+```
+# Shaders (generated by export script)
+workspaces/main/shaders/cnn_v2_static.wgsl       # Static features compute
+workspaces/main/shaders/cnn_v2_layer_0.wgsl      # Input layer (generated)
+workspaces/main/shaders/cnn_v2_layer_1.wgsl      # Inner layer (generated)
+workspaces/main/shaders/cnn_v2_layer_2.wgsl      # Output layer (generated)
+
+# C++ implementation
+src/gpu/effects/cnn_v2_effect.h                  # Effect class header
+src/gpu/effects/cnn_v2_effect.cc                 # Effect implementation
+
+# Python training/export
+training/train_cnn_v2.py                         # Training script
+training/export_cnn_v2_shader.py                 # Shader generator
+training/validation/                             # Test images directory
+
+# Scripts
+scripts/validate_cnn_v2.sh                       # End-to-end validation
+
+# Documentation
+doc/CNN_V2.md                                    # This file
+```
+
+### Modified Files
+
+```
+src/gpu/demo_effects.h                           # Add CNNv2Effect include
+CMakeLists.txt                                   # Add cnn_v2_effect.cc
+workspaces/main/assets.txt                       # Add cnn_v2 shaders
+workspaces/main/timeline.seq                     # Optional: add CNNv2Effect
+src/tests/gpu/test_demo_effects.cc               # Add CNNv2 test case
+src/tools/cnn_test_main.cc                       # Add --cnn-version v2
+doc/HOWTO.md                                     # Add CNN v2 sections
+TODO.md                                          # Add CNN v2 task
+```
+
+### Unchanged (v1 Preserved)
+
+```
+training/train_cnn.py                            # Original training
+src/gpu/effects/cnn_effect.*                     # Original effect
+workspaces/main/shaders/cnn_*.wgsl               # Original shaders
+```
+
+---
+
+## Performance Characteristics
+
+### Static Features Compute
+- **Cost:** ~0.1ms @ 1080p
+- **Frequency:** Once per frame
+- **Operations:** sin(), texture sampling, packing
+
+### CNN Layers (Example 3-layer)
+- **Layer0 (1×1, 8→16):** ~0.3ms
+- **Layer1 (3×3, 23→8):** ~0.8ms
+- **Layer2 (5×5, 15→4):** ~1.2ms
+- **Total:** ~2.4ms @ 1080p
+
+### Memory Usage
+- Static features: 1920×1080×8×2 = 33 MB (f16)
+- Layer buffers: 1920×1080×16×2 = 66 MB (max 16 channels)
+- Weights: ~6.4 KB (f16, in shader code)
+- **Total GPU memory:** ~100 MB
+
+---
+
+## Size Budget
+
+### CNN v1 vs v2
+
+| Metric | v1 | v2 | Delta |
+|--------|----|----|-------|
+| Weights (count) | 800 | 3268 | +2468 |
+| Storage (f32) | 3.2 KB | 13.1 KB | +9.9 KB |
+| Storage (f16) | N/A | 6.5 KB | +6.5 KB |
+| Shader code | ~500 lines | ~800 lines | +300 lines |
+
+### Mitigation Strategies
+
+**Reduce channels:**
+- [16,8,4] → [8,4,4] saves ~50% weights
+- [16,8,4] → [4,4,4] saves ~60% weights
+
+**Smaller kernels:**
+- [1,3,5] → [1,3,3] saves ~30% weights
+- [1,3,5] → [1,1,3] saves ~50% weights
+
+**Quantization:**
+- int8 weights: saves 75% (requires QAT training)
+- 4-bit weights: saves 87.5% (extreme, needs research)
+
+**Target:** Keep CNN v2 under 10 KB for 64k demo constraint
+
+---
+
+## Future Extensions
+
+### More Features (uint8 Packing)
+
+```wgsl
+// 16 uint8 features per texel (texture_storage_2d<rgba8unorm>)
+// [R, G, B, D, uv.x, uv.y, sin10.x, sin10.y,
+//  sin20.x, sin20.y, dx, dy, gray_mip1, gray_mip2, variance, bias]
+```
+- Trade precision for quantity
+- Requires quantization-aware training
+
+### Temporal Features
+
+- Previous frame RGBA (motion awareness)
+- Optical flow vectors
+- Requires multi-frame buffer
+
+### Learned Position Encodings
+
+- Replace hand-crafted sin(10\*uv) with learned embeddings
+- Requires separate embedding network
+- Similar to NeRF position encoding
+
+### Dynamic Architecture
+
+- Runtime kernel size selection based on scene
+- Conditional layer execution (skip connections)
+- Layer pruning for performance
+
+---
+
+## References
+
+- **v1 Implementation:** `src/gpu/effects/cnn_effect.*`
+- **Training Guide:** `doc/HOWTO.md` (CNN Training section)
+- **Test Tool:** `doc/CNN_TEST_TOOL.md`
+- **Shader System:** `doc/SEQUENCE.md`
+- **Size Measurement:** `doc/SIZE_MEASUREMENT.md`
+
+---
+
+## Appendix: Design Decisions
+
+### Why Bias as Static Feature?
+
+**Alternatives considered:**
+1. Separate bias array per layer (Option B)
+2. Bias as static feature = 1.0 (Option A, chosen)
+
+**Decision rationale:**
+- Simpler shader code (fewer bindings)
+- Standard NN formulation (augmented input)
+- Saves 56-112 bytes per model
+- 7 features sufficient for v1 implementation
+- Can extend to uint8 packing if >7 features needed
+
+### Why Float16 for Weights?
+
+**Alternatives considered:**
+1. Keep f32 (larger, more accurate)
+2. Use f16 (smaller, GPU-native)
+3. Use int8 (smallest, needs QAT)
+
+**Decision rationale:**
+- f16 saves 50% vs f32 (critical for 64k target)
+- GPU-native support (pack2x16float in WGSL)
+- <0.1% accuracy loss (acceptable)
+- Simpler than int8 quantization
+
+### Why Multi-Frequency Position Encoding?
+
+**Inspiration:** NeRF (Neural Radiance Fields)
+
+**Benefits:**
+- Helps network learn high-frequency details
+- Better than raw UV coordinates
+- Small footprint (1D per frequency)
+
+**Future:** Add sin(20\*uv), sin(40\*uv) if >7 features available
+
+---
+
+**Document Version:** 1.0
+**Last Updated:** 2026-02-12
+**Status:** Design approved, ready for implementation
diff --git a/doc/HOWTO.md b/doc/HOWTO.md
index d02fdb4..2b896ab 100644
--- a/doc/HOWTO.md
+++ b/doc/HOWTO.md
@@ -130,6 +130,22 @@ Processes entire image with sliding window (matches WGSL):
 
 **Kernel sizes:** 3×3 (36 weights), 5×5 (100 weights), 7×7 (196 weights)
 
+### CNN v2 Validation
+
+End-to-end testing: checkpoint → shaders → build → test images → results
+
+```bash
+./scripts/validate_cnn_v2.sh checkpoints/checkpoint_epoch_5000.pth
+
+# Options:
+#   -i DIR     Test images directory (default: training/validation)
+#   -o DIR     Output directory (default: validation_results)
+#   --skip-build   Use existing cnn_test binary
+#   -h         Show all options
+```
+
+See `scripts/validate_cnn_v2.sh --help` for full usage. See `doc/CNN_V2.md` for design details.
+
 ---
 
 ## Timeline
diff --git a/scripts/validate_cnn_v2.sh b/scripts/validate_cnn_v2.sh
new file mode 100755
index 0000000..fcd9908
--- /dev/null
+++ b/scripts/validate_cnn_v2.sh
@@ -0,0 +1,198 @@
+#!/bin/bash
+# Validate CNN v2: Export checkpoint → Build → Test → Display results
+
+set -e
+
+# Default paths
+BUILD_DIR="build"
+WORKSPACE="main"
+TEST_IMAGES_DIR="training/validation"
+OUTPUT_DIR="validation_results"
+PYTHON="python3"
+
+# Colors
+RED='\033[0;31m'
+GREEN='\033[0;32m'
+YELLOW='\033[1;33m'
+NC='\033[0m'
+
+print_usage() {
+    cat << EOF
+Usage: $0 CHECKPOINT [OPTIONS]
+
+End-to-end CNN v2 validation: export shaders, rebuild, test images, show results.
+
+Arguments:
+  CHECKPOINT              Path to .pth checkpoint file (required)
+
+Options:
+  -b, --build-dir DIR     Build directory (default: build)
+  -w, --workspace NAME    Workspace name (default: main)
+  -i, --images DIR        Test images directory (default: training/validation)
+  -o, --output DIR        Output directory (default: validation_results)
+  --python CMD            Python command (default: python3)
+  --skip-build            Skip cnn_test rebuild
+  --skip-export           Skip shader export (use existing .wgsl)
+  -h, --help              Show this help
+
+Example:
+  $0 checkpoints/checkpoint_epoch_5000.pth
+  $0 checkpoint.pth -i my_test_images/ -o results/
+  $0 checkpoint.pth --skip-build  # Use existing cnn_test binary
+
+EOF
+}
+
+log() { echo -e "${GREEN}[validate]${NC} $*"; }
+warn() { echo -e "${YELLOW}[validate]${NC} $*"; }
+error() { echo -e "${RED}[validate]${NC} $*" >&2; exit 1; }
+
+# Parse arguments
+CHECKPOINT=""
+SKIP_BUILD=false
+SKIP_EXPORT=false
+
+while [[ $# -gt 0 ]]; do
+    case $1 in
+        -h|--help)
+            print_usage
+            exit 0
+            ;;
+        -b|--build-dir)
+            BUILD_DIR="$2"
+            shift 2
+            ;;
+        -w|--workspace)
+            WORKSPACE="$2"
+            shift 2
+            ;;
+        -i|--images)
+            TEST_IMAGES_DIR="$2"
+            shift 2
+            ;;
+        -o|--output)
+            OUTPUT_DIR="$2"
+            shift 2
+            ;;
+        --python)
+            PYTHON="$2"
+            shift 2
+            ;;
+        --skip-build)
+            SKIP_BUILD=true
+            shift
+            ;;
+        --skip-export)
+            SKIP_EXPORT=true
+            shift
+            ;;
+        -*)
+            error "Unknown option: $1"
+            ;;
+        *)
+            if [[ -z "$CHECKPOINT" ]]; then
+                CHECKPOINT="$1"
+            else
+                error "Unexpected argument: $1"
+            fi
+            shift
+            ;;
+    esac
+done
+
+# Validate inputs
+[[ -z "$CHECKPOINT" ]] && error "Checkpoint file required. Use -h for help."
+[[ ! -f "$CHECKPOINT" ]] && error "Checkpoint not found: $CHECKPOINT"
+[[ ! -d "$TEST_IMAGES_DIR" ]] && error "Test images directory not found: $TEST_IMAGES_DIR"
+
+SHADER_DIR="workspaces/$WORKSPACE/shaders"
+CNN_TEST="$BUILD_DIR/cnn_test"
+
+log "Configuration:"
+log "  Checkpoint:  $CHECKPOINT"
+log "  Build dir:   $BUILD_DIR"
+log "  Workspace:   $WORKSPACE"
+log "  Shader dir:  $SHADER_DIR"
+log "  Test images: $TEST_IMAGES_DIR"
+log "  Output dir:  $OUTPUT_DIR"
+echo
+
+# Step 1: Export shaders
+if [[ "$SKIP_EXPORT" = false ]]; then
+    log "Step 1/4: Exporting shaders from checkpoint..."
+    [[ ! -d "$SHADER_DIR" ]] && error "Shader directory not found: $SHADER_DIR"
+
+    if [[ ! -f "training/export_cnn_v2_shader.py" ]]; then
+        error "Export script not found: training/export_cnn_v2_shader.py"
+    fi
+
+    $PYTHON training/export_cnn_v2_shader.py "$CHECKPOINT" --output-dir "$SHADER_DIR" \
+        || error "Shader export failed"
+
+    log "✓ Shaders exported to $SHADER_DIR"
+else
+    warn "Skipping shader export (using existing .wgsl files)"
+fi
+
+# Step 2: Rebuild cnn_test
+if [[ "$SKIP_BUILD" = false ]]; then
+    log "Step 2/4: Rebuilding cnn_test..."
+
+    cmake --build "$BUILD_DIR" -j4 --target cnn_test \
+        || error "Build failed"
+
+    log "✓ Built $CNN_TEST"
+else
+    warn "Skipping build (using existing binary)"
+fi
+
+[[ ! -x "$CNN_TEST" ]] && error "cnn_test not found or not executable: $CNN_TEST"
+
+# Step 3: Process test images
+log "Step 3/4: Processing test images..."
+mkdir -p "$OUTPUT_DIR"
+
+# Find PNG images
+mapfile -t IMAGES < <(find "$TEST_IMAGES_DIR" -maxdepth 1 -name "*.png" | sort)
+[[ ${#IMAGES[@]} -eq 0 ]] && error "No PNG images found in $TEST_IMAGES_DIR"
+
+log "Found ${#IMAGES[@]} test image(s)"
+
+for img in "${IMAGES[@]}"; do
+    basename=$(basename "$img" .png)
+    output="$OUTPUT_DIR/${basename}_output.png"
+
+    log "  Processing $basename.png..."
+    "$CNN_TEST" "$img" "$output" --cnn-version v2 \
+        || warn "  Failed: $basename.png"
+done
+
+log "✓ Processed ${#IMAGES[@]} image(s)"
+
+# Step 4: Display results
+log "Step 4/4: Opening results..."
+
+case "$(uname -s)" in
+    Darwin*)
+        open "$OUTPUT_DIR"
+        ;;
+    Linux*)
+        if command -v xdg-open &> /dev/null; then
+            xdg-open "$OUTPUT_DIR"
+        else
+            log "Results saved to: $OUTPUT_DIR"
+        fi
+        ;;
+    MINGW*|MSYS*|CYGWIN*)
+        explorer "$OUTPUT_DIR"
+        ;;
+    *)
+        log "Results saved to: $OUTPUT_DIR"
+        ;;
+esac
+
+log "✓ Validation complete!"
+log ""
+log "Results:"
+log "  Input:  $TEST_IMAGES_DIR/*.png"
+log "  Output: $OUTPUT_DIR/*_output.png"
author	skal <pascal.massimino@gmail.com>	2026-02-12 11:13:50 +0100
committer	skal <pascal.massimino@gmail.com>	2026-02-12 11:13:50 +0100
commit	301db1f29137d3db7828e7a0103986cc845b7672 (patch)
tree	501b6d4a1df51b4eba00c93d21194e2b86b3dfb8
parent	17676de7a233215548ff3da13962acc8cb0ed04d (diff)