diff options
| author | skal <pascal.massimino@gmail.com> | 2026-02-12 11:13:50 +0100 |
|---|---|---|
| committer | skal <pascal.massimino@gmail.com> | 2026-02-12 11:13:50 +0100 |
| commit | 301db1f29137d3db7828e7a0103986cc845b7672 (patch) | |
| tree | 501b6d4a1df51b4eba00c93d21194e2b86b3dfb8 | |
| parent | 17676de7a233215548ff3da13962acc8cb0ed04d (diff) | |
CNN v2: parametric static features - design doc
Design document for CNN v2 with enhanced feature inputs:
- 7D static features: RGBD + UV + sin encoding + bias
- Per-layer configurable kernels (1×1, 3×3, 5×5)
- Float16 weight storage (~6.4 KB vs 3.2 KB)
- Multi-pass architecture with static feature compute
Implementation plan:
1. Static features compute shader (RGBD + UV + sin + bias)
2. C++ effect class (CNNv2Effect)
3. Training pipeline (train_cnn_v2.py, export_cnn_v2_shader.py)
4. Validation tooling (validate_cnn_v2.sh)
Files:
- doc/CNN_V2.md: Complete technical design (architecture, training, export)
- scripts/validate_cnn_v2.sh: End-to-end validation script
- TODO.md: Add CNN v2 as Priority 2 task
- doc/HOWTO.md: Add CNN v2 validation usage
Target: <10 KB for 64k demo constraint
handoff(Claude): CNN v2 design ready for implementation
| -rw-r--r-- | TODO.md | 21 | ||||
| -rw-r--r-- | doc/CNN_V2.md | 671 | ||||
| -rw-r--r-- | doc/HOWTO.md | 16 | ||||
| -rwxr-xr-x | scripts/validate_cnn_v2.sh | 198 |
4 files changed, 906 insertions, 0 deletions
@@ -24,6 +24,27 @@ Self-contained workspaces for parallel demo development. --- +## Priority 2: CNN v2 - Parametric Static Features (Task #85) [PLANNING] + +Enhanced CNN post-processing with multi-dimensional feature inputs. + +**Design:** `doc/CNN_V2.md` + +**Implementation phases:** +1. Static features compute shader (RGBD + UV + sin encoding + bias) +2. C++ effect class (multi-pass layer execution) +3. Training pipeline (PyTorch f32 → f16 export) +4. Validation tooling (end-to-end checkpoint testing) + +**Key improvements over v1:** +- 7D static feature input (vs 4D RGB) +- Per-layer configurable kernels (1×1, 3×3, 5×5) +- Float16 weight storage (~6.4 KB vs 3.2 KB) + +**Target:** <10 KB for 64k demo constraint + +--- + ## Priority 3: 3D System Enhancements (Task #18) Pipeline for importing complex 3D scenes to replace hardcoded geometry. diff --git a/doc/CNN_V2.md b/doc/CNN_V2.md new file mode 100644 index 0000000..b3b6587 --- /dev/null +++ b/doc/CNN_V2.md @@ -0,0 +1,671 @@ +# CNN v2: Parametric Static Features + +**Technical Design Document** + +--- + +## Overview + +CNN v2 extends the original CNN post-processing effect with parametric static features, enabling richer spatial and frequency-domain inputs for improved visual quality. + +**Key improvements over v1:** +- 7D static feature input (vs 4D RGB) +- Multi-frequency position encoding (NeRF-style) +- Per-layer configurable kernel sizes (1×1, 3×3, 5×5) +- Variable channel counts per layer +- Float16 weight storage (GPU-optimized) +- Bias integrated as static feature dimension + +**Status:** Design complete, ready for implementation + +--- + +## Architecture + +### Pipeline Overview + +``` +Input RGBD → Static Features Compute → CNN Layers → Output RGBA + └─ computed once/frame ─┘ └─ multi-pass ─┘ +``` + +**Static Features Texture:** +- Name: `static_features` +- Format: `texture_storage_2d<rgba32uint, write>` (4×u32) +- Data: 8 float16 values packed via `pack2x16float()` +- Computed once per frame, read by all CNN layers +- Lifetime: Entire frame (all CNN layer passes) + +**CNN Layers:** +- Input Layer: 7D static features → C₀ channels +- Inner Layers: (7D + Cᵢ₋₁) → Cᵢ channels +- Output Layer: (7D + Cₙ) → 4D RGBA +- Storage: `texture_storage_2d<rgba32uint>` (8×f16 per texel recommended) + +--- + +## Static Features (7D + 1 bias) + +### Feature Layout + +**8 float16 values per pixel:** + +```wgsl +// Slot 0-3: RGBD (core pixel data) +let r = rgba.r; // Red channel +let g = rgba.g; // Green channel +let b = rgba.b; // Blue channel +let d = depth; // Depth value + +// Slot 4-5: UV coordinates (normalized screen space) +let uv_x = coord.x / resolution.x; // Horizontal position [0,1] +let uv_y = coord.y / resolution.y; // Vertical position [0,1] + +// Slot 6: Multi-frequency position encoding +let sin10_x = sin(10.0 * uv_x); // Periodic feature (frequency=10) + +// Slot 7: Bias dimension (always 1.0) +let bias = 1.0; // Learned bias per output channel + +// Packed storage: [R, G, B, D, uv.x, uv.y, sin(10*uv.x), 1.0] +``` + +### Feature Rationale + +| Feature | Dimension | Purpose | Priority | +|---------|-----------|---------|----------| +| RGBD | 4D | Core pixel information | Essential | +| UV coords | 2D | Spatial position awareness | Essential | +| sin(10\*uv.x) | 1D | Periodic position encoding | Medium | +| Bias | 1D | Learned bias (standard NN) | Essential | + +**Why bias as static feature:** +- Simpler shader code (single weight array) +- Standard NN formulation: y = Wx (x includes bias term) +- Saves 56-112 bytes (no separate bias buffer) +- 7 features sufficient for initial implementation + +### Future Feature Extensions + +**Option: Replace sin(10\*uv.x) with:** +- `sin(20*uv.x)` - Higher frequency encoding +- `gray_mip1` - Multi-scale luminance +- `dx`, `dy` - Sobel gradients +- `variance` - Local texture measure +- `laplacian` - Edge detection + +**Option: uint8 packing (16+ features):** +```wgsl +// texture_storage_2d<rgba8unorm> stores 16 uint8 values +// Trade precision for feature count +// [R, G, B, D, uv.x, uv.y, sin10.x, sin10.y, +// sin20.x, sin20.y, dx, dy, gray_mip1, gray_mip2, var, bias] +``` +Requires quantization-aware training. + +--- + +## Layer Structure + +### Example 3-Layer Network + +``` +Input: 7D static → 16 channels (1×1 kernel, pointwise) +Layer1: (7+16)D → 8 channels (3×3 kernel, spatial) +Layer2: (7+8)D → 4 channels (5×5 kernel, large receptive field) +``` + +### Weight Calculations + +**Per-layer weights:** +``` +Input: 7 × 1 × 1 × 16 = 112 weights +Layer1: (7+16) × 3 × 3 × 8 = 1656 weights +Layer2: (7+8) × 5 × 5 × 4 = 1500 weights +Total: 3268 weights +``` + +**Storage sizes:** +- f32: 3268 × 4 = 13,072 bytes (~12.8 KB) +- f16: 3268 × 2 = 6,536 bytes (~6.4 KB) ✓ **recommended** + +**Comparison to v1:** +- v1: ~800 weights (3.2 KB f32) +- v2: ~3268 weights (6.4 KB f16) +- **Growth: 2× size for parametric features** + +### Kernel Size Guidelines + +**1×1 kernel (pointwise):** +- No spatial context, channel mixing only +- Weights: `(7 + C_in) × C_out` +- Use for: Input layer, bottleneck layers + +**3×3 kernel (standard conv):** +- Local spatial context +- Weights: `(7 + C_in) × 9 × C_out` +- Use for: Most inner layers + +**5×5 kernel (large receptive field):** +- Wide spatial context +- Weights: `(7 + C_in) × 25 × C_out` +- Use for: Output layer, detail enhancement + +### Channel Storage (8×f16 per texel) + +```wgsl +@group(0) @binding(1) var layer_input: texture_2d<u32>; + +fn unpack_channels(coord: vec2<i32>) -> array<f32, 8> { + let packed = textureLoad(layer_input, coord, 0); + return array( + unpack2x16float(packed.x).x, unpack2x16float(packed.x).y, + unpack2x16float(packed.y).x, unpack2x16float(packed.y).y, + unpack2x16float(packed.z).x, unpack2x16float(packed.z).y, + unpack2x16float(packed.w).x, unpack2x16float(packed.w).y + ); +} + +fn pack_channels(values: array<f32, 8>) -> vec4<u32> { + return vec4( + pack2x16float(vec2(values[0], values[1])), + pack2x16float(vec2(values[2], values[3])), + pack2x16float(vec2(values[4], values[5])), + pack2x16float(vec2(values[6], values[7])) + ); +} +``` + +--- + +## Training Workflow + +### Script: `training/train_cnn_v2.py` + +**Static Feature Extraction:** + +```python +def compute_static_features(rgb, depth): + """Generate 7D static features + bias dimension.""" + h, w = rgb.shape[:2] + + # RGBD channels + r, g, b = rgb[..., 0], rgb[..., 1], rgb[..., 2] + + # UV coordinates (normalized) + uv_x = np.linspace(0, 1, w)[None, :].repeat(h, axis=0) + uv_y = np.linspace(0, 1, h)[:, None].repeat(w, axis=1) + + # Multi-frequency position encoding + sin10_x = np.sin(10.0 * uv_x) + + # Bias dimension (always 1.0) + bias = np.ones_like(r) + + # Stack: [R, G, B, D, uv.x, uv.y, sin10_x, bias] + return np.stack([r, g, b, depth, uv_x, uv_y, sin10_x, bias], axis=-1) +``` + +**Network Definition:** + +```python +class CNNv2(nn.Module): + def __init__(self, kernels=[1,3,5], channels=[16,8,4]): + super().__init__() + + # Input layer: 8D (7 features + bias) → channels[0] + self.layer0 = nn.Conv2d(8, channels[0], kernel_size=kernels[0], + padding=kernels[0]//2, bias=False) + + # Inner layers: (7 features + bias + C_prev) → C_next + in_ch_1 = 8 + channels[0] # static + layer0 output + self.layer1 = nn.Conv2d(in_ch_1, channels[1], kernel_size=kernels[1], + padding=kernels[1]//2, bias=False) + + # Output layer: (7 features + bias + C_last) → 4 (RGBA) + in_ch_2 = 8 + channels[1] + self.layer2 = nn.Conv2d(in_ch_2, 4, kernel_size=kernels[2], + padding=kernels[2]//2, bias=False) + + def forward(self, static_features, layer0_input=None): + # Layer 0: Use full 8D static features (includes bias) + x0 = self.layer0(static_features) + x0 = F.relu(x0) + + # Layer 1: Concatenate static + layer0 output + x1_input = torch.cat([static_features, x0], dim=1) + x1 = self.layer1(x1_input) + x1 = F.relu(x1) + + # Layer 2: Concatenate static + layer1 output + x2_input = torch.cat([static_features, x1], dim=1) + output = self.layer2(x2_input) + + return torch.sigmoid(output) # RGBA output [0,1] +``` + +**Training Configuration:** + +```python +# Hyperparameters +kernels = [1, 3, 5] # Per-layer kernel sizes +channels = [16, 8, 4] # Per-layer output channels +learning_rate = 1e-3 +batch_size = 16 +epochs = 5000 + +# Training loop (standard PyTorch f32) +for epoch in range(epochs): + for rgb_batch, depth_batch, target_batch in dataloader: + # Compute static features + static_feat = compute_static_features(rgb_batch, depth_batch) + + # Forward pass + output = model(static_feat) + loss = criterion(output, target_batch) + + # Backward pass + optimizer.zero_grad() + loss.backward() + optimizer.step() +``` + +**Checkpoint Format:** + +```python +torch.save({ + 'state_dict': model.state_dict(), # f32 weights + 'config': { + 'kernels': [1, 3, 5], + 'channels': [16, 8, 4], + 'features': ['R', 'G', 'B', 'D', 'uv.x', 'uv.y', 'sin10_x', 'bias'] + }, + 'epoch': epoch, + 'loss': loss.item() +}, f'checkpoints/checkpoint_epoch_{epoch}.pth') +``` + +--- + +## Export Workflow + +### Script: `training/export_cnn_v2_shader.py` + +**Process:** +1. Load checkpoint (f32 PyTorch weights) +2. Extract layer configs (kernels, channels) +3. Quantize weights to float16: `weights_f16 = weights_f32.astype(np.float16)` +4. Generate WGSL shader per layer +5. Write to `workspaces/<workspace>/shaders/cnn_v2_*.wgsl` + +**Example Generated Shader:** + +```wgsl +// cnn_v2_layer_0.wgsl - Auto-generated from checkpoint_epoch_5000.pth + +const KERNEL_SIZE: u32 = 1u; +const IN_CHANNELS: u32 = 8u; // 7 features + bias +const OUT_CHANNELS: u32 = 16u; + +// Weights quantized to float16 (stored as f32 in shader) +const weights: array<f32, 128> = array( + 0.123047, -0.089844, 0.234375, 0.456055, ... +); + +@group(0) @binding(0) var static_features: texture_2d<u32>; +@group(0) @binding(1) var output_texture: texture_storage_2d<rgba32uint, write>; + +@compute @workgroup_size(8, 8) +fn main(@builtin(global_invocation_id) id: vec3<u32>) { + // Load static features (8D) + let static_feat = get_static_features(vec2<i32>(id.xy)); + + // Convolution (1×1 kernel = pointwise) + var output: array<f32, OUT_CHANNELS>; + for (var c: u32 = 0u; c < OUT_CHANNELS; c++) { + var sum: f32 = 0.0; + for (var k: u32 = 0u; k < IN_CHANNELS; k++) { + sum += weights[c * IN_CHANNELS + k] * static_feat[k]; + } + output[c] = max(0.0, sum); // ReLU activation + } + + // Pack and store (8×f16 per texel) + textureStore(output_texture, vec2<i32>(id.xy), pack_f16x8(output)); +} +``` + +**Float16 Quantization:** +- Training uses f32 throughout (PyTorch standard) +- Export converts to np.float16, then back to f32 for WGSL literals +- **Expected discrepancy:** <0.1% MSE (acceptable) +- Validation via `validate_cnn_v2.sh` compares outputs + +--- + +## Validation Workflow + +### Script: `scripts/validate_cnn_v2.sh` + +**End-to-end pipeline:** +```bash +./scripts/validate_cnn_v2.sh checkpoints/checkpoint_epoch_5000.pth +``` + +**Steps automated:** +1. Export checkpoint → .wgsl shaders +2. Rebuild `cnn_test` tool +3. Process test images with CNN v2 +4. Display input/output results + +**Usage:** +```bash +# Basic usage +./scripts/validate_cnn_v2.sh checkpoint.pth + +# Custom paths +./scripts/validate_cnn_v2.sh checkpoint.pth \ + -i my_test_images/ \ + -o results/ \ + -b build_release + +# Skip rebuild (iterate on checkpoint only) +./scripts/validate_cnn_v2.sh checkpoint.pth --skip-build + +# Skip export (iterate on test images only) +./scripts/validate_cnn_v2.sh checkpoint.pth --skip-export + +# Show help +./scripts/validate_cnn_v2.sh --help +``` + +**Options:** +- `-b, --build-dir DIR` - Build directory (default: build) +- `-w, --workspace NAME` - Workspace name (default: main) +- `-i, --images DIR` - Test images directory (default: training/validation) +- `-o, --output DIR` - Output directory (default: validation_results) +- `--skip-build` - Use existing cnn_test binary +- `--skip-export` - Use existing .wgsl shaders +- `-h, --help` - Show full usage + +**Output:** +- Input images: `<test_images_dir>/*.png` +- Output images: `<output_dir>/*_output.png` +- Opens results directory in system file browser + +--- + +## Implementation Checklist + +### Phase 1: Shaders (Core Infrastructure) + +- [ ] `workspaces/main/shaders/cnn_v2_static.wgsl` - Static features compute + - [ ] RGBD sampling from framebuffer + - [ ] UV coordinate calculation + - [ ] sin(10\*uv.x) computation + - [ ] Bias dimension (constant 1.0) + - [ ] Float16 packing via `pack2x16float()` + - [ ] Output to `texture_storage_2d<rgba32uint>` + +- [ ] `workspaces/main/shaders/cnn_v2_layer_template.wgsl` - Layer template + - [ ] Static features unpacking + - [ ] Previous layer unpacking (8×f16) + - [ ] Convolution implementation (1×1, 3×3, 5×5) + - [ ] ReLU activation + - [ ] Output packing (8×f16) + - [ ] Proper padding handling + +### Phase 2: C++ Effect Class + +- [ ] `src/gpu/effects/cnn_v2_effect.h` - Header + - [ ] Class declaration inheriting from `PostProcessEffect` + - [ ] Static features texture member + - [ ] Layer textures vector + - [ ] Pipeline and bind group members + +- [ ] `src/gpu/effects/cnn_v2_effect.cc` - Implementation + - [ ] Constructor: Load shaders, create textures + - [ ] `init()`: Create pipelines, bind groups + - [ ] `render()`: Multi-pass execution + - [ ] Pass 0: Compute static features + - [ ] Pass 1-N: CNN layers + - [ ] Final: Composite to output + - [ ] Proper resource cleanup + +- [ ] Integration + - [ ] Add to `src/gpu/demo_effects.h` includes + - [ ] Add `cnn_v2_effect.cc` to `CMakeLists.txt` (headless + normal) + - [ ] Add shaders to `workspaces/main/assets.txt` + - [ ] Add to `src/tests/gpu/test_demo_effects.cc` + +### Phase 3: Training Pipeline + +- [ ] `training/train_cnn_v2.py` - Training script + - [ ] Static feature extraction function + - [ ] CNNv2 PyTorch model class + - [ ] Patch-based dataloader + - [ ] Training loop with checkpointing + - [ ] Command-line argument parsing + - [ ] Inference mode (ground truth generation) + +- [ ] `training/export_cnn_v2_shader.py` - Export script + - [ ] Checkpoint loading + - [ ] Weight extraction and f16 quantization + - [ ] Per-layer WGSL generation + - [ ] File output to workspace shaders/ + - [ ] Metadata preservation + +### Phase 4: Tools & Validation + +- [ ] `scripts/validate_cnn_v2.sh` - End-to-end validation + - [ ] Command-line argument parsing + - [ ] Shader export orchestration + - [ ] Build orchestration + - [ ] Batch image processing + - [ ] Results display + +- [ ] `src/tools/cnn_test_main.cc` - Tool updates + - [ ] Add `--cnn-version v2` flag + - [ ] CNNv2Effect instantiation path + - [ ] Static features pass execution + - [ ] Multi-layer processing + +### Phase 5: Documentation + +- [ ] `doc/HOWTO.md` - Usage guide + - [ ] Training section (CNN v2) + - [ ] Export section + - [ ] Validation section + - [ ] Examples + +- [ ] `README.md` - Project overview update + - [ ] Mention CNN v2 capability + +--- + +## File Structure + +### New Files + +``` +# Shaders (generated by export script) +workspaces/main/shaders/cnn_v2_static.wgsl # Static features compute +workspaces/main/shaders/cnn_v2_layer_0.wgsl # Input layer (generated) +workspaces/main/shaders/cnn_v2_layer_1.wgsl # Inner layer (generated) +workspaces/main/shaders/cnn_v2_layer_2.wgsl # Output layer (generated) + +# C++ implementation +src/gpu/effects/cnn_v2_effect.h # Effect class header +src/gpu/effects/cnn_v2_effect.cc # Effect implementation + +# Python training/export +training/train_cnn_v2.py # Training script +training/export_cnn_v2_shader.py # Shader generator +training/validation/ # Test images directory + +# Scripts +scripts/validate_cnn_v2.sh # End-to-end validation + +# Documentation +doc/CNN_V2.md # This file +``` + +### Modified Files + +``` +src/gpu/demo_effects.h # Add CNNv2Effect include +CMakeLists.txt # Add cnn_v2_effect.cc +workspaces/main/assets.txt # Add cnn_v2 shaders +workspaces/main/timeline.seq # Optional: add CNNv2Effect +src/tests/gpu/test_demo_effects.cc # Add CNNv2 test case +src/tools/cnn_test_main.cc # Add --cnn-version v2 +doc/HOWTO.md # Add CNN v2 sections +TODO.md # Add CNN v2 task +``` + +### Unchanged (v1 Preserved) + +``` +training/train_cnn.py # Original training +src/gpu/effects/cnn_effect.* # Original effect +workspaces/main/shaders/cnn_*.wgsl # Original shaders +``` + +--- + +## Performance Characteristics + +### Static Features Compute +- **Cost:** ~0.1ms @ 1080p +- **Frequency:** Once per frame +- **Operations:** sin(), texture sampling, packing + +### CNN Layers (Example 3-layer) +- **Layer0 (1×1, 8→16):** ~0.3ms +- **Layer1 (3×3, 23→8):** ~0.8ms +- **Layer2 (5×5, 15→4):** ~1.2ms +- **Total:** ~2.4ms @ 1080p + +### Memory Usage +- Static features: 1920×1080×8×2 = 33 MB (f16) +- Layer buffers: 1920×1080×16×2 = 66 MB (max 16 channels) +- Weights: ~6.4 KB (f16, in shader code) +- **Total GPU memory:** ~100 MB + +--- + +## Size Budget + +### CNN v1 vs v2 + +| Metric | v1 | v2 | Delta | +|--------|----|----|-------| +| Weights (count) | 800 | 3268 | +2468 | +| Storage (f32) | 3.2 KB | 13.1 KB | +9.9 KB | +| Storage (f16) | N/A | 6.5 KB | +6.5 KB | +| Shader code | ~500 lines | ~800 lines | +300 lines | + +### Mitigation Strategies + +**Reduce channels:** +- [16,8,4] → [8,4,4] saves ~50% weights +- [16,8,4] → [4,4,4] saves ~60% weights + +**Smaller kernels:** +- [1,3,5] → [1,3,3] saves ~30% weights +- [1,3,5] → [1,1,3] saves ~50% weights + +**Quantization:** +- int8 weights: saves 75% (requires QAT training) +- 4-bit weights: saves 87.5% (extreme, needs research) + +**Target:** Keep CNN v2 under 10 KB for 64k demo constraint + +--- + +## Future Extensions + +### More Features (uint8 Packing) + +```wgsl +// 16 uint8 features per texel (texture_storage_2d<rgba8unorm>) +// [R, G, B, D, uv.x, uv.y, sin10.x, sin10.y, +// sin20.x, sin20.y, dx, dy, gray_mip1, gray_mip2, variance, bias] +``` +- Trade precision for quantity +- Requires quantization-aware training + +### Temporal Features + +- Previous frame RGBA (motion awareness) +- Optical flow vectors +- Requires multi-frame buffer + +### Learned Position Encodings + +- Replace hand-crafted sin(10\*uv) with learned embeddings +- Requires separate embedding network +- Similar to NeRF position encoding + +### Dynamic Architecture + +- Runtime kernel size selection based on scene +- Conditional layer execution (skip connections) +- Layer pruning for performance + +--- + +## References + +- **v1 Implementation:** `src/gpu/effects/cnn_effect.*` +- **Training Guide:** `doc/HOWTO.md` (CNN Training section) +- **Test Tool:** `doc/CNN_TEST_TOOL.md` +- **Shader System:** `doc/SEQUENCE.md` +- **Size Measurement:** `doc/SIZE_MEASUREMENT.md` + +--- + +## Appendix: Design Decisions + +### Why Bias as Static Feature? + +**Alternatives considered:** +1. Separate bias array per layer (Option B) +2. Bias as static feature = 1.0 (Option A, chosen) + +**Decision rationale:** +- Simpler shader code (fewer bindings) +- Standard NN formulation (augmented input) +- Saves 56-112 bytes per model +- 7 features sufficient for v1 implementation +- Can extend to uint8 packing if >7 features needed + +### Why Float16 for Weights? + +**Alternatives considered:** +1. Keep f32 (larger, more accurate) +2. Use f16 (smaller, GPU-native) +3. Use int8 (smallest, needs QAT) + +**Decision rationale:** +- f16 saves 50% vs f32 (critical for 64k target) +- GPU-native support (pack2x16float in WGSL) +- <0.1% accuracy loss (acceptable) +- Simpler than int8 quantization + +### Why Multi-Frequency Position Encoding? + +**Inspiration:** NeRF (Neural Radiance Fields) + +**Benefits:** +- Helps network learn high-frequency details +- Better than raw UV coordinates +- Small footprint (1D per frequency) + +**Future:** Add sin(20\*uv), sin(40\*uv) if >7 features available + +--- + +**Document Version:** 1.0 +**Last Updated:** 2026-02-12 +**Status:** Design approved, ready for implementation diff --git a/doc/HOWTO.md b/doc/HOWTO.md index d02fdb4..2b896ab 100644 --- a/doc/HOWTO.md +++ b/doc/HOWTO.md @@ -130,6 +130,22 @@ Processes entire image with sliding window (matches WGSL): **Kernel sizes:** 3×3 (36 weights), 5×5 (100 weights), 7×7 (196 weights) +### CNN v2 Validation + +End-to-end testing: checkpoint → shaders → build → test images → results + +```bash +./scripts/validate_cnn_v2.sh checkpoints/checkpoint_epoch_5000.pth + +# Options: +# -i DIR Test images directory (default: training/validation) +# -o DIR Output directory (default: validation_results) +# --skip-build Use existing cnn_test binary +# -h Show all options +``` + +See `scripts/validate_cnn_v2.sh --help` for full usage. See `doc/CNN_V2.md` for design details. + --- ## Timeline diff --git a/scripts/validate_cnn_v2.sh b/scripts/validate_cnn_v2.sh new file mode 100755 index 0000000..fcd9908 --- /dev/null +++ b/scripts/validate_cnn_v2.sh @@ -0,0 +1,198 @@ +#!/bin/bash +# Validate CNN v2: Export checkpoint → Build → Test → Display results + +set -e + +# Default paths +BUILD_DIR="build" +WORKSPACE="main" +TEST_IMAGES_DIR="training/validation" +OUTPUT_DIR="validation_results" +PYTHON="python3" + +# Colors +RED='\033[0;31m' +GREEN='\033[0;32m' +YELLOW='\033[1;33m' +NC='\033[0m' + +print_usage() { + cat << EOF +Usage: $0 CHECKPOINT [OPTIONS] + +End-to-end CNN v2 validation: export shaders, rebuild, test images, show results. + +Arguments: + CHECKPOINT Path to .pth checkpoint file (required) + +Options: + -b, --build-dir DIR Build directory (default: build) + -w, --workspace NAME Workspace name (default: main) + -i, --images DIR Test images directory (default: training/validation) + -o, --output DIR Output directory (default: validation_results) + --python CMD Python command (default: python3) + --skip-build Skip cnn_test rebuild + --skip-export Skip shader export (use existing .wgsl) + -h, --help Show this help + +Example: + $0 checkpoints/checkpoint_epoch_5000.pth + $0 checkpoint.pth -i my_test_images/ -o results/ + $0 checkpoint.pth --skip-build # Use existing cnn_test binary + +EOF +} + +log() { echo -e "${GREEN}[validate]${NC} $*"; } +warn() { echo -e "${YELLOW}[validate]${NC} $*"; } +error() { echo -e "${RED}[validate]${NC} $*" >&2; exit 1; } + +# Parse arguments +CHECKPOINT="" +SKIP_BUILD=false +SKIP_EXPORT=false + +while [[ $# -gt 0 ]]; do + case $1 in + -h|--help) + print_usage + exit 0 + ;; + -b|--build-dir) + BUILD_DIR="$2" + shift 2 + ;; + -w|--workspace) + WORKSPACE="$2" + shift 2 + ;; + -i|--images) + TEST_IMAGES_DIR="$2" + shift 2 + ;; + -o|--output) + OUTPUT_DIR="$2" + shift 2 + ;; + --python) + PYTHON="$2" + shift 2 + ;; + --skip-build) + SKIP_BUILD=true + shift + ;; + --skip-export) + SKIP_EXPORT=true + shift + ;; + -*) + error "Unknown option: $1" + ;; + *) + if [[ -z "$CHECKPOINT" ]]; then + CHECKPOINT="$1" + else + error "Unexpected argument: $1" + fi + shift + ;; + esac +done + +# Validate inputs +[[ -z "$CHECKPOINT" ]] && error "Checkpoint file required. Use -h for help." +[[ ! -f "$CHECKPOINT" ]] && error "Checkpoint not found: $CHECKPOINT" +[[ ! -d "$TEST_IMAGES_DIR" ]] && error "Test images directory not found: $TEST_IMAGES_DIR" + +SHADER_DIR="workspaces/$WORKSPACE/shaders" +CNN_TEST="$BUILD_DIR/cnn_test" + +log "Configuration:" +log " Checkpoint: $CHECKPOINT" +log " Build dir: $BUILD_DIR" +log " Workspace: $WORKSPACE" +log " Shader dir: $SHADER_DIR" +log " Test images: $TEST_IMAGES_DIR" +log " Output dir: $OUTPUT_DIR" +echo + +# Step 1: Export shaders +if [[ "$SKIP_EXPORT" = false ]]; then + log "Step 1/4: Exporting shaders from checkpoint..." + [[ ! -d "$SHADER_DIR" ]] && error "Shader directory not found: $SHADER_DIR" + + if [[ ! -f "training/export_cnn_v2_shader.py" ]]; then + error "Export script not found: training/export_cnn_v2_shader.py" + fi + + $PYTHON training/export_cnn_v2_shader.py "$CHECKPOINT" --output-dir "$SHADER_DIR" \ + || error "Shader export failed" + + log "✓ Shaders exported to $SHADER_DIR" +else + warn "Skipping shader export (using existing .wgsl files)" +fi + +# Step 2: Rebuild cnn_test +if [[ "$SKIP_BUILD" = false ]]; then + log "Step 2/4: Rebuilding cnn_test..." + + cmake --build "$BUILD_DIR" -j4 --target cnn_test \ + || error "Build failed" + + log "✓ Built $CNN_TEST" +else + warn "Skipping build (using existing binary)" +fi + +[[ ! -x "$CNN_TEST" ]] && error "cnn_test not found or not executable: $CNN_TEST" + +# Step 3: Process test images +log "Step 3/4: Processing test images..." +mkdir -p "$OUTPUT_DIR" + +# Find PNG images +mapfile -t IMAGES < <(find "$TEST_IMAGES_DIR" -maxdepth 1 -name "*.png" | sort) +[[ ${#IMAGES[@]} -eq 0 ]] && error "No PNG images found in $TEST_IMAGES_DIR" + +log "Found ${#IMAGES[@]} test image(s)" + +for img in "${IMAGES[@]}"; do + basename=$(basename "$img" .png) + output="$OUTPUT_DIR/${basename}_output.png" + + log " Processing $basename.png..." + "$CNN_TEST" "$img" "$output" --cnn-version v2 \ + || warn " Failed: $basename.png" +done + +log "✓ Processed ${#IMAGES[@]} image(s)" + +# Step 4: Display results +log "Step 4/4: Opening results..." + +case "$(uname -s)" in + Darwin*) + open "$OUTPUT_DIR" + ;; + Linux*) + if command -v xdg-open &> /dev/null; then + xdg-open "$OUTPUT_DIR" + else + log "Results saved to: $OUTPUT_DIR" + fi + ;; + MINGW*|MSYS*|CYGWIN*) + explorer "$OUTPUT_DIR" + ;; + *) + log "Results saved to: $OUTPUT_DIR" + ;; +esac + +log "✓ Validation complete!" +log "" +log "Results:" +log " Input: $TEST_IMAGES_DIR/*.png" +log " Output: $OUTPUT_DIR/*_output.png" |
