From 161a59fa50bb92e3664c389fa03b95aefe349b3f Mon Sep 17 00:00:00 2001 From: skal Date: Sun, 15 Feb 2026 18:44:17 +0100 Subject: refactor(cnn): isolate CNN v2 to cnn_v2/ subdirectory Move all CNN v2 files to dedicated cnn_v2/ directory to prepare for CNN v3 development. Zero functional changes. Structure: - cnn_v2/src/ - C++ effect implementation - cnn_v2/shaders/ - WGSL shaders (6 files) - cnn_v2/weights/ - Binary weights (3 files) - cnn_v2/training/ - Python training scripts (4 files) - cnn_v2/scripts/ - Shell scripts (train_cnn_v2_full.sh) - cnn_v2/tools/ - Validation tools (HTML) - cnn_v2/docs/ - Documentation (4 markdown files) Changes: - Update CMake source list to cnn_v2/src/cnn_v2_effect.cc - Update assets.txt with relative paths to cnn_v2/ - Update includes to ../../cnn_v2/src/cnn_v2_effect.h - Add PROJECT_ROOT resolution to Python/shell scripts - Update doc references in HOWTO.md, TODO.md - Add cnn_v2/README.md Verification: 34/34 tests passing, demo runs correctly. Co-Authored-By: Claude Sonnet 4.5 --- cnn_v2/docs/CNN_V2.md | 813 ++++++++++++++++++++++++++++++++++++ cnn_v2/docs/CNN_V2_BINARY_FORMAT.md | 235 +++++++++++ cnn_v2/docs/CNN_V2_DEBUG_TOOLS.md | 143 +++++++ cnn_v2/docs/CNN_V2_WEB_TOOL.md | 348 +++++++++++++++ 4 files changed, 1539 insertions(+) create mode 100644 cnn_v2/docs/CNN_V2.md create mode 100644 cnn_v2/docs/CNN_V2_BINARY_FORMAT.md create mode 100644 cnn_v2/docs/CNN_V2_DEBUG_TOOLS.md create mode 100644 cnn_v2/docs/CNN_V2_WEB_TOOL.md (limited to 'cnn_v2/docs') diff --git a/cnn_v2/docs/CNN_V2.md b/cnn_v2/docs/CNN_V2.md new file mode 100644 index 0000000..b7fd6f8 --- /dev/null +++ b/cnn_v2/docs/CNN_V2.md @@ -0,0 +1,813 @@ +# CNN v2: Parametric Static Features + +**Technical Design Document** + +--- + +## Overview + +CNN v2 extends the original CNN post-processing effect with parametric static features, enabling richer spatial and frequency-domain inputs for improved visual quality. + +**Key improvements over v1:** +- 7D static feature input (vs 4D RGB) +- Multi-frequency position encoding (NeRF-style) +- Configurable mip-level for p0-p3 parametric features (0-3) +- Per-layer configurable kernel sizes (1×1, 3×3, 5×5) +- Variable channel counts per layer +- Float16 weight storage (~3.2 KB for 3-layer model) +- Bias integrated as static feature dimension +- Storage buffer architecture (dynamic layer count) +- Binary weight format v2 for runtime loading +- Sigmoid activation for layer 0 and final layer (smooth [0,1] mapping) + +**Status:** ✅ Complete. Sigmoid activation, stable training, validation tools operational. + +**Breaking Change:** +- Models trained with `clamp()` incompatible. Retrain required. + +**TODO:** +- 8-bit quantization with QAT for 2× size reduction (~1.6 KB) + +--- + +## Architecture + +### Pipeline Overview + +``` +Input RGBD → Static Features Compute → CNN Layers → Output RGBA + └─ computed once/frame ─┘ └─ multi-pass ─┘ +``` + +**Detailed Data Flow:** + +``` + ┌─────────────────────────────────────────┐ + │ Static Features (computed once) │ + │ 8D: p0,p1,p2,p3,uv_x,uv_y,sin10x,bias │ + └──────────────┬──────────────────────────┘ + │ + │ 8D (broadcast to all layers) + ├───────────────────────────┐ + │ │ + ┌──────────────┐ │ │ + │ Input RGBD │──────────────┤ │ + │ 4D │ 4D │ │ + └──────────────┘ │ │ + ▼ │ + ┌────────────┐ │ + │ Layer 0 │ (12D input) │ + │ (CNN) │ = 4D + 8D │ + │ 12D → 4D │ │ + └─────┬──────┘ │ + │ 4D output │ + │ │ + ├───────────────────────────┘ + │ │ + ▼ │ + ┌────────────┐ │ + │ Layer 1 │ (12D input) │ + │ (CNN) │ = 4D + 8D │ + │ 12D → 4D │ │ + └─────┬──────┘ │ + │ 4D output │ + │ │ + ├───────────────────────────┘ + ▼ │ + ... │ + │ │ + ▼ │ + ┌────────────┐ │ + │ Layer N │ (12D input) │ + │ (output) │◄──────────────────┘ + │ 12D → 4D │ + └─────┬──────┘ + │ 4D (RGBA) + ▼ + Output +``` + +**Key Points:** +- Static features computed once, broadcast to all CNN layers +- Each layer: previous 4D output + 8D static → 12D input → 4D output +- Ping-pong buffering between layers +- Layer 0 special case: uses input RGBD instead of previous layer output + +**Static Features Texture:** +- Name: `static_features` +- Format: `texture_storage_2d` (4×u32) +- Data: 8 float16 values packed via `pack2x16float()` +- Computed once per frame, read by all CNN layers +- Lifetime: Entire frame (all CNN layer passes) + +**CNN Layers:** +- Layer 0: input RGBD (4D) + static (8D) = 12D → 4 channels +- Layer 1+: previous output (4D) + static (8D) = 12D → 4 channels +- All layers: uniform 12D input, 4D output (ping-pong buffer) +- Storage: `texture_storage_2d` (4 channels as 2×f16 pairs) + +**Activation Functions:** +- Layer 0 & final layer: `sigmoid(x)` for smooth [0,1] mapping +- Middle layers: `ReLU` (max(0, x)) +- Rationale: Sigmoid prevents gradient blocking at boundaries, enabling better convergence +- Breaking change: Models trained with `clamp(x, 0, 1)` are incompatible, retrain required + +--- + +## Static Features (7D + 1 bias) + +### Feature Layout + +**8 float16 values per pixel:** + +```wgsl +// Slot 0-3: Parametric features (p0, p1, p2, p3) +// Sampled from configurable mip level (0=original, 1=half, 2=quarter, 3=eighth) +// Training sets mip_level via --mip-level flag, stored in binary format v2 +let p0 = ...; // RGB.r from selected mip level +let p1 = ...; // RGB.g from selected mip level +let p2 = ...; // RGB.b from selected mip level +let p3 = ...; // Depth or RGB channel from mip level + +// Slot 4-5: UV coordinates (normalized screen space) +let uv_x = coord.x / resolution.x; // Horizontal position [0,1] +let uv_y = coord.y / resolution.y; // Vertical position [0,1] + +// Slot 6: Multi-frequency position encoding +let sin20_y = sin(20.0 * uv_y); // Periodic feature (frequency=20, vertical) + +// Slot 7: Bias dimension (always 1.0) +let bias = 1.0; // Learned bias per output channel + +// Packed storage: [p0, p1, p2, p3, uv.x, uv.y, sin(20*uv.y), 1.0] +``` + +### Input Channel Mapping + +**Weight tensor layout (12 input channels per layer):** + +| Input Channel | Feature | Description | +|--------------|---------|-------------| +| 0-3 | Previous layer output | 4D RGBA from prior CNN layer (or input RGBD for Layer 0) | +| 4-11 | Static features | 8D: p0, p1, p2, p3, uv_x, uv_y, sin20_y, bias | + +**Static feature channel details:** +- Channel 4 → p0 (RGB.r from mip level) +- Channel 5 → p1 (RGB.g from mip level) +- Channel 6 → p2 (RGB.b from mip level) +- Channel 7 → p3 (depth or RGB channel from mip level) +- Channel 8 → p4 (uv_x: normalized horizontal position) +- Channel 9 → p5 (uv_y: normalized vertical position) +- Channel 10 → p6 (sin(20*uv_y): periodic encoding) +- Channel 11 → p7 (bias: constant 1.0) + +**Note:** When generating identity weights, p4-p7 correspond to input channels 8-11, not 4-7. + +### Feature Rationale + +| Feature | Dimension | Purpose | Priority | +|---------|-----------|---------|----------| +| p0-p3 | 4D | Parametric auxiliary features (mips, gradients, etc.) | Essential | +| UV coords | 2D | Spatial position awareness | Essential | +| sin(20\*uv.y) | 1D | Periodic position encoding (vertical) | Medium | +| Bias | 1D | Learned bias (standard NN) | Essential | + +**Note:** Input image RGBD (mip 0) fed only to Layer 0. Subsequent layers see static features + previous layer output. + +**Why bias as static feature:** +- Simpler shader code (single weight array) +- Standard NN formulation: y = Wx (x includes bias term) +- Saves 56-112 bytes (no separate bias buffer) +- 7 features sufficient for initial implementation + +### Future Feature Extensions + +**Option: Additional encodings:** +- `sin(40*uv.y)` - Higher frequency encoding +- `gray_mip1` - Multi-scale luminance +- `dx`, `dy` - Sobel gradients +- `variance` - Local texture measure +- `laplacian` - Edge detection + +**Option: uint8 packing (16+ features):** +```wgsl +// texture_storage_2d stores 16 uint8 values +// Trade precision for feature count +// [R, G, B, D, uv.x, uv.y, sin10.x, sin10.y, +// sin20.x, sin20.y, dx, dy, gray_mip1, gray_mip2, var, bias] +``` +Requires quantization-aware training. + +--- + +## Layer Structure + +### Example 3-Layer Network + +``` +Layer 0: input RGBD (4D) + static (8D) = 12D → 4 channels (3×3 kernel) +Layer 1: previous (4D) + static (8D) = 12D → 4 channels (3×3 kernel) +Layer 2: previous (4D) + static (8D) = 12D → 4 channels (3×3 kernel, output RGBA) +``` + +**Output:** 4 channels (RGBA). Training targets preserve alpha from target images. + +### Weight Calculations + +**Per-layer weights (uniform 12D→4D, 3×3 kernels):** +``` +Layer 0: 12 × 3 × 3 × 4 = 432 weights +Layer 1: 12 × 3 × 3 × 4 = 432 weights +Layer 2: 12 × 3 × 3 × 4 = 432 weights +Total: 1296 weights +``` + +**Storage sizes:** +- f32: 1296 × 4 = 5,184 bytes (~5.1 KB) +- f16: 1296 × 2 = 2,592 bytes (~2.5 KB) ✓ **recommended** + +**Comparison to v1:** +- v1: ~800 weights (3.2 KB f32) +- v2: ~1296 weights (2.5 KB f16) +- **Uniform architecture, smaller than v1 f32** + +### Kernel Size Guidelines + +**1×1 kernel (pointwise):** +- No spatial context, channel mixing only +- Weights: `12 × 4 = 48` per layer +- Use for: Fast inference, channel remapping + +**3×3 kernel (standard conv):** +- Local spatial context (recommended) +- Weights: `12 × 9 × 4 = 432` per layer +- Use for: Most layers (balanced quality/size) + +**5×5 kernel (large receptive field):** +- Wide spatial context +- Weights: `12 × 25 × 4 = 1200` per layer +- Use for: Output layer, fine detail enhancement + +### Channel Storage (4×f16 per texel) + +```wgsl +@group(0) @binding(1) var layer_input: texture_2d; + +fn unpack_channels(coord: vec2) -> vec4 { + let packed = textureLoad(layer_input, coord, 0); + let v0 = unpack2x16float(packed.x); // [ch0, ch1] + let v1 = unpack2x16float(packed.y); // [ch2, ch3] + return vec4(v0.x, v0.y, v1.x, v1.y); +} + +fn pack_channels(values: vec4) -> vec4 { + return vec4( + pack2x16float(vec2(values.x, values.y)), + pack2x16float(vec2(values.z, values.w)), + 0u, // Unused + 0u // Unused + ); +} +``` + +--- + +## Training Workflow + +### Script: `training/train_cnn_v2.py` + +**Static Feature Extraction:** + +```python +def compute_static_features(rgb, depth, mip_level=0): + """Generate parametric features (8D: p0-p3 + spatial). + + Args: + mip_level: 0=original, 1=half res, 2=quarter res, 3=eighth res + """ + h, w = rgb.shape[:2] + + # Generate mip level for p0-p3 (downsample then upsample) + if mip_level > 0: + mip_rgb = rgb.copy() + for _ in range(mip_level): + mip_rgb = cv2.pyrDown(mip_rgb) + for _ in range(mip_level): + mip_rgb = cv2.pyrUp(mip_rgb) + if mip_rgb.shape[:2] != (h, w): + mip_rgb = cv2.resize(mip_rgb, (w, h), interpolation=cv2.INTER_LINEAR) + else: + mip_rgb = rgb + + # Parametric features from mip level + p0, p1, p2, p3 = mip_rgb[..., 0], mip_rgb[..., 1], mip_rgb[..., 2], depth + + # UV coordinates (normalized) + uv_x = np.linspace(0, 1, w)[None, :].repeat(h, axis=0) + uv_y = np.linspace(0, 1, h)[:, None].repeat(w, axis=1) + + # Multi-frequency position encoding + sin10_x = np.sin(10.0 * uv_x) + + # Bias dimension (always 1.0) + bias = np.ones_like(p0) + + # Stack: [p0, p1, p2, p3, uv.x, uv.y, sin10_x, bias] + return np.stack([p0, p1, p2, p3, uv_x, uv_y, sin10_x, bias], axis=-1) +``` + +**Network Definition:** + +```python +class CNNv2(nn.Module): + def __init__(self, kernel_sizes, num_layers=3): + super().__init__() + if isinstance(kernel_sizes, int): + kernel_sizes = [kernel_sizes] * num_layers + self.kernel_sizes = kernel_sizes + self.layers = nn.ModuleList() + + # All layers: 12D input (4 prev + 8 static) → 4D output + for kernel_size in kernel_sizes: + self.layers.append( + nn.Conv2d(12, 4, kernel_size=kernel_size, + padding=kernel_size//2, bias=False) + ) + + def forward(self, input_rgbd, static_features): + # Layer 0: input RGBD (4D) + static (8D) = 12D + x = torch.cat([input_rgbd, static_features], dim=1) + x = self.layers[0](x) + x = torch.sigmoid(x) # Soft [0,1] for layer 0 + + # Layer 1+: previous output (4D) + static (8D) = 12D + for i in range(1, len(self.layers)): + x_input = torch.cat([x, static_features], dim=1) + x = self.layers[i](x_input) + if i < len(self.layers) - 1: + x = F.relu(x) + else: + x = torch.sigmoid(x) # Soft [0,1] for final layer + + return x # RGBA output +``` + +**Training Configuration:** + +```python +# Hyperparameters +kernel_sizes = [3, 3, 3] # Per-layer kernel sizes (e.g., [1,3,5]) +num_layers = 3 # Number of CNN layers +mip_level = 0 # Mip level for p0-p3: 0=orig, 1=half, 2=quarter, 3=eighth +grayscale_loss = False # Compute loss on grayscale (Y) instead of RGBA +learning_rate = 1e-3 +batch_size = 16 +epochs = 5000 + +# Dataset: Input RGB, Target RGBA (preserves alpha channel from image) +# Model outputs RGBA, loss compares all 4 channels (or grayscale if --grayscale-loss) + +# Training loop (standard PyTorch f32) +for epoch in range(epochs): + for rgb_batch, depth_batch, target_batch in dataloader: + # Compute static features (8D) with mip level + static_feat = compute_static_features(rgb_batch, depth_batch, mip_level) + + # Input RGBD (4D) + input_rgbd = torch.cat([rgb_batch, depth_batch.unsqueeze(1)], dim=1) + + # Forward pass + output = model(input_rgbd, static_feat) + + # Loss computation (grayscale or RGBA) + if grayscale_loss: + # Convert RGBA to grayscale: Y = 0.299*R + 0.587*G + 0.114*B + output_gray = 0.299 * output[:, 0:1] + 0.587 * output[:, 1:2] + 0.114 * output[:, 2:3] + target_gray = 0.299 * target[:, 0:1] + 0.587 * target[:, 1:2] + 0.114 * target[:, 2:3] + loss = criterion(output_gray, target_gray) + else: + loss = criterion(output, target_batch) + + # Backward pass + optimizer.zero_grad() + loss.backward() + optimizer.step() +``` + +**Checkpoint Format:** + +```python +torch.save({ + 'state_dict': model.state_dict(), # f32 weights + 'config': { + 'kernel_sizes': [3, 3, 3], # Per-layer kernel sizes + 'num_layers': 3, + 'mip_level': 0, # Mip level used for p0-p3 + 'grayscale_loss': False, # Whether grayscale loss was used + 'features': ['p0', 'p1', 'p2', 'p3', 'uv.x', 'uv.y', 'sin10_x', 'bias'] + }, + 'epoch': epoch, + 'loss': loss.item() +}, f'checkpoints/checkpoint_epoch_{epoch}.pth') +``` + +--- + +## Export Workflow + +### Script: `training/export_cnn_v2_shader.py` + +**Process:** +1. Load checkpoint (f32 PyTorch weights) +2. Extract layer configs (kernels, channels) +3. Quantize weights to float16: `weights_f16 = weights_f32.astype(np.float16)` +4. Generate WGSL shader per layer +5. Write to `workspaces//shaders/cnn_v2/cnn_v2_*.wgsl` + +**Example Generated Shader:** + +```wgsl +// cnn_v2_layer_0.wgsl - Auto-generated from checkpoint_epoch_5000.pth + +const KERNEL_SIZE: u32 = 1u; +const IN_CHANNELS: u32 = 8u; // 7 features + bias +const OUT_CHANNELS: u32 = 16u; + +// Weights quantized to float16 (stored as f32 in shader) +const weights: array = array( + 0.123047, -0.089844, 0.234375, 0.456055, ... +); + +@group(0) @binding(0) var static_features: texture_2d; +@group(0) @binding(1) var output_texture: texture_storage_2d; + +@compute @workgroup_size(8, 8) +fn main(@builtin(global_invocation_id) id: vec3) { + // Load static features (8D) + let static_feat = get_static_features(vec2(id.xy)); + + // Convolution (1×1 kernel = pointwise) + var output: array; + for (var c: u32 = 0u; c < OUT_CHANNELS; c++) { + var sum: f32 = 0.0; + for (var k: u32 = 0u; k < IN_CHANNELS; k++) { + sum += weights[c * IN_CHANNELS + k] * static_feat[k]; + } + output[c] = max(0.0, sum); // ReLU activation + } + + // Pack and store (8×f16 per texel) + textureStore(output_texture, vec2(id.xy), pack_f16x8(output)); +} +``` + +**Float16 Quantization:** +- Training uses f32 throughout (PyTorch standard) +- Export converts to np.float16, then back to f32 for WGSL literals +- **Expected discrepancy:** <0.1% MSE (acceptable) +- Validation via HTML tool (see below) + +--- + +## Validation Workflow + +### HTML Tool: `tools/cnn_v2_test/index.html` + +**WebGPU-based testing tool** with layer visualization. + +**Usage:** +1. Open `tools/cnn_v2_test/index.html` in browser +2. Drop `.bin` weights file (from `export_cnn_v2_weights.py`) +3. Drop PNG test image +4. View results with layer inspection + +**Features:** +- Live CNN inference with WebGPU +- Layer-by-layer visualization (static features + all CNN layers) +- Weight visualization (per-layer kernels) +- View modes: CNN output, original, diff (×10) +- Blend control for comparing with original + +**Export weights:** +```bash +./training/export_cnn_v2_weights.py checkpoints/checkpoint_epoch_100.pth \ + --output-weights workspaces/main/cnn_v2_weights.bin +``` + +See `doc/CNN_V2_WEB_TOOL.md` for detailed documentation + +--- + +## Implementation Checklist + +### Phase 1: Shaders (Core Infrastructure) + +- [ ] `workspaces/main/shaders/cnn_v2/cnn_v2_static.wgsl` - Static features compute + - [ ] RGBD sampling from framebuffer + - [ ] UV coordinate calculation + - [ ] sin(10\*uv.x) computation + - [ ] Bias dimension (constant 1.0) + - [ ] Float16 packing via `pack2x16float()` + - [ ] Output to `texture_storage_2d` + +- [ ] `workspaces/main/shaders/cnn_v2/cnn_v2_layer_template.wgsl` - Layer template + - [ ] Static features unpacking + - [ ] Previous layer unpacking (8×f16) + - [ ] Convolution implementation (1×1, 3×3, 5×5) + - [ ] ReLU activation + - [ ] Output packing (8×f16) + - [ ] Proper padding handling + +### Phase 2: C++ Effect Class + +- [ ] `src/effects/cnn_v2_effect.h` - Header + - [ ] Class declaration inheriting from `PostProcessEffect` + - [ ] Static features texture member + - [ ] Layer textures vector + - [ ] Pipeline and bind group members + +- [ ] `src/effects/cnn_v2_effect.cc` - Implementation + - [ ] Constructor: Load shaders, create textures + - [ ] `init()`: Create pipelines, bind groups + - [ ] `render()`: Multi-pass execution + - [ ] Pass 0: Compute static features + - [ ] Pass 1-N: CNN layers + - [ ] Final: Composite to output + - [ ] Proper resource cleanup + +- [ ] Integration + - [ ] Add to `src/gpu/demo_effects.h` includes + - [ ] Add `cnn_v2_effect.cc` to `CMakeLists.txt` (headless + normal) + - [ ] Add shaders to `workspaces/main/assets.txt` + - [ ] Add to `src/tests/gpu/test_demo_effects.cc` + +### Phase 3: Training Pipeline + +- [ ] `training/train_cnn_v2.py` - Training script + - [ ] Static feature extraction function + - [ ] CNNv2 PyTorch model class + - [ ] Patch-based dataloader + - [ ] Training loop with checkpointing + - [ ] Command-line argument parsing + - [ ] Inference mode (ground truth generation) + +- [ ] `training/export_cnn_v2_shader.py` - Export script + - [ ] Checkpoint loading + - [ ] Weight extraction and f16 quantization + - [ ] Per-layer WGSL generation + - [ ] File output to workspace shaders/ + - [ ] Metadata preservation + +### Phase 4: Tools & Validation + +- [x] HTML validation tool - WebGPU inference with layer visualization + - [ ] Command-line argument parsing + - [ ] Shader export orchestration + - [ ] Build orchestration + - [ ] Batch image processing + - [ ] Results display + +- [ ] `src/tools/cnn_test_main.cc` - Tool updates + - [ ] Add `--cnn-version v2` flag + - [ ] CNNv2Effect instantiation path + - [ ] Static features pass execution + - [ ] Multi-layer processing + +### Phase 5: Documentation + +- [ ] `doc/HOWTO.md` - Usage guide + - [ ] Training section (CNN v2) + - [ ] Export section + - [ ] Validation section + - [ ] Examples + +- [ ] `README.md` - Project overview update + - [ ] Mention CNN v2 capability + +--- + +## File Structure + +### New Files + +``` +# Shaders (generated by export script) +workspaces/main/shaders/cnn_v2/cnn_v2_static.wgsl # Static features compute +workspaces/main/shaders/cnn_v2/cnn_v2_layer_0.wgsl # Input layer (generated) +workspaces/main/shaders/cnn_v2/cnn_v2_layer_1.wgsl # Inner layer (generated) +workspaces/main/shaders/cnn_v2/cnn_v2_layer_2.wgsl # Output layer (generated) + +# C++ implementation +src/effects/cnn_v2_effect.h # Effect class header +src/effects/cnn_v2_effect.cc # Effect implementation + +# Python training/export +training/train_cnn_v2.py # Training script +training/export_cnn_v2_shader.py # Shader generator +training/validation/ # Test images directory + +# Validation +tools/cnn_v2_test/index.html # WebGPU validation tool + +# Documentation +doc/CNN_V2.md # This file +``` + +### Modified Files + +``` +src/gpu/demo_effects.h # Add CNNv2Effect include +CMakeLists.txt # Add cnn_v2_effect.cc +workspaces/main/assets.txt # Add cnn_v2 shaders +workspaces/main/timeline.seq # Optional: add CNNv2Effect +src/tests/gpu/test_demo_effects.cc # Add CNNv2 test case +src/tools/cnn_test_main.cc # Add --cnn-version v2 +doc/HOWTO.md # Add CNN v2 sections +TODO.md # Add CNN v2 task +``` + +### Unchanged (v1 Preserved) + +``` +training/train_cnn.py # Original training +src/effects/cnn_effect.* # Original effect +workspaces/main/shaders/cnn_*.wgsl # Original v1 shaders +``` + +--- + +## Performance Characteristics + +### Static Features Compute +- **Cost:** ~0.1ms @ 1080p +- **Frequency:** Once per frame +- **Operations:** sin(), texture sampling, packing + +### CNN Layers (Example 3-layer) +- **Layer0 (1×1, 8→16):** ~0.3ms +- **Layer1 (3×3, 23→8):** ~0.8ms +- **Layer2 (5×5, 15→4):** ~1.2ms +- **Total:** ~2.4ms @ 1080p + +### Memory Usage +- Static features: 1920×1080×8×2 = 33 MB (f16) +- Layer buffers: 1920×1080×16×2 = 66 MB (max 16 channels) +- Weights: ~6.4 KB (f16, in shader code) +- **Total GPU memory:** ~100 MB + +--- + +## Size Budget + +### CNN v1 vs v2 + +| Metric | v1 | v2 | Delta | +|--------|----|----|-------| +| Weights (count) | 800 | 3268 | +2468 | +| Storage (f32) | 3.2 KB | 13.1 KB | +9.9 KB | +| Storage (f16) | N/A | 6.5 KB | +6.5 KB | +| Shader code | ~500 lines | ~800 lines | +300 lines | + +### Mitigation Strategies + +**Reduce channels:** +- [16,8,4] → [8,4,4] saves ~50% weights +- [16,8,4] → [4,4,4] saves ~60% weights + +**Smaller kernels:** +- [1,3,5] → [1,3,3] saves ~30% weights +- [1,3,5] → [1,1,3] saves ~50% weights + +**Quantization:** +- int8 weights: saves 75% (requires QAT training) +- 4-bit weights: saves 87.5% (extreme, needs research) + +**Target:** Keep CNN v2 under 10 KB for 64k demo constraint + +--- + +## Future Extensions + +### Flexible Feature Layout (Binary Format v3) + +**TODO:** Support arbitrary feature vector layouts and ordering in binary format. + +**Current Limitation:** +- Feature layout hardcoded: `[p0, p1, p2, p3, uv_x, uv_y, sin10_x, bias]` +- Shader must match training script exactly +- Experimentation requires shader recompilation + +**Proposed Enhancement:** +- Add feature descriptor to binary format header +- Specify feature types, sources, and ordering +- Runtime shader generation or dynamic feature indexing +- Examples: `[R, G, B, dx, dy, uv_x, bias]` or `[mip1.r, mip2.g, laplacian, uv_x, sin20_x, bias]` + +**Benefits:** +- Training experiments without C++/shader changes +- A/B test different feature combinations +- Single binary format, multiple architectures +- Faster iteration on feature engineering + +**Implementation Options:** +1. **Static approach:** Generate shader code from descriptor at load time +2. **Dynamic approach:** Array-based indexing with feature map uniform +3. **Hybrid:** Precompile common layouts, fallback to dynamic + +See `doc/CNN_V2_BINARY_FORMAT.md` for proposed descriptor format. + +--- + +### More Features (uint8 Packing) + +```wgsl +// 16 uint8 features per texel (texture_storage_2d) +// [R, G, B, D, uv.x, uv.y, sin10.x, sin10.y, +// sin20.x, sin20.y, dx, dy, gray_mip1, gray_mip2, variance, bias] +``` +- Trade precision for quantity +- Requires quantization-aware training + +### Temporal Features + +- Previous frame RGBA (motion awareness) +- Optical flow vectors +- Requires multi-frame buffer + +### Learned Position Encodings + +- Replace hand-crafted sin(10\*uv) with learned embeddings +- Requires separate embedding network +- Similar to NeRF position encoding + +### Dynamic Architecture + +- Runtime kernel size selection based on scene +- Conditional layer execution (skip connections) +- Layer pruning for performance + +--- + +## References + +- **v1 Implementation:** `src/effects/cnn_effect.*` +- **Training Guide:** `doc/HOWTO.md` (CNN Training section) +- **Test Tool:** `doc/CNN_TEST_TOOL.md` +- **Shader System:** `doc/SEQUENCE.md` +- **Size Measurement:** `doc/SIZE_MEASUREMENT.md` + +--- + +## Appendix: Design Decisions + +### Why Bias as Static Feature? + +**Alternatives considered:** +1. Separate bias array per layer (Option B) +2. Bias as static feature = 1.0 (Option A, chosen) + +**Decision rationale:** +- Simpler shader code (fewer bindings) +- Standard NN formulation (augmented input) +- Saves 56-112 bytes per model +- 7 features sufficient for v1 implementation +- Can extend to uint8 packing if >7 features needed + +### Why Float16 for Weights? + +**Alternatives considered:** +1. Keep f32 (larger, more accurate) +2. Use f16 (smaller, GPU-native) +3. Use int8 (smallest, needs QAT) + +**Decision rationale:** +- f16 saves 50% vs f32 (critical for 64k target) +- GPU-native support (pack2x16float in WGSL) +- <0.1% accuracy loss (acceptable) +- Simpler than int8 quantization + +### Why Multi-Frequency Position Encoding? + +**Inspiration:** NeRF (Neural Radiance Fields) + +**Benefits:** +- Helps network learn high-frequency details +- Better than raw UV coordinates +- Small footprint (1D per frequency) + +**Future:** Add sin(20\*uv), sin(40\*uv) if >7 features available + +--- + +## Related Documentation + +- `doc/CNN_V2_BINARY_FORMAT.md` - Binary weight file specification (.bin format) +- `doc/CNN_V2_WEB_TOOL.md` - WebGPU testing tool with layer visualization +- `doc/CNN_TEST_TOOL.md` - C++ offline validation tool (deprecated) +- `doc/HOWTO.md` - Training and validation workflows + +--- + +**Document Version:** 1.0 +**Last Updated:** 2026-02-12 +**Status:** Design approved, ready for implementation diff --git a/cnn_v2/docs/CNN_V2_BINARY_FORMAT.md b/cnn_v2/docs/CNN_V2_BINARY_FORMAT.md new file mode 100644 index 0000000..59c859d --- /dev/null +++ b/cnn_v2/docs/CNN_V2_BINARY_FORMAT.md @@ -0,0 +1,235 @@ +# CNN v2 Binary Weight Format Specification + +Binary format for storing trained CNN v2 weights with static feature architecture. + +**File Extension:** `.bin` +**Byte Order:** Little-endian +**Version:** 2.0 (supports mip-level for parametric features) +**Backward Compatible:** Version 1.0 files supported (mip_level=0) + +--- + +## File Structure + +**Version 2 (current):** +``` +┌─────────────────────┐ +│ Header (20 bytes) │ +├─────────────────────┤ +│ Layer Info │ +│ (20 bytes × N) │ +├─────────────────────┤ +│ Weight Data │ +│ (variable size) │ +└─────────────────────┘ +``` + +**Version 1 (legacy):** +``` +┌─────────────────────┐ +│ Header (16 bytes) │ +├─────────────────────┤ +│ Layer Info │ +│ (20 bytes × N) │ +├─────────────────────┤ +│ Weight Data │ +│ (variable size) │ +└─────────────────────┘ +``` + +--- + +## Header + +**Version 2 (20 bytes):** + +| Offset | Type | Field | Description | +|--------|------|----------------|--------------------------------------| +| 0x00 | u32 | magic | Magic number: `0x32_4E_4E_43` ("CNN2") | +| 0x04 | u32 | version | Format version (2 for current) | +| 0x08 | u32 | num_layers | Number of CNN layers (excludes static features) | +| 0x0C | u32 | total_weights | Total f16 weight count across all layers | +| 0x10 | u32 | mip_level | Mip level for p0-p3 features (0=original, 1=half, 2=quarter, 3=eighth) | + +**Version 1 (16 bytes) - Legacy:** + +| Offset | Type | Field | Description | +|--------|------|----------------|--------------------------------------| +| 0x00 | u32 | magic | Magic number: `0x32_4E_4E_43` ("CNN2") | +| 0x04 | u32 | version | Format version (1) | +| 0x08 | u32 | num_layers | Number of CNN layers | +| 0x0C | u32 | total_weights | Total f16 weight count | + +**Note:** Loaders should check version field and handle both formats. Version 1 files treated as mip_level=0. + +--- + +## Layer Info (20 bytes per layer) + +Repeated `num_layers` times: +- **Version 2:** Starting at offset 0x14 (20 bytes) +- **Version 1:** Starting at offset 0x10 (16 bytes) + +| Offset | Type | Field | Description | +|-------------|------|----------------|--------------------------------------| +| 0x00 | u32 | kernel_size | Convolution kernel dimension (3, 5, 7, etc.) | +| 0x04 | u32 | in_channels | Input channel count (includes 8 static features for Layer 1) | +| 0x08 | u32 | out_channels | Output channel count (max 8) | +| 0x0C | u32 | weight_offset | Weight array start index (f16 units, relative to weight data section) | +| 0x10 | u32 | weight_count | Number of f16 weights for this layer | + +**Layer Order:** Sequential (Layer 1, Layer 2, Layer 3, ...) + +--- + +## Weight Data (variable size) + +Starts at offset: +- **Version 2:** `20 + (num_layers × 20)` +- **Version 1:** `16 + (num_layers × 20)` + +**Format:** Packed f16 pairs stored as u32 +**Packing:** `u32 = (f16_hi << 16) | f16_lo` +**Storage:** Sequential by layer, then by output channel, input channel, spatial position + +**Weight Indexing:** +``` +weight_idx = output_ch × (in_channels × kernel_size²) + + input_ch × kernel_size² + + (ky × kernel_size + kx) +``` + +Where: +- `output_ch` ∈ [0, out_channels) +- `input_ch` ∈ [0, in_channels) +- `ky`, `kx` ∈ [0, kernel_size) + +**Unpacking f16 from u32:** +```c +uint32_t packed = weights_buffer[weight_idx / 2]; +uint16_t f16_bits = (weight_idx % 2 == 0) ? (packed & 0xFFFF) : (packed >> 16); +``` + +--- + +## Example: 3-Layer Network (Version 2) + +**Configuration:** +- Mip level: 0 (original resolution) +- Layer 0: 12→4, kernel 3×3 (432 weights) +- Layer 1: 12→4, kernel 3×3 (432 weights) +- Layer 2: 12→4, kernel 3×3 (432 weights) + +**File Layout:** +``` +Offset Size Content +------ ---- ------- +0x00 20 Header (magic, version=2, layers=3, weights=1296, mip_level=0) +0x14 20 Layer 0 info (kernel=3, in=12, out=4, offset=0, count=432) +0x28 20 Layer 1 info (kernel=3, in=12, out=4, offset=432, count=432) +0x3C 20 Layer 2 info (kernel=3, in=12, out=4, offset=864, count=432) +0x50 2592 Weight data (1296 u32 packed f16 pairs) + ---- +Total: 2672 bytes (~2.6 KB) +``` + +--- + +## Static Features + +Not stored in .bin file (computed at runtime): + +**8D Input Features:** +1. **p0** - Parametric feature 0 (from mip level) +2. **p1** - Parametric feature 1 (from mip level) +3. **p2** - Parametric feature 2 (from mip level) +4. **p3** - Parametric feature 3 (depth or from mip level) +5. **UV_X** - Normalized x coordinate [0,1] +6. **UV_Y** - Normalized y coordinate [0,1] +7. **sin(20 × UV_Y)** - Spatial frequency encoding (vertical, frequency=20) +8. **1.0** - Bias term + +**Mip Level Usage (p0-p3):** +- `mip_level=0`: RGB from original resolution (mip 0) +- `mip_level=1`: RGB from half resolution (mip 1), upsampled +- `mip_level=2`: RGB from quarter resolution (mip 2), upsampled +- `mip_level=3`: RGB from eighth resolution (mip 3), upsampled + +**Layer 0** receives input RGBD (4D) + static features (8D) = 12D input → 4D output. +**Layer 1+** receive previous layer output (4D) + static features (8D) = 12D input → 4D output. + +--- + +## Validation + +**Magic Check:** +```c +uint32_t magic; +fread(&magic, 4, 1, fp); +if (magic != 0x32_4E_4E_43) { error("Invalid CNN v2 file"); } +``` + +**Version Check:** +```c +uint32_t version; +fread(&version, 4, 1, fp); +if (version != 1 && version != 2) { error("Unsupported version"); } +uint32_t header_size = (version == 1) ? 16 : 20; +``` + +**Size Check:** +```c +expected_size = header_size + (num_layers × 20) + (total_weights × 2); +if (file_size != expected_size) { error("Size mismatch"); } +``` + +**Weight Offset Sanity:** +```c +// Each layer's offset should match cumulative count +uint32_t cumulative = 0; +for (int i = 0; i < num_layers; i++) { + if (layers[i].weight_offset != cumulative) { error("Invalid offset"); } + cumulative += layers[i].weight_count; +} +if (cumulative != total_weights) { error("Total mismatch"); } +``` + +--- + +## Future Extensions + +**TODO: Flexible Feature Layout** + +Current limitation: Feature vector layout is hardcoded as `[p0, p1, p2, p3, uv_x, uv_y, sin10_x, bias]`. + +Proposed enhancement for version 3: +- Add feature descriptor section to header +- Specify feature count, types, and ordering +- Support arbitrary 7D feature combinations (e.g., `[R, G, B, dx, dy, uv_x, bias]`) +- Allow runtime shader generation based on descriptor +- Enable experimentation without recompiling shaders + +Example descriptor format: +``` +struct FeatureDescriptor { + u32 feature_count; // Number of features (typically 7-8) + u32 feature_types[8]; // Type enum per feature + u32 feature_sources[8]; // Source enum (mip0, mip1, gradient, etc.) + u32 reserved[8]; // Future use +} +``` + +Benefits: +- Training can experiment with different feature combinations +- No shader recompilation needed +- Single binary format supports multiple architectures +- Easier A/B testing of feature effectiveness + +--- + +## Related Files + +- `training/export_cnn_v2_weights.py` - Binary export tool +- `src/effects/cnn_v2_effect.cc` - C++ loader +- `tools/cnn_v2_test/index.html` - WebGPU validator +- `doc/CNN_V2.md` - Architecture design diff --git a/cnn_v2/docs/CNN_V2_DEBUG_TOOLS.md b/cnn_v2/docs/CNN_V2_DEBUG_TOOLS.md new file mode 100644 index 0000000..8d1289a --- /dev/null +++ b/cnn_v2/docs/CNN_V2_DEBUG_TOOLS.md @@ -0,0 +1,143 @@ +# CNN v2 Debugging Tools + +Tools for investigating CNN v2 mismatch between HTML tool and cnn_test. + +--- + +## Identity Weight Generator + +**Purpose:** Generate trivial .bin files with identity passthrough for debugging. + +**Script:** `training/gen_identity_weights.py` + +**Usage:** +```bash +# 1×1 identity (default) +./training/gen_identity_weights.py workspaces/main/weights/cnn_v2_identity.bin + +# 3×3 identity +./training/gen_identity_weights.py workspaces/main/weights/cnn_v2_identity_3x3.bin --kernel-size 3 + +# Mix mode: 50-50 blend (0.5*p0+0.5*p4, etc) +./training/gen_identity_weights.py output.bin --mix + +# Static features only: p4→ch0, p5→ch1, p6→ch2, p7→ch3 +./training/gen_identity_weights.py output.bin --p47 + +# Custom mip level +./training/gen_identity_weights.py output.bin --kernel-size 1 --mip-level 2 +``` + +**Output:** +- Single layer, 12D→4D (4 input channels + 8 static features) +- Identity mode: Output Ch{0,1,2,3} = Input Ch{0,1,2,3} +- Mix mode (--mix): Output Ch{i} = 0.5*Input Ch{i} + 0.5*Input Ch{i+4} (50-50 blend, avoids overflow) +- Static mode (--p47): Output Ch{i} = Input Ch{i+4} (static features only, visualizes p4-p7) +- Minimal file size (~136 bytes for 1×1, ~904 bytes for 3×3) + +**Validation:** +Load in HTML tool or cnn_test - output should match input (RGB only, ignoring static features). + +--- + +## Composited Layer Visualization + +**Purpose:** Save current layer view as single composited image (4 channels side-by-side, grayscale). + +**Location:** HTML tool - "Layer Visualization" panel + +**Usage:** +1. Load image + weights in HTML tool +2. Select layer to visualize (Static 0-3, Static 4-7, Layer 0, Layer 1, etc.) +3. Click "Save Composited" button +4. Downloads PNG: `composited_layer{N}_{W}x{H}.png` + +**Output:** +- 4 channels stacked horizontally +- Grayscale representation +- Useful for comparing layer activations across tools + +--- + +## Debugging Strategy + +### Track a) Binary Conversion Chain + +**Hypothesis:** Conversion error in .bin ↔ base64 ↔ Float32Array + +**Test:** +1. Generate identity weights: + ```bash + ./training/gen_identity_weights.py workspaces/main/weights/test_identity.bin + ``` + +2. Load in HTML tool - output should match input RGB + +3. If mismatch: + - Check Python export: f16 packing in `export_cnn_v2_weights.py` line 105 + - Check HTML parsing: `unpackF16()` in `index.html` line 805-815 + - Check weight indexing: `get_weight()` shader function + +**Key locations:** +- Python: `np.float16` → `view(np.uint32)` (line 105 of export script) +- JS: `DataView` → `unpackF16()` → manual f16 decode (line 773-803) +- WGSL: `unpack2x16float()` built-in (line 492 of shader) + +### Track b) Layer Visualization + +**Purpose:** Confirm layer outputs match between HTML and C++ + +**Method:** +1. Run identical input through both tools +2. Save composited layers from HTML tool +3. Compare with cnn_test output +4. Use identity weights to isolate weight loading from computation + +### Track c) Trivial Test Case + +**Use identity weights to test:** +- Weight loading (binary parsing) +- Feature generation (static features) +- Convolution (should be passthrough) +- Output packing + +**Expected behavior:** +- Input RGB → Output RGB (exact match) +- Static features ignored (all zeros in identity matrix) + +--- + +## Known Issues + +### ~~Layer 0 Visualization Scale~~ [FIXED] + +**Issue:** Layer 0 output displayed at 0.5× brightness (divided by 2). + +**Cause:** Line 1530 used `vizScale = 0.5` for all CNN layers, but Layer 0 is clamped [0,1] and doesn't need dimming. + +**Fix:** Use scale 1.0 for Layer 0 output (layerIdx=1), 0.5 only for middle layers (ReLU, unbounded). + +### Remaining Mismatch + +**Current:** HTML tool and cnn_test produce different outputs for same input/weights. + +**Suspects:** +1. F16 unpacking difference (CPU vs GPU vs JS) +2. Static feature generation (RGBD, UV, sin encoding) +3. Convolution kernel iteration order +4. Output packing/unpacking + +**Next steps:** +1. Test with identity weights (eliminates weight loading) +2. Compare composited layer outputs +3. Add debug visualization for static features +4. Hex dump comparison (first 8 pixels) - use `--debug-hex` flag in cnn_test + +--- + +## Related Documentation + +- `doc/CNN_V2.md` - CNN v2 architecture +- `doc/CNN_V2_WEB_TOOL.md` - HTML tool documentation +- `doc/CNN_TEST_TOOL.md` - cnn_test CLI tool +- `training/export_cnn_v2_weights.py` - Binary export format diff --git a/cnn_v2/docs/CNN_V2_WEB_TOOL.md b/cnn_v2/docs/CNN_V2_WEB_TOOL.md new file mode 100644 index 0000000..b6f5b0b --- /dev/null +++ b/cnn_v2/docs/CNN_V2_WEB_TOOL.md @@ -0,0 +1,348 @@ +# CNN v2 Web Testing Tool + +Browser-based WebGPU tool for validating CNN v2 inference with layer visualization and weight inspection. + +**Location:** `tools/cnn_v2_test/index.html` + +--- + +## Status (2026-02-13) + +**Working:** +- ✅ WebGPU initialization and device setup +- ✅ Binary weight file parsing (v1 and v2 formats) +- ✅ Automatic mip-level detection from binary format v2 +- ✅ Weight statistics (min/max per layer) +- ✅ UI layout with collapsible panels +- ✅ Mode switching (Activations/Weights tabs) +- ✅ Canvas context management (2D for weights, WebGPU for activations) +- ✅ Weight visualization infrastructure (layer selection, grid layout) +- ✅ Layer naming matches codebase convention (Layer 0, Layer 1, Layer 2) +- ✅ Static features split visualization (Static 0-3, Static 4-7) +- ✅ All layers visible including output layer (Layer 2) +- ✅ Video playback support (MP4, WebM) with frame-by-frame controls +- ✅ Video looping (automatic continuous playback) +- ✅ Mip level selection (p0-p3 features at different resolutions) + +**Recent Changes (Latest):** +- Binary format v2 support: Reads mip_level from 20-byte header +- Backward compatible: v1 (16-byte header) → mip_level=0 +- Auto-update UI dropdown when loading weights with mip_level +- Display mip_level in metadata panel +- Code refactoring: Extracted FULLSCREEN_QUAD_VS shader (reused 3× across pipelines) +- Added helper methods: `getDimensions()`, `setVideoControlsEnabled()` +- Improved code organization with section headers and comments +- Moved Mip Level selector to bottom of left sidebar (removed "Features (p0-p3)" label) +- Added `loop` attribute to video element for automatic continuous playback + +**Previous Fixes:** +- Fixed Layer 2 not appearing (was excluded from layerOutputs due to isOutput check) +- Fixed canvas context switching (force clear before recreation) +- Added Static 0-3 / Static 4-7 buttons to view all 8 static feature channels +- Aligned naming with train_cnn_v2.py/.wgsl: Layer 0, Layer 1, Layer 2 (not Layer 1, 2, 3) +- Disabled Static buttons in weights mode (no learnable weights) + +**Known Issues:** +- Layer activation visualization may show black if texture data not properly unpacked +- Weight kernel display depends on correct 2D context creation after canvas recreation + +--- + +## Architecture + +### File Structure +- Single-file HTML tool (~1100 lines) +- Embedded shaders: STATIC_SHADER, CNN_SHADER, DISPLAY_SHADER, LAYER_VIZ_SHADER +- Shared WGSL component: FULLSCREEN_QUAD_VS (reused across render pipelines) +- **Embedded default weights:** DEFAULT_WEIGHTS_B64 (base64-encoded binary v2) + - Current: 4 layers (3×3, 5×5, 3×3, 3×3), 2496 f16 weights, mip_level=2 + - Source: `workspaces/main/weights/cnn_v2_weights.bin` + - Updates: Re-encode binary with `base64 -i ` and update constant +- Pure WebGPU (no external dependencies) + +### Code Organization + +**Recent Refactoring (2026-02-13):** +- Extracted `FULLSCREEN_QUAD_VS` constant: Reused fullscreen quad vertex shader (2 triangles covering NDC) +- Added helper methods to CNNTester class: + - `getDimensions()`: Returns current source dimensions (video or image) + - `setVideoControlsEnabled(enabled)`: Centralized video control enable/disable +- Consolidated duplicate vertex shader code (used in mipmap generation, display, layer visualization) +- Added section headers in JavaScript for better navigation +- Improved inline comments explaining shader architecture + +**Benefits:** +- Reduced code duplication (~40 lines saved) +- Easier maintenance (single source of truth for fullscreen quad) +- Clearer separation of concerns + +### Key Components + +**1. Weight Parsing** +- Reads binary format v2: header (20B) + layer info (20B×N) + f16 weights +- Backward compatible with v1: header (16B), mip_level defaults to 0 +- Computes min/max per layer via f16 unpacking +- Stores `{ layers[], weights[], mipLevel, fileSize }` +- Auto-sets UI mip-level dropdown from loaded weights + +**2. CNN Pipeline** +- Static features computation (RGBD + UV + sin + bias → 7D packed) +- Layer-by-layer convolution with storage buffer weights +- Ping-pong buffers for intermediate results +- Copy to persistent textures for visualization + +**3. Visualization Modes** + +**Activations Mode:** +- 4 grayscale views per layer (channels 0-3 of up to 8 total) +- WebGPU compute → unpack f16 → scale → grayscale +- Auto-scale: Static features = 1.0, CNN layers = 0.2 +- Static features: Shows R,G,B,D (first 4 of 8: RGBD+UV+sin+bias) +- CNN layers: Shows first 4 output channels + +**Weights Mode:** +- 2D canvas rendering per output channel +- Shows all input kernels horizontally +- Normalized by layer min/max → [0, 1] → grayscale +- 20px cells, 2px padding between kernels + +### Texture Management + +**Persistent Storage (layerTextures[]):** +- One texture per layer output (static + all CNN layers) +- `rgba32uint` format (packed f16 data) +- `COPY_DST` usage for storing results + +**Compute Buffers (computeTextures[]):** +- 2 textures for ping-pong computation +- Reused across all layers +- `COPY_SRC` usage for copying to persistent storage + +**Pipeline:** +``` +Static pass → copy to layerTextures[0] +For each CNN layer i: + Compute (ping-pong) → copy to layerTextures[i+1] +``` + +### Layer Indexing + +**UI Layer Buttons:** +- "Static" → layerOutputs[0] (7D input features) +- "Layer 1" → layerOutputs[1] (CNN layer 1 output, uses weights.layers[0]) +- "Layer 2" → layerOutputs[2] (CNN layer 2 output, uses weights.layers[1]) +- "Layer N" → layerOutputs[N] (CNN layer N output, uses weights.layers[N-1]) + +**Weights Table:** +- "Layer 1" → weights.layers[0] (first CNN layer weights) +- "Layer 2" → weights.layers[1] (second CNN layer weights) +- "Layer N" → weights.layers[N-1] + +**Consistency:** Both UI and weights table use same numbering (1, 2, 3...) for CNN layers. + +--- + +## Known Issues + +### Issue #1: Layer Activations Show Black + +**Symptom:** +- All 4 channel canvases render black +- UV gradient test (debug mode 10) works +- Raw packed data test (mode 11) shows black +- Unpacked f16 test (mode 12) shows black + +**Diagnosis:** +- Texture access works (UV gradient visible) +- Texture data is all zeros (packed.x = 0) +- Textures being read are empty + +**Root Cause:** +- `copyTextureToTexture` operations may not be executing +- Possible ordering issue (copies not submitted before visualization) +- Alternative: textures created with wrong usage flags + +**Investigation Steps Taken:** +1. Added `onSubmittedWorkDone()` wait before visualization +2. Verified texture creation with `COPY_SRC` and `COPY_DST` flags +3. Confirmed separate texture allocation per layer (no aliasing) +4. Added debug shader modes to isolate issue + +**Next Steps:** +- Verify encoder contains copy commands (add debug logging) +- Check if compute passes actually write data (add known-value test) +- Test copyTextureToTexture in isolation +- Consider CPU readback to verify texture contents + +### Issue #2: Weight Visualization Empty + +**Symptom:** +- Canvases created with correct dimensions (logged) +- No visual output (black canvases) +- Console logs show method execution + +**Potential Causes:** +1. Weight indexing calculation incorrect +2. Canvas not properly attached to DOM when rendering +3. 2D context operations not flushing +4. Min/max normalization producing black (all values equal?) + +**Debug Added:** +- Comprehensive logging of dimensions, indices, ranges +- Canvas context check before rendering + +**Next Steps:** +- Add test rendering (fixed gradient) to verify 2D context works +- Log sample weight values to verify data access +- Check if canvas is visible in DOM inspector +- Verify min/max calculation produces valid range + +--- + +## UI Layout + +### Header +- Controls: Blend slider, Depth input, View mode display +- Drop zone for .bin weight files + +### Content Area + +**Left Sidebar (300px):** +1. Drop zone for .bin weight files +2. Weights Info panel (file size, layer table with min/max) +3. Weights Visualization panel (per-layer kernel display) +4. **Mip Level selector** (bottom) - Select p0/p1/p2 for static features + +**Main Canvas (center):** +- CNN output display with video controls (Play/Pause, Frame ◄/►) +- Supports both PNG images and video files (MP4, WebM) +- Video loops automatically for continuous playback + +**Right Sidebar (panels):** +1. **Layer Visualization Panel** (top, flex: 1) + - Layer selection buttons (Static 0-3, Static 4-7, Layer 0, Layer 1, ...) + - 2×2 grid of channel views (grayscale activations) + - 4× zoom view at bottom + +### Footer +- Status line (GPU timing, dimensions, mode) +- Console log (scrollable, color-coded) + +--- + +## Shader Details + +### LAYER_VIZ_SHADER + +**Purpose:** Display single channel from packed layer texture + +**Inputs:** +- `@binding(0) layer_tex: texture_2d` - Packed f16 layer data +- `@binding(1) viz_params: vec2` - (channel_idx, scale) + +**Debug Modes:** +- Channel 10: UV gradient (texture coordinate test) +- Channel 11: Raw packed u32 data +- Channel 12: First unpacked f16 value + +**Normal Operation:** +- Unpack all 8 f16 channels from rgba32uint +- Select channel by index (0-7) +- Apply scale factor (1.0 for static, 0.2 for CNN) +- Clamp to [0, 1] and output grayscale + +**Scale Rationale:** +- Static features (RGBD, UV): already in [0, 1] range +- CNN activations: post-ReLU [0, ~5], need scaling for visibility + +--- + +## Binary Weight Format + +See `doc/CNN_V2_BINARY_FORMAT.md` for complete specification. + +**Quick Summary:** +- Header: 16 bytes (magic, version, layer count, total weights) +- Layer info: 20 bytes × N (kernel size, channels, offsets) +- Weights: Packed f16 pairs as u32 + +--- + +## Testing Workflow + +### Load & Parse +1. Drop PNG image → displays original +2. Drop .bin weights → parses and shows info table +3. Auto-runs CNN pipeline + +### Verify Pipeline +1. Check console for "Running CNN pipeline" +2. Verify "Completed in Xms" +3. Check "Layer visualization ready: N layers" + +### Debug Activations +1. Select "Activations" tab +2. Click layer buttons to switch +3. Check console for texture/canvas logs +4. If black: note which debug modes work (UV vs data) + +### Debug Weights +1. Select "Weights" tab +2. Click Layer 1 or Layer 2 (Layer 0 has no weights) +3. Check console for "Visualizing Layer N weights" +4. Check canvas dimensions logged +5. Verify weight range is non-trivial (not [0, 0]) + +--- + +## Integration with Main Project + +**Training Pipeline:** +```bash +# Generate weights +./training/train_cnn_v2.py --export-binary + +# Test in browser +open tools/cnn_v2_test/index.html +# Drop: workspaces/main/cnn_v2_weights.bin +# Drop: training/input/test.png +``` + +**Validation:** +- Compare against demo CNNv2Effect (visual check) +- Verify layer count matches binary file +- Check weight ranges match training logs + +--- + +## Future Enhancements + +- [ ] Fix layer activation visualization (black texture issue) +- [ ] Fix weight kernel display (empty canvas issue) +- [ ] Add per-channel auto-scaling (compute min/max from visible data) +- [ ] Export rendered outputs (download PNG) +- [ ] Side-by-side comparison with original +- [ ] Heatmap mode (color-coded activations) +- [ ] Weight statistics overlay (mean, std, sparsity) +- [ ] Batch processing (multiple images in sequence) +- [ ] Integration with Python training (live reload) + +--- + +## Code Metrics + +- Total lines: ~1100 +- JavaScript: ~700 lines +- WGSL shaders: ~300 lines +- HTML/CSS: ~100 lines + +**Dependencies:** None (pure WebGPU + HTML5) + +--- + +## Related Files + +- `doc/CNN_V2.md` - CNN v2 architecture and design +- `doc/CNN_TEST_TOOL.md` - C++ offline testing tool (deprecated) +- `training/train_cnn_v2.py` - Training script with binary export +- `workspaces/main/cnn_v2_weights.bin` - Trained weights -- cgit v1.2.3