diff options
| author | skal <pascal.massimino@gmail.com> | 2026-02-15 18:44:17 +0100 |
|---|---|---|
| committer | skal <pascal.massimino@gmail.com> | 2026-02-15 18:44:17 +0100 |
| commit | 161a59fa50bb92e3664c389fa03b95aefe349b3f (patch) | |
| tree | 71548f64b2bdea958388f9063b74137659d70306 /doc | |
| parent | 9c3b72c710bf1ffa7e18f7c7390a425d57487eba (diff) | |
refactor(cnn): isolate CNN v2 to cnn_v2/ subdirectory
Move all CNN v2 files to dedicated cnn_v2/ directory to prepare for CNN v3 development. Zero functional changes.
Structure:
- cnn_v2/src/ - C++ effect implementation
- cnn_v2/shaders/ - WGSL shaders (6 files)
- cnn_v2/weights/ - Binary weights (3 files)
- cnn_v2/training/ - Python training scripts (4 files)
- cnn_v2/scripts/ - Shell scripts (train_cnn_v2_full.sh)
- cnn_v2/tools/ - Validation tools (HTML)
- cnn_v2/docs/ - Documentation (4 markdown files)
Changes:
- Update CMake source list to cnn_v2/src/cnn_v2_effect.cc
- Update assets.txt with relative paths to cnn_v2/
- Update includes to ../../cnn_v2/src/cnn_v2_effect.h
- Add PROJECT_ROOT resolution to Python/shell scripts
- Update doc references in HOWTO.md, TODO.md
- Add cnn_v2/README.md
Verification: 34/34 tests passing, demo runs correctly.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Diffstat (limited to 'doc')
| -rw-r--r-- | doc/CNN_V2.md | 813 | ||||
| -rw-r--r-- | doc/CNN_V2_BINARY_FORMAT.md | 235 | ||||
| -rw-r--r-- | doc/CNN_V2_DEBUG_TOOLS.md | 143 | ||||
| -rw-r--r-- | doc/CNN_V2_WEB_TOOL.md | 348 | ||||
| -rw-r--r-- | doc/HOWTO.md | 32 |
5 files changed, 16 insertions, 1555 deletions
diff --git a/doc/CNN_V2.md b/doc/CNN_V2.md deleted file mode 100644 index b7fd6f8..0000000 --- a/doc/CNN_V2.md +++ /dev/null @@ -1,813 +0,0 @@ -# CNN v2: Parametric Static Features - -**Technical Design Document** - ---- - -## Overview - -CNN v2 extends the original CNN post-processing effect with parametric static features, enabling richer spatial and frequency-domain inputs for improved visual quality. - -**Key improvements over v1:** -- 7D static feature input (vs 4D RGB) -- Multi-frequency position encoding (NeRF-style) -- Configurable mip-level for p0-p3 parametric features (0-3) -- Per-layer configurable kernel sizes (1×1, 3×3, 5×5) -- Variable channel counts per layer -- Float16 weight storage (~3.2 KB for 3-layer model) -- Bias integrated as static feature dimension -- Storage buffer architecture (dynamic layer count) -- Binary weight format v2 for runtime loading -- Sigmoid activation for layer 0 and final layer (smooth [0,1] mapping) - -**Status:** ✅ Complete. Sigmoid activation, stable training, validation tools operational. - -**Breaking Change:** -- Models trained with `clamp()` incompatible. Retrain required. - -**TODO:** -- 8-bit quantization with QAT for 2× size reduction (~1.6 KB) - ---- - -## Architecture - -### Pipeline Overview - -``` -Input RGBD → Static Features Compute → CNN Layers → Output RGBA - └─ computed once/frame ─┘ └─ multi-pass ─┘ -``` - -**Detailed Data Flow:** - -``` - ┌─────────────────────────────────────────┐ - │ Static Features (computed once) │ - │ 8D: p0,p1,p2,p3,uv_x,uv_y,sin10x,bias │ - └──────────────┬──────────────────────────┘ - │ - │ 8D (broadcast to all layers) - ├───────────────────────────┐ - │ │ - ┌──────────────┐ │ │ - │ Input RGBD │──────────────┤ │ - │ 4D │ 4D │ │ - └──────────────┘ │ │ - ▼ │ - ┌────────────┐ │ - │ Layer 0 │ (12D input) │ - │ (CNN) │ = 4D + 8D │ - │ 12D → 4D │ │ - └─────┬──────┘ │ - │ 4D output │ - │ │ - ├───────────────────────────┘ - │ │ - ▼ │ - ┌────────────┐ │ - │ Layer 1 │ (12D input) │ - │ (CNN) │ = 4D + 8D │ - │ 12D → 4D │ │ - └─────┬──────┘ │ - │ 4D output │ - │ │ - ├───────────────────────────┘ - ▼ │ - ... │ - │ │ - ▼ │ - ┌────────────┐ │ - │ Layer N │ (12D input) │ - │ (output) │◄──────────────────┘ - │ 12D → 4D │ - └─────┬──────┘ - │ 4D (RGBA) - ▼ - Output -``` - -**Key Points:** -- Static features computed once, broadcast to all CNN layers -- Each layer: previous 4D output + 8D static → 12D input → 4D output -- Ping-pong buffering between layers -- Layer 0 special case: uses input RGBD instead of previous layer output - -**Static Features Texture:** -- Name: `static_features` -- Format: `texture_storage_2d<rgba32uint, write>` (4×u32) -- Data: 8 float16 values packed via `pack2x16float()` -- Computed once per frame, read by all CNN layers -- Lifetime: Entire frame (all CNN layer passes) - -**CNN Layers:** -- Layer 0: input RGBD (4D) + static (8D) = 12D → 4 channels -- Layer 1+: previous output (4D) + static (8D) = 12D → 4 channels -- All layers: uniform 12D input, 4D output (ping-pong buffer) -- Storage: `texture_storage_2d<rgba32uint>` (4 channels as 2×f16 pairs) - -**Activation Functions:** -- Layer 0 & final layer: `sigmoid(x)` for smooth [0,1] mapping -- Middle layers: `ReLU` (max(0, x)) -- Rationale: Sigmoid prevents gradient blocking at boundaries, enabling better convergence -- Breaking change: Models trained with `clamp(x, 0, 1)` are incompatible, retrain required - ---- - -## Static Features (7D + 1 bias) - -### Feature Layout - -**8 float16 values per pixel:** - -```wgsl -// Slot 0-3: Parametric features (p0, p1, p2, p3) -// Sampled from configurable mip level (0=original, 1=half, 2=quarter, 3=eighth) -// Training sets mip_level via --mip-level flag, stored in binary format v2 -let p0 = ...; // RGB.r from selected mip level -let p1 = ...; // RGB.g from selected mip level -let p2 = ...; // RGB.b from selected mip level -let p3 = ...; // Depth or RGB channel from mip level - -// Slot 4-5: UV coordinates (normalized screen space) -let uv_x = coord.x / resolution.x; // Horizontal position [0,1] -let uv_y = coord.y / resolution.y; // Vertical position [0,1] - -// Slot 6: Multi-frequency position encoding -let sin20_y = sin(20.0 * uv_y); // Periodic feature (frequency=20, vertical) - -// Slot 7: Bias dimension (always 1.0) -let bias = 1.0; // Learned bias per output channel - -// Packed storage: [p0, p1, p2, p3, uv.x, uv.y, sin(20*uv.y), 1.0] -``` - -### Input Channel Mapping - -**Weight tensor layout (12 input channels per layer):** - -| Input Channel | Feature | Description | -|--------------|---------|-------------| -| 0-3 | Previous layer output | 4D RGBA from prior CNN layer (or input RGBD for Layer 0) | -| 4-11 | Static features | 8D: p0, p1, p2, p3, uv_x, uv_y, sin20_y, bias | - -**Static feature channel details:** -- Channel 4 → p0 (RGB.r from mip level) -- Channel 5 → p1 (RGB.g from mip level) -- Channel 6 → p2 (RGB.b from mip level) -- Channel 7 → p3 (depth or RGB channel from mip level) -- Channel 8 → p4 (uv_x: normalized horizontal position) -- Channel 9 → p5 (uv_y: normalized vertical position) -- Channel 10 → p6 (sin(20*uv_y): periodic encoding) -- Channel 11 → p7 (bias: constant 1.0) - -**Note:** When generating identity weights, p4-p7 correspond to input channels 8-11, not 4-7. - -### Feature Rationale - -| Feature | Dimension | Purpose | Priority | -|---------|-----------|---------|----------| -| p0-p3 | 4D | Parametric auxiliary features (mips, gradients, etc.) | Essential | -| UV coords | 2D | Spatial position awareness | Essential | -| sin(20\*uv.y) | 1D | Periodic position encoding (vertical) | Medium | -| Bias | 1D | Learned bias (standard NN) | Essential | - -**Note:** Input image RGBD (mip 0) fed only to Layer 0. Subsequent layers see static features + previous layer output. - -**Why bias as static feature:** -- Simpler shader code (single weight array) -- Standard NN formulation: y = Wx (x includes bias term) -- Saves 56-112 bytes (no separate bias buffer) -- 7 features sufficient for initial implementation - -### Future Feature Extensions - -**Option: Additional encodings:** -- `sin(40*uv.y)` - Higher frequency encoding -- `gray_mip1` - Multi-scale luminance -- `dx`, `dy` - Sobel gradients -- `variance` - Local texture measure -- `laplacian` - Edge detection - -**Option: uint8 packing (16+ features):** -```wgsl -// texture_storage_2d<rgba8unorm> stores 16 uint8 values -// Trade precision for feature count -// [R, G, B, D, uv.x, uv.y, sin10.x, sin10.y, -// sin20.x, sin20.y, dx, dy, gray_mip1, gray_mip2, var, bias] -``` -Requires quantization-aware training. - ---- - -## Layer Structure - -### Example 3-Layer Network - -``` -Layer 0: input RGBD (4D) + static (8D) = 12D → 4 channels (3×3 kernel) -Layer 1: previous (4D) + static (8D) = 12D → 4 channels (3×3 kernel) -Layer 2: previous (4D) + static (8D) = 12D → 4 channels (3×3 kernel, output RGBA) -``` - -**Output:** 4 channels (RGBA). Training targets preserve alpha from target images. - -### Weight Calculations - -**Per-layer weights (uniform 12D→4D, 3×3 kernels):** -``` -Layer 0: 12 × 3 × 3 × 4 = 432 weights -Layer 1: 12 × 3 × 3 × 4 = 432 weights -Layer 2: 12 × 3 × 3 × 4 = 432 weights -Total: 1296 weights -``` - -**Storage sizes:** -- f32: 1296 × 4 = 5,184 bytes (~5.1 KB) -- f16: 1296 × 2 = 2,592 bytes (~2.5 KB) ✓ **recommended** - -**Comparison to v1:** -- v1: ~800 weights (3.2 KB f32) -- v2: ~1296 weights (2.5 KB f16) -- **Uniform architecture, smaller than v1 f32** - -### Kernel Size Guidelines - -**1×1 kernel (pointwise):** -- No spatial context, channel mixing only -- Weights: `12 × 4 = 48` per layer -- Use for: Fast inference, channel remapping - -**3×3 kernel (standard conv):** -- Local spatial context (recommended) -- Weights: `12 × 9 × 4 = 432` per layer -- Use for: Most layers (balanced quality/size) - -**5×5 kernel (large receptive field):** -- Wide spatial context -- Weights: `12 × 25 × 4 = 1200` per layer -- Use for: Output layer, fine detail enhancement - -### Channel Storage (4×f16 per texel) - -```wgsl -@group(0) @binding(1) var layer_input: texture_2d<u32>; - -fn unpack_channels(coord: vec2<i32>) -> vec4<f32> { - let packed = textureLoad(layer_input, coord, 0); - let v0 = unpack2x16float(packed.x); // [ch0, ch1] - let v1 = unpack2x16float(packed.y); // [ch2, ch3] - return vec4<f32>(v0.x, v0.y, v1.x, v1.y); -} - -fn pack_channels(values: vec4<f32>) -> vec4<u32> { - return vec4<u32>( - pack2x16float(vec2(values.x, values.y)), - pack2x16float(vec2(values.z, values.w)), - 0u, // Unused - 0u // Unused - ); -} -``` - ---- - -## Training Workflow - -### Script: `training/train_cnn_v2.py` - -**Static Feature Extraction:** - -```python -def compute_static_features(rgb, depth, mip_level=0): - """Generate parametric features (8D: p0-p3 + spatial). - - Args: - mip_level: 0=original, 1=half res, 2=quarter res, 3=eighth res - """ - h, w = rgb.shape[:2] - - # Generate mip level for p0-p3 (downsample then upsample) - if mip_level > 0: - mip_rgb = rgb.copy() - for _ in range(mip_level): - mip_rgb = cv2.pyrDown(mip_rgb) - for _ in range(mip_level): - mip_rgb = cv2.pyrUp(mip_rgb) - if mip_rgb.shape[:2] != (h, w): - mip_rgb = cv2.resize(mip_rgb, (w, h), interpolation=cv2.INTER_LINEAR) - else: - mip_rgb = rgb - - # Parametric features from mip level - p0, p1, p2, p3 = mip_rgb[..., 0], mip_rgb[..., 1], mip_rgb[..., 2], depth - - # UV coordinates (normalized) - uv_x = np.linspace(0, 1, w)[None, :].repeat(h, axis=0) - uv_y = np.linspace(0, 1, h)[:, None].repeat(w, axis=1) - - # Multi-frequency position encoding - sin10_x = np.sin(10.0 * uv_x) - - # Bias dimension (always 1.0) - bias = np.ones_like(p0) - - # Stack: [p0, p1, p2, p3, uv.x, uv.y, sin10_x, bias] - return np.stack([p0, p1, p2, p3, uv_x, uv_y, sin10_x, bias], axis=-1) -``` - -**Network Definition:** - -```python -class CNNv2(nn.Module): - def __init__(self, kernel_sizes, num_layers=3): - super().__init__() - if isinstance(kernel_sizes, int): - kernel_sizes = [kernel_sizes] * num_layers - self.kernel_sizes = kernel_sizes - self.layers = nn.ModuleList() - - # All layers: 12D input (4 prev + 8 static) → 4D output - for kernel_size in kernel_sizes: - self.layers.append( - nn.Conv2d(12, 4, kernel_size=kernel_size, - padding=kernel_size//2, bias=False) - ) - - def forward(self, input_rgbd, static_features): - # Layer 0: input RGBD (4D) + static (8D) = 12D - x = torch.cat([input_rgbd, static_features], dim=1) - x = self.layers[0](x) - x = torch.sigmoid(x) # Soft [0,1] for layer 0 - - # Layer 1+: previous output (4D) + static (8D) = 12D - for i in range(1, len(self.layers)): - x_input = torch.cat([x, static_features], dim=1) - x = self.layers[i](x_input) - if i < len(self.layers) - 1: - x = F.relu(x) - else: - x = torch.sigmoid(x) # Soft [0,1] for final layer - - return x # RGBA output -``` - -**Training Configuration:** - -```python -# Hyperparameters -kernel_sizes = [3, 3, 3] # Per-layer kernel sizes (e.g., [1,3,5]) -num_layers = 3 # Number of CNN layers -mip_level = 0 # Mip level for p0-p3: 0=orig, 1=half, 2=quarter, 3=eighth -grayscale_loss = False # Compute loss on grayscale (Y) instead of RGBA -learning_rate = 1e-3 -batch_size = 16 -epochs = 5000 - -# Dataset: Input RGB, Target RGBA (preserves alpha channel from image) -# Model outputs RGBA, loss compares all 4 channels (or grayscale if --grayscale-loss) - -# Training loop (standard PyTorch f32) -for epoch in range(epochs): - for rgb_batch, depth_batch, target_batch in dataloader: - # Compute static features (8D) with mip level - static_feat = compute_static_features(rgb_batch, depth_batch, mip_level) - - # Input RGBD (4D) - input_rgbd = torch.cat([rgb_batch, depth_batch.unsqueeze(1)], dim=1) - - # Forward pass - output = model(input_rgbd, static_feat) - - # Loss computation (grayscale or RGBA) - if grayscale_loss: - # Convert RGBA to grayscale: Y = 0.299*R + 0.587*G + 0.114*B - output_gray = 0.299 * output[:, 0:1] + 0.587 * output[:, 1:2] + 0.114 * output[:, 2:3] - target_gray = 0.299 * target[:, 0:1] + 0.587 * target[:, 1:2] + 0.114 * target[:, 2:3] - loss = criterion(output_gray, target_gray) - else: - loss = criterion(output, target_batch) - - # Backward pass - optimizer.zero_grad() - loss.backward() - optimizer.step() -``` - -**Checkpoint Format:** - -```python -torch.save({ - 'state_dict': model.state_dict(), # f32 weights - 'config': { - 'kernel_sizes': [3, 3, 3], # Per-layer kernel sizes - 'num_layers': 3, - 'mip_level': 0, # Mip level used for p0-p3 - 'grayscale_loss': False, # Whether grayscale loss was used - 'features': ['p0', 'p1', 'p2', 'p3', 'uv.x', 'uv.y', 'sin10_x', 'bias'] - }, - 'epoch': epoch, - 'loss': loss.item() -}, f'checkpoints/checkpoint_epoch_{epoch}.pth') -``` - ---- - -## Export Workflow - -### Script: `training/export_cnn_v2_shader.py` - -**Process:** -1. Load checkpoint (f32 PyTorch weights) -2. Extract layer configs (kernels, channels) -3. Quantize weights to float16: `weights_f16 = weights_f32.astype(np.float16)` -4. Generate WGSL shader per layer -5. Write to `workspaces/<workspace>/shaders/cnn_v2/cnn_v2_*.wgsl` - -**Example Generated Shader:** - -```wgsl -// cnn_v2_layer_0.wgsl - Auto-generated from checkpoint_epoch_5000.pth - -const KERNEL_SIZE: u32 = 1u; -const IN_CHANNELS: u32 = 8u; // 7 features + bias -const OUT_CHANNELS: u32 = 16u; - -// Weights quantized to float16 (stored as f32 in shader) -const weights: array<f32, 128> = array( - 0.123047, -0.089844, 0.234375, 0.456055, ... -); - -@group(0) @binding(0) var static_features: texture_2d<u32>; -@group(0) @binding(1) var output_texture: texture_storage_2d<rgba32uint, write>; - -@compute @workgroup_size(8, 8) -fn main(@builtin(global_invocation_id) id: vec3<u32>) { - // Load static features (8D) - let static_feat = get_static_features(vec2<i32>(id.xy)); - - // Convolution (1×1 kernel = pointwise) - var output: array<f32, OUT_CHANNELS>; - for (var c: u32 = 0u; c < OUT_CHANNELS; c++) { - var sum: f32 = 0.0; - for (var k: u32 = 0u; k < IN_CHANNELS; k++) { - sum += weights[c * IN_CHANNELS + k] * static_feat[k]; - } - output[c] = max(0.0, sum); // ReLU activation - } - - // Pack and store (8×f16 per texel) - textureStore(output_texture, vec2<i32>(id.xy), pack_f16x8(output)); -} -``` - -**Float16 Quantization:** -- Training uses f32 throughout (PyTorch standard) -- Export converts to np.float16, then back to f32 for WGSL literals -- **Expected discrepancy:** <0.1% MSE (acceptable) -- Validation via HTML tool (see below) - ---- - -## Validation Workflow - -### HTML Tool: `tools/cnn_v2_test/index.html` - -**WebGPU-based testing tool** with layer visualization. - -**Usage:** -1. Open `tools/cnn_v2_test/index.html` in browser -2. Drop `.bin` weights file (from `export_cnn_v2_weights.py`) -3. Drop PNG test image -4. View results with layer inspection - -**Features:** -- Live CNN inference with WebGPU -- Layer-by-layer visualization (static features + all CNN layers) -- Weight visualization (per-layer kernels) -- View modes: CNN output, original, diff (×10) -- Blend control for comparing with original - -**Export weights:** -```bash -./training/export_cnn_v2_weights.py checkpoints/checkpoint_epoch_100.pth \ - --output-weights workspaces/main/cnn_v2_weights.bin -``` - -See `doc/CNN_V2_WEB_TOOL.md` for detailed documentation - ---- - -## Implementation Checklist - -### Phase 1: Shaders (Core Infrastructure) - -- [ ] `workspaces/main/shaders/cnn_v2/cnn_v2_static.wgsl` - Static features compute - - [ ] RGBD sampling from framebuffer - - [ ] UV coordinate calculation - - [ ] sin(10\*uv.x) computation - - [ ] Bias dimension (constant 1.0) - - [ ] Float16 packing via `pack2x16float()` - - [ ] Output to `texture_storage_2d<rgba32uint>` - -- [ ] `workspaces/main/shaders/cnn_v2/cnn_v2_layer_template.wgsl` - Layer template - - [ ] Static features unpacking - - [ ] Previous layer unpacking (8×f16) - - [ ] Convolution implementation (1×1, 3×3, 5×5) - - [ ] ReLU activation - - [ ] Output packing (8×f16) - - [ ] Proper padding handling - -### Phase 2: C++ Effect Class - -- [ ] `src/effects/cnn_v2_effect.h` - Header - - [ ] Class declaration inheriting from `PostProcessEffect` - - [ ] Static features texture member - - [ ] Layer textures vector - - [ ] Pipeline and bind group members - -- [ ] `src/effects/cnn_v2_effect.cc` - Implementation - - [ ] Constructor: Load shaders, create textures - - [ ] `init()`: Create pipelines, bind groups - - [ ] `render()`: Multi-pass execution - - [ ] Pass 0: Compute static features - - [ ] Pass 1-N: CNN layers - - [ ] Final: Composite to output - - [ ] Proper resource cleanup - -- [ ] Integration - - [ ] Add to `src/gpu/demo_effects.h` includes - - [ ] Add `cnn_v2_effect.cc` to `CMakeLists.txt` (headless + normal) - - [ ] Add shaders to `workspaces/main/assets.txt` - - [ ] Add to `src/tests/gpu/test_demo_effects.cc` - -### Phase 3: Training Pipeline - -- [ ] `training/train_cnn_v2.py` - Training script - - [ ] Static feature extraction function - - [ ] CNNv2 PyTorch model class - - [ ] Patch-based dataloader - - [ ] Training loop with checkpointing - - [ ] Command-line argument parsing - - [ ] Inference mode (ground truth generation) - -- [ ] `training/export_cnn_v2_shader.py` - Export script - - [ ] Checkpoint loading - - [ ] Weight extraction and f16 quantization - - [ ] Per-layer WGSL generation - - [ ] File output to workspace shaders/ - - [ ] Metadata preservation - -### Phase 4: Tools & Validation - -- [x] HTML validation tool - WebGPU inference with layer visualization - - [ ] Command-line argument parsing - - [ ] Shader export orchestration - - [ ] Build orchestration - - [ ] Batch image processing - - [ ] Results display - -- [ ] `src/tools/cnn_test_main.cc` - Tool updates - - [ ] Add `--cnn-version v2` flag - - [ ] CNNv2Effect instantiation path - - [ ] Static features pass execution - - [ ] Multi-layer processing - -### Phase 5: Documentation - -- [ ] `doc/HOWTO.md` - Usage guide - - [ ] Training section (CNN v2) - - [ ] Export section - - [ ] Validation section - - [ ] Examples - -- [ ] `README.md` - Project overview update - - [ ] Mention CNN v2 capability - ---- - -## File Structure - -### New Files - -``` -# Shaders (generated by export script) -workspaces/main/shaders/cnn_v2/cnn_v2_static.wgsl # Static features compute -workspaces/main/shaders/cnn_v2/cnn_v2_layer_0.wgsl # Input layer (generated) -workspaces/main/shaders/cnn_v2/cnn_v2_layer_1.wgsl # Inner layer (generated) -workspaces/main/shaders/cnn_v2/cnn_v2_layer_2.wgsl # Output layer (generated) - -# C++ implementation -src/effects/cnn_v2_effect.h # Effect class header -src/effects/cnn_v2_effect.cc # Effect implementation - -# Python training/export -training/train_cnn_v2.py # Training script -training/export_cnn_v2_shader.py # Shader generator -training/validation/ # Test images directory - -# Validation -tools/cnn_v2_test/index.html # WebGPU validation tool - -# Documentation -doc/CNN_V2.md # This file -``` - -### Modified Files - -``` -src/gpu/demo_effects.h # Add CNNv2Effect include -CMakeLists.txt # Add cnn_v2_effect.cc -workspaces/main/assets.txt # Add cnn_v2 shaders -workspaces/main/timeline.seq # Optional: add CNNv2Effect -src/tests/gpu/test_demo_effects.cc # Add CNNv2 test case -src/tools/cnn_test_main.cc # Add --cnn-version v2 -doc/HOWTO.md # Add CNN v2 sections -TODO.md # Add CNN v2 task -``` - -### Unchanged (v1 Preserved) - -``` -training/train_cnn.py # Original training -src/effects/cnn_effect.* # Original effect -workspaces/main/shaders/cnn_*.wgsl # Original v1 shaders -``` - ---- - -## Performance Characteristics - -### Static Features Compute -- **Cost:** ~0.1ms @ 1080p -- **Frequency:** Once per frame -- **Operations:** sin(), texture sampling, packing - -### CNN Layers (Example 3-layer) -- **Layer0 (1×1, 8→16):** ~0.3ms -- **Layer1 (3×3, 23→8):** ~0.8ms -- **Layer2 (5×5, 15→4):** ~1.2ms -- **Total:** ~2.4ms @ 1080p - -### Memory Usage -- Static features: 1920×1080×8×2 = 33 MB (f16) -- Layer buffers: 1920×1080×16×2 = 66 MB (max 16 channels) -- Weights: ~6.4 KB (f16, in shader code) -- **Total GPU memory:** ~100 MB - ---- - -## Size Budget - -### CNN v1 vs v2 - -| Metric | v1 | v2 | Delta | -|--------|----|----|-------| -| Weights (count) | 800 | 3268 | +2468 | -| Storage (f32) | 3.2 KB | 13.1 KB | +9.9 KB | -| Storage (f16) | N/A | 6.5 KB | +6.5 KB | -| Shader code | ~500 lines | ~800 lines | +300 lines | - -### Mitigation Strategies - -**Reduce channels:** -- [16,8,4] → [8,4,4] saves ~50% weights -- [16,8,4] → [4,4,4] saves ~60% weights - -**Smaller kernels:** -- [1,3,5] → [1,3,3] saves ~30% weights -- [1,3,5] → [1,1,3] saves ~50% weights - -**Quantization:** -- int8 weights: saves 75% (requires QAT training) -- 4-bit weights: saves 87.5% (extreme, needs research) - -**Target:** Keep CNN v2 under 10 KB for 64k demo constraint - ---- - -## Future Extensions - -### Flexible Feature Layout (Binary Format v3) - -**TODO:** Support arbitrary feature vector layouts and ordering in binary format. - -**Current Limitation:** -- Feature layout hardcoded: `[p0, p1, p2, p3, uv_x, uv_y, sin10_x, bias]` -- Shader must match training script exactly -- Experimentation requires shader recompilation - -**Proposed Enhancement:** -- Add feature descriptor to binary format header -- Specify feature types, sources, and ordering -- Runtime shader generation or dynamic feature indexing -- Examples: `[R, G, B, dx, dy, uv_x, bias]` or `[mip1.r, mip2.g, laplacian, uv_x, sin20_x, bias]` - -**Benefits:** -- Training experiments without C++/shader changes -- A/B test different feature combinations -- Single binary format, multiple architectures -- Faster iteration on feature engineering - -**Implementation Options:** -1. **Static approach:** Generate shader code from descriptor at load time -2. **Dynamic approach:** Array-based indexing with feature map uniform -3. **Hybrid:** Precompile common layouts, fallback to dynamic - -See `doc/CNN_V2_BINARY_FORMAT.md` for proposed descriptor format. - ---- - -### More Features (uint8 Packing) - -```wgsl -// 16 uint8 features per texel (texture_storage_2d<rgba8unorm>) -// [R, G, B, D, uv.x, uv.y, sin10.x, sin10.y, -// sin20.x, sin20.y, dx, dy, gray_mip1, gray_mip2, variance, bias] -``` -- Trade precision for quantity -- Requires quantization-aware training - -### Temporal Features - -- Previous frame RGBA (motion awareness) -- Optical flow vectors -- Requires multi-frame buffer - -### Learned Position Encodings - -- Replace hand-crafted sin(10\*uv) with learned embeddings -- Requires separate embedding network -- Similar to NeRF position encoding - -### Dynamic Architecture - -- Runtime kernel size selection based on scene -- Conditional layer execution (skip connections) -- Layer pruning for performance - ---- - -## References - -- **v1 Implementation:** `src/effects/cnn_effect.*` -- **Training Guide:** `doc/HOWTO.md` (CNN Training section) -- **Test Tool:** `doc/CNN_TEST_TOOL.md` -- **Shader System:** `doc/SEQUENCE.md` -- **Size Measurement:** `doc/SIZE_MEASUREMENT.md` - ---- - -## Appendix: Design Decisions - -### Why Bias as Static Feature? - -**Alternatives considered:** -1. Separate bias array per layer (Option B) -2. Bias as static feature = 1.0 (Option A, chosen) - -**Decision rationale:** -- Simpler shader code (fewer bindings) -- Standard NN formulation (augmented input) -- Saves 56-112 bytes per model -- 7 features sufficient for v1 implementation -- Can extend to uint8 packing if >7 features needed - -### Why Float16 for Weights? - -**Alternatives considered:** -1. Keep f32 (larger, more accurate) -2. Use f16 (smaller, GPU-native) -3. Use int8 (smallest, needs QAT) - -**Decision rationale:** -- f16 saves 50% vs f32 (critical for 64k target) -- GPU-native support (pack2x16float in WGSL) -- <0.1% accuracy loss (acceptable) -- Simpler than int8 quantization - -### Why Multi-Frequency Position Encoding? - -**Inspiration:** NeRF (Neural Radiance Fields) - -**Benefits:** -- Helps network learn high-frequency details -- Better than raw UV coordinates -- Small footprint (1D per frequency) - -**Future:** Add sin(20\*uv), sin(40\*uv) if >7 features available - ---- - -## Related Documentation - -- `doc/CNN_V2_BINARY_FORMAT.md` - Binary weight file specification (.bin format) -- `doc/CNN_V2_WEB_TOOL.md` - WebGPU testing tool with layer visualization -- `doc/CNN_TEST_TOOL.md` - C++ offline validation tool (deprecated) -- `doc/HOWTO.md` - Training and validation workflows - ---- - -**Document Version:** 1.0 -**Last Updated:** 2026-02-12 -**Status:** Design approved, ready for implementation diff --git a/doc/CNN_V2_BINARY_FORMAT.md b/doc/CNN_V2_BINARY_FORMAT.md deleted file mode 100644 index 59c859d..0000000 --- a/doc/CNN_V2_BINARY_FORMAT.md +++ /dev/null @@ -1,235 +0,0 @@ -# CNN v2 Binary Weight Format Specification - -Binary format for storing trained CNN v2 weights with static feature architecture. - -**File Extension:** `.bin` -**Byte Order:** Little-endian -**Version:** 2.0 (supports mip-level for parametric features) -**Backward Compatible:** Version 1.0 files supported (mip_level=0) - ---- - -## File Structure - -**Version 2 (current):** -``` -┌─────────────────────┐ -│ Header (20 bytes) │ -├─────────────────────┤ -│ Layer Info │ -│ (20 bytes × N) │ -├─────────────────────┤ -│ Weight Data │ -│ (variable size) │ -└─────────────────────┘ -``` - -**Version 1 (legacy):** -``` -┌─────────────────────┐ -│ Header (16 bytes) │ -├─────────────────────┤ -│ Layer Info │ -│ (20 bytes × N) │ -├─────────────────────┤ -│ Weight Data │ -│ (variable size) │ -└─────────────────────┘ -``` - ---- - -## Header - -**Version 2 (20 bytes):** - -| Offset | Type | Field | Description | -|--------|------|----------------|--------------------------------------| -| 0x00 | u32 | magic | Magic number: `0x32_4E_4E_43` ("CNN2") | -| 0x04 | u32 | version | Format version (2 for current) | -| 0x08 | u32 | num_layers | Number of CNN layers (excludes static features) | -| 0x0C | u32 | total_weights | Total f16 weight count across all layers | -| 0x10 | u32 | mip_level | Mip level for p0-p3 features (0=original, 1=half, 2=quarter, 3=eighth) | - -**Version 1 (16 bytes) - Legacy:** - -| Offset | Type | Field | Description | -|--------|------|----------------|--------------------------------------| -| 0x00 | u32 | magic | Magic number: `0x32_4E_4E_43` ("CNN2") | -| 0x04 | u32 | version | Format version (1) | -| 0x08 | u32 | num_layers | Number of CNN layers | -| 0x0C | u32 | total_weights | Total f16 weight count | - -**Note:** Loaders should check version field and handle both formats. Version 1 files treated as mip_level=0. - ---- - -## Layer Info (20 bytes per layer) - -Repeated `num_layers` times: -- **Version 2:** Starting at offset 0x14 (20 bytes) -- **Version 1:** Starting at offset 0x10 (16 bytes) - -| Offset | Type | Field | Description | -|-------------|------|----------------|--------------------------------------| -| 0x00 | u32 | kernel_size | Convolution kernel dimension (3, 5, 7, etc.) | -| 0x04 | u32 | in_channels | Input channel count (includes 8 static features for Layer 1) | -| 0x08 | u32 | out_channels | Output channel count (max 8) | -| 0x0C | u32 | weight_offset | Weight array start index (f16 units, relative to weight data section) | -| 0x10 | u32 | weight_count | Number of f16 weights for this layer | - -**Layer Order:** Sequential (Layer 1, Layer 2, Layer 3, ...) - ---- - -## Weight Data (variable size) - -Starts at offset: -- **Version 2:** `20 + (num_layers × 20)` -- **Version 1:** `16 + (num_layers × 20)` - -**Format:** Packed f16 pairs stored as u32 -**Packing:** `u32 = (f16_hi << 16) | f16_lo` -**Storage:** Sequential by layer, then by output channel, input channel, spatial position - -**Weight Indexing:** -``` -weight_idx = output_ch × (in_channels × kernel_size²) + - input_ch × kernel_size² + - (ky × kernel_size + kx) -``` - -Where: -- `output_ch` ∈ [0, out_channels) -- `input_ch` ∈ [0, in_channels) -- `ky`, `kx` ∈ [0, kernel_size) - -**Unpacking f16 from u32:** -```c -uint32_t packed = weights_buffer[weight_idx / 2]; -uint16_t f16_bits = (weight_idx % 2 == 0) ? (packed & 0xFFFF) : (packed >> 16); -``` - ---- - -## Example: 3-Layer Network (Version 2) - -**Configuration:** -- Mip level: 0 (original resolution) -- Layer 0: 12→4, kernel 3×3 (432 weights) -- Layer 1: 12→4, kernel 3×3 (432 weights) -- Layer 2: 12→4, kernel 3×3 (432 weights) - -**File Layout:** -``` -Offset Size Content ------- ---- ------- -0x00 20 Header (magic, version=2, layers=3, weights=1296, mip_level=0) -0x14 20 Layer 0 info (kernel=3, in=12, out=4, offset=0, count=432) -0x28 20 Layer 1 info (kernel=3, in=12, out=4, offset=432, count=432) -0x3C 20 Layer 2 info (kernel=3, in=12, out=4, offset=864, count=432) -0x50 2592 Weight data (1296 u32 packed f16 pairs) - ---- -Total: 2672 bytes (~2.6 KB) -``` - ---- - -## Static Features - -Not stored in .bin file (computed at runtime): - -**8D Input Features:** -1. **p0** - Parametric feature 0 (from mip level) -2. **p1** - Parametric feature 1 (from mip level) -3. **p2** - Parametric feature 2 (from mip level) -4. **p3** - Parametric feature 3 (depth or from mip level) -5. **UV_X** - Normalized x coordinate [0,1] -6. **UV_Y** - Normalized y coordinate [0,1] -7. **sin(20 × UV_Y)** - Spatial frequency encoding (vertical, frequency=20) -8. **1.0** - Bias term - -**Mip Level Usage (p0-p3):** -- `mip_level=0`: RGB from original resolution (mip 0) -- `mip_level=1`: RGB from half resolution (mip 1), upsampled -- `mip_level=2`: RGB from quarter resolution (mip 2), upsampled -- `mip_level=3`: RGB from eighth resolution (mip 3), upsampled - -**Layer 0** receives input RGBD (4D) + static features (8D) = 12D input → 4D output. -**Layer 1+** receive previous layer output (4D) + static features (8D) = 12D input → 4D output. - ---- - -## Validation - -**Magic Check:** -```c -uint32_t magic; -fread(&magic, 4, 1, fp); -if (magic != 0x32_4E_4E_43) { error("Invalid CNN v2 file"); } -``` - -**Version Check:** -```c -uint32_t version; -fread(&version, 4, 1, fp); -if (version != 1 && version != 2) { error("Unsupported version"); } -uint32_t header_size = (version == 1) ? 16 : 20; -``` - -**Size Check:** -```c -expected_size = header_size + (num_layers × 20) + (total_weights × 2); -if (file_size != expected_size) { error("Size mismatch"); } -``` - -**Weight Offset Sanity:** -```c -// Each layer's offset should match cumulative count -uint32_t cumulative = 0; -for (int i = 0; i < num_layers; i++) { - if (layers[i].weight_offset != cumulative) { error("Invalid offset"); } - cumulative += layers[i].weight_count; -} -if (cumulative != total_weights) { error("Total mismatch"); } -``` - ---- - -## Future Extensions - -**TODO: Flexible Feature Layout** - -Current limitation: Feature vector layout is hardcoded as `[p0, p1, p2, p3, uv_x, uv_y, sin10_x, bias]`. - -Proposed enhancement for version 3: -- Add feature descriptor section to header -- Specify feature count, types, and ordering -- Support arbitrary 7D feature combinations (e.g., `[R, G, B, dx, dy, uv_x, bias]`) -- Allow runtime shader generation based on descriptor -- Enable experimentation without recompiling shaders - -Example descriptor format: -``` -struct FeatureDescriptor { - u32 feature_count; // Number of features (typically 7-8) - u32 feature_types[8]; // Type enum per feature - u32 feature_sources[8]; // Source enum (mip0, mip1, gradient, etc.) - u32 reserved[8]; // Future use -} -``` - -Benefits: -- Training can experiment with different feature combinations -- No shader recompilation needed -- Single binary format supports multiple architectures -- Easier A/B testing of feature effectiveness - ---- - -## Related Files - -- `training/export_cnn_v2_weights.py` - Binary export tool -- `src/effects/cnn_v2_effect.cc` - C++ loader -- `tools/cnn_v2_test/index.html` - WebGPU validator -- `doc/CNN_V2.md` - Architecture design diff --git a/doc/CNN_V2_DEBUG_TOOLS.md b/doc/CNN_V2_DEBUG_TOOLS.md deleted file mode 100644 index 8d1289a..0000000 --- a/doc/CNN_V2_DEBUG_TOOLS.md +++ /dev/null @@ -1,143 +0,0 @@ -# CNN v2 Debugging Tools - -Tools for investigating CNN v2 mismatch between HTML tool and cnn_test. - ---- - -## Identity Weight Generator - -**Purpose:** Generate trivial .bin files with identity passthrough for debugging. - -**Script:** `training/gen_identity_weights.py` - -**Usage:** -```bash -# 1×1 identity (default) -./training/gen_identity_weights.py workspaces/main/weights/cnn_v2_identity.bin - -# 3×3 identity -./training/gen_identity_weights.py workspaces/main/weights/cnn_v2_identity_3x3.bin --kernel-size 3 - -# Mix mode: 50-50 blend (0.5*p0+0.5*p4, etc) -./training/gen_identity_weights.py output.bin --mix - -# Static features only: p4→ch0, p5→ch1, p6→ch2, p7→ch3 -./training/gen_identity_weights.py output.bin --p47 - -# Custom mip level -./training/gen_identity_weights.py output.bin --kernel-size 1 --mip-level 2 -``` - -**Output:** -- Single layer, 12D→4D (4 input channels + 8 static features) -- Identity mode: Output Ch{0,1,2,3} = Input Ch{0,1,2,3} -- Mix mode (--mix): Output Ch{i} = 0.5*Input Ch{i} + 0.5*Input Ch{i+4} (50-50 blend, avoids overflow) -- Static mode (--p47): Output Ch{i} = Input Ch{i+4} (static features only, visualizes p4-p7) -- Minimal file size (~136 bytes for 1×1, ~904 bytes for 3×3) - -**Validation:** -Load in HTML tool or cnn_test - output should match input (RGB only, ignoring static features). - ---- - -## Composited Layer Visualization - -**Purpose:** Save current layer view as single composited image (4 channels side-by-side, grayscale). - -**Location:** HTML tool - "Layer Visualization" panel - -**Usage:** -1. Load image + weights in HTML tool -2. Select layer to visualize (Static 0-3, Static 4-7, Layer 0, Layer 1, etc.) -3. Click "Save Composited" button -4. Downloads PNG: `composited_layer{N}_{W}x{H}.png` - -**Output:** -- 4 channels stacked horizontally -- Grayscale representation -- Useful for comparing layer activations across tools - ---- - -## Debugging Strategy - -### Track a) Binary Conversion Chain - -**Hypothesis:** Conversion error in .bin ↔ base64 ↔ Float32Array - -**Test:** -1. Generate identity weights: - ```bash - ./training/gen_identity_weights.py workspaces/main/weights/test_identity.bin - ``` - -2. Load in HTML tool - output should match input RGB - -3. If mismatch: - - Check Python export: f16 packing in `export_cnn_v2_weights.py` line 105 - - Check HTML parsing: `unpackF16()` in `index.html` line 805-815 - - Check weight indexing: `get_weight()` shader function - -**Key locations:** -- Python: `np.float16` → `view(np.uint32)` (line 105 of export script) -- JS: `DataView` → `unpackF16()` → manual f16 decode (line 773-803) -- WGSL: `unpack2x16float()` built-in (line 492 of shader) - -### Track b) Layer Visualization - -**Purpose:** Confirm layer outputs match between HTML and C++ - -**Method:** -1. Run identical input through both tools -2. Save composited layers from HTML tool -3. Compare with cnn_test output -4. Use identity weights to isolate weight loading from computation - -### Track c) Trivial Test Case - -**Use identity weights to test:** -- Weight loading (binary parsing) -- Feature generation (static features) -- Convolution (should be passthrough) -- Output packing - -**Expected behavior:** -- Input RGB → Output RGB (exact match) -- Static features ignored (all zeros in identity matrix) - ---- - -## Known Issues - -### ~~Layer 0 Visualization Scale~~ [FIXED] - -**Issue:** Layer 0 output displayed at 0.5× brightness (divided by 2). - -**Cause:** Line 1530 used `vizScale = 0.5` for all CNN layers, but Layer 0 is clamped [0,1] and doesn't need dimming. - -**Fix:** Use scale 1.0 for Layer 0 output (layerIdx=1), 0.5 only for middle layers (ReLU, unbounded). - -### Remaining Mismatch - -**Current:** HTML tool and cnn_test produce different outputs for same input/weights. - -**Suspects:** -1. F16 unpacking difference (CPU vs GPU vs JS) -2. Static feature generation (RGBD, UV, sin encoding) -3. Convolution kernel iteration order -4. Output packing/unpacking - -**Next steps:** -1. Test with identity weights (eliminates weight loading) -2. Compare composited layer outputs -3. Add debug visualization for static features -4. Hex dump comparison (first 8 pixels) - use `--debug-hex` flag in cnn_test - ---- - -## Related Documentation - -- `doc/CNN_V2.md` - CNN v2 architecture -- `doc/CNN_V2_WEB_TOOL.md` - HTML tool documentation -- `doc/CNN_TEST_TOOL.md` - cnn_test CLI tool -- `training/export_cnn_v2_weights.py` - Binary export format diff --git a/doc/CNN_V2_WEB_TOOL.md b/doc/CNN_V2_WEB_TOOL.md deleted file mode 100644 index b6f5b0b..0000000 --- a/doc/CNN_V2_WEB_TOOL.md +++ /dev/null @@ -1,348 +0,0 @@ -# CNN v2 Web Testing Tool - -Browser-based WebGPU tool for validating CNN v2 inference with layer visualization and weight inspection. - -**Location:** `tools/cnn_v2_test/index.html` - ---- - -## Status (2026-02-13) - -**Working:** -- ✅ WebGPU initialization and device setup -- ✅ Binary weight file parsing (v1 and v2 formats) -- ✅ Automatic mip-level detection from binary format v2 -- ✅ Weight statistics (min/max per layer) -- ✅ UI layout with collapsible panels -- ✅ Mode switching (Activations/Weights tabs) -- ✅ Canvas context management (2D for weights, WebGPU for activations) -- ✅ Weight visualization infrastructure (layer selection, grid layout) -- ✅ Layer naming matches codebase convention (Layer 0, Layer 1, Layer 2) -- ✅ Static features split visualization (Static 0-3, Static 4-7) -- ✅ All layers visible including output layer (Layer 2) -- ✅ Video playback support (MP4, WebM) with frame-by-frame controls -- ✅ Video looping (automatic continuous playback) -- ✅ Mip level selection (p0-p3 features at different resolutions) - -**Recent Changes (Latest):** -- Binary format v2 support: Reads mip_level from 20-byte header -- Backward compatible: v1 (16-byte header) → mip_level=0 -- Auto-update UI dropdown when loading weights with mip_level -- Display mip_level in metadata panel -- Code refactoring: Extracted FULLSCREEN_QUAD_VS shader (reused 3× across pipelines) -- Added helper methods: `getDimensions()`, `setVideoControlsEnabled()` -- Improved code organization with section headers and comments -- Moved Mip Level selector to bottom of left sidebar (removed "Features (p0-p3)" label) -- Added `loop` attribute to video element for automatic continuous playback - -**Previous Fixes:** -- Fixed Layer 2 not appearing (was excluded from layerOutputs due to isOutput check) -- Fixed canvas context switching (force clear before recreation) -- Added Static 0-3 / Static 4-7 buttons to view all 8 static feature channels -- Aligned naming with train_cnn_v2.py/.wgsl: Layer 0, Layer 1, Layer 2 (not Layer 1, 2, 3) -- Disabled Static buttons in weights mode (no learnable weights) - -**Known Issues:** -- Layer activation visualization may show black if texture data not properly unpacked -- Weight kernel display depends on correct 2D context creation after canvas recreation - ---- - -## Architecture - -### File Structure -- Single-file HTML tool (~1100 lines) -- Embedded shaders: STATIC_SHADER, CNN_SHADER, DISPLAY_SHADER, LAYER_VIZ_SHADER -- Shared WGSL component: FULLSCREEN_QUAD_VS (reused across render pipelines) -- **Embedded default weights:** DEFAULT_WEIGHTS_B64 (base64-encoded binary v2) - - Current: 4 layers (3×3, 5×5, 3×3, 3×3), 2496 f16 weights, mip_level=2 - - Source: `workspaces/main/weights/cnn_v2_weights.bin` - - Updates: Re-encode binary with `base64 -i <file>` and update constant -- Pure WebGPU (no external dependencies) - -### Code Organization - -**Recent Refactoring (2026-02-13):** -- Extracted `FULLSCREEN_QUAD_VS` constant: Reused fullscreen quad vertex shader (2 triangles covering NDC) -- Added helper methods to CNNTester class: - - `getDimensions()`: Returns current source dimensions (video or image) - - `setVideoControlsEnabled(enabled)`: Centralized video control enable/disable -- Consolidated duplicate vertex shader code (used in mipmap generation, display, layer visualization) -- Added section headers in JavaScript for better navigation -- Improved inline comments explaining shader architecture - -**Benefits:** -- Reduced code duplication (~40 lines saved) -- Easier maintenance (single source of truth for fullscreen quad) -- Clearer separation of concerns - -### Key Components - -**1. Weight Parsing** -- Reads binary format v2: header (20B) + layer info (20B×N) + f16 weights -- Backward compatible with v1: header (16B), mip_level defaults to 0 -- Computes min/max per layer via f16 unpacking -- Stores `{ layers[], weights[], mipLevel, fileSize }` -- Auto-sets UI mip-level dropdown from loaded weights - -**2. CNN Pipeline** -- Static features computation (RGBD + UV + sin + bias → 7D packed) -- Layer-by-layer convolution with storage buffer weights -- Ping-pong buffers for intermediate results -- Copy to persistent textures for visualization - -**3. Visualization Modes** - -**Activations Mode:** -- 4 grayscale views per layer (channels 0-3 of up to 8 total) -- WebGPU compute → unpack f16 → scale → grayscale -- Auto-scale: Static features = 1.0, CNN layers = 0.2 -- Static features: Shows R,G,B,D (first 4 of 8: RGBD+UV+sin+bias) -- CNN layers: Shows first 4 output channels - -**Weights Mode:** -- 2D canvas rendering per output channel -- Shows all input kernels horizontally -- Normalized by layer min/max → [0, 1] → grayscale -- 20px cells, 2px padding between kernels - -### Texture Management - -**Persistent Storage (layerTextures[]):** -- One texture per layer output (static + all CNN layers) -- `rgba32uint` format (packed f16 data) -- `COPY_DST` usage for storing results - -**Compute Buffers (computeTextures[]):** -- 2 textures for ping-pong computation -- Reused across all layers -- `COPY_SRC` usage for copying to persistent storage - -**Pipeline:** -``` -Static pass → copy to layerTextures[0] -For each CNN layer i: - Compute (ping-pong) → copy to layerTextures[i+1] -``` - -### Layer Indexing - -**UI Layer Buttons:** -- "Static" → layerOutputs[0] (7D input features) -- "Layer 1" → layerOutputs[1] (CNN layer 1 output, uses weights.layers[0]) -- "Layer 2" → layerOutputs[2] (CNN layer 2 output, uses weights.layers[1]) -- "Layer N" → layerOutputs[N] (CNN layer N output, uses weights.layers[N-1]) - -**Weights Table:** -- "Layer 1" → weights.layers[0] (first CNN layer weights) -- "Layer 2" → weights.layers[1] (second CNN layer weights) -- "Layer N" → weights.layers[N-1] - -**Consistency:** Both UI and weights table use same numbering (1, 2, 3...) for CNN layers. - ---- - -## Known Issues - -### Issue #1: Layer Activations Show Black - -**Symptom:** -- All 4 channel canvases render black -- UV gradient test (debug mode 10) works -- Raw packed data test (mode 11) shows black -- Unpacked f16 test (mode 12) shows black - -**Diagnosis:** -- Texture access works (UV gradient visible) -- Texture data is all zeros (packed.x = 0) -- Textures being read are empty - -**Root Cause:** -- `copyTextureToTexture` operations may not be executing -- Possible ordering issue (copies not submitted before visualization) -- Alternative: textures created with wrong usage flags - -**Investigation Steps Taken:** -1. Added `onSubmittedWorkDone()` wait before visualization -2. Verified texture creation with `COPY_SRC` and `COPY_DST` flags -3. Confirmed separate texture allocation per layer (no aliasing) -4. Added debug shader modes to isolate issue - -**Next Steps:** -- Verify encoder contains copy commands (add debug logging) -- Check if compute passes actually write data (add known-value test) -- Test copyTextureToTexture in isolation -- Consider CPU readback to verify texture contents - -### Issue #2: Weight Visualization Empty - -**Symptom:** -- Canvases created with correct dimensions (logged) -- No visual output (black canvases) -- Console logs show method execution - -**Potential Causes:** -1. Weight indexing calculation incorrect -2. Canvas not properly attached to DOM when rendering -3. 2D context operations not flushing -4. Min/max normalization producing black (all values equal?) - -**Debug Added:** -- Comprehensive logging of dimensions, indices, ranges -- Canvas context check before rendering - -**Next Steps:** -- Add test rendering (fixed gradient) to verify 2D context works -- Log sample weight values to verify data access -- Check if canvas is visible in DOM inspector -- Verify min/max calculation produces valid range - ---- - -## UI Layout - -### Header -- Controls: Blend slider, Depth input, View mode display -- Drop zone for .bin weight files - -### Content Area - -**Left Sidebar (300px):** -1. Drop zone for .bin weight files -2. Weights Info panel (file size, layer table with min/max) -3. Weights Visualization panel (per-layer kernel display) -4. **Mip Level selector** (bottom) - Select p0/p1/p2 for static features - -**Main Canvas (center):** -- CNN output display with video controls (Play/Pause, Frame ◄/►) -- Supports both PNG images and video files (MP4, WebM) -- Video loops automatically for continuous playback - -**Right Sidebar (panels):** -1. **Layer Visualization Panel** (top, flex: 1) - - Layer selection buttons (Static 0-3, Static 4-7, Layer 0, Layer 1, ...) - - 2×2 grid of channel views (grayscale activations) - - 4× zoom view at bottom - -### Footer -- Status line (GPU timing, dimensions, mode) -- Console log (scrollable, color-coded) - ---- - -## Shader Details - -### LAYER_VIZ_SHADER - -**Purpose:** Display single channel from packed layer texture - -**Inputs:** -- `@binding(0) layer_tex: texture_2d<u32>` - Packed f16 layer data -- `@binding(1) viz_params: vec2<f32>` - (channel_idx, scale) - -**Debug Modes:** -- Channel 10: UV gradient (texture coordinate test) -- Channel 11: Raw packed u32 data -- Channel 12: First unpacked f16 value - -**Normal Operation:** -- Unpack all 8 f16 channels from rgba32uint -- Select channel by index (0-7) -- Apply scale factor (1.0 for static, 0.2 for CNN) -- Clamp to [0, 1] and output grayscale - -**Scale Rationale:** -- Static features (RGBD, UV): already in [0, 1] range -- CNN activations: post-ReLU [0, ~5], need scaling for visibility - ---- - -## Binary Weight Format - -See `doc/CNN_V2_BINARY_FORMAT.md` for complete specification. - -**Quick Summary:** -- Header: 16 bytes (magic, version, layer count, total weights) -- Layer info: 20 bytes × N (kernel size, channels, offsets) -- Weights: Packed f16 pairs as u32 - ---- - -## Testing Workflow - -### Load & Parse -1. Drop PNG image → displays original -2. Drop .bin weights → parses and shows info table -3. Auto-runs CNN pipeline - -### Verify Pipeline -1. Check console for "Running CNN pipeline" -2. Verify "Completed in Xms" -3. Check "Layer visualization ready: N layers" - -### Debug Activations -1. Select "Activations" tab -2. Click layer buttons to switch -3. Check console for texture/canvas logs -4. If black: note which debug modes work (UV vs data) - -### Debug Weights -1. Select "Weights" tab -2. Click Layer 1 or Layer 2 (Layer 0 has no weights) -3. Check console for "Visualizing Layer N weights" -4. Check canvas dimensions logged -5. Verify weight range is non-trivial (not [0, 0]) - ---- - -## Integration with Main Project - -**Training Pipeline:** -```bash -# Generate weights -./training/train_cnn_v2.py --export-binary - -# Test in browser -open tools/cnn_v2_test/index.html -# Drop: workspaces/main/cnn_v2_weights.bin -# Drop: training/input/test.png -``` - -**Validation:** -- Compare against demo CNNv2Effect (visual check) -- Verify layer count matches binary file -- Check weight ranges match training logs - ---- - -## Future Enhancements - -- [ ] Fix layer activation visualization (black texture issue) -- [ ] Fix weight kernel display (empty canvas issue) -- [ ] Add per-channel auto-scaling (compute min/max from visible data) -- [ ] Export rendered outputs (download PNG) -- [ ] Side-by-side comparison with original -- [ ] Heatmap mode (color-coded activations) -- [ ] Weight statistics overlay (mean, std, sparsity) -- [ ] Batch processing (multiple images in sequence) -- [ ] Integration with Python training (live reload) - ---- - -## Code Metrics - -- Total lines: ~1100 -- JavaScript: ~700 lines -- WGSL shaders: ~300 lines -- HTML/CSS: ~100 lines - -**Dependencies:** None (pure WebGPU + HTML5) - ---- - -## Related Files - -- `doc/CNN_V2.md` - CNN v2 architecture and design -- `doc/CNN_TEST_TOOL.md` - C++ offline testing tool (deprecated) -- `training/train_cnn_v2.py` - Training script with binary export -- `workspaces/main/cnn_v2_weights.bin` - Trained weights diff --git a/doc/HOWTO.md b/doc/HOWTO.md index 0dc9ec7..a309b27 100644 --- a/doc/HOWTO.md +++ b/doc/HOWTO.md @@ -145,31 +145,31 @@ Enhanced CNN with parametric static features (7D input: RGBD + UV + sin encoding **Complete Pipeline** (recommended): ```bash # Train → Export → Build → Validate (default config) -./scripts/train_cnn_v2_full.sh +./cnn_v2/scripts/train_cnn_v2_full.sh # Rapid debug (1 layer, 3×3, 5 epochs) -./scripts/train_cnn_v2_full.sh --num-layers 1 --kernel-sizes 3 --epochs 5 --output-weights test.bin +./cnn_v2/scripts/train_cnn_v2_full.sh --num-layers 1 --kernel-sizes 3 --epochs 5 --output-weights test.bin # Custom training parameters -./scripts/train_cnn_v2_full.sh --epochs 500 --batch-size 32 --checkpoint-every 100 +./cnn_v2/scripts/train_cnn_v2_full.sh --epochs 500 --batch-size 32 --checkpoint-every 100 # Custom architecture -./scripts/train_cnn_v2_full.sh --kernel-sizes 3,5,3 --num-layers 3 --mip-level 1 +./cnn_v2/scripts/train_cnn_v2_full.sh --kernel-sizes 3,5,3 --num-layers 3 --mip-level 1 # Custom output path -./scripts/train_cnn_v2_full.sh --output-weights workspaces/test/cnn_weights.bin +./cnn_v2/scripts/train_cnn_v2_full.sh --output-weights workspaces/test/cnn_weights.bin # Grayscale loss (compute loss on luminance instead of RGBA) -./scripts/train_cnn_v2_full.sh --grayscale-loss +./cnn_v2/scripts/train_cnn_v2_full.sh --grayscale-loss # Custom directories -./scripts/train_cnn_v2_full.sh --input training/input --target training/target_2 +./cnn_v2/scripts/train_cnn_v2_full.sh --input training/input --target training/target_2 # Full-image mode (instead of patch-based) -./scripts/train_cnn_v2_full.sh --full-image --image-size 256 +./cnn_v2/scripts/train_cnn_v2_full.sh --full-image --image-size 256 # See all options -./scripts/train_cnn_v2_full.sh --help +./cnn_v2/scripts/train_cnn_v2_full.sh --help ``` **Defaults:** 200 epochs, 3×3 kernels, 8→4→4 channels, batch-size 16, patch-based (8×8, harris detector). @@ -184,33 +184,33 @@ Enhanced CNN with parametric static features (7D input: RGBD + UV + sin encoding **Validation Only** (skip training): ```bash # Use latest checkpoint -./scripts/train_cnn_v2_full.sh --validate +./cnn_v2/scripts/train_cnn_v2_full.sh --validate # Use specific checkpoint -./scripts/train_cnn_v2_full.sh --validate checkpoints/checkpoint_epoch_50.pth +./cnn_v2/scripts/train_cnn_v2_full.sh --validate checkpoints/checkpoint_epoch_50.pth ``` **Manual Training:** ```bash # Default config -./training/train_cnn_v2.py \ +./cnn_v2/training/train_cnn_v2.py \ --input training/input/ --target training/target_2/ \ --epochs 100 --batch-size 16 --checkpoint-every 5 # Custom architecture (per-layer kernel sizes) -./training/train_cnn_v2.py \ +./cnn_v2/training/train_cnn_v2.py \ --input training/input/ --target training/target_2/ \ --kernel-sizes 1,3,5 \ --epochs 5000 --batch-size 16 # Mip-level for p0-p3 features (0=original, 1=half, 2=quarter, 3=eighth) -./training/train_cnn_v2.py \ +./cnn_v2/training/train_cnn_v2.py \ --input training/input/ --target training/target_2/ \ --mip-level 1 \ --epochs 100 --batch-size 16 # Grayscale loss (compute loss on luminance Y = 0.299*R + 0.587*G + 0.114*B) -./training/train_cnn_v2.py \ +./cnn_v2/training/train_cnn_v2.py \ --input training/input/ --target training/target_2/ \ --grayscale-loss \ --epochs 100 --batch-size 16 @@ -236,7 +236,7 @@ Use `--quiet` for streamlined output in scripts (used automatically by train_cnn ``` -**Validation:** Use HTML tool (`tools/cnn_v2_test/index.html`) for CNN v2 validation. See `doc/CNN_V2_WEB_TOOL.md`. +**Validation:** Use HTML tool (`cnn_v2/tools/cnn_v2_test/index.html`) for CNN v2 validation. See `cnn_v2/docs/CNN_V2_WEB_TOOL.md`. --- |
