diff options
Diffstat (limited to 'doc')
| -rw-r--r-- | doc/CNN_TEST_TOOL.md | 19 | ||||
| -rw-r--r-- | doc/CNN_V2.md | 39 | ||||
| -rw-r--r-- | doc/CNN_V2_DEBUG_TOOLS.md | 143 | ||||
| -rw-r--r-- | doc/COMPLETED.md | 9 | ||||
| -rw-r--r-- | doc/HOWTO.md | 21 |
5 files changed, 222 insertions, 9 deletions
diff --git a/doc/CNN_TEST_TOOL.md b/doc/CNN_TEST_TOOL.md index 82d5799..4307894 100644 --- a/doc/CNN_TEST_TOOL.md +++ b/doc/CNN_TEST_TOOL.md @@ -41,10 +41,11 @@ Standalone tool for validating trained CNN shaders with GPU-to-CPU readback. Sup cnn_test input.png output.png [OPTIONS] OPTIONS: - --cnn-version N CNN version: 1 (default) or 2 + --cnn-version N CNN version: 1 (default) or 2 (ignored with --weights) + --weights PATH Load weights from .bin (forces CNN v2, overrides layer config) --blend F Final blend amount (0.0-1.0, default: 1.0) --format ppm|png Output format (default: png) - --layers N Number of CNN layers (1-10, v1 only, default: 3) + --layers N Number of CNN layers (1-10, v1 only, default: 3, ignored with --weights) --save-intermediates DIR Save intermediate layers to directory --debug-hex Print first 8 pixels as hex (debug) --help Show usage @@ -55,9 +56,12 @@ OPTIONS: # CNN v1 (render pipeline, 3 layers) ./build/cnn_test input.png output.png --cnn-version 1 -# CNN v2 (compute, storage buffer, dynamic layers) +# CNN v2 (compute, storage buffer, uses asset system weights) ./build/cnn_test input.png output.png --cnn-version 2 +# CNN v2 with runtime weight loading (loads layer config from .bin) +./build/cnn_test input.png output.png --weights checkpoints/checkpoint_epoch_100.pth.bin + # 50% blend with original (v2) ./build/cnn_test input.png output.png --cnn-version 2 --blend 0.5 @@ -65,6 +69,8 @@ OPTIONS: ./build/cnn_test input.png output.png --cnn-version 2 --debug-hex ``` +**Important:** When using `--weights`, the layer count and kernel sizes are read from the binary file header, overriding any `--layers` or `--cnn-version` arguments. + --- ## Implementation Details @@ -119,6 +125,13 @@ std::vector<uint8_t> OffscreenRenderTarget::read_pixels() { **Binary format:** Header (20B) + layer info (20B×N) + f16 weights +**Weight Loading:** +- **Without `--weights`:** Loads from asset system (`ASSET_WEIGHTS_CNN_V2`) +- **With `--weights PATH`:** Loads from external `.bin` file (e.g., checkpoint exports) + - Layer count and kernel sizes parsed from binary header + - Overrides any `--layers` or `--cnn-version` arguments + - Enables runtime testing of training checkpoints without rebuild + --- ## Build Integration diff --git a/doc/CNN_V2.md b/doc/CNN_V2.md index 577cf9e..2d1d4c4 100644 --- a/doc/CNN_V2.md +++ b/doc/CNN_V2.md @@ -18,15 +18,15 @@ CNN v2 extends the original CNN post-processing effect with parametric static fe - Bias integrated as static feature dimension - Storage buffer architecture (dynamic layer count) - Binary weight format v2 for runtime loading +- Sigmoid activation for layer 0 and final layer (smooth [0,1] mapping) -**Status:** ✅ Complete. Training pipeline functional, validation tools ready, mip-level support integrated. +**Status:** ✅ Complete. Sigmoid activation, stable training, validation tools operational. -**Known Issues:** -- ⚠️ **cnn_test output differs from HTML validation tool** - Visual discrepancy remains after fixing uv_y inversion and Layer 0 activation. Root cause under investigation. Both tools should produce identical output given same weights/input. +**Breaking Change:** +- Models trained with `clamp()` incompatible. Retrain required. **TODO:** - 8-bit quantization with QAT for 2× size reduction (~1.6 KB) -- Debug cnn_test vs HTML tool output difference --- @@ -106,6 +106,12 @@ Input RGBD → Static Features Compute → CNN Layers → Output RGBA - All layers: uniform 12D input, 4D output (ping-pong buffer) - Storage: `texture_storage_2d<rgba32uint>` (4 channels as 2×f16 pairs) +**Activation Functions:** +- Layer 0 & final layer: `sigmoid(x)` for smooth [0,1] mapping +- Middle layers: `ReLU` (max(0, x)) +- Rationale: Sigmoid prevents gradient blocking at boundaries, enabling better convergence +- Breaking change: Models trained with `clamp(x, 0, 1)` are incompatible, retrain required + --- ## Static Features (7D + 1 bias) @@ -136,6 +142,27 @@ let bias = 1.0; // Learned bias per output channel // Packed storage: [p0, p1, p2, p3, uv.x, uv.y, sin(20*uv.y), 1.0] ``` +### Input Channel Mapping + +**Weight tensor layout (12 input channels per layer):** + +| Input Channel | Feature | Description | +|--------------|---------|-------------| +| 0-3 | Previous layer output | 4D RGBA from prior CNN layer (or input RGBD for Layer 0) | +| 4-11 | Static features | 8D: p0, p1, p2, p3, uv_x, uv_y, sin20_y, bias | + +**Static feature channel details:** +- Channel 4 → p0 (RGB.r from mip level) +- Channel 5 → p1 (RGB.g from mip level) +- Channel 6 → p2 (RGB.b from mip level) +- Channel 7 → p3 (depth or RGB channel from mip level) +- Channel 8 → p4 (uv_x: normalized horizontal position) +- Channel 9 → p5 (uv_y: normalized vertical position) +- Channel 10 → p6 (sin(20*uv_y): periodic encoding) +- Channel 11 → p7 (bias: constant 1.0) + +**Note:** When generating identity weights, p4-p7 correspond to input channels 8-11, not 4-7. + ### Feature Rationale | Feature | Dimension | Purpose | Priority | @@ -311,7 +338,7 @@ class CNNv2(nn.Module): # Layer 0: input RGBD (4D) + static (8D) = 12D x = torch.cat([input_rgbd, static_features], dim=1) x = self.layers[0](x) - x = torch.clamp(x, 0, 1) # Output layer 0 (4 channels) + x = torch.sigmoid(x) # Soft [0,1] for layer 0 # Layer 1+: previous output (4D) + static (8D) = 12D for i in range(1, len(self.layers)): @@ -320,7 +347,7 @@ class CNNv2(nn.Module): if i < len(self.layers) - 1: x = F.relu(x) else: - x = torch.clamp(x, 0, 1) # Final output [0,1] + x = torch.sigmoid(x) # Soft [0,1] for final layer return x # RGBA output ``` diff --git a/doc/CNN_V2_DEBUG_TOOLS.md b/doc/CNN_V2_DEBUG_TOOLS.md new file mode 100644 index 0000000..8d1289a --- /dev/null +++ b/doc/CNN_V2_DEBUG_TOOLS.md @@ -0,0 +1,143 @@ +# CNN v2 Debugging Tools + +Tools for investigating CNN v2 mismatch between HTML tool and cnn_test. + +--- + +## Identity Weight Generator + +**Purpose:** Generate trivial .bin files with identity passthrough for debugging. + +**Script:** `training/gen_identity_weights.py` + +**Usage:** +```bash +# 1×1 identity (default) +./training/gen_identity_weights.py workspaces/main/weights/cnn_v2_identity.bin + +# 3×3 identity +./training/gen_identity_weights.py workspaces/main/weights/cnn_v2_identity_3x3.bin --kernel-size 3 + +# Mix mode: 50-50 blend (0.5*p0+0.5*p4, etc) +./training/gen_identity_weights.py output.bin --mix + +# Static features only: p4→ch0, p5→ch1, p6→ch2, p7→ch3 +./training/gen_identity_weights.py output.bin --p47 + +# Custom mip level +./training/gen_identity_weights.py output.bin --kernel-size 1 --mip-level 2 +``` + +**Output:** +- Single layer, 12D→4D (4 input channels + 8 static features) +- Identity mode: Output Ch{0,1,2,3} = Input Ch{0,1,2,3} +- Mix mode (--mix): Output Ch{i} = 0.5*Input Ch{i} + 0.5*Input Ch{i+4} (50-50 blend, avoids overflow) +- Static mode (--p47): Output Ch{i} = Input Ch{i+4} (static features only, visualizes p4-p7) +- Minimal file size (~136 bytes for 1×1, ~904 bytes for 3×3) + +**Validation:** +Load in HTML tool or cnn_test - output should match input (RGB only, ignoring static features). + +--- + +## Composited Layer Visualization + +**Purpose:** Save current layer view as single composited image (4 channels side-by-side, grayscale). + +**Location:** HTML tool - "Layer Visualization" panel + +**Usage:** +1. Load image + weights in HTML tool +2. Select layer to visualize (Static 0-3, Static 4-7, Layer 0, Layer 1, etc.) +3. Click "Save Composited" button +4. Downloads PNG: `composited_layer{N}_{W}x{H}.png` + +**Output:** +- 4 channels stacked horizontally +- Grayscale representation +- Useful for comparing layer activations across tools + +--- + +## Debugging Strategy + +### Track a) Binary Conversion Chain + +**Hypothesis:** Conversion error in .bin ↔ base64 ↔ Float32Array + +**Test:** +1. Generate identity weights: + ```bash + ./training/gen_identity_weights.py workspaces/main/weights/test_identity.bin + ``` + +2. Load in HTML tool - output should match input RGB + +3. If mismatch: + - Check Python export: f16 packing in `export_cnn_v2_weights.py` line 105 + - Check HTML parsing: `unpackF16()` in `index.html` line 805-815 + - Check weight indexing: `get_weight()` shader function + +**Key locations:** +- Python: `np.float16` → `view(np.uint32)` (line 105 of export script) +- JS: `DataView` → `unpackF16()` → manual f16 decode (line 773-803) +- WGSL: `unpack2x16float()` built-in (line 492 of shader) + +### Track b) Layer Visualization + +**Purpose:** Confirm layer outputs match between HTML and C++ + +**Method:** +1. Run identical input through both tools +2. Save composited layers from HTML tool +3. Compare with cnn_test output +4. Use identity weights to isolate weight loading from computation + +### Track c) Trivial Test Case + +**Use identity weights to test:** +- Weight loading (binary parsing) +- Feature generation (static features) +- Convolution (should be passthrough) +- Output packing + +**Expected behavior:** +- Input RGB → Output RGB (exact match) +- Static features ignored (all zeros in identity matrix) + +--- + +## Known Issues + +### ~~Layer 0 Visualization Scale~~ [FIXED] + +**Issue:** Layer 0 output displayed at 0.5× brightness (divided by 2). + +**Cause:** Line 1530 used `vizScale = 0.5` for all CNN layers, but Layer 0 is clamped [0,1] and doesn't need dimming. + +**Fix:** Use scale 1.0 for Layer 0 output (layerIdx=1), 0.5 only for middle layers (ReLU, unbounded). + +### Remaining Mismatch + +**Current:** HTML tool and cnn_test produce different outputs for same input/weights. + +**Suspects:** +1. F16 unpacking difference (CPU vs GPU vs JS) +2. Static feature generation (RGBD, UV, sin encoding) +3. Convolution kernel iteration order +4. Output packing/unpacking + +**Next steps:** +1. Test with identity weights (eliminates weight loading) +2. Compare composited layer outputs +3. Add debug visualization for static features +4. Hex dump comparison (first 8 pixels) - use `--debug-hex` flag in cnn_test + +--- + +## Related Documentation + +- `doc/CNN_V2.md` - CNN v2 architecture +- `doc/CNN_V2_WEB_TOOL.md` - HTML tool documentation +- `doc/CNN_TEST_TOOL.md` - cnn_test CLI tool +- `training/export_cnn_v2_weights.py` - Binary export format diff --git a/doc/COMPLETED.md b/doc/COMPLETED.md index 01c4408..c7b2cae 100644 --- a/doc/COMPLETED.md +++ b/doc/COMPLETED.md @@ -455,3 +455,12 @@ Use `read @doc/archive/FILENAME.md` to access archived documents. - **test_mesh tool**: Implemented a standalone `test_mesh` tool for visualizing OBJ files with debug normal display. - **Task #39: Visual Debugging System**: Implemented a comprehensive set of wireframe primitives (Sphere, Cone, Cross, Line, Trajectory) in `VisualDebug`. Updated `test_3d_render` to demonstrate usage. - **Task #68: Mesh Wireframe Rendering**: Added `add_mesh_wireframe` to `VisualDebug` to visualize triangle edges for mesh objects. Integrated into `Renderer3D` debug path and `test_mesh` tool. + +#### CNN v2 Training Pipeline Improvements (February 14, 2026) 🎯 +- **Critical Training Fixes**: Resolved checkpoint saving and argument handling bugs in CNN v2 training pipeline. **Bug 1 (Missing Checkpoints)**: Training completed successfully but no checkpoint saved when `epochs < checkpoint_every` interval. Solution: Always save final checkpoint after training completes, regardless of interval settings. **Bug 2 (Stale Checkpoints)**: Old checkpoint files from previous runs with different parameters weren't overwritten due to `if not exists` check. Solution: Remove existence check, always overwrite final checkpoint. **Bug 3 (Ignored num_layers)**: When providing comma-separated kernel sizes (e.g., `--kernel-sizes 3,1,3`), the `--num-layers` parameter was used only for validation but not derived from list length. Solution: Derive `num_layers` from kernel_sizes list length when multiple values provided. **Bug 4 (Argument Passing)**: Shell script passed unquoted variables to Python, potentially causing parsing issues with special characters. Solution: Quote all shell variables when passing to Python scripts. + +- **Output Streamlining**: Reduced verbose training pipeline output by 90%. **Export Section**: Added `--quiet` flag to `export_cnn_v2_weights.py`, producing single-line summary instead of detailed layer-by-layer breakdown (e.g., "Exported 3 layers, 912 weights, 1904 bytes → test.bin"). **Validation Section**: Changed from printing 10+ lines per image (loading, processing, saving) to compact single-line format showing all images at once (e.g., "Processing images: img_000 img_001 img_002 ✓"). **Result**: Training pipeline output reduced from ~100 lines to ~30 lines while preserving essential information. Makes rapid iteration more pleasant. + +- **Documentation Updates**: Updated `doc/HOWTO.md` CNN v2 training section to document new behavior: always saves final checkpoint, derives num_layers from kernel_sizes list, uses streamlined output with `--quiet` flag. Added examples for both verbose and quiet export modes. + +- **Files Modified**: `training/train_cnn_v2.py` (checkpoint saving logic, num_layers derivation), `scripts/train_cnn_v2_full.sh` (variable quoting, validation output, checkpoint validation), `training/export_cnn_v2_weights.py` (--quiet flag support), `doc/HOWTO.md` (documentation). **Impact**: Training pipeline now robust for rapid experimentation with different architectures, no longer requires manual checkpoint management or workarounds for short training runs. diff --git a/doc/HOWTO.md b/doc/HOWTO.md index 85ce801..506bf0a 100644 --- a/doc/HOWTO.md +++ b/doc/HOWTO.md @@ -139,12 +139,18 @@ Enhanced CNN with parametric static features (7D input: RGBD + UV + sin encoding # Train → Export → Build → Validate (default config) ./scripts/train_cnn_v2_full.sh +# Rapid debug (1 layer, 3×3, 5 epochs) +./scripts/train_cnn_v2_full.sh --num-layers 1 --kernel-sizes 3 --epochs 5 --output-weights test.bin + # Custom training parameters ./scripts/train_cnn_v2_full.sh --epochs 500 --batch-size 32 --checkpoint-every 100 # Custom architecture ./scripts/train_cnn_v2_full.sh --kernel-sizes 3,5,3 --num-layers 3 --mip-level 1 +# Custom output path +./scripts/train_cnn_v2_full.sh --output-weights workspaces/test/cnn_weights.bin + # Grayscale loss (compute loss on luminance instead of RGBA) ./scripts/train_cnn_v2_full.sh --grayscale-loss @@ -160,8 +166,11 @@ Enhanced CNN with parametric static features (7D input: RGBD + UV + sin encoding **Defaults:** 200 epochs, 3×3 kernels, 8→4→4 channels, batch-size 16, patch-based (8×8, harris detector). - Live progress with single-line update +- Always saves final checkpoint (regardless of --checkpoint-every interval) +- When multiple kernel sizes provided (e.g., 3,5,3), num_layers derived from list length - Validates all input images on final epoch - Exports binary weights (storage buffer architecture) +- Streamlined output: single-line export summary, compact validation - All parameters configurable via command-line **Validation Only** (skip training): @@ -201,12 +210,19 @@ Enhanced CNN with parametric static features (7D input: RGBD + UV + sin encoding **Export Binary Weights:** ```bash +# Verbose output (shows all layer details) ./training/export_cnn_v2_weights.py checkpoints/checkpoint_epoch_100.pth \ --output-weights workspaces/main/cnn_v2_weights.bin + +# Quiet mode (single-line summary) +./training/export_cnn_v2_weights.py checkpoints/checkpoint_epoch_100.pth \ + --output-weights workspaces/main/cnn_v2_weights.bin \ + --quiet ``` Generates binary format: header + layer info + f16 weights (~3.2 KB for 3-layer model). Storage buffer architecture allows dynamic layer count. +Use `--quiet` for streamlined output in scripts (used automatically by train_cnn_v2_full.sh). **TODO:** 8-bit quantization for 2× size reduction (~1.6 KB). Requires quantization-aware training (QAT). @@ -268,6 +284,9 @@ See `doc/ASSET_SYSTEM.md` and `doc/WORKSPACE_SYSTEM.md`. # CNN v2 (recommended, fully functional) ./build/cnn_test input.png output.png --cnn-version 2 +# CNN v2 with runtime weight loading (loads layer config from .bin) +./build/cnn_test input.png output.png --weights checkpoints/checkpoint_epoch_100.pth.bin + # CNN v1 (produces incorrect output, debug only) ./build/cnn_test input.png output.png --cnn-version 1 @@ -282,6 +301,8 @@ See `doc/ASSET_SYSTEM.md` and `doc/WORKSPACE_SYSTEM.md`. - **CNN v2:** ✅ Fully functional, matches CNNv2Effect - **CNN v1:** ⚠️ Produces incorrect output, use CNNEffect in demo for validation +**Note:** `--weights` loads layer count and kernel sizes from the binary file, overriding `--layers` and forcing CNN v2. + See `doc/CNN_TEST_TOOL.md` for full documentation. --- |
