diff options
Diffstat (limited to 'doc')
| -rw-r--r-- | doc/CNN_TEST_TOOL.md | 19 | ||||
| -rw-r--r-- | doc/CNN_V2.md | 39 | ||||
| -rw-r--r-- | doc/CNN_V2_DEBUG_TOOLS.md | 11 | ||||
| -rw-r--r-- | doc/COMPLETED.md | 9 | ||||
| -rw-r--r-- | doc/HOWTO.md | 21 |
5 files changed, 88 insertions, 11 deletions
diff --git a/doc/CNN_TEST_TOOL.md b/doc/CNN_TEST_TOOL.md index 82d5799..4307894 100644 --- a/doc/CNN_TEST_TOOL.md +++ b/doc/CNN_TEST_TOOL.md @@ -41,10 +41,11 @@ Standalone tool for validating trained CNN shaders with GPU-to-CPU readback. Sup cnn_test input.png output.png [OPTIONS] OPTIONS: - --cnn-version N CNN version: 1 (default) or 2 + --cnn-version N CNN version: 1 (default) or 2 (ignored with --weights) + --weights PATH Load weights from .bin (forces CNN v2, overrides layer config) --blend F Final blend amount (0.0-1.0, default: 1.0) --format ppm|png Output format (default: png) - --layers N Number of CNN layers (1-10, v1 only, default: 3) + --layers N Number of CNN layers (1-10, v1 only, default: 3, ignored with --weights) --save-intermediates DIR Save intermediate layers to directory --debug-hex Print first 8 pixels as hex (debug) --help Show usage @@ -55,9 +56,12 @@ OPTIONS: # CNN v1 (render pipeline, 3 layers) ./build/cnn_test input.png output.png --cnn-version 1 -# CNN v2 (compute, storage buffer, dynamic layers) +# CNN v2 (compute, storage buffer, uses asset system weights) ./build/cnn_test input.png output.png --cnn-version 2 +# CNN v2 with runtime weight loading (loads layer config from .bin) +./build/cnn_test input.png output.png --weights checkpoints/checkpoint_epoch_100.pth.bin + # 50% blend with original (v2) ./build/cnn_test input.png output.png --cnn-version 2 --blend 0.5 @@ -65,6 +69,8 @@ OPTIONS: ./build/cnn_test input.png output.png --cnn-version 2 --debug-hex ``` +**Important:** When using `--weights`, the layer count and kernel sizes are read from the binary file header, overriding any `--layers` or `--cnn-version` arguments. + --- ## Implementation Details @@ -119,6 +125,13 @@ std::vector<uint8_t> OffscreenRenderTarget::read_pixels() { **Binary format:** Header (20B) + layer info (20B×N) + f16 weights +**Weight Loading:** +- **Without `--weights`:** Loads from asset system (`ASSET_WEIGHTS_CNN_V2`) +- **With `--weights PATH`:** Loads from external `.bin` file (e.g., checkpoint exports) + - Layer count and kernel sizes parsed from binary header + - Overrides any `--layers` or `--cnn-version` arguments + - Enables runtime testing of training checkpoints without rebuild + --- ## Build Integration diff --git a/doc/CNN_V2.md b/doc/CNN_V2.md index 577cf9e..2d1d4c4 100644 --- a/doc/CNN_V2.md +++ b/doc/CNN_V2.md @@ -18,15 +18,15 @@ CNN v2 extends the original CNN post-processing effect with parametric static fe - Bias integrated as static feature dimension - Storage buffer architecture (dynamic layer count) - Binary weight format v2 for runtime loading +- Sigmoid activation for layer 0 and final layer (smooth [0,1] mapping) -**Status:** ✅ Complete. Training pipeline functional, validation tools ready, mip-level support integrated. +**Status:** ✅ Complete. Sigmoid activation, stable training, validation tools operational. -**Known Issues:** -- ⚠️ **cnn_test output differs from HTML validation tool** - Visual discrepancy remains after fixing uv_y inversion and Layer 0 activation. Root cause under investigation. Both tools should produce identical output given same weights/input. +**Breaking Change:** +- Models trained with `clamp()` incompatible. Retrain required. **TODO:** - 8-bit quantization with QAT for 2× size reduction (~1.6 KB) -- Debug cnn_test vs HTML tool output difference --- @@ -106,6 +106,12 @@ Input RGBD → Static Features Compute → CNN Layers → Output RGBA - All layers: uniform 12D input, 4D output (ping-pong buffer) - Storage: `texture_storage_2d<rgba32uint>` (4 channels as 2×f16 pairs) +**Activation Functions:** +- Layer 0 & final layer: `sigmoid(x)` for smooth [0,1] mapping +- Middle layers: `ReLU` (max(0, x)) +- Rationale: Sigmoid prevents gradient blocking at boundaries, enabling better convergence +- Breaking change: Models trained with `clamp(x, 0, 1)` are incompatible, retrain required + --- ## Static Features (7D + 1 bias) @@ -136,6 +142,27 @@ let bias = 1.0; // Learned bias per output channel // Packed storage: [p0, p1, p2, p3, uv.x, uv.y, sin(20*uv.y), 1.0] ``` +### Input Channel Mapping + +**Weight tensor layout (12 input channels per layer):** + +| Input Channel | Feature | Description | +|--------------|---------|-------------| +| 0-3 | Previous layer output | 4D RGBA from prior CNN layer (or input RGBD for Layer 0) | +| 4-11 | Static features | 8D: p0, p1, p2, p3, uv_x, uv_y, sin20_y, bias | + +**Static feature channel details:** +- Channel 4 → p0 (RGB.r from mip level) +- Channel 5 → p1 (RGB.g from mip level) +- Channel 6 → p2 (RGB.b from mip level) +- Channel 7 → p3 (depth or RGB channel from mip level) +- Channel 8 → p4 (uv_x: normalized horizontal position) +- Channel 9 → p5 (uv_y: normalized vertical position) +- Channel 10 → p6 (sin(20*uv_y): periodic encoding) +- Channel 11 → p7 (bias: constant 1.0) + +**Note:** When generating identity weights, p4-p7 correspond to input channels 8-11, not 4-7. + ### Feature Rationale | Feature | Dimension | Purpose | Priority | @@ -311,7 +338,7 @@ class CNNv2(nn.Module): # Layer 0: input RGBD (4D) + static (8D) = 12D x = torch.cat([input_rgbd, static_features], dim=1) x = self.layers[0](x) - x = torch.clamp(x, 0, 1) # Output layer 0 (4 channels) + x = torch.sigmoid(x) # Soft [0,1] for layer 0 # Layer 1+: previous output (4D) + static (8D) = 12D for i in range(1, len(self.layers)): @@ -320,7 +347,7 @@ class CNNv2(nn.Module): if i < len(self.layers) - 1: x = F.relu(x) else: - x = torch.clamp(x, 0, 1) # Final output [0,1] + x = torch.sigmoid(x) # Soft [0,1] for final layer return x # RGBA output ``` diff --git a/doc/CNN_V2_DEBUG_TOOLS.md b/doc/CNN_V2_DEBUG_TOOLS.md index b6dc65f..8d1289a 100644 --- a/doc/CNN_V2_DEBUG_TOOLS.md +++ b/doc/CNN_V2_DEBUG_TOOLS.md @@ -18,14 +18,21 @@ Tools for investigating CNN v2 mismatch between HTML tool and cnn_test. # 3×3 identity ./training/gen_identity_weights.py workspaces/main/weights/cnn_v2_identity_3x3.bin --kernel-size 3 +# Mix mode: 50-50 blend (0.5*p0+0.5*p4, etc) +./training/gen_identity_weights.py output.bin --mix + +# Static features only: p4→ch0, p5→ch1, p6→ch2, p7→ch3 +./training/gen_identity_weights.py output.bin --p47 + # Custom mip level ./training/gen_identity_weights.py output.bin --kernel-size 1 --mip-level 2 ``` **Output:** - Single layer, 12D→4D (4 input channels + 8 static features) -- Identity matrix: Output Ch{0,1,2,3} = Input Ch{0,1,2,3} -- Static features (Ch 4-11) are zeroed +- Identity mode: Output Ch{0,1,2,3} = Input Ch{0,1,2,3} +- Mix mode (--mix): Output Ch{i} = 0.5*Input Ch{i} + 0.5*Input Ch{i+4} (50-50 blend, avoids overflow) +- Static mode (--p47): Output Ch{i} = Input Ch{i+4} (static features only, visualizes p4-p7) - Minimal file size (~136 bytes for 1×1, ~904 bytes for 3×3) **Validation:** diff --git a/doc/COMPLETED.md b/doc/COMPLETED.md index 01c4408..c7b2cae 100644 --- a/doc/COMPLETED.md +++ b/doc/COMPLETED.md @@ -455,3 +455,12 @@ Use `read @doc/archive/FILENAME.md` to access archived documents. - **test_mesh tool**: Implemented a standalone `test_mesh` tool for visualizing OBJ files with debug normal display. - **Task #39: Visual Debugging System**: Implemented a comprehensive set of wireframe primitives (Sphere, Cone, Cross, Line, Trajectory) in `VisualDebug`. Updated `test_3d_render` to demonstrate usage. - **Task #68: Mesh Wireframe Rendering**: Added `add_mesh_wireframe` to `VisualDebug` to visualize triangle edges for mesh objects. Integrated into `Renderer3D` debug path and `test_mesh` tool. + +#### CNN v2 Training Pipeline Improvements (February 14, 2026) 🎯 +- **Critical Training Fixes**: Resolved checkpoint saving and argument handling bugs in CNN v2 training pipeline. **Bug 1 (Missing Checkpoints)**: Training completed successfully but no checkpoint saved when `epochs < checkpoint_every` interval. Solution: Always save final checkpoint after training completes, regardless of interval settings. **Bug 2 (Stale Checkpoints)**: Old checkpoint files from previous runs with different parameters weren't overwritten due to `if not exists` check. Solution: Remove existence check, always overwrite final checkpoint. **Bug 3 (Ignored num_layers)**: When providing comma-separated kernel sizes (e.g., `--kernel-sizes 3,1,3`), the `--num-layers` parameter was used only for validation but not derived from list length. Solution: Derive `num_layers` from kernel_sizes list length when multiple values provided. **Bug 4 (Argument Passing)**: Shell script passed unquoted variables to Python, potentially causing parsing issues with special characters. Solution: Quote all shell variables when passing to Python scripts. + +- **Output Streamlining**: Reduced verbose training pipeline output by 90%. **Export Section**: Added `--quiet` flag to `export_cnn_v2_weights.py`, producing single-line summary instead of detailed layer-by-layer breakdown (e.g., "Exported 3 layers, 912 weights, 1904 bytes → test.bin"). **Validation Section**: Changed from printing 10+ lines per image (loading, processing, saving) to compact single-line format showing all images at once (e.g., "Processing images: img_000 img_001 img_002 ✓"). **Result**: Training pipeline output reduced from ~100 lines to ~30 lines while preserving essential information. Makes rapid iteration more pleasant. + +- **Documentation Updates**: Updated `doc/HOWTO.md` CNN v2 training section to document new behavior: always saves final checkpoint, derives num_layers from kernel_sizes list, uses streamlined output with `--quiet` flag. Added examples for both verbose and quiet export modes. + +- **Files Modified**: `training/train_cnn_v2.py` (checkpoint saving logic, num_layers derivation), `scripts/train_cnn_v2_full.sh` (variable quoting, validation output, checkpoint validation), `training/export_cnn_v2_weights.py` (--quiet flag support), `doc/HOWTO.md` (documentation). **Impact**: Training pipeline now robust for rapid experimentation with different architectures, no longer requires manual checkpoint management or workarounds for short training runs. diff --git a/doc/HOWTO.md b/doc/HOWTO.md index 85ce801..506bf0a 100644 --- a/doc/HOWTO.md +++ b/doc/HOWTO.md @@ -139,12 +139,18 @@ Enhanced CNN with parametric static features (7D input: RGBD + UV + sin encoding # Train → Export → Build → Validate (default config) ./scripts/train_cnn_v2_full.sh +# Rapid debug (1 layer, 3×3, 5 epochs) +./scripts/train_cnn_v2_full.sh --num-layers 1 --kernel-sizes 3 --epochs 5 --output-weights test.bin + # Custom training parameters ./scripts/train_cnn_v2_full.sh --epochs 500 --batch-size 32 --checkpoint-every 100 # Custom architecture ./scripts/train_cnn_v2_full.sh --kernel-sizes 3,5,3 --num-layers 3 --mip-level 1 +# Custom output path +./scripts/train_cnn_v2_full.sh --output-weights workspaces/test/cnn_weights.bin + # Grayscale loss (compute loss on luminance instead of RGBA) ./scripts/train_cnn_v2_full.sh --grayscale-loss @@ -160,8 +166,11 @@ Enhanced CNN with parametric static features (7D input: RGBD + UV + sin encoding **Defaults:** 200 epochs, 3×3 kernels, 8→4→4 channels, batch-size 16, patch-based (8×8, harris detector). - Live progress with single-line update +- Always saves final checkpoint (regardless of --checkpoint-every interval) +- When multiple kernel sizes provided (e.g., 3,5,3), num_layers derived from list length - Validates all input images on final epoch - Exports binary weights (storage buffer architecture) +- Streamlined output: single-line export summary, compact validation - All parameters configurable via command-line **Validation Only** (skip training): @@ -201,12 +210,19 @@ Enhanced CNN with parametric static features (7D input: RGBD + UV + sin encoding **Export Binary Weights:** ```bash +# Verbose output (shows all layer details) ./training/export_cnn_v2_weights.py checkpoints/checkpoint_epoch_100.pth \ --output-weights workspaces/main/cnn_v2_weights.bin + +# Quiet mode (single-line summary) +./training/export_cnn_v2_weights.py checkpoints/checkpoint_epoch_100.pth \ + --output-weights workspaces/main/cnn_v2_weights.bin \ + --quiet ``` Generates binary format: header + layer info + f16 weights (~3.2 KB for 3-layer model). Storage buffer architecture allows dynamic layer count. +Use `--quiet` for streamlined output in scripts (used automatically by train_cnn_v2_full.sh). **TODO:** 8-bit quantization for 2× size reduction (~1.6 KB). Requires quantization-aware training (QAT). @@ -268,6 +284,9 @@ See `doc/ASSET_SYSTEM.md` and `doc/WORKSPACE_SYSTEM.md`. # CNN v2 (recommended, fully functional) ./build/cnn_test input.png output.png --cnn-version 2 +# CNN v2 with runtime weight loading (loads layer config from .bin) +./build/cnn_test input.png output.png --weights checkpoints/checkpoint_epoch_100.pth.bin + # CNN v1 (produces incorrect output, debug only) ./build/cnn_test input.png output.png --cnn-version 1 @@ -282,6 +301,8 @@ See `doc/ASSET_SYSTEM.md` and `doc/WORKSPACE_SYSTEM.md`. - **CNN v2:** ✅ Fully functional, matches CNNv2Effect - **CNN v1:** ⚠️ Produces incorrect output, use CNNEffect in demo for validation +**Note:** `--weights` loads layer count and kernel sizes from the binary file, overriding `--layers` and forcing CNN v2. + See `doc/CNN_TEST_TOOL.md` for full documentation. --- |
