diff options
| author | skal <pascal.massimino@gmail.com> | 2026-02-13 16:52:41 +0100 |
|---|---|---|
| committer | skal <pascal.massimino@gmail.com> | 2026-02-13 16:52:41 +0100 |
| commit | e4c1641201af04c9919410325f4e0865e8b88d5d (patch) | |
| tree | ba554f74ad7ca0cc5619d36cd4c9fcd4707a528a /doc | |
| parent | 250491dc3044549edee8418d680d1e47920833f4 (diff) | |
Doc: Update CNN v2 docs for binary format v2 and mip-level support
Updated documentation to reflect binary format v2 with mip_level field.
Changes:
- CNN_V2_BINARY_FORMAT.md: Document v2 (20-byte header) with mip_level, v1 backward compat
- CNN_V2_WEB_TOOL.md: Document auto-detection of mip_level, UI updates
- CNN_V2.md: Update overview with mip-level feature, training pipeline
Binary format v2:
- Header: 20 bytes (was 16)
- New field: mip_level (u32) at offset 0x10
- Backward compatible: v1 loaders treat as mip_level=0
Documentation complete for full mip-level pipeline integration.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Diffstat (limited to 'doc')
| -rw-r--r-- | doc/CNN_V2.md | 17 | ||||
| -rw-r--r-- | doc/CNN_V2_BINARY_FORMAT.md | 101 | ||||
| -rw-r--r-- | doc/CNN_V2_WEB_TOOL.md | 13 |
3 files changed, 94 insertions, 37 deletions
diff --git a/doc/CNN_V2.md b/doc/CNN_V2.md index 49086ca..a66dc1d 100644 --- a/doc/CNN_V2.md +++ b/doc/CNN_V2.md @@ -11,14 +11,15 @@ CNN v2 extends the original CNN post-processing effect with parametric static fe **Key improvements over v1:** - 7D static feature input (vs 4D RGB) - Multi-frequency position encoding (NeRF-style) +- Configurable mip-level for p0-p3 parametric features (0-3) - Per-layer configurable kernel sizes (1×1, 3×3, 5×5) - Variable channel counts per layer - Float16 weight storage (~3.2 KB for 3-layer model) - Bias integrated as static feature dimension - Storage buffer architecture (dynamic layer count) -- Binary weight format for runtime loading +- Binary weight format v2 for runtime loading -**Status:** ✅ Complete. Training pipeline functional, validation tools ready. +**Status:** ✅ Complete. Training pipeline functional, validation tools ready, mip-level support integrated. **TODO:** 8-bit quantization with QAT for 2× size reduction (~1.6 KB) --- @@ -109,12 +110,12 @@ Input RGBD → Static Features Compute → CNN Layers → Output RGBA ```wgsl // Slot 0-3: Parametric features (p0, p1, p2, p3) -// Can be: mip1/2 RGBD, grayscale, gradients, etc. -// Distinct from input image RGBD (fed only to Layer 0) -let p0 = ...; // Parametric feature 0 (e.g., mip1.r or grayscale) -let p1 = ...; // Parametric feature 1 -let p2 = ...; // Parametric feature 2 -let p3 = ...; // Parametric feature 3 +// Sampled from configurable mip level (0=original, 1=half, 2=quarter, 3=eighth) +// Training sets mip_level via --mip-level flag, stored in binary format v2 +let p0 = ...; // RGB.r from selected mip level +let p1 = ...; // RGB.g from selected mip level +let p2 = ...; // RGB.b from selected mip level +let p3 = ...; // Depth or RGB channel from mip level // Slot 4-5: UV coordinates (normalized screen space) let uv_x = coord.x / resolution.x; // Horizontal position [0,1] diff --git a/doc/CNN_V2_BINARY_FORMAT.md b/doc/CNN_V2_BINARY_FORMAT.md index 650177f..fd758ee 100644 --- a/doc/CNN_V2_BINARY_FORMAT.md +++ b/doc/CNN_V2_BINARY_FORMAT.md @@ -4,12 +4,27 @@ Binary format for storing trained CNN v2 weights with static feature architectur **File Extension:** `.bin` **Byte Order:** Little-endian -**Version:** 1.0 +**Version:** 2.0 (supports mip-level for parametric features) +**Backward Compatible:** Version 1.0 files supported (mip_level=0) --- ## File Structure +**Version 2 (current):** +``` +┌─────────────────────┐ +│ Header (20 bytes) │ +├─────────────────────┤ +│ Layer Info │ +│ (20 bytes × N) │ +├─────────────────────┤ +│ Weight Data │ +│ (variable size) │ +└─────────────────────┘ +``` + +**Version 1 (legacy):** ``` ┌─────────────────────┐ │ Header (16 bytes) │ @@ -24,20 +39,36 @@ Binary format for storing trained CNN v2 weights with static feature architectur --- -## Header (16 bytes) +## Header + +**Version 2 (20 bytes):** | Offset | Type | Field | Description | |--------|------|----------------|--------------------------------------| | 0x00 | u32 | magic | Magic number: `0x32_4E_4E_43` ("CNN2") | -| 0x04 | u32 | version | Format version (currently 1) | +| 0x04 | u32 | version | Format version (2 for current) | | 0x08 | u32 | num_layers | Number of CNN layers (excludes static features) | | 0x0C | u32 | total_weights | Total f16 weight count across all layers | +| 0x10 | u32 | mip_level | Mip level for p0-p3 features (0=original, 1=half, 2=quarter, 3=eighth) | + +**Version 1 (16 bytes) - Legacy:** + +| Offset | Type | Field | Description | +|--------|------|----------------|--------------------------------------| +| 0x00 | u32 | magic | Magic number: `0x32_4E_4E_43` ("CNN2") | +| 0x04 | u32 | version | Format version (1) | +| 0x08 | u32 | num_layers | Number of CNN layers | +| 0x0C | u32 | total_weights | Total f16 weight count | + +**Note:** Loaders should check version field and handle both formats. Version 1 files treated as mip_level=0. --- ## Layer Info (20 bytes per layer) -Repeated `num_layers` times, starting at offset 0x10. +Repeated `num_layers` times: +- **Version 2:** Starting at offset 0x14 (20 bytes) +- **Version 1:** Starting at offset 0x10 (16 bytes) | Offset | Type | Field | Description | |-------------|------|----------------|--------------------------------------| @@ -53,7 +84,9 @@ Repeated `num_layers` times, starting at offset 0x10. ## Weight Data (variable size) -Starts at offset: `16 + (num_layers × 20)` +Starts at offset: +- **Version 2:** `20 + (num_layers × 20)` +- **Version 1:** `16 + (num_layers × 20)` **Format:** Packed f16 pairs stored as u32 **Packing:** `u32 = (f16_hi << 16) | f16_lo` @@ -79,24 +112,25 @@ uint16_t f16_bits = (weight_idx % 2 == 0) ? (packed & 0xFFFF) : (packed >> 16); --- -## Example: 3-Layer Network +## Example: 3-Layer Network (Version 2) **Configuration:** -- Layer 1: 15→8, kernel 3×3 (1,080 weights) -- Layer 2: 8→4, kernel 3×3 (288 weights) -- Layer 3: 4→3, kernel 3×3 (108 weights) +- Mip level: 0 (original resolution) +- Layer 0: 12→4, kernel 3×3 (432 weights) +- Layer 1: 12→4, kernel 3×3 (432 weights) +- Layer 2: 12→4, kernel 3×3 (432 weights) **File Layout:** ``` Offset Size Content ------ ---- ------- -0x00 16 Header (magic, version=1, layers=3, weights=1476) -0x10 20 Layer 1 info (kernel=3, in=15, out=8, offset=0, count=1080) -0x24 20 Layer 2 info (kernel=3, in=8, out=4, offset=1080, count=288) -0x38 20 Layer 3 info (kernel=3, in=4, out=3, offset=1368, count=108) -0x4C 1476 Weight data (738 u32 packed f16 pairs) +0x00 20 Header (magic, version=2, layers=3, weights=1296, mip_level=0) +0x14 20 Layer 0 info (kernel=3, in=12, out=4, offset=0, count=432) +0x28 20 Layer 1 info (kernel=3, in=12, out=4, offset=432, count=432) +0x3C 20 Layer 2 info (kernel=3, in=12, out=4, offset=864, count=432) +0x50 2592 Weight data (1296 u32 packed f16 pairs) ---- -Total: 1528 bytes (~1.5 KB) +Total: 2672 bytes (~2.6 KB) ``` --- @@ -105,17 +139,24 @@ Total: 1528 bytes (~1.5 KB) Not stored in .bin file (computed at runtime): -**7D Input Features (packed as 8 channels):** -1. R (red channel) -2. G (green channel) -3. B (blue channel) -4. D (depth value) -5. UV_X (normalized x coordinate) -6. UV_Y (normalized y coordinate) -7. sin(10 × UV_X) (spatial frequency encoding) -8. 1.0 (bias term) +**8D Input Features:** +1. **p0** - Parametric feature 0 (from mip level) +2. **p1** - Parametric feature 1 (from mip level) +3. **p2** - Parametric feature 2 (from mip level) +4. **p3** - Parametric feature 3 (depth or from mip level) +5. **UV_X** - Normalized x coordinate [0,1] +6. **UV_Y** - Normalized y coordinate [0,1] +7. **sin(10 × UV_X)** - Spatial frequency encoding +8. **1.0** - Bias term + +**Mip Level Usage (p0-p3):** +- `mip_level=0`: RGB from original resolution (mip 0) +- `mip_level=1`: RGB from half resolution (mip 1), upsampled +- `mip_level=2`: RGB from quarter resolution (mip 2), upsampled +- `mip_level=3`: RGB from eighth resolution (mip 3), upsampled -**First CNN layer** receives all 8 static features + 0-7 previous layer outputs (total 8-15 input channels). +**Layer 0** receives input RGBD (4D) + static features (8D) = 12D input → 4D output. +**Layer 1+** receive previous layer output (4D) + static features (8D) = 12D input → 4D output. --- @@ -128,9 +169,17 @@ fread(&magic, 4, 1, fp); if (magic != 0x32_4E_4E_43) { error("Invalid CNN v2 file"); } ``` +**Version Check:** +```c +uint32_t version; +fread(&version, 4, 1, fp); +if (version != 1 && version != 2) { error("Unsupported version"); } +uint32_t header_size = (version == 1) ? 16 : 20; +``` + **Size Check:** ```c -expected_size = 16 + (num_layers × 20) + (total_weights × 2); +expected_size = header_size + (num_layers × 20) + (total_weights × 2); if (file_size != expected_size) { error("Size mismatch"); } ``` diff --git a/doc/CNN_V2_WEB_TOOL.md b/doc/CNN_V2_WEB_TOOL.md index 8c661b2..25f4ec7 100644 --- a/doc/CNN_V2_WEB_TOOL.md +++ b/doc/CNN_V2_WEB_TOOL.md @@ -10,7 +10,8 @@ Browser-based WebGPU tool for validating CNN v2 inference with layer visualizati **Working:** - ✅ WebGPU initialization and device setup -- ✅ Binary weight file parsing (.bin format) +- ✅ Binary weight file parsing (v1 and v2 formats) +- ✅ Automatic mip-level detection from binary format v2 - ✅ Weight statistics (min/max per layer) - ✅ UI layout with collapsible panels - ✅ Mode switching (Activations/Weights tabs) @@ -24,6 +25,10 @@ Browser-based WebGPU tool for validating CNN v2 inference with layer visualizati - ✅ Mip level selection (p0-p3 features at different resolutions) **Recent Changes (Latest):** +- Binary format v2 support: Reads mip_level from 20-byte header +- Backward compatible: v1 (16-byte header) → mip_level=0 +- Auto-update UI dropdown when loading weights with mip_level +- Display mip_level in metadata panel - Code refactoring: Extracted FULLSCREEN_QUAD_VS shader (reused 3× across pipelines) - Added helper methods: `getDimensions()`, `setVideoControlsEnabled()` - Improved code organization with section headers and comments @@ -70,9 +75,11 @@ Browser-based WebGPU tool for validating CNN v2 inference with layer visualizati ### Key Components **1. Weight Parsing** -- Reads binary format: header (16B) + layer info (20B×N) + f16 weights +- Reads binary format v2: header (20B) + layer info (20B×N) + f16 weights +- Backward compatible with v1: header (16B), mip_level defaults to 0 - Computes min/max per layer via f16 unpacking -- Stores `{ layers[], weights[], fileSize }` +- Stores `{ layers[], weights[], mipLevel, fileSize }` +- Auto-sets UI mip-level dropdown from loaded weights **2. CNN Pipeline** - Static features computation (RGBD + UV + sin + bias → 7D packed) |
