Doc: Update CNN v2 docs for binary format v2 and mip-level support

Updated documentation to reflect binary format v2 with mip_level field. Changes: - CNN_V2_BINARY_FORMAT.md: Document v2 (20-byte header) with mip_level, v1 backward compat - CNN_V2_WEB_TOOL.md: Document auto-detection of mip_level, UI updates - CNN_V2.md: Update overview with mip-level feature, training pipeline Binary format v2: - Header: 20 bytes (was 16) - New field: mip_level (u32) at offset 0x10 - Backward compatible: v1 loaders treat as mip_level=0 Documentation complete for full mip-level pipeline integration. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
author: skal <pascal.massimino@gmail.com> 2026-02-13 16:52:41 +0100
committer: skal <pascal.massimino@gmail.com> 2026-02-13 16:52:41 +0100
commit: e4c1641201af04c9919410325f4e0865e8b88d5d (patch)
tree: ba554f74ad7ca0cc5619d36cd4c9fcd4707a528a
parent: 250491dc3044549edee8418d680d1e47920833f4 (diff)
3 files changed, 94 insertions, 37 deletions
diff --git a/doc/CNN_V2.md b/doc/CNN_V2.md
index 49086ca..a66dc1d 100644
--- a/doc/CNN_V2.md
+++ b/doc/CNN_V2.md
@@ -11,14 +11,15 @@ CNN v2 extends the original CNN post-processing effect with parametric static fe
 **Key improvements over v1:**
 - 7D static feature input (vs 4D RGB)
 - Multi-frequency position encoding (NeRF-style)
+- Configurable mip-level for p0-p3 parametric features (0-3)
 - Per-layer configurable kernel sizes (1×1, 3×3, 5×5)
 - Variable channel counts per layer
 - Float16 weight storage (~3.2 KB for 3-layer model)
 - Bias integrated as static feature dimension
 - Storage buffer architecture (dynamic layer count)
-- Binary weight format for runtime loading
+- Binary weight format v2 for runtime loading
 
-**Status:** ✅ Complete. Training pipeline functional, validation tools ready.
+**Status:** ✅ Complete. Training pipeline functional, validation tools ready, mip-level support integrated.
 **TODO:** 8-bit quantization with QAT for 2× size reduction (~1.6 KB)
 
 ---
@@ -109,12 +110,12 @@ Input RGBD → Static Features Compute → CNN Layers → Output RGBA
 
 ```wgsl
 // Slot 0-3: Parametric features (p0, p1, p2, p3)
-// Can be: mip1/2 RGBD, grayscale, gradients, etc.
-// Distinct from input image RGBD (fed only to Layer 0)
-let p0 = ...;  // Parametric feature 0 (e.g., mip1.r or grayscale)
-let p1 = ...;  // Parametric feature 1
-let p2 = ...;  // Parametric feature 2
-let p3 = ...;  // Parametric feature 3
+// Sampled from configurable mip level (0=original, 1=half, 2=quarter, 3=eighth)
+// Training sets mip_level via --mip-level flag, stored in binary format v2
+let p0 = ...;  // RGB.r from selected mip level
+let p1 = ...;  // RGB.g from selected mip level
+let p2 = ...;  // RGB.b from selected mip level
+let p3 = ...;  // Depth or RGB channel from mip level
 
 // Slot 4-5: UV coordinates (normalized screen space)
 let uv_x = coord.x / resolution.x;  // Horizontal position [0,1]
diff --git a/doc/CNN_V2_BINARY_FORMAT.md b/doc/CNN_V2_BINARY_FORMAT.md
index 650177f..fd758ee 100644
--- a/doc/CNN_V2_BINARY_FORMAT.md
+++ b/doc/CNN_V2_BINARY_FORMAT.md
@@ -4,12 +4,27 @@ Binary format for storing trained CNN v2 weights with static feature architectur
 
 **File Extension:** `.bin`
 **Byte Order:** Little-endian
-**Version:** 1.0
+**Version:** 2.0 (supports mip-level for parametric features)
+**Backward Compatible:** Version 1.0 files supported (mip_level=0)
 
 ---
 
 ## File Structure
 
+**Version 2 (current):**
+```
+┌─────────────────────┐
+│  Header (20 bytes)  │
+├─────────────────────┤
+│  Layer Info         │
+│  (20 bytes × N)     │
+├─────────────────────┤
+│  Weight Data        │
+│  (variable size)    │
+└─────────────────────┘
+```
+
+**Version 1 (legacy):**
 ```
 ┌─────────────────────┐
 │  Header (16 bytes)  │
@@ -24,20 +39,36 @@ Binary format for storing trained CNN v2 weights with static feature architectur
 
 ---
 
-## Header (16 bytes)
+## Header
+
+**Version 2 (20 bytes):**
 
 | Offset | Type | Field          | Description                          |
 |--------|------|----------------|--------------------------------------|
 | 0x00   | u32  | magic          | Magic number: `0x32_4E_4E_43` ("CNN2") |
-| 0x04   | u32  | version        | Format version (currently 1)         |
+| 0x04   | u32  | version        | Format version (2 for current)       |
 | 0x08   | u32  | num_layers     | Number of CNN layers (excludes static features) |
 | 0x0C   | u32  | total_weights  | Total f16 weight count across all layers |
+| 0x10   | u32  | mip_level      | Mip level for p0-p3 features (0=original, 1=half, 2=quarter, 3=eighth) |
+
+**Version 1 (16 bytes) - Legacy:**
+
+| Offset | Type | Field          | Description                          |
+|--------|------|----------------|--------------------------------------|
+| 0x00   | u32  | magic          | Magic number: `0x32_4E_4E_43` ("CNN2") |
+| 0x04   | u32  | version        | Format version (1)                   |
+| 0x08   | u32  | num_layers     | Number of CNN layers                 |
+| 0x0C   | u32  | total_weights  | Total f16 weight count               |
+
+**Note:** Loaders should check version field and handle both formats. Version 1 files treated as mip_level=0.
 
 ---
 
 ## Layer Info (20 bytes per layer)
 
-Repeated `num_layers` times, starting at offset 0x10.
+Repeated `num_layers` times:
+- **Version 2:** Starting at offset 0x14 (20 bytes)
+- **Version 1:** Starting at offset 0x10 (16 bytes)
 
 | Offset      | Type | Field          | Description                          |
 |-------------|------|----------------|--------------------------------------|
@@ -53,7 +84,9 @@ Repeated `num_layers` times, starting at offset 0x10.
 
 ## Weight Data (variable size)
 
-Starts at offset: `16 + (num_layers × 20)`
+Starts at offset:
+- **Version 2:** `20 + (num_layers × 20)`
+- **Version 1:** `16 + (num_layers × 20)`
 
 **Format:** Packed f16 pairs stored as u32
 **Packing:** `u32 = (f16_hi << 16) | f16_lo`
@@ -79,24 +112,25 @@ uint16_t f16_bits = (weight_idx % 2 == 0) ? (packed & 0xFFFF) : (packed >> 16);
 
 ---
 
-## Example: 3-Layer Network
+## Example: 3-Layer Network (Version 2)
 
 **Configuration:**
-- Layer 1: 15→8, kernel 3×3 (1,080 weights)
-- Layer 2: 8→4, kernel 3×3 (288 weights)
-- Layer 3: 4→3, kernel 3×3 (108 weights)
+- Mip level: 0 (original resolution)
+- Layer 0: 12→4, kernel 3×3 (432 weights)
+- Layer 1: 12→4, kernel 3×3 (432 weights)
+- Layer 2: 12→4, kernel 3×3 (432 weights)
 
 **File Layout:**
 ```
 Offset   Size   Content
 ------   ----   -------
-0x00     16     Header (magic, version=1, layers=3, weights=1476)
-0x10     20     Layer 1 info (kernel=3, in=15, out=8, offset=0, count=1080)
-0x24     20     Layer 2 info (kernel=3, in=8, out=4, offset=1080, count=288)
-0x38     20     Layer 3 info (kernel=3, in=4, out=3, offset=1368, count=108)
-0x4C     1476   Weight data (738 u32 packed f16 pairs)
+0x00     20     Header (magic, version=2, layers=3, weights=1296, mip_level=0)
+0x14     20     Layer 0 info (kernel=3, in=12, out=4, offset=0, count=432)
+0x28     20     Layer 1 info (kernel=3, in=12, out=4, offset=432, count=432)
+0x3C     20     Layer 2 info (kernel=3, in=12, out=4, offset=864, count=432)
+0x50     2592   Weight data (1296 u32 packed f16 pairs)
          ----
-Total:   1528 bytes (~1.5 KB)
+Total:   2672 bytes (~2.6 KB)
 ```
 
 ---
@@ -105,17 +139,24 @@ Total:   1528 bytes (~1.5 KB)
 
 Not stored in .bin file (computed at runtime):
 
-**7D Input Features (packed as 8 channels):**
-1. R (red channel)
-2. G (green channel)
-3. B (blue channel)
-4. D (depth value)
-5. UV_X (normalized x coordinate)
-6. UV_Y (normalized y coordinate)
-7. sin(10 × UV_X) (spatial frequency encoding)
-8. 1.0 (bias term)
+**8D Input Features:**
+1. **p0** - Parametric feature 0 (from mip level)
+2. **p1** - Parametric feature 1 (from mip level)
+3. **p2** - Parametric feature 2 (from mip level)
+4. **p3** - Parametric feature 3 (depth or from mip level)
+5. **UV_X** - Normalized x coordinate [0,1]
+6. **UV_Y** - Normalized y coordinate [0,1]
+7. **sin(10 × UV_X)** - Spatial frequency encoding
+8. **1.0** - Bias term
+
+**Mip Level Usage (p0-p3):**
+- `mip_level=0`: RGB from original resolution (mip 0)
+- `mip_level=1`: RGB from half resolution (mip 1), upsampled
+- `mip_level=2`: RGB from quarter resolution (mip 2), upsampled
+- `mip_level=3`: RGB from eighth resolution (mip 3), upsampled
 
-**First CNN layer** receives all 8 static features + 0-7 previous layer outputs (total 8-15 input channels).
+**Layer 0** receives input RGBD (4D) + static features (8D) = 12D input → 4D output.
+**Layer 1+** receive previous layer output (4D) + static features (8D) = 12D input → 4D output.
 
 ---
 
@@ -128,9 +169,17 @@ fread(&magic, 4, 1, fp);
 if (magic != 0x32_4E_4E_43) { error("Invalid CNN v2 file"); }
 ```
 
+**Version Check:**
+```c
+uint32_t version;
+fread(&version, 4, 1, fp);
+if (version != 1 && version != 2) { error("Unsupported version"); }
+uint32_t header_size = (version == 1) ? 16 : 20;
+```
+
 **Size Check:**
 ```c
-expected_size = 16 + (num_layers × 20) + (total_weights × 2);
+expected_size = header_size + (num_layers × 20) + (total_weights × 2);
 if (file_size != expected_size) { error("Size mismatch"); }
 ```
 
diff --git a/doc/CNN_V2_WEB_TOOL.md b/doc/CNN_V2_WEB_TOOL.md
index 8c661b2..25f4ec7 100644
--- a/doc/CNN_V2_WEB_TOOL.md
+++ b/doc/CNN_V2_WEB_TOOL.md
@@ -10,7 +10,8 @@ Browser-based WebGPU tool for validating CNN v2 inference with layer visualizati
 
 **Working:**
 - ✅ WebGPU initialization and device setup
-- ✅ Binary weight file parsing (.bin format)
+- ✅ Binary weight file parsing (v1 and v2 formats)
+- ✅ Automatic mip-level detection from binary format v2
 - ✅ Weight statistics (min/max per layer)
 - ✅ UI layout with collapsible panels
 - ✅ Mode switching (Activations/Weights tabs)
@@ -24,6 +25,10 @@ Browser-based WebGPU tool for validating CNN v2 inference with layer visualizati
 - ✅ Mip level selection (p0-p3 features at different resolutions)
 
 **Recent Changes (Latest):**
+- Binary format v2 support: Reads mip_level from 20-byte header
+- Backward compatible: v1 (16-byte header) → mip_level=0
+- Auto-update UI dropdown when loading weights with mip_level
+- Display mip_level in metadata panel
 - Code refactoring: Extracted FULLSCREEN_QUAD_VS shader (reused 3× across pipelines)
 - Added helper methods: `getDimensions()`, `setVideoControlsEnabled()`
 - Improved code organization with section headers and comments
@@ -70,9 +75,11 @@ Browser-based WebGPU tool for validating CNN v2 inference with layer visualizati
 ### Key Components
 
 **1. Weight Parsing**
-- Reads binary format: header (16B) + layer info (20B×N) + f16 weights
+- Reads binary format v2: header (20B) + layer info (20B×N) + f16 weights
+- Backward compatible with v1: header (16B), mip_level defaults to 0
 - Computes min/max per layer via f16 unpacking
-- Stores `{ layers[], weights[], fileSize }`
+- Stores `{ layers[], weights[], mipLevel, fileSize }`
+- Auto-sets UI mip-level dropdown from loaded weights
 
 **2. CNN Pipeline**
 - Static features computation (RGBD + UV + sin + bias → 7D packed)
author	skal <pascal.massimino@gmail.com>	2026-02-13 16:52:41 +0100
committer	skal <pascal.massimino@gmail.com>	2026-02-13 16:52:41 +0100
commit	e4c1641201af04c9919410325f4e0865e8b88d5d (patch)
tree	ba554f74ad7ca0cc5619d36cd4c9fcd4707a528a
parent	250491dc3044549edee8418d680d1e47920833f4 (diff)