summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorskal <pascal.massimino@gmail.com>2026-02-13 16:52:41 +0100
committerskal <pascal.massimino@gmail.com>2026-02-13 16:52:41 +0100
commite4c1641201af04c9919410325f4e0865e8b88d5d (patch)
treeba554f74ad7ca0cc5619d36cd4c9fcd4707a528a
parent250491dc3044549edee8418d680d1e47920833f4 (diff)
Doc: Update CNN v2 docs for binary format v2 and mip-level support
Updated documentation to reflect binary format v2 with mip_level field. Changes: - CNN_V2_BINARY_FORMAT.md: Document v2 (20-byte header) with mip_level, v1 backward compat - CNN_V2_WEB_TOOL.md: Document auto-detection of mip_level, UI updates - CNN_V2.md: Update overview with mip-level feature, training pipeline Binary format v2: - Header: 20 bytes (was 16) - New field: mip_level (u32) at offset 0x10 - Backward compatible: v1 loaders treat as mip_level=0 Documentation complete for full mip-level pipeline integration. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
-rw-r--r--doc/CNN_V2.md17
-rw-r--r--doc/CNN_V2_BINARY_FORMAT.md101
-rw-r--r--doc/CNN_V2_WEB_TOOL.md13
3 files changed, 94 insertions, 37 deletions
diff --git a/doc/CNN_V2.md b/doc/CNN_V2.md
index 49086ca..a66dc1d 100644
--- a/doc/CNN_V2.md
+++ b/doc/CNN_V2.md
@@ -11,14 +11,15 @@ CNN v2 extends the original CNN post-processing effect with parametric static fe
**Key improvements over v1:**
- 7D static feature input (vs 4D RGB)
- Multi-frequency position encoding (NeRF-style)
+- Configurable mip-level for p0-p3 parametric features (0-3)
- Per-layer configurable kernel sizes (1×1, 3×3, 5×5)
- Variable channel counts per layer
- Float16 weight storage (~3.2 KB for 3-layer model)
- Bias integrated as static feature dimension
- Storage buffer architecture (dynamic layer count)
-- Binary weight format for runtime loading
+- Binary weight format v2 for runtime loading
-**Status:** ✅ Complete. Training pipeline functional, validation tools ready.
+**Status:** ✅ Complete. Training pipeline functional, validation tools ready, mip-level support integrated.
**TODO:** 8-bit quantization with QAT for 2× size reduction (~1.6 KB)
---
@@ -109,12 +110,12 @@ Input RGBD → Static Features Compute → CNN Layers → Output RGBA
```wgsl
// Slot 0-3: Parametric features (p0, p1, p2, p3)
-// Can be: mip1/2 RGBD, grayscale, gradients, etc.
-// Distinct from input image RGBD (fed only to Layer 0)
-let p0 = ...; // Parametric feature 0 (e.g., mip1.r or grayscale)
-let p1 = ...; // Parametric feature 1
-let p2 = ...; // Parametric feature 2
-let p3 = ...; // Parametric feature 3
+// Sampled from configurable mip level (0=original, 1=half, 2=quarter, 3=eighth)
+// Training sets mip_level via --mip-level flag, stored in binary format v2
+let p0 = ...; // RGB.r from selected mip level
+let p1 = ...; // RGB.g from selected mip level
+let p2 = ...; // RGB.b from selected mip level
+let p3 = ...; // Depth or RGB channel from mip level
// Slot 4-5: UV coordinates (normalized screen space)
let uv_x = coord.x / resolution.x; // Horizontal position [0,1]
diff --git a/doc/CNN_V2_BINARY_FORMAT.md b/doc/CNN_V2_BINARY_FORMAT.md
index 650177f..fd758ee 100644
--- a/doc/CNN_V2_BINARY_FORMAT.md
+++ b/doc/CNN_V2_BINARY_FORMAT.md
@@ -4,12 +4,27 @@ Binary format for storing trained CNN v2 weights with static feature architectur
**File Extension:** `.bin`
**Byte Order:** Little-endian
-**Version:** 1.0
+**Version:** 2.0 (supports mip-level for parametric features)
+**Backward Compatible:** Version 1.0 files supported (mip_level=0)
---
## File Structure
+**Version 2 (current):**
+```
+┌─────────────────────┐
+│ Header (20 bytes) │
+├─────────────────────┤
+│ Layer Info │
+│ (20 bytes × N) │
+├─────────────────────┤
+│ Weight Data │
+│ (variable size) │
+└─────────────────────┘
+```
+
+**Version 1 (legacy):**
```
┌─────────────────────┐
│ Header (16 bytes) │
@@ -24,20 +39,36 @@ Binary format for storing trained CNN v2 weights with static feature architectur
---
-## Header (16 bytes)
+## Header
+
+**Version 2 (20 bytes):**
| Offset | Type | Field | Description |
|--------|------|----------------|--------------------------------------|
| 0x00 | u32 | magic | Magic number: `0x32_4E_4E_43` ("CNN2") |
-| 0x04 | u32 | version | Format version (currently 1) |
+| 0x04 | u32 | version | Format version (2 for current) |
| 0x08 | u32 | num_layers | Number of CNN layers (excludes static features) |
| 0x0C | u32 | total_weights | Total f16 weight count across all layers |
+| 0x10 | u32 | mip_level | Mip level for p0-p3 features (0=original, 1=half, 2=quarter, 3=eighth) |
+
+**Version 1 (16 bytes) - Legacy:**
+
+| Offset | Type | Field | Description |
+|--------|------|----------------|--------------------------------------|
+| 0x00 | u32 | magic | Magic number: `0x32_4E_4E_43` ("CNN2") |
+| 0x04 | u32 | version | Format version (1) |
+| 0x08 | u32 | num_layers | Number of CNN layers |
+| 0x0C | u32 | total_weights | Total f16 weight count |
+
+**Note:** Loaders should check version field and handle both formats. Version 1 files treated as mip_level=0.
---
## Layer Info (20 bytes per layer)
-Repeated `num_layers` times, starting at offset 0x10.
+Repeated `num_layers` times:
+- **Version 2:** Starting at offset 0x14 (20 bytes)
+- **Version 1:** Starting at offset 0x10 (16 bytes)
| Offset | Type | Field | Description |
|-------------|------|----------------|--------------------------------------|
@@ -53,7 +84,9 @@ Repeated `num_layers` times, starting at offset 0x10.
## Weight Data (variable size)
-Starts at offset: `16 + (num_layers × 20)`
+Starts at offset:
+- **Version 2:** `20 + (num_layers × 20)`
+- **Version 1:** `16 + (num_layers × 20)`
**Format:** Packed f16 pairs stored as u32
**Packing:** `u32 = (f16_hi << 16) | f16_lo`
@@ -79,24 +112,25 @@ uint16_t f16_bits = (weight_idx % 2 == 0) ? (packed & 0xFFFF) : (packed >> 16);
---
-## Example: 3-Layer Network
+## Example: 3-Layer Network (Version 2)
**Configuration:**
-- Layer 1: 15→8, kernel 3×3 (1,080 weights)
-- Layer 2: 8→4, kernel 3×3 (288 weights)
-- Layer 3: 4→3, kernel 3×3 (108 weights)
+- Mip level: 0 (original resolution)
+- Layer 0: 12→4, kernel 3×3 (432 weights)
+- Layer 1: 12→4, kernel 3×3 (432 weights)
+- Layer 2: 12→4, kernel 3×3 (432 weights)
**File Layout:**
```
Offset Size Content
------ ---- -------
-0x00 16 Header (magic, version=1, layers=3, weights=1476)
-0x10 20 Layer 1 info (kernel=3, in=15, out=8, offset=0, count=1080)
-0x24 20 Layer 2 info (kernel=3, in=8, out=4, offset=1080, count=288)
-0x38 20 Layer 3 info (kernel=3, in=4, out=3, offset=1368, count=108)
-0x4C 1476 Weight data (738 u32 packed f16 pairs)
+0x00 20 Header (magic, version=2, layers=3, weights=1296, mip_level=0)
+0x14 20 Layer 0 info (kernel=3, in=12, out=4, offset=0, count=432)
+0x28 20 Layer 1 info (kernel=3, in=12, out=4, offset=432, count=432)
+0x3C 20 Layer 2 info (kernel=3, in=12, out=4, offset=864, count=432)
+0x50 2592 Weight data (1296 u32 packed f16 pairs)
----
-Total: 1528 bytes (~1.5 KB)
+Total: 2672 bytes (~2.6 KB)
```
---
@@ -105,17 +139,24 @@ Total: 1528 bytes (~1.5 KB)
Not stored in .bin file (computed at runtime):
-**7D Input Features (packed as 8 channels):**
-1. R (red channel)
-2. G (green channel)
-3. B (blue channel)
-4. D (depth value)
-5. UV_X (normalized x coordinate)
-6. UV_Y (normalized y coordinate)
-7. sin(10 × UV_X) (spatial frequency encoding)
-8. 1.0 (bias term)
+**8D Input Features:**
+1. **p0** - Parametric feature 0 (from mip level)
+2. **p1** - Parametric feature 1 (from mip level)
+3. **p2** - Parametric feature 2 (from mip level)
+4. **p3** - Parametric feature 3 (depth or from mip level)
+5. **UV_X** - Normalized x coordinate [0,1]
+6. **UV_Y** - Normalized y coordinate [0,1]
+7. **sin(10 × UV_X)** - Spatial frequency encoding
+8. **1.0** - Bias term
+
+**Mip Level Usage (p0-p3):**
+- `mip_level=0`: RGB from original resolution (mip 0)
+- `mip_level=1`: RGB from half resolution (mip 1), upsampled
+- `mip_level=2`: RGB from quarter resolution (mip 2), upsampled
+- `mip_level=3`: RGB from eighth resolution (mip 3), upsampled
-**First CNN layer** receives all 8 static features + 0-7 previous layer outputs (total 8-15 input channels).
+**Layer 0** receives input RGBD (4D) + static features (8D) = 12D input → 4D output.
+**Layer 1+** receive previous layer output (4D) + static features (8D) = 12D input → 4D output.
---
@@ -128,9 +169,17 @@ fread(&magic, 4, 1, fp);
if (magic != 0x32_4E_4E_43) { error("Invalid CNN v2 file"); }
```
+**Version Check:**
+```c
+uint32_t version;
+fread(&version, 4, 1, fp);
+if (version != 1 && version != 2) { error("Unsupported version"); }
+uint32_t header_size = (version == 1) ? 16 : 20;
+```
+
**Size Check:**
```c
-expected_size = 16 + (num_layers × 20) + (total_weights × 2);
+expected_size = header_size + (num_layers × 20) + (total_weights × 2);
if (file_size != expected_size) { error("Size mismatch"); }
```
diff --git a/doc/CNN_V2_WEB_TOOL.md b/doc/CNN_V2_WEB_TOOL.md
index 8c661b2..25f4ec7 100644
--- a/doc/CNN_V2_WEB_TOOL.md
+++ b/doc/CNN_V2_WEB_TOOL.md
@@ -10,7 +10,8 @@ Browser-based WebGPU tool for validating CNN v2 inference with layer visualizati
**Working:**
- ✅ WebGPU initialization and device setup
-- ✅ Binary weight file parsing (.bin format)
+- ✅ Binary weight file parsing (v1 and v2 formats)
+- ✅ Automatic mip-level detection from binary format v2
- ✅ Weight statistics (min/max per layer)
- ✅ UI layout with collapsible panels
- ✅ Mode switching (Activations/Weights tabs)
@@ -24,6 +25,10 @@ Browser-based WebGPU tool for validating CNN v2 inference with layer visualizati
- ✅ Mip level selection (p0-p3 features at different resolutions)
**Recent Changes (Latest):**
+- Binary format v2 support: Reads mip_level from 20-byte header
+- Backward compatible: v1 (16-byte header) → mip_level=0
+- Auto-update UI dropdown when loading weights with mip_level
+- Display mip_level in metadata panel
- Code refactoring: Extracted FULLSCREEN_QUAD_VS shader (reused 3× across pipelines)
- Added helper methods: `getDimensions()`, `setVideoControlsEnabled()`
- Improved code organization with section headers and comments
@@ -70,9 +75,11 @@ Browser-based WebGPU tool for validating CNN v2 inference with layer visualizati
### Key Components
**1. Weight Parsing**
-- Reads binary format: header (16B) + layer info (20B×N) + f16 weights
+- Reads binary format v2: header (20B) + layer info (20B×N) + f16 weights
+- Backward compatible with v1: header (16B), mip_level defaults to 0
- Computes min/max per layer via f16 unpacking
-- Stores `{ layers[], weights[], fileSize }`
+- Stores `{ layers[], weights[], mipLevel, fileSize }`
+- Auto-sets UI mip-level dropdown from loaded weights
**2. CNN Pipeline**
- Static features computation (RGBD + UV + sin + bias → 7D packed)