summaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorskal <pascal.massimino@gmail.com>2026-02-13 11:44:41 +0100
committerskal <pascal.massimino@gmail.com>2026-02-13 11:44:41 +0100
commitc27b34279c0d1c2a8f1dbceb0e154b585b5c6916 (patch)
tree5918fbaadad369ec8213df1682919ebaf9f57b56 /doc
parent6ca832296a74b3a3342320cf4edaa368ebc56afe (diff)
CNN v2 Web Tool: Unify layer terminology and add binary format spec
- Rename 'Static (L0)' → 'Static' (clearer, less confusing) - Update channel labels: 'R/G/B/D' → 'Ch0 (R)/Ch1 (G)/Ch2 (B)/Ch3 (D)' - Add 'Layer' prefix in weights table for consistency - Document layer indexing: Static + Layer 1,2,3... (UI) ↔ weights.layers[0,1,2...] - Add explanatory notes about 7D input and 4-of-8 channel display - Create doc/CNN_V2_BINARY_FORMAT.md with complete .bin specification - Cross-reference spec in CNN_V2.md and CNN_V2_WEB_TOOL.md Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Diffstat (limited to 'doc')
-rw-r--r--doc/CNN_V2.md9
-rw-r--r--doc/CNN_V2_BINARY_FORMAT.md155
-rw-r--r--doc/CNN_V2_WEB_TOOL.md45
3 files changed, 188 insertions, 21 deletions
diff --git a/doc/CNN_V2.md b/doc/CNN_V2.md
index 09d0841..588c3db 100644
--- a/doc/CNN_V2.md
+++ b/doc/CNN_V2.md
@@ -669,6 +669,15 @@ workspaces/main/shaders/cnn_*.wgsl # Original v1 shaders
---
+## Related Documentation
+
+- `doc/CNN_V2_BINARY_FORMAT.md` - Binary weight file specification (.bin format)
+- `doc/CNN_V2_WEB_TOOL.md` - WebGPU testing tool with layer visualization
+- `doc/CNN_TEST_TOOL.md` - C++ offline validation tool (deprecated)
+- `doc/HOWTO.md` - Training and validation workflows
+
+---
+
**Document Version:** 1.0
**Last Updated:** 2026-02-12
**Status:** Design approved, ready for implementation
diff --git a/doc/CNN_V2_BINARY_FORMAT.md b/doc/CNN_V2_BINARY_FORMAT.md
new file mode 100644
index 0000000..650177f
--- /dev/null
+++ b/doc/CNN_V2_BINARY_FORMAT.md
@@ -0,0 +1,155 @@
+# CNN v2 Binary Weight Format Specification
+
+Binary format for storing trained CNN v2 weights with static feature architecture.
+
+**File Extension:** `.bin`
+**Byte Order:** Little-endian
+**Version:** 1.0
+
+---
+
+## File Structure
+
+```
+┌─────────────────────┐
+│ Header (16 bytes) │
+├─────────────────────┤
+│ Layer Info │
+│ (20 bytes × N) │
+├─────────────────────┤
+│ Weight Data │
+│ (variable size) │
+└─────────────────────┘
+```
+
+---
+
+## Header (16 bytes)
+
+| Offset | Type | Field | Description |
+|--------|------|----------------|--------------------------------------|
+| 0x00 | u32 | magic | Magic number: `0x32_4E_4E_43` ("CNN2") |
+| 0x04 | u32 | version | Format version (currently 1) |
+| 0x08 | u32 | num_layers | Number of CNN layers (excludes static features) |
+| 0x0C | u32 | total_weights | Total f16 weight count across all layers |
+
+---
+
+## Layer Info (20 bytes per layer)
+
+Repeated `num_layers` times, starting at offset 0x10.
+
+| Offset | Type | Field | Description |
+|-------------|------|----------------|--------------------------------------|
+| 0x00 | u32 | kernel_size | Convolution kernel dimension (3, 5, 7, etc.) |
+| 0x04 | u32 | in_channels | Input channel count (includes 8 static features for Layer 1) |
+| 0x08 | u32 | out_channels | Output channel count (max 8) |
+| 0x0C | u32 | weight_offset | Weight array start index (f16 units, relative to weight data section) |
+| 0x10 | u32 | weight_count | Number of f16 weights for this layer |
+
+**Layer Order:** Sequential (Layer 1, Layer 2, Layer 3, ...)
+
+---
+
+## Weight Data (variable size)
+
+Starts at offset: `16 + (num_layers × 20)`
+
+**Format:** Packed f16 pairs stored as u32
+**Packing:** `u32 = (f16_hi << 16) | f16_lo`
+**Storage:** Sequential by layer, then by output channel, input channel, spatial position
+
+**Weight Indexing:**
+```
+weight_idx = output_ch × (in_channels × kernel_size²) +
+ input_ch × kernel_size² +
+ (ky × kernel_size + kx)
+```
+
+Where:
+- `output_ch` ∈ [0, out_channels)
+- `input_ch` ∈ [0, in_channels)
+- `ky`, `kx` ∈ [0, kernel_size)
+
+**Unpacking f16 from u32:**
+```c
+uint32_t packed = weights_buffer[weight_idx / 2];
+uint16_t f16_bits = (weight_idx % 2 == 0) ? (packed & 0xFFFF) : (packed >> 16);
+```
+
+---
+
+## Example: 3-Layer Network
+
+**Configuration:**
+- Layer 1: 15→8, kernel 3×3 (1,080 weights)
+- Layer 2: 8→4, kernel 3×3 (288 weights)
+- Layer 3: 4→3, kernel 3×3 (108 weights)
+
+**File Layout:**
+```
+Offset Size Content
+------ ---- -------
+0x00 16 Header (magic, version=1, layers=3, weights=1476)
+0x10 20 Layer 1 info (kernel=3, in=15, out=8, offset=0, count=1080)
+0x24 20 Layer 2 info (kernel=3, in=8, out=4, offset=1080, count=288)
+0x38 20 Layer 3 info (kernel=3, in=4, out=3, offset=1368, count=108)
+0x4C 1476 Weight data (738 u32 packed f16 pairs)
+ ----
+Total: 1528 bytes (~1.5 KB)
+```
+
+---
+
+## Static Features
+
+Not stored in .bin file (computed at runtime):
+
+**7D Input Features (packed as 8 channels):**
+1. R (red channel)
+2. G (green channel)
+3. B (blue channel)
+4. D (depth value)
+5. UV_X (normalized x coordinate)
+6. UV_Y (normalized y coordinate)
+7. sin(10 × UV_X) (spatial frequency encoding)
+8. 1.0 (bias term)
+
+**First CNN layer** receives all 8 static features + 0-7 previous layer outputs (total 8-15 input channels).
+
+---
+
+## Validation
+
+**Magic Check:**
+```c
+uint32_t magic;
+fread(&magic, 4, 1, fp);
+if (magic != 0x32_4E_4E_43) { error("Invalid CNN v2 file"); }
+```
+
+**Size Check:**
+```c
+expected_size = 16 + (num_layers × 20) + (total_weights × 2);
+if (file_size != expected_size) { error("Size mismatch"); }
+```
+
+**Weight Offset Sanity:**
+```c
+// Each layer's offset should match cumulative count
+uint32_t cumulative = 0;
+for (int i = 0; i < num_layers; i++) {
+ if (layers[i].weight_offset != cumulative) { error("Invalid offset"); }
+ cumulative += layers[i].weight_count;
+}
+if (cumulative != total_weights) { error("Total mismatch"); }
+```
+
+---
+
+## Related Files
+
+- `training/export_cnn_v2_weights.py` - Binary export tool
+- `src/gpu/effects/cnn_v2_effect.cc` - C++ loader
+- `tools/cnn_v2_test/index.html` - WebGPU validator
+- `doc/CNN_V2.md` - Architecture design
diff --git a/doc/CNN_V2_WEB_TOOL.md b/doc/CNN_V2_WEB_TOOL.md
index 2fbc70e..81549ab 100644
--- a/doc/CNN_V2_WEB_TOOL.md
+++ b/doc/CNN_V2_WEB_TOOL.md
@@ -49,9 +49,11 @@ Browser-based WebGPU tool for validating CNN v2 inference with layer visualizati
**3. Visualization Modes**
**Activations Mode:**
-- 4 grayscale views per layer (channels 0-3)
+- 4 grayscale views per layer (channels 0-3 of up to 8 total)
- WebGPU compute → unpack f16 → scale → grayscale
-- Auto-scale: Layer 0 (static) = 1.0, CNN layers = 0.2
+- Auto-scale: Static features = 1.0, CNN layers = 0.2
+- Static features: Shows R,G,B,D (first 4 of 8: RGBD+UV+sin+bias)
+- CNN layers: Shows first 4 output channels
**Weights Mode:**
- 2D canvas rendering per output channel
@@ -78,6 +80,21 @@ For each CNN layer i:
Compute (ping-pong) → copy to layerTextures[i+1]
```
+### Layer Indexing
+
+**UI Layer Buttons:**
+- "Static" → layerOutputs[0] (7D input features)
+- "Layer 1" → layerOutputs[1] (CNN layer 1 output, uses weights.layers[0])
+- "Layer 2" → layerOutputs[2] (CNN layer 2 output, uses weights.layers[1])
+- "Layer N" → layerOutputs[N] (CNN layer N output, uses weights.layers[N-1])
+
+**Weights Table:**
+- "Layer 1" → weights.layers[0] (first CNN layer weights)
+- "Layer 2" → weights.layers[1] (second CNN layer weights)
+- "Layer N" → weights.layers[N-1]
+
+**Consistency:** Both UI and weights table use same numbering (1, 2, 3...) for CNN layers.
+
---
## Known Issues
@@ -192,26 +209,12 @@ For each CNN layer i:
## Binary Weight Format
-**Header (16 bytes):**
-```
-u32 magic; // 0x32_4E_4E_43 ("CNN2")
-u32 version; // Format version
-u32 num_layers; // Layer count
-u32 total_weights;// Total f16 weight count
-```
-
-**Layer Info (20 bytes × N):**
-```
-u32 kernel_size; // 3, 5, 7, etc.
-u32 in_channels; // Input channel count
-u32 out_channels; // Output channel count
-u32 weight_offset; // Offset in f16 units
-u32 weight_count; // Number of f16 weights
-```
+See `doc/CNN_V2_BINARY_FORMAT.md` for complete specification.
-**Weights (variable):**
-- Packed f16 pairs as u32 (lo 16 bits, hi 16 bits)
-- Sequential storage: [layer0_weights][layer1_weights]...
+**Quick Summary:**
+- Header: 16 bytes (magic, version, layer count, total weights)
+- Layer info: 20 bytes × N (kernel size, channels, offsets)
+- Weights: Packed f16 pairs as u32
---