diff options
Diffstat (limited to 'doc/CNN_V2_BINARY_FORMAT.md')
| -rw-r--r-- | doc/CNN_V2_BINARY_FORMAT.md | 155 |
1 files changed, 155 insertions, 0 deletions
diff --git a/doc/CNN_V2_BINARY_FORMAT.md b/doc/CNN_V2_BINARY_FORMAT.md new file mode 100644 index 0000000..650177f --- /dev/null +++ b/doc/CNN_V2_BINARY_FORMAT.md @@ -0,0 +1,155 @@ +# CNN v2 Binary Weight Format Specification + +Binary format for storing trained CNN v2 weights with static feature architecture. + +**File Extension:** `.bin` +**Byte Order:** Little-endian +**Version:** 1.0 + +--- + +## File Structure + +``` +┌─────────────────────┐ +│ Header (16 bytes) │ +├─────────────────────┤ +│ Layer Info │ +│ (20 bytes × N) │ +├─────────────────────┤ +│ Weight Data │ +│ (variable size) │ +└─────────────────────┘ +``` + +--- + +## Header (16 bytes) + +| Offset | Type | Field | Description | +|--------|------|----------------|--------------------------------------| +| 0x00 | u32 | magic | Magic number: `0x32_4E_4E_43` ("CNN2") | +| 0x04 | u32 | version | Format version (currently 1) | +| 0x08 | u32 | num_layers | Number of CNN layers (excludes static features) | +| 0x0C | u32 | total_weights | Total f16 weight count across all layers | + +--- + +## Layer Info (20 bytes per layer) + +Repeated `num_layers` times, starting at offset 0x10. + +| Offset | Type | Field | Description | +|-------------|------|----------------|--------------------------------------| +| 0x00 | u32 | kernel_size | Convolution kernel dimension (3, 5, 7, etc.) | +| 0x04 | u32 | in_channels | Input channel count (includes 8 static features for Layer 1) | +| 0x08 | u32 | out_channels | Output channel count (max 8) | +| 0x0C | u32 | weight_offset | Weight array start index (f16 units, relative to weight data section) | +| 0x10 | u32 | weight_count | Number of f16 weights for this layer | + +**Layer Order:** Sequential (Layer 1, Layer 2, Layer 3, ...) + +--- + +## Weight Data (variable size) + +Starts at offset: `16 + (num_layers × 20)` + +**Format:** Packed f16 pairs stored as u32 +**Packing:** `u32 = (f16_hi << 16) | f16_lo` +**Storage:** Sequential by layer, then by output channel, input channel, spatial position + +**Weight Indexing:** +``` +weight_idx = output_ch × (in_channels × kernel_size²) + + input_ch × kernel_size² + + (ky × kernel_size + kx) +``` + +Where: +- `output_ch` ∈ [0, out_channels) +- `input_ch` ∈ [0, in_channels) +- `ky`, `kx` ∈ [0, kernel_size) + +**Unpacking f16 from u32:** +```c +uint32_t packed = weights_buffer[weight_idx / 2]; +uint16_t f16_bits = (weight_idx % 2 == 0) ? (packed & 0xFFFF) : (packed >> 16); +``` + +--- + +## Example: 3-Layer Network + +**Configuration:** +- Layer 1: 15→8, kernel 3×3 (1,080 weights) +- Layer 2: 8→4, kernel 3×3 (288 weights) +- Layer 3: 4→3, kernel 3×3 (108 weights) + +**File Layout:** +``` +Offset Size Content +------ ---- ------- +0x00 16 Header (magic, version=1, layers=3, weights=1476) +0x10 20 Layer 1 info (kernel=3, in=15, out=8, offset=0, count=1080) +0x24 20 Layer 2 info (kernel=3, in=8, out=4, offset=1080, count=288) +0x38 20 Layer 3 info (kernel=3, in=4, out=3, offset=1368, count=108) +0x4C 1476 Weight data (738 u32 packed f16 pairs) + ---- +Total: 1528 bytes (~1.5 KB) +``` + +--- + +## Static Features + +Not stored in .bin file (computed at runtime): + +**7D Input Features (packed as 8 channels):** +1. R (red channel) +2. G (green channel) +3. B (blue channel) +4. D (depth value) +5. UV_X (normalized x coordinate) +6. UV_Y (normalized y coordinate) +7. sin(10 × UV_X) (spatial frequency encoding) +8. 1.0 (bias term) + +**First CNN layer** receives all 8 static features + 0-7 previous layer outputs (total 8-15 input channels). + +--- + +## Validation + +**Magic Check:** +```c +uint32_t magic; +fread(&magic, 4, 1, fp); +if (magic != 0x32_4E_4E_43) { error("Invalid CNN v2 file"); } +``` + +**Size Check:** +```c +expected_size = 16 + (num_layers × 20) + (total_weights × 2); +if (file_size != expected_size) { error("Size mismatch"); } +``` + +**Weight Offset Sanity:** +```c +// Each layer's offset should match cumulative count +uint32_t cumulative = 0; +for (int i = 0; i < num_layers; i++) { + if (layers[i].weight_offset != cumulative) { error("Invalid offset"); } + cumulative += layers[i].weight_count; +} +if (cumulative != total_weights) { error("Total mismatch"); } +``` + +--- + +## Related Files + +- `training/export_cnn_v2_weights.py` - Binary export tool +- `src/gpu/effects/cnn_v2_effect.cc` - C++ loader +- `tools/cnn_v2_test/index.html` - WebGPU validator +- `doc/CNN_V2.md` - Architecture design |
