diff options
Diffstat (limited to 'doc/CNN_V2_BINARY_FORMAT.md')
| -rw-r--r-- | doc/CNN_V2_BINARY_FORMAT.md | 235 |
1 files changed, 0 insertions, 235 deletions
diff --git a/doc/CNN_V2_BINARY_FORMAT.md b/doc/CNN_V2_BINARY_FORMAT.md deleted file mode 100644 index 59c859d..0000000 --- a/doc/CNN_V2_BINARY_FORMAT.md +++ /dev/null @@ -1,235 +0,0 @@ -# CNN v2 Binary Weight Format Specification - -Binary format for storing trained CNN v2 weights with static feature architecture. - -**File Extension:** `.bin` -**Byte Order:** Little-endian -**Version:** 2.0 (supports mip-level for parametric features) -**Backward Compatible:** Version 1.0 files supported (mip_level=0) - ---- - -## File Structure - -**Version 2 (current):** -``` -┌─────────────────────┐ -│ Header (20 bytes) │ -├─────────────────────┤ -│ Layer Info │ -│ (20 bytes × N) │ -├─────────────────────┤ -│ Weight Data │ -│ (variable size) │ -└─────────────────────┘ -``` - -**Version 1 (legacy):** -``` -┌─────────────────────┐ -│ Header (16 bytes) │ -├─────────────────────┤ -│ Layer Info │ -│ (20 bytes × N) │ -├─────────────────────┤ -│ Weight Data │ -│ (variable size) │ -└─────────────────────┘ -``` - ---- - -## Header - -**Version 2 (20 bytes):** - -| Offset | Type | Field | Description | -|--------|------|----------------|--------------------------------------| -| 0x00 | u32 | magic | Magic number: `0x32_4E_4E_43` ("CNN2") | -| 0x04 | u32 | version | Format version (2 for current) | -| 0x08 | u32 | num_layers | Number of CNN layers (excludes static features) | -| 0x0C | u32 | total_weights | Total f16 weight count across all layers | -| 0x10 | u32 | mip_level | Mip level for p0-p3 features (0=original, 1=half, 2=quarter, 3=eighth) | - -**Version 1 (16 bytes) - Legacy:** - -| Offset | Type | Field | Description | -|--------|------|----------------|--------------------------------------| -| 0x00 | u32 | magic | Magic number: `0x32_4E_4E_43` ("CNN2") | -| 0x04 | u32 | version | Format version (1) | -| 0x08 | u32 | num_layers | Number of CNN layers | -| 0x0C | u32 | total_weights | Total f16 weight count | - -**Note:** Loaders should check version field and handle both formats. Version 1 files treated as mip_level=0. - ---- - -## Layer Info (20 bytes per layer) - -Repeated `num_layers` times: -- **Version 2:** Starting at offset 0x14 (20 bytes) -- **Version 1:** Starting at offset 0x10 (16 bytes) - -| Offset | Type | Field | Description | -|-------------|------|----------------|--------------------------------------| -| 0x00 | u32 | kernel_size | Convolution kernel dimension (3, 5, 7, etc.) | -| 0x04 | u32 | in_channels | Input channel count (includes 8 static features for Layer 1) | -| 0x08 | u32 | out_channels | Output channel count (max 8) | -| 0x0C | u32 | weight_offset | Weight array start index (f16 units, relative to weight data section) | -| 0x10 | u32 | weight_count | Number of f16 weights for this layer | - -**Layer Order:** Sequential (Layer 1, Layer 2, Layer 3, ...) - ---- - -## Weight Data (variable size) - -Starts at offset: -- **Version 2:** `20 + (num_layers × 20)` -- **Version 1:** `16 + (num_layers × 20)` - -**Format:** Packed f16 pairs stored as u32 -**Packing:** `u32 = (f16_hi << 16) | f16_lo` -**Storage:** Sequential by layer, then by output channel, input channel, spatial position - -**Weight Indexing:** -``` -weight_idx = output_ch × (in_channels × kernel_size²) + - input_ch × kernel_size² + - (ky × kernel_size + kx) -``` - -Where: -- `output_ch` ∈ [0, out_channels) -- `input_ch` ∈ [0, in_channels) -- `ky`, `kx` ∈ [0, kernel_size) - -**Unpacking f16 from u32:** -```c -uint32_t packed = weights_buffer[weight_idx / 2]; -uint16_t f16_bits = (weight_idx % 2 == 0) ? (packed & 0xFFFF) : (packed >> 16); -``` - ---- - -## Example: 3-Layer Network (Version 2) - -**Configuration:** -- Mip level: 0 (original resolution) -- Layer 0: 12→4, kernel 3×3 (432 weights) -- Layer 1: 12→4, kernel 3×3 (432 weights) -- Layer 2: 12→4, kernel 3×3 (432 weights) - -**File Layout:** -``` -Offset Size Content ------- ---- ------- -0x00 20 Header (magic, version=2, layers=3, weights=1296, mip_level=0) -0x14 20 Layer 0 info (kernel=3, in=12, out=4, offset=0, count=432) -0x28 20 Layer 1 info (kernel=3, in=12, out=4, offset=432, count=432) -0x3C 20 Layer 2 info (kernel=3, in=12, out=4, offset=864, count=432) -0x50 2592 Weight data (1296 u32 packed f16 pairs) - ---- -Total: 2672 bytes (~2.6 KB) -``` - ---- - -## Static Features - -Not stored in .bin file (computed at runtime): - -**8D Input Features:** -1. **p0** - Parametric feature 0 (from mip level) -2. **p1** - Parametric feature 1 (from mip level) -3. **p2** - Parametric feature 2 (from mip level) -4. **p3** - Parametric feature 3 (depth or from mip level) -5. **UV_X** - Normalized x coordinate [0,1] -6. **UV_Y** - Normalized y coordinate [0,1] -7. **sin(20 × UV_Y)** - Spatial frequency encoding (vertical, frequency=20) -8. **1.0** - Bias term - -**Mip Level Usage (p0-p3):** -- `mip_level=0`: RGB from original resolution (mip 0) -- `mip_level=1`: RGB from half resolution (mip 1), upsampled -- `mip_level=2`: RGB from quarter resolution (mip 2), upsampled -- `mip_level=3`: RGB from eighth resolution (mip 3), upsampled - -**Layer 0** receives input RGBD (4D) + static features (8D) = 12D input → 4D output. -**Layer 1+** receive previous layer output (4D) + static features (8D) = 12D input → 4D output. - ---- - -## Validation - -**Magic Check:** -```c -uint32_t magic; -fread(&magic, 4, 1, fp); -if (magic != 0x32_4E_4E_43) { error("Invalid CNN v2 file"); } -``` - -**Version Check:** -```c -uint32_t version; -fread(&version, 4, 1, fp); -if (version != 1 && version != 2) { error("Unsupported version"); } -uint32_t header_size = (version == 1) ? 16 : 20; -``` - -**Size Check:** -```c -expected_size = header_size + (num_layers × 20) + (total_weights × 2); -if (file_size != expected_size) { error("Size mismatch"); } -``` - -**Weight Offset Sanity:** -```c -// Each layer's offset should match cumulative count -uint32_t cumulative = 0; -for (int i = 0; i < num_layers; i++) { - if (layers[i].weight_offset != cumulative) { error("Invalid offset"); } - cumulative += layers[i].weight_count; -} -if (cumulative != total_weights) { error("Total mismatch"); } -``` - ---- - -## Future Extensions - -**TODO: Flexible Feature Layout** - -Current limitation: Feature vector layout is hardcoded as `[p0, p1, p2, p3, uv_x, uv_y, sin10_x, bias]`. - -Proposed enhancement for version 3: -- Add feature descriptor section to header -- Specify feature count, types, and ordering -- Support arbitrary 7D feature combinations (e.g., `[R, G, B, dx, dy, uv_x, bias]`) -- Allow runtime shader generation based on descriptor -- Enable experimentation without recompiling shaders - -Example descriptor format: -``` -struct FeatureDescriptor { - u32 feature_count; // Number of features (typically 7-8) - u32 feature_types[8]; // Type enum per feature - u32 feature_sources[8]; // Source enum (mip0, mip1, gradient, etc.) - u32 reserved[8]; // Future use -} -``` - -Benefits: -- Training can experiment with different feature combinations -- No shader recompilation needed -- Single binary format supports multiple architectures -- Easier A/B testing of feature effectiveness - ---- - -## Related Files - -- `training/export_cnn_v2_weights.py` - Binary export tool -- `src/effects/cnn_v2_effect.cc` - C++ loader -- `tools/cnn_v2_test/index.html` - WebGPU validator -- `doc/CNN_V2.md` - Architecture design |
