summaryrefslogtreecommitdiff
path: root/cnn_v2/docs/CNN_V2_BINARY_FORMAT.md
diff options
context:
space:
mode:
Diffstat (limited to 'cnn_v2/docs/CNN_V2_BINARY_FORMAT.md')
-rw-r--r--cnn_v2/docs/CNN_V2_BINARY_FORMAT.md235
1 files changed, 235 insertions, 0 deletions
diff --git a/cnn_v2/docs/CNN_V2_BINARY_FORMAT.md b/cnn_v2/docs/CNN_V2_BINARY_FORMAT.md
new file mode 100644
index 0000000..59c859d
--- /dev/null
+++ b/cnn_v2/docs/CNN_V2_BINARY_FORMAT.md
@@ -0,0 +1,235 @@
+# CNN v2 Binary Weight Format Specification
+
+Binary format for storing trained CNN v2 weights with static feature architecture.
+
+**File Extension:** `.bin`
+**Byte Order:** Little-endian
+**Version:** 2.0 (supports mip-level for parametric features)
+**Backward Compatible:** Version 1.0 files supported (mip_level=0)
+
+---
+
+## File Structure
+
+**Version 2 (current):**
+```
+┌─────────────────────┐
+│ Header (20 bytes) │
+├─────────────────────┤
+│ Layer Info │
+│ (20 bytes × N) │
+├─────────────────────┤
+│ Weight Data │
+│ (variable size) │
+└─────────────────────┘
+```
+
+**Version 1 (legacy):**
+```
+┌─────────────────────┐
+│ Header (16 bytes) │
+├─────────────────────┤
+│ Layer Info │
+│ (20 bytes × N) │
+├─────────────────────┤
+│ Weight Data │
+│ (variable size) │
+└─────────────────────┘
+```
+
+---
+
+## Header
+
+**Version 2 (20 bytes):**
+
+| Offset | Type | Field | Description |
+|--------|------|----------------|--------------------------------------|
+| 0x00 | u32 | magic | Magic number: `0x32_4E_4E_43` ("CNN2") |
+| 0x04 | u32 | version | Format version (2 for current) |
+| 0x08 | u32 | num_layers | Number of CNN layers (excludes static features) |
+| 0x0C | u32 | total_weights | Total f16 weight count across all layers |
+| 0x10 | u32 | mip_level | Mip level for p0-p3 features (0=original, 1=half, 2=quarter, 3=eighth) |
+
+**Version 1 (16 bytes) - Legacy:**
+
+| Offset | Type | Field | Description |
+|--------|------|----------------|--------------------------------------|
+| 0x00 | u32 | magic | Magic number: `0x32_4E_4E_43` ("CNN2") |
+| 0x04 | u32 | version | Format version (1) |
+| 0x08 | u32 | num_layers | Number of CNN layers |
+| 0x0C | u32 | total_weights | Total f16 weight count |
+
+**Note:** Loaders should check version field and handle both formats. Version 1 files treated as mip_level=0.
+
+---
+
+## Layer Info (20 bytes per layer)
+
+Repeated `num_layers` times:
+- **Version 2:** Starting at offset 0x14 (20 bytes)
+- **Version 1:** Starting at offset 0x10 (16 bytes)
+
+| Offset | Type | Field | Description |
+|-------------|------|----------------|--------------------------------------|
+| 0x00 | u32 | kernel_size | Convolution kernel dimension (3, 5, 7, etc.) |
+| 0x04 | u32 | in_channels | Input channel count (includes 8 static features for Layer 1) |
+| 0x08 | u32 | out_channels | Output channel count (max 8) |
+| 0x0C | u32 | weight_offset | Weight array start index (f16 units, relative to weight data section) |
+| 0x10 | u32 | weight_count | Number of f16 weights for this layer |
+
+**Layer Order:** Sequential (Layer 1, Layer 2, Layer 3, ...)
+
+---
+
+## Weight Data (variable size)
+
+Starts at offset:
+- **Version 2:** `20 + (num_layers × 20)`
+- **Version 1:** `16 + (num_layers × 20)`
+
+**Format:** Packed f16 pairs stored as u32
+**Packing:** `u32 = (f16_hi << 16) | f16_lo`
+**Storage:** Sequential by layer, then by output channel, input channel, spatial position
+
+**Weight Indexing:**
+```
+weight_idx = output_ch × (in_channels × kernel_size²) +
+ input_ch × kernel_size² +
+ (ky × kernel_size + kx)
+```
+
+Where:
+- `output_ch` ∈ [0, out_channels)
+- `input_ch` ∈ [0, in_channels)
+- `ky`, `kx` ∈ [0, kernel_size)
+
+**Unpacking f16 from u32:**
+```c
+uint32_t packed = weights_buffer[weight_idx / 2];
+uint16_t f16_bits = (weight_idx % 2 == 0) ? (packed & 0xFFFF) : (packed >> 16);
+```
+
+---
+
+## Example: 3-Layer Network (Version 2)
+
+**Configuration:**
+- Mip level: 0 (original resolution)
+- Layer 0: 12→4, kernel 3×3 (432 weights)
+- Layer 1: 12→4, kernel 3×3 (432 weights)
+- Layer 2: 12→4, kernel 3×3 (432 weights)
+
+**File Layout:**
+```
+Offset Size Content
+------ ---- -------
+0x00 20 Header (magic, version=2, layers=3, weights=1296, mip_level=0)
+0x14 20 Layer 0 info (kernel=3, in=12, out=4, offset=0, count=432)
+0x28 20 Layer 1 info (kernel=3, in=12, out=4, offset=432, count=432)
+0x3C 20 Layer 2 info (kernel=3, in=12, out=4, offset=864, count=432)
+0x50 2592 Weight data (1296 u32 packed f16 pairs)
+ ----
+Total: 2672 bytes (~2.6 KB)
+```
+
+---
+
+## Static Features
+
+Not stored in .bin file (computed at runtime):
+
+**8D Input Features:**
+1. **p0** - Parametric feature 0 (from mip level)
+2. **p1** - Parametric feature 1 (from mip level)
+3. **p2** - Parametric feature 2 (from mip level)
+4. **p3** - Parametric feature 3 (depth or from mip level)
+5. **UV_X** - Normalized x coordinate [0,1]
+6. **UV_Y** - Normalized y coordinate [0,1]
+7. **sin(20 × UV_Y)** - Spatial frequency encoding (vertical, frequency=20)
+8. **1.0** - Bias term
+
+**Mip Level Usage (p0-p3):**
+- `mip_level=0`: RGB from original resolution (mip 0)
+- `mip_level=1`: RGB from half resolution (mip 1), upsampled
+- `mip_level=2`: RGB from quarter resolution (mip 2), upsampled
+- `mip_level=3`: RGB from eighth resolution (mip 3), upsampled
+
+**Layer 0** receives input RGBD (4D) + static features (8D) = 12D input → 4D output.
+**Layer 1+** receive previous layer output (4D) + static features (8D) = 12D input → 4D output.
+
+---
+
+## Validation
+
+**Magic Check:**
+```c
+uint32_t magic;
+fread(&magic, 4, 1, fp);
+if (magic != 0x32_4E_4E_43) { error("Invalid CNN v2 file"); }
+```
+
+**Version Check:**
+```c
+uint32_t version;
+fread(&version, 4, 1, fp);
+if (version != 1 && version != 2) { error("Unsupported version"); }
+uint32_t header_size = (version == 1) ? 16 : 20;
+```
+
+**Size Check:**
+```c
+expected_size = header_size + (num_layers × 20) + (total_weights × 2);
+if (file_size != expected_size) { error("Size mismatch"); }
+```
+
+**Weight Offset Sanity:**
+```c
+// Each layer's offset should match cumulative count
+uint32_t cumulative = 0;
+for (int i = 0; i < num_layers; i++) {
+ if (layers[i].weight_offset != cumulative) { error("Invalid offset"); }
+ cumulative += layers[i].weight_count;
+}
+if (cumulative != total_weights) { error("Total mismatch"); }
+```
+
+---
+
+## Future Extensions
+
+**TODO: Flexible Feature Layout**
+
+Current limitation: Feature vector layout is hardcoded as `[p0, p1, p2, p3, uv_x, uv_y, sin10_x, bias]`.
+
+Proposed enhancement for version 3:
+- Add feature descriptor section to header
+- Specify feature count, types, and ordering
+- Support arbitrary 7D feature combinations (e.g., `[R, G, B, dx, dy, uv_x, bias]`)
+- Allow runtime shader generation based on descriptor
+- Enable experimentation without recompiling shaders
+
+Example descriptor format:
+```
+struct FeatureDescriptor {
+ u32 feature_count; // Number of features (typically 7-8)
+ u32 feature_types[8]; // Type enum per feature
+ u32 feature_sources[8]; // Source enum (mip0, mip1, gradient, etc.)
+ u32 reserved[8]; // Future use
+}
+```
+
+Benefits:
+- Training can experiment with different feature combinations
+- No shader recompilation needed
+- Single binary format supports multiple architectures
+- Easier A/B testing of feature effectiveness
+
+---
+
+## Related Files
+
+- `training/export_cnn_v2_weights.py` - Binary export tool
+- `src/effects/cnn_v2_effect.cc` - C++ loader
+- `tools/cnn_v2_test/index.html` - WebGPU validator
+- `doc/CNN_V2.md` - Architecture design