# CNN v2 Binary Weight Format Specification

Binary format for storing trained CNN v2 weights with static feature architecture.

**File Extension:** `.bin`
**Byte Order:** Little-endian
**Version:** 2.0 (supports mip-level for parametric features)
**Backward Compatible:** Version 1.0 files supported (mip_level=0)

---

## File Structure

**Version 2 (current):**
```
┌─────────────────────┐
│  Header (20 bytes)  │
├─────────────────────┤
│  Layer Info         │
│  (20 bytes × N)     │
├─────────────────────┤
│  Weight Data        │
│  (variable size)    │
└─────────────────────┘
```

**Version 1 (legacy):**
```
┌─────────────────────┐
│  Header (16 bytes)  │
├─────────────────────┤
│  Layer Info         │
│  (20 bytes × N)     │
├─────────────────────┤
│  Weight Data        │
│  (variable size)    │
└─────────────────────┘
```

---

## Header

**Version 2 (20 bytes):**

| Offset | Type | Field          | Description                          |
|--------|------|----------------|--------------------------------------|
| 0x00   | u32  | magic          | Magic number: `0x32_4E_4E_43` ("CNN2") |
| 0x04   | u32  | version        | Format version (2 for current)       |
| 0x08   | u32  | num_layers     | Number of CNN layers (excludes static features) |
| 0x0C   | u32  | total_weights  | Total f16 weight count across all layers |
| 0x10   | u32  | mip_level      | Mip level for p0-p3 features (0=original, 1=half, 2=quarter, 3=eighth) |

**Version 1 (16 bytes) - Legacy:**

| Offset | Type | Field          | Description                          |
|--------|------|----------------|--------------------------------------|
| 0x00   | u32  | magic          | Magic number: `0x32_4E_4E_43` ("CNN2") |
| 0x04   | u32  | version        | Format version (1)                   |
| 0x08   | u32  | num_layers     | Number of CNN layers                 |
| 0x0C   | u32  | total_weights  | Total f16 weight count               |

**Note:** Loaders should check version field and handle both formats. Version 1 files treated as mip_level=0.

---

## Layer Info (20 bytes per layer)

Repeated `num_layers` times:
- **Version 2:** Starting at offset 0x14 (20 bytes)
- **Version 1:** Starting at offset 0x10 (16 bytes)

| Offset      | Type | Field          | Description                          |
|-------------|------|----------------|--------------------------------------|
| 0x00        | u32  | kernel_size    | Convolution kernel dimension (3, 5, 7, etc.) |
| 0x04        | u32  | in_channels    | Input channel count (includes 8 static features for Layer 1) |
| 0x08        | u32  | out_channels   | Output channel count (max 8)         |
| 0x0C        | u32  | weight_offset  | Weight array start index (f16 units, relative to weight data section) |
| 0x10        | u32  | weight_count   | Number of f16 weights for this layer |

**Layer Order:** Sequential (Layer 1, Layer 2, Layer 3, ...)

---

## Weight Data (variable size)

Starts at offset:
- **Version 2:** `20 + (num_layers × 20)`
- **Version 1:** `16 + (num_layers × 20)`

**Format:** Packed f16 pairs stored as u32
**Packing:** `u32 = (f16_hi << 16) | f16_lo`
**Storage:** Sequential by layer, then by output channel, input channel, spatial position

**Weight Indexing:**
```
weight_idx = output_ch × (in_channels × kernel_size²) +
             input_ch × kernel_size² +
             (ky × kernel_size + kx)
```

Where:
- `output_ch` ∈ [0, out_channels)
- `input_ch` ∈ [0, in_channels)
- `ky`, `kx` ∈ [0, kernel_size)

**Unpacking f16 from u32:**
```c
uint32_t packed = weights_buffer[weight_idx / 2];
uint16_t f16_bits = (weight_idx % 2 == 0) ? (packed & 0xFFFF) : (packed >> 16);
```

---

## Example: 3-Layer Network (Version 2)

**Configuration:**
- Mip level: 0 (original resolution)
- Layer 0: 12→4, kernel 3×3 (432 weights)
- Layer 1: 12→4, kernel 3×3 (432 weights)
- Layer 2: 12→4, kernel 3×3 (432 weights)

**File Layout:**
```
Offset   Size   Content
------   ----   -------
0x00     20     Header (magic, version=2, layers=3, weights=1296, mip_level=0)
0x14     20     Layer 0 info (kernel=3, in=12, out=4, offset=0, count=432)
0x28     20     Layer 1 info (kernel=3, in=12, out=4, offset=432, count=432)
0x3C     20     Layer 2 info (kernel=3, in=12, out=4, offset=864, count=432)
0x50     2592   Weight data (1296 u32 packed f16 pairs)
         ----
Total:   2672 bytes (~2.6 KB)
```

---

## Static Features

Not stored in .bin file (computed at runtime):

**8D Input Features:**
1. **p0** - Parametric feature 0 (from mip level)
2. **p1** - Parametric feature 1 (from mip level)
3. **p2** - Parametric feature 2 (from mip level)
4. **p3** - Parametric feature 3 (depth or from mip level)
5. **UV_X** - Normalized x coordinate [0,1]
6. **UV_Y** - Normalized y coordinate [0,1]
7. **sin(10 × UV_X)** - Spatial frequency encoding
8. **1.0** - Bias term

**Mip Level Usage (p0-p3):**
- `mip_level=0`: RGB from original resolution (mip 0)
- `mip_level=1`: RGB from half resolution (mip 1), upsampled
- `mip_level=2`: RGB from quarter resolution (mip 2), upsampled
- `mip_level=3`: RGB from eighth resolution (mip 3), upsampled

**Layer 0** receives input RGBD (4D) + static features (8D) = 12D input → 4D output.
**Layer 1+** receive previous layer output (4D) + static features (8D) = 12D input → 4D output.

---

## Validation

**Magic Check:**
```c
uint32_t magic;
fread(&magic, 4, 1, fp);
if (magic != 0x32_4E_4E_43) { error("Invalid CNN v2 file"); }
```

**Version Check:**
```c
uint32_t version;
fread(&version, 4, 1, fp);
if (version != 1 && version != 2) { error("Unsupported version"); }
uint32_t header_size = (version == 1) ? 16 : 20;
```

**Size Check:**
```c
expected_size = header_size + (num_layers × 20) + (total_weights × 2);
if (file_size != expected_size) { error("Size mismatch"); }
```

**Weight Offset Sanity:**
```c
// Each layer's offset should match cumulative count
uint32_t cumulative = 0;
for (int i = 0; i < num_layers; i++) {
    if (layers[i].weight_offset != cumulative) { error("Invalid offset"); }
    cumulative += layers[i].weight_count;
}
if (cumulative != total_weights) { error("Total mismatch"); }
```

---

## Future Extensions

**TODO: Flexible Feature Layout**

Current limitation: Feature vector layout is hardcoded as `[p0, p1, p2, p3, uv_x, uv_y, sin10_x, bias]`.

Proposed enhancement for version 3:
- Add feature descriptor section to header
- Specify feature count, types, and ordering
- Support arbitrary 7D feature combinations (e.g., `[R, G, B, dx, dy, uv_x, bias]`)
- Allow runtime shader generation based on descriptor
- Enable experimentation without recompiling shaders

Example descriptor format:
```
struct FeatureDescriptor {
  u32 feature_count;           // Number of features (typically 7-8)
  u32 feature_types[8];        // Type enum per feature
  u32 feature_sources[8];      // Source enum (mip0, mip1, gradient, etc.)
  u32 reserved[8];             // Future use
}
```

Benefits:
- Training can experiment with different feature combinations
- No shader recompilation needed
- Single binary format supports multiple architectures
- Easier A/B testing of feature effectiveness

---

## Related Files

- `training/export_cnn_v2_weights.py` - Binary export tool
- `src/gpu/effects/cnn_v2_effect.cc` - C++ loader
- `tools/cnn_v2_test/index.html` - WebGPU validator
- `doc/CNN_V2.md` - Architecture design