doc/CNN_V2_BINARY_FORMAT.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235

# CNN v2 Binary Weight Format Specification

Binary format for storing trained CNN v2 weights with static feature architecture.

**File Extension:** `.bin`
**Byte Order:** Little-endian
**Version:** 2.0 (supports mip-level for parametric features)
**Backward Compatible:** Version 1.0 files supported (mip_level=0)

---

## File Structure

**Version 2 (current):**
```
┌─────────────────────┐
│  Header (20 bytes)  │
├─────────────────────┤
│  Layer Info         │
│  (20 bytes × N)     │
├─────────────────────┤
│  Weight Data        │
│  (variable size)    │
└─────────────────────┘
```

**Version 1 (legacy):**
```
┌─────────────────────┐
│  Header (16 bytes)  │
├─────────────────────┤
│  Layer Info         │
│  (20 bytes × N)     │
├─────────────────────┤
│  Weight Data        │
│  (variable size)    │
└─────────────────────┘
```

---

## Header

**Version 2 (20 bytes):**

| Offset | Type | Field          | Description                          |
|--------|------|----------------|--------------------------------------|
| 0x00   | u32  | magic          | Magic number: `0x32_4E_4E_43` ("CNN2") |
| 0x04   | u32  | version        | Format version (2 for current)       |
| 0x08   | u32  | num_layers     | Number of CNN layers (excludes static features) |
| 0x0C   | u32  | total_weights  | Total f16 weight count across all layers |
| 0x10   | u32  | mip_level      | Mip level for p0-p3 features (0=original, 1=half, 2=quarter, 3=eighth) |

**Version 1 (16 bytes) - Legacy:**

| Offset | Type | Field          | Description                          |
|--------|------|----------------|--------------------------------------|
| 0x00   | u32  | magic          | Magic number: `0x32_4E_4E_43` ("CNN2") |
| 0x04   | u32  | version        | Format version (1)                   |
| 0x08   | u32  | num_layers     | Number of CNN layers                 |
| 0x0C   | u32  | total_weights  | Total f16 weight count               |

**Note:** Loaders should check version field and handle both formats. Version 1 files treated as mip_level=0.

---

## Layer Info (20 bytes per layer)

Repeated `num_layers` times:
- **Version 2:** Starting at offset 0x14 (20 bytes)
- **Version 1:** Starting at offset 0x10 (16 bytes)

| Offset      | Type | Field          | Description                          |
|-------------|------|----------------|--------------------------------------|
| 0x00        | u32  | kernel_size    | Convolution kernel dimension (3, 5, 7, etc.) |
| 0x04        | u32  | in_channels    | Input channel count (includes 8 static features for Layer 1) |
| 0x08        | u32  | out_channels   | Output channel count (max 8)         |
| 0x0C        | u32  | weight_offset  | Weight array start index (f16 units, relative to weight data section) |
| 0x10        | u32  | weight_count   | Number of f16 weights for this layer |

**Layer Order:** Sequential (Layer 1, Layer 2, Layer 3, ...)

---

## Weight Data (variable size)

Starts at offset:
- **Version 2:** `20 + (num_layers × 20)`
- **Version 1:** `16 + (num_layers × 20)`

**Format:** Packed f16 pairs stored as u32
**Packing:** `u32 = (f16_hi << 16) | f16_lo`
**Storage:** Sequential by layer, then by output channel, input channel, spatial position

**Weight Indexing:**
```
weight_idx = output_ch × (in_channels × kernel_size²) +
             input_ch × kernel_size² +
             (ky × kernel_size + kx)
```

Where:
- `output_ch` ∈ [0, out_channels)
- `input_ch` ∈ [0, in_channels)
- `ky`, `kx` ∈ [0, kernel_size)

**Unpacking f16 from u32:**
```c
uint32_t packed = weights_buffer[weight_idx / 2];
uint16_t f16_bits = (weight_idx % 2 == 0) ? (packed & 0xFFFF) : (packed >> 16);
```

---

## Example: 3-Layer Network (Version 2)

**Configuration:**
- Mip level: 0 (original resolution)
- Layer 0: 12→4, kernel 3×3 (432 weights)
- Layer 1: 12→4, kernel 3×3 (432 weights)
- Layer 2: 12→4, kernel 3×3 (432 weights)

**File Layout:**
```
Offset   Size   Content
------   ----   -------
0x00     20     Header (magic, version=2, layers=3, weights=1296, mip_level=0)
0x14     20     Layer 0 info (kernel=3, in=12, out=4, offset=0, count=432)
0x28     20     Layer 1 info (kernel=3, in=12, out=4, offset=432, count=432)
0x3C     20     Layer 2 info (kernel=3, in=12, out=4, offset=864, count=432)
0x50     2592   Weight data (1296 u32 packed f16 pairs)
         ----
Total:   2672 bytes (~2.6 KB)
```

---

## Static Features

Not stored in .bin file (computed at runtime):

**8D Input Features:**
1. **p0** - Parametric feature 0 (from mip level)
2. **p1** - Parametric feature 1 (from mip level)
3. **p2** - Parametric feature 2 (from mip level)
4. **p3** - Parametric feature 3 (depth or from mip level)
5. **UV_X** - Normalized x coordinate [0,1]
6. **UV_Y** - Normalized y coordinate [0,1]
7. **sin(20 × UV_Y)** - Spatial frequency encoding (vertical, frequency=20)
8. **1.0** - Bias term

**Mip Level Usage (p0-p3):**
- `mip_level=0`: RGB from original resolution (mip 0)
- `mip_level=1`: RGB from half resolution (mip 1), upsampled
- `mip_level=2`: RGB from quarter resolution (mip 2), upsampled
- `mip_level=3`: RGB from eighth resolution (mip 3), upsampled

**Layer 0** receives input RGBD (4D) + static features (8D) = 12D input → 4D output.
**Layer 1+** receive previous layer output (4D) + static features (8D) = 12D input → 4D output.

---

## Validation

**Magic Check:**
```c
uint32_t magic;
fread(&magic, 4, 1, fp);
if (magic != 0x32_4E_4E_43) { error("Invalid CNN v2 file"); }
```

**Version Check:**
```c
uint32_t version;
fread(&version, 4, 1, fp);
if (version != 1 && version != 2) { error("Unsupported version"); }
uint32_t header_size = (version == 1) ? 16 : 20;
```

**Size Check:**
```c
expected_size = header_size + (num_layers × 20) + (total_weights × 2);
if (file_size != expected_size) { error("Size mismatch"); }
```

**Weight Offset Sanity:**
```c
// Each layer's offset should match cumulative count
uint32_t cumulative = 0;
for (int i = 0; i < num_layers; i++) {
    if (layers[i].weight_offset != cumulative) { error("Invalid offset"); }
    cumulative += layers[i].weight_count;
}
if (cumulative != total_weights) { error("Total mismatch"); }
```

---

## Future Extensions

**TODO: Flexible Feature Layout**

Current limitation: Feature vector layout is hardcoded as `[p0, p1, p2, p3, uv_x, uv_y, sin10_x, bias]`.

Proposed enhancement for version 3:
- Add feature descriptor section to header
- Specify feature count, types, and ordering
- Support arbitrary 7D feature combinations (e.g., `[R, G, B, dx, dy, uv_x, bias]`)
- Allow runtime shader generation based on descriptor
- Enable experimentation without recompiling shaders

Example descriptor format:
```
struct FeatureDescriptor {
  u32 feature_count;           // Number of features (typically 7-8)
  u32 feature_types[8];        // Type enum per feature
  u32 feature_sources[8];      // Source enum (mip0, mip1, gradient, etc.)
  u32 reserved[8];             // Future use
}
```

Benefits:
- Training can experiment with different feature combinations
- No shader recompilation needed
- Single binary format supports multiple architectures
- Easier A/B testing of feature effectiveness

---

## Related Files

- `training/export_cnn_v2_weights.py` - Binary export tool
- `src/gpu/effects/cnn_v2_effect.cc` - C++ loader
- `tools/cnn_v2_test/index.html` - WebGPU validator
- `doc/CNN_V2.md` - Architecture design