summaryrefslogtreecommitdiff
path: root/doc/CNN_V2_BINARY_FORMAT.md
blob: 650177fd4bc998a04917c3fe34615af529e733e1 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
# CNN v2 Binary Weight Format Specification

Binary format for storing trained CNN v2 weights with static feature architecture.

**File Extension:** `.bin`
**Byte Order:** Little-endian
**Version:** 1.0

---

## File Structure

```
┌─────────────────────┐
│  Header (16 bytes)  │
├─────────────────────┤
│  Layer Info         │
│  (20 bytes × N)     │
├─────────────────────┤
│  Weight Data        │
│  (variable size)    │
└─────────────────────┘
```

---

## Header (16 bytes)

| Offset | Type | Field          | Description                          |
|--------|------|----------------|--------------------------------------|
| 0x00   | u32  | magic          | Magic number: `0x32_4E_4E_43` ("CNN2") |
| 0x04   | u32  | version        | Format version (currently 1)         |
| 0x08   | u32  | num_layers     | Number of CNN layers (excludes static features) |
| 0x0C   | u32  | total_weights  | Total f16 weight count across all layers |

---

## Layer Info (20 bytes per layer)

Repeated `num_layers` times, starting at offset 0x10.

| Offset      | Type | Field          | Description                          |
|-------------|------|----------------|--------------------------------------|
| 0x00        | u32  | kernel_size    | Convolution kernel dimension (3, 5, 7, etc.) |
| 0x04        | u32  | in_channels    | Input channel count (includes 8 static features for Layer 1) |
| 0x08        | u32  | out_channels   | Output channel count (max 8)         |
| 0x0C        | u32  | weight_offset  | Weight array start index (f16 units, relative to weight data section) |
| 0x10        | u32  | weight_count   | Number of f16 weights for this layer |

**Layer Order:** Sequential (Layer 1, Layer 2, Layer 3, ...)

---

## Weight Data (variable size)

Starts at offset: `16 + (num_layers × 20)`

**Format:** Packed f16 pairs stored as u32
**Packing:** `u32 = (f16_hi << 16) | f16_lo`
**Storage:** Sequential by layer, then by output channel, input channel, spatial position

**Weight Indexing:**
```
weight_idx = output_ch × (in_channels × kernel_size²) +
             input_ch × kernel_size² +
             (ky × kernel_size + kx)
```

Where:
- `output_ch` ∈ [0, out_channels)
- `input_ch` ∈ [0, in_channels)
- `ky`, `kx` ∈ [0, kernel_size)

**Unpacking f16 from u32:**
```c
uint32_t packed = weights_buffer[weight_idx / 2];
uint16_t f16_bits = (weight_idx % 2 == 0) ? (packed & 0xFFFF) : (packed >> 16);
```

---

## Example: 3-Layer Network

**Configuration:**
- Layer 1: 15→8, kernel 3×3 (1,080 weights)
- Layer 2: 8→4, kernel 3×3 (288 weights)
- Layer 3: 4→3, kernel 3×3 (108 weights)

**File Layout:**
```
Offset   Size   Content
------   ----   -------
0x00     16     Header (magic, version=1, layers=3, weights=1476)
0x10     20     Layer 1 info (kernel=3, in=15, out=8, offset=0, count=1080)
0x24     20     Layer 2 info (kernel=3, in=8, out=4, offset=1080, count=288)
0x38     20     Layer 3 info (kernel=3, in=4, out=3, offset=1368, count=108)
0x4C     1476   Weight data (738 u32 packed f16 pairs)
         ----
Total:   1528 bytes (~1.5 KB)
```

---

## Static Features

Not stored in .bin file (computed at runtime):

**7D Input Features (packed as 8 channels):**
1. R (red channel)
2. G (green channel)
3. B (blue channel)
4. D (depth value)
5. UV_X (normalized x coordinate)
6. UV_Y (normalized y coordinate)
7. sin(10 × UV_X) (spatial frequency encoding)
8. 1.0 (bias term)

**First CNN layer** receives all 8 static features + 0-7 previous layer outputs (total 8-15 input channels).

---

## Validation

**Magic Check:**
```c
uint32_t magic;
fread(&magic, 4, 1, fp);
if (magic != 0x32_4E_4E_43) { error("Invalid CNN v2 file"); }
```

**Size Check:**
```c
expected_size = 16 + (num_layers × 20) + (total_weights × 2);
if (file_size != expected_size) { error("Size mismatch"); }
```

**Weight Offset Sanity:**
```c
// Each layer's offset should match cumulative count
uint32_t cumulative = 0;
for (int i = 0; i < num_layers; i++) {
    if (layers[i].weight_offset != cumulative) { error("Invalid offset"); }
    cumulative += layers[i].weight_count;
}
if (cumulative != total_weights) { error("Total mismatch"); }
```

---

## Related Files

- `training/export_cnn_v2_weights.py` - Binary export tool
- `src/gpu/effects/cnn_v2_effect.cc` - C++ loader
- `tools/cnn_v2_test/index.html` - WebGPU validator
- `doc/CNN_V2.md` - Architecture design