doc/CNN_V2.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674

# CNN v2: Parametric Static Features

**Technical Design Document**

---

## Overview

CNN v2 extends the original CNN post-processing effect with parametric static features, enabling richer spatial and frequency-domain inputs for improved visual quality.

**Key improvements over v1:**
- 7D static feature input (vs 4D RGB)
- Multi-frequency position encoding (NeRF-style)
- Per-layer configurable kernel sizes (1×1, 3×3, 5×5)
- Variable channel counts per layer
- Float16 weight storage (~3.2 KB for 3-layer model)
- Bias integrated as static feature dimension
- Storage buffer architecture (dynamic layer count)
- Binary weight format for runtime loading

**Status:** ✅ Complete. Training pipeline functional, validation tools ready.
**TODO:** 8-bit quantization with QAT for 2× size reduction (~1.6 KB)

---

## Architecture

### Pipeline Overview

```
Input RGBD → Static Features Compute → CNN Layers → Output RGBA
             └─ computed once/frame ─┘  └─ multi-pass ─┘
```

**Static Features Texture:**
- Name: `static_features`
- Format: `texture_storage_2d<rgba32uint, write>` (4×u32)
- Data: 8 float16 values packed via `pack2x16float()`
- Computed once per frame, read by all CNN layers
- Lifetime: Entire frame (all CNN layer passes)

**CNN Layers:**
- Input Layer: 7D static features → C₀ channels
- Inner Layers: (7D + Cᵢ₋₁) → Cᵢ channels
- Output Layer: (7D + Cₙ) → 4D RGBA
- Storage: `texture_storage_2d<rgba32uint>` (8×f16 per texel recommended)

---

## Static Features (7D + 1 bias)

### Feature Layout

**8 float16 values per pixel:**

```wgsl
// Slot 0-3: RGBD (core pixel data)
let r = rgba.r;          // Red channel
let g = rgba.g;          // Green channel
let b = rgba.b;          // Blue channel
let d = depth;           // Depth value

// Slot 4-5: UV coordinates (normalized screen space)
let uv_x = coord.x / resolution.x;  // Horizontal position [0,1]
let uv_y = coord.y / resolution.y;  // Vertical position [0,1]

// Slot 6: Multi-frequency position encoding
let sin10_x = sin(10.0 * uv_x);     // Periodic feature (frequency=10)

// Slot 7: Bias dimension (always 1.0)
let bias = 1.0;                     // Learned bias per output channel

// Packed storage: [R, G, B, D, uv.x, uv.y, sin(10*uv.x), 1.0]
```

### Feature Rationale

| Feature | Dimension | Purpose | Priority |
|---------|-----------|---------|----------|
| RGBD | 4D | Core pixel information | Essential |
| UV coords | 2D | Spatial position awareness | Essential |
| sin(10\*uv.x) | 1D | Periodic position encoding | Medium |
| Bias | 1D | Learned bias (standard NN) | Essential |

**Why bias as static feature:**
- Simpler shader code (single weight array)
- Standard NN formulation: y = Wx (x includes bias term)
- Saves 56-112 bytes (no separate bias buffer)
- 7 features sufficient for initial implementation

### Future Feature Extensions

**Option: Replace sin(10\*uv.x) with:**
- `sin(20*uv.x)` - Higher frequency encoding
- `gray_mip1` - Multi-scale luminance
- `dx`, `dy` - Sobel gradients
- `variance` - Local texture measure
- `laplacian` - Edge detection

**Option: uint8 packing (16+ features):**
```wgsl
// texture_storage_2d<rgba8unorm> stores 16 uint8 values
// Trade precision for feature count
// [R, G, B, D, uv.x, uv.y, sin10.x, sin10.y,
//  sin20.x, sin20.y, dx, dy, gray_mip1, gray_mip2, var, bias]
```
Requires quantization-aware training.

---

## Layer Structure

### Example 3-Layer Network

```
Input:  7D static → 16 channels (1×1 kernel, pointwise)
Layer1: (7+16)D → 8 channels (3×3 kernel, spatial)
Layer2: (7+8)D → 4 channels (5×5 kernel, large receptive field)
```

### Weight Calculations

**Per-layer weights:**
```
Input:  7 × 1 × 1 × 16 = 112 weights
Layer1: (7+16) × 3 × 3 × 8 = 1656 weights
Layer2: (7+8) × 5 × 5 × 4 = 1500 weights
Total: 3268 weights
```

**Storage sizes:**
- f32: 3268 × 4 = 13,072 bytes (~12.8 KB)
- f16: 3268 × 2 = 6,536 bytes (~6.4 KB) ✓ **recommended**

**Comparison to v1:**
- v1: ~800 weights (3.2 KB f32)
- v2: ~3268 weights (6.4 KB f16)
- **Growth: 2× size for parametric features**

### Kernel Size Guidelines

**1×1 kernel (pointwise):**
- No spatial context, channel mixing only
- Weights: `(7 + C_in) × C_out`
- Use for: Input layer, bottleneck layers

**3×3 kernel (standard conv):**
- Local spatial context
- Weights: `(7 + C_in) × 9 × C_out`
- Use for: Most inner layers

**5×5 kernel (large receptive field):**
- Wide spatial context
- Weights: `(7 + C_in) × 25 × C_out`
- Use for: Output layer, detail enhancement

### Channel Storage (8×f16 per texel)

```wgsl
@group(0) @binding(1) var layer_input: texture_2d<u32>;

fn unpack_channels(coord: vec2<i32>) -> array<f32, 8> {
  let packed = textureLoad(layer_input, coord, 0);
  return array(
    unpack2x16float(packed.x).x, unpack2x16float(packed.x).y,
    unpack2x16float(packed.y).x, unpack2x16float(packed.y).y,
    unpack2x16float(packed.z).x, unpack2x16float(packed.z).y,
    unpack2x16float(packed.w).x, unpack2x16float(packed.w).y
  );
}

fn pack_channels(values: array<f32, 8>) -> vec4<u32> {
  return vec4(
    pack2x16float(vec2(values[0], values[1])),
    pack2x16float(vec2(values[2], values[3])),
    pack2x16float(vec2(values[4], values[5])),
    pack2x16float(vec2(values[6], values[7]))
  );
}
```

---

## Training Workflow

### Script: `training/train_cnn_v2.py`

**Static Feature Extraction:**

```python
def compute_static_features(rgb, depth):
    """Generate 7D static features + bias dimension."""
    h, w = rgb.shape[:2]

    # RGBD channels
    r, g, b = rgb[..., 0], rgb[..., 1], rgb[..., 2]

    # UV coordinates (normalized)
    uv_x = np.linspace(0, 1, w)[None, :].repeat(h, axis=0)
    uv_y = np.linspace(0, 1, h)[:, None].repeat(w, axis=1)

    # Multi-frequency position encoding
    sin10_x = np.sin(10.0 * uv_x)

    # Bias dimension (always 1.0)
    bias = np.ones_like(r)

    # Stack: [R, G, B, D, uv.x, uv.y, sin10_x, bias]
    return np.stack([r, g, b, depth, uv_x, uv_y, sin10_x, bias], axis=-1)
```

**Network Definition:**

```python
class CNNv2(nn.Module):
    def __init__(self, kernels=[1,3,5], channels=[16,8,4]):
        super().__init__()

        # Input layer: 8D (7 features + bias) → channels[0]
        self.layer0 = nn.Conv2d(8, channels[0], kernel_size=kernels[0],
                                padding=kernels[0]//2, bias=False)

        # Inner layers: (7 features + bias + C_prev) → C_next
        in_ch_1 = 8 + channels[0]  # static + layer0 output
        self.layer1 = nn.Conv2d(in_ch_1, channels[1], kernel_size=kernels[1],
                                padding=kernels[1]//2, bias=False)

        # Output layer: (7 features + bias + C_last) → 4 (RGBA)
        in_ch_2 = 8 + channels[1]
        self.layer2 = nn.Conv2d(in_ch_2, 4, kernel_size=kernels[2],
                                padding=kernels[2]//2, bias=False)

    def forward(self, static_features, layer0_input=None):
        # Layer 0: Use full 8D static features (includes bias)
        x0 = self.layer0(static_features)
        x0 = F.relu(x0)

        # Layer 1: Concatenate static + layer0 output
        x1_input = torch.cat([static_features, x0], dim=1)
        x1 = self.layer1(x1_input)
        x1 = F.relu(x1)

        # Layer 2: Concatenate static + layer1 output
        x2_input = torch.cat([static_features, x1], dim=1)
        output = self.layer2(x2_input)

        return torch.sigmoid(output)  # RGBA output [0,1]
```

**Training Configuration:**

```python
# Hyperparameters
kernels = [1, 3, 5]          # Per-layer kernel sizes
channels = [16, 8, 4]        # Per-layer output channels
learning_rate = 1e-3
batch_size = 16
epochs = 5000

# Training loop (standard PyTorch f32)
for epoch in range(epochs):
    for rgb_batch, depth_batch, target_batch in dataloader:
        # Compute static features
        static_feat = compute_static_features(rgb_batch, depth_batch)

        # Forward pass
        output = model(static_feat)
        loss = criterion(output, target_batch)

        # Backward pass
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
```

**Checkpoint Format:**

```python
torch.save({
    'state_dict': model.state_dict(),  # f32 weights
    'config': {
        'kernels': [1, 3, 5],
        'channels': [16, 8, 4],
        'features': ['R', 'G', 'B', 'D', 'uv.x', 'uv.y', 'sin10_x', 'bias']
    },
    'epoch': epoch,
    'loss': loss.item()
}, f'checkpoints/checkpoint_epoch_{epoch}.pth')
```

---

## Export Workflow

### Script: `training/export_cnn_v2_shader.py`

**Process:**
1. Load checkpoint (f32 PyTorch weights)
2. Extract layer configs (kernels, channels)
3. Quantize weights to float16: `weights_f16 = weights_f32.astype(np.float16)`
4. Generate WGSL shader per layer
5. Write to `workspaces/<workspace>/shaders/cnn_v2_*.wgsl`

**Example Generated Shader:**

```wgsl
// cnn_v2_layer_0.wgsl - Auto-generated from checkpoint_epoch_5000.pth

const KERNEL_SIZE: u32 = 1u;
const IN_CHANNELS: u32 = 8u;   // 7 features + bias
const OUT_CHANNELS: u32 = 16u;

// Weights quantized to float16 (stored as f32 in shader)
const weights: array<f32, 128> = array(
  0.123047, -0.089844, 0.234375, 0.456055, ...
);

@group(0) @binding(0) var static_features: texture_2d<u32>;
@group(0) @binding(1) var output_texture: texture_storage_2d<rgba32uint, write>;

@compute @workgroup_size(8, 8)
fn main(@builtin(global_invocation_id) id: vec3<u32>) {
  // Load static features (8D)
  let static_feat = get_static_features(vec2<i32>(id.xy));

  // Convolution (1×1 kernel = pointwise)
  var output: array<f32, OUT_CHANNELS>;
  for (var c: u32 = 0u; c < OUT_CHANNELS; c++) {
    var sum: f32 = 0.0;
    for (var k: u32 = 0u; k < IN_CHANNELS; k++) {
      sum += weights[c * IN_CHANNELS + k] * static_feat[k];
    }
    output[c] = max(0.0, sum);  // ReLU activation
  }

  // Pack and store (8×f16 per texel)
  textureStore(output_texture, vec2<i32>(id.xy), pack_f16x8(output));
}
```

**Float16 Quantization:**
- Training uses f32 throughout (PyTorch standard)
- Export converts to np.float16, then back to f32 for WGSL literals
- **Expected discrepancy:** <0.1% MSE (acceptable)
- Validation via `validate_cnn_v2.sh` compares outputs

---

## Validation Workflow

### Script: `scripts/validate_cnn_v2.sh`

**End-to-end pipeline:**
```bash
./scripts/validate_cnn_v2.sh checkpoints/checkpoint_epoch_5000.pth
```

**Steps automated:**
1. Export checkpoint → .wgsl shaders
2. Rebuild `cnn_test` tool
3. Process test images with CNN v2
4. Display input/output results

**Usage:**
```bash
# Basic usage
./scripts/validate_cnn_v2.sh checkpoint.pth

# Custom paths
./scripts/validate_cnn_v2.sh checkpoint.pth \
  -i my_test_images/ \
  -o results/ \
  -b build_release

# Skip rebuild (iterate on checkpoint only)
./scripts/validate_cnn_v2.sh checkpoint.pth --skip-build

# Skip export (iterate on test images only)
./scripts/validate_cnn_v2.sh checkpoint.pth --skip-export

# Show help
./scripts/validate_cnn_v2.sh --help
```

**Options:**
- `-b, --build-dir DIR` - Build directory (default: build)
- `-w, --workspace NAME` - Workspace name (default: main)
- `-i, --images DIR` - Test images directory (default: training/validation)
- `-o, --output DIR` - Output directory (default: validation_results)
- `--skip-build` - Use existing cnn_test binary
- `--skip-export` - Use existing .wgsl shaders
- `-h, --help` - Show full usage

**Output:**
- Input images: `<test_images_dir>/*.png`
- Output images: `<output_dir>/*_output.png`
- Opens results directory in system file browser

---

## Implementation Checklist

### Phase 1: Shaders (Core Infrastructure)

- [ ] `workspaces/main/shaders/cnn_v2_static.wgsl` - Static features compute
  - [ ] RGBD sampling from framebuffer
  - [ ] UV coordinate calculation
  - [ ] sin(10\*uv.x) computation
  - [ ] Bias dimension (constant 1.0)
  - [ ] Float16 packing via `pack2x16float()`
  - [ ] Output to `texture_storage_2d<rgba32uint>`

- [ ] `workspaces/main/shaders/cnn_v2_layer_template.wgsl` - Layer template
  - [ ] Static features unpacking
  - [ ] Previous layer unpacking (8×f16)
  - [ ] Convolution implementation (1×1, 3×3, 5×5)
  - [ ] ReLU activation
  - [ ] Output packing (8×f16)
  - [ ] Proper padding handling

### Phase 2: C++ Effect Class

- [ ] `src/gpu/effects/cnn_v2_effect.h` - Header
  - [ ] Class declaration inheriting from `PostProcessEffect`
  - [ ] Static features texture member
  - [ ] Layer textures vector
  - [ ] Pipeline and bind group members

- [ ] `src/gpu/effects/cnn_v2_effect.cc` - Implementation
  - [ ] Constructor: Load shaders, create textures
  - [ ] `init()`: Create pipelines, bind groups
  - [ ] `render()`: Multi-pass execution
    - [ ] Pass 0: Compute static features
    - [ ] Pass 1-N: CNN layers
    - [ ] Final: Composite to output
  - [ ] Proper resource cleanup

- [ ] Integration
  - [ ] Add to `src/gpu/demo_effects.h` includes
  - [ ] Add `cnn_v2_effect.cc` to `CMakeLists.txt` (headless + normal)
  - [ ] Add shaders to `workspaces/main/assets.txt`
  - [ ] Add to `src/tests/gpu/test_demo_effects.cc`

### Phase 3: Training Pipeline

- [ ] `training/train_cnn_v2.py` - Training script
  - [ ] Static feature extraction function
  - [ ] CNNv2 PyTorch model class
  - [ ] Patch-based dataloader
  - [ ] Training loop with checkpointing
  - [ ] Command-line argument parsing
  - [ ] Inference mode (ground truth generation)

- [ ] `training/export_cnn_v2_shader.py` - Export script
  - [ ] Checkpoint loading
  - [ ] Weight extraction and f16 quantization
  - [ ] Per-layer WGSL generation
  - [ ] File output to workspace shaders/
  - [ ] Metadata preservation

### Phase 4: Tools & Validation

- [ ] `scripts/validate_cnn_v2.sh` - End-to-end validation
  - [ ] Command-line argument parsing
  - [ ] Shader export orchestration
  - [ ] Build orchestration
  - [ ] Batch image processing
  - [ ] Results display

- [ ] `src/tools/cnn_test_main.cc` - Tool updates
  - [ ] Add `--cnn-version v2` flag
  - [ ] CNNv2Effect instantiation path
  - [ ] Static features pass execution
  - [ ] Multi-layer processing

### Phase 5: Documentation

- [ ] `doc/HOWTO.md` - Usage guide
  - [ ] Training section (CNN v2)
  - [ ] Export section
  - [ ] Validation section
  - [ ] Examples

- [ ] `README.md` - Project overview update
  - [ ] Mention CNN v2 capability

---

## File Structure

### New Files

```
# Shaders (generated by export script)
workspaces/main/shaders/cnn_v2_static.wgsl       # Static features compute
workspaces/main/shaders/cnn_v2_layer_0.wgsl      # Input layer (generated)
workspaces/main/shaders/cnn_v2_layer_1.wgsl      # Inner layer (generated)
workspaces/main/shaders/cnn_v2_layer_2.wgsl      # Output layer (generated)

# C++ implementation
src/gpu/effects/cnn_v2_effect.h                  # Effect class header
src/gpu/effects/cnn_v2_effect.cc                 # Effect implementation

# Python training/export
training/train_cnn_v2.py                         # Training script
training/export_cnn_v2_shader.py                 # Shader generator
training/validation/                             # Test images directory

# Scripts
scripts/validate_cnn_v2.sh                       # End-to-end validation

# Documentation
doc/CNN_V2.md                                    # This file
```

### Modified Files

```
src/gpu/demo_effects.h                           # Add CNNv2Effect include
CMakeLists.txt                                   # Add cnn_v2_effect.cc
workspaces/main/assets.txt                       # Add cnn_v2 shaders
workspaces/main/timeline.seq                     # Optional: add CNNv2Effect
src/tests/gpu/test_demo_effects.cc               # Add CNNv2 test case
src/tools/cnn_test_main.cc                       # Add --cnn-version v2
doc/HOWTO.md                                     # Add CNN v2 sections
TODO.md                                          # Add CNN v2 task
```

### Unchanged (v1 Preserved)

```
training/train_cnn.py                            # Original training
src/gpu/effects/cnn_effect.*                     # Original effect
workspaces/main/shaders/cnn_*.wgsl               # Original shaders
```

---

## Performance Characteristics

### Static Features Compute
- **Cost:** ~0.1ms @ 1080p
- **Frequency:** Once per frame
- **Operations:** sin(), texture sampling, packing

### CNN Layers (Example 3-layer)
- **Layer0 (1×1, 8→16):** ~0.3ms
- **Layer1 (3×3, 23→8):** ~0.8ms
- **Layer2 (5×5, 15→4):** ~1.2ms
- **Total:** ~2.4ms @ 1080p

### Memory Usage
- Static features: 1920×1080×8×2 = 33 MB (f16)
- Layer buffers: 1920×1080×16×2 = 66 MB (max 16 channels)
- Weights: ~6.4 KB (f16, in shader code)
- **Total GPU memory:** ~100 MB

---

## Size Budget

### CNN v1 vs v2

| Metric | v1 | v2 | Delta |
|--------|----|----|-------|
| Weights (count) | 800 | 3268 | +2468 |
| Storage (f32) | 3.2 KB | 13.1 KB | +9.9 KB |
| Storage (f16) | N/A | 6.5 KB | +6.5 KB |
| Shader code | ~500 lines | ~800 lines | +300 lines |

### Mitigation Strategies

**Reduce channels:**
- [16,8,4] → [8,4,4] saves ~50% weights
- [16,8,4] → [4,4,4] saves ~60% weights

**Smaller kernels:**
- [1,3,5] → [1,3,3] saves ~30% weights
- [1,3,5] → [1,1,3] saves ~50% weights

**Quantization:**
- int8 weights: saves 75% (requires QAT training)
- 4-bit weights: saves 87.5% (extreme, needs research)

**Target:** Keep CNN v2 under 10 KB for 64k demo constraint

---

## Future Extensions

### More Features (uint8 Packing)

```wgsl
// 16 uint8 features per texel (texture_storage_2d<rgba8unorm>)
// [R, G, B, D, uv.x, uv.y, sin10.x, sin10.y,
//  sin20.x, sin20.y, dx, dy, gray_mip1, gray_mip2, variance, bias]
```
- Trade precision for quantity
- Requires quantization-aware training

### Temporal Features

- Previous frame RGBA (motion awareness)
- Optical flow vectors
- Requires multi-frame buffer

### Learned Position Encodings

- Replace hand-crafted sin(10\*uv) with learned embeddings
- Requires separate embedding network
- Similar to NeRF position encoding

### Dynamic Architecture

- Runtime kernel size selection based on scene
- Conditional layer execution (skip connections)
- Layer pruning for performance

---

## References

- **v1 Implementation:** `src/gpu/effects/cnn_effect.*`
- **Training Guide:** `doc/HOWTO.md` (CNN Training section)
- **Test Tool:** `doc/CNN_TEST_TOOL.md`
- **Shader System:** `doc/SEQUENCE.md`
- **Size Measurement:** `doc/SIZE_MEASUREMENT.md`

---

## Appendix: Design Decisions

### Why Bias as Static Feature?

**Alternatives considered:**
1. Separate bias array per layer (Option B)
2. Bias as static feature = 1.0 (Option A, chosen)

**Decision rationale:**
- Simpler shader code (fewer bindings)
- Standard NN formulation (augmented input)
- Saves 56-112 bytes per model
- 7 features sufficient for v1 implementation
- Can extend to uint8 packing if >7 features needed

### Why Float16 for Weights?

**Alternatives considered:**
1. Keep f32 (larger, more accurate)
2. Use f16 (smaller, GPU-native)
3. Use int8 (smallest, needs QAT)

**Decision rationale:**
- f16 saves 50% vs f32 (critical for 64k target)
- GPU-native support (pack2x16float in WGSL)
- <0.1% accuracy loss (acceptable)
- Simpler than int8 quantization

### Why Multi-Frequency Position Encoding?

**Inspiration:** NeRF (Neural Radiance Fields)

**Benefits:**
- Helps network learn high-frequency details
- Better than raw UV coordinates
- Small footprint (1D per frequency)

**Future:** Add sin(20\*uv), sin(40\*uv) if >7 features available

---

**Document Version:** 1.0
**Last Updated:** 2026-02-12
**Status:** Design approved, ready for implementation