summaryrefslogtreecommitdiff
path: root/cnn_v3/docs/CNN_V3.md
diff options
context:
space:
mode:
authorskal <pascal.massimino@gmail.com>2026-03-21 09:54:16 +0100
committerskal <pascal.massimino@gmail.com>2026-03-21 09:54:16 +0100
commit5e740fc8f5f48fdd8ec4b84ae0c9a3c74e387d4f (patch)
treec330c8402e771d4b02316331d734802337d413c4 /cnn_v3/docs/CNN_V3.md
parent673a24215b2670007317060325256059d1448f3b (diff)
docs(cnn_v3): update CNN_V3.md + HOWTO.md to reflect Phases 1-5 complete
- CNN_V3.md: status line, architecture channel counts (8/16→4/8), FiLM MLP output count (96→40 params), size budget table (real implemented values) - HOWTO.md: Phase status table (5→done, add phase 6 training TODO), sections 3-5 rewritten to reflect what exists vs what is still planned
Diffstat (limited to 'cnn_v3/docs/CNN_V3.md')
-rw-r--r--cnn_v3/docs/CNN_V3.md66
1 files changed, 23 insertions, 43 deletions
diff --git a/cnn_v3/docs/CNN_V3.md b/cnn_v3/docs/CNN_V3.md
index 9d64fe3..3f8f7db 100644
--- a/cnn_v3/docs/CNN_V3.md
+++ b/cnn_v3/docs/CNN_V3.md
@@ -19,9 +19,7 @@ CNN v3 is a next-generation post-processing effect using:
- Training from both Blender renders and real photos
- Strict test framework: per-pixel bit-exact validation across all implementations
-**Status:** Design phase. G-buffer implementation is prerequisite.
-
-**Prerequisites:** G-buffer (GEOM_BUFFER.md) must be implemented first.
+**Status:** Phases 1–5 complete. Parity validated (max_err=4.88e-4 ≤ 1/255). Next: `train_cnn_v3.py` for FiLM MLP training.
---
@@ -40,17 +38,17 @@ G-Buffer (albedo, normal, depth, matID, UV)
U-Net
┌─────────────────────────────────────────┐
│ Encoder │
- │ enc0 (H×W, 8ch) ────────────skip──────┤
+ │ enc0 (H×W, 4ch) ────────────skip──────┤
│ ↓ down (avg pool 2×2) │
- │ enc1 (H/2×W/2, 16ch) ───────skip──────┤
+ │ enc1 (H/2×W/2, 8ch) ────────skip──────┤
│ ↓ down │
- │ bottleneck (H/4×W/4, 16ch) │
+ │ bottleneck (H/4×W/4, 8ch) │
│ │
│ Decoder │
- │ ↑ up (bilinear 2×) + skip enc1 │
- │ dec1 (H/2×W/2, 16ch) │
+ │ ↑ up (nearest ×2) + skip enc1 │
+ │ dec1 (H/2×W/2, 4ch) │
│ ↑ up + skip enc0 │
- │ dec0 (H×W, 8ch) │
+ │ dec0 (H×W, 4ch) │
└─────────────────────────────────────────┘
@@ -80,14 +78,14 @@ A small MLP takes a conditioning vector `c` and outputs all γ/β:
c = [beat_phase, beat_time/8, audio_intensity, style_p0, style_p1] (5D)
↓ Linear(5 → 16) → ReLU
↓ Linear(16 → N_film_params)
- → [γ_enc0(8ch), β_enc0(8ch), γ_enc1(16ch), β_enc1(16ch),
- γ_dec1(16ch), β_dec1(16ch), γ_dec0(8ch), β_dec0(8ch)]
- = 2 × (8+16+16+8) = 96 parameters output
+ → [γ_enc0(4ch), β_enc0(4ch), γ_enc1(8ch), β_enc1(8ch),
+ γ_dec1(4ch), β_dec1(4ch), γ_dec0(4ch), β_dec0(4ch)]
+ = 2 × (4+8+4+4) = 40 parameters output
```
**Runtime cost:** trivial (one MLP forward pass per frame, CPU-side).
**Training:** jointly trained with U-Net — backprop through FiLM to MLP.
-**Size:** MLP weights ~(5×16 + 16×96) × 2 bytes f16 ≈ 3 KB.
+**Size:** MLP weights ~(5×16 + 16×40) × 2 bytes f16 ≈ 1.4 KB.
**Why FiLM instead of just uniform parameters?**
- γ/β are per-channel, enabling fine-grained style control
@@ -346,38 +344,20 @@ All f16, little-endian, same packing as v2 (`pack2x16float`).
**CNN v3 target: ≤ 6 KB weights**
-| Component | Params | f16 bytes |
-|-----------|--------|-----------|
-| enc0: Conv(20→8, 3×3) | 20×8×9=1440 | 2880 |
-| enc1: Conv(8→16, 3×3) | 8×16×9=1152 | 2304 |
-| bottleneck: Conv(16→16, 3×3) | 16×16×9=2304 | 4608 |
-| dec1: Conv(32→8, 3×3) | 32×8×9=2304 | 4608 |
-| dec0: Conv(16→8, 3×3) | 16×8×9=1152 | 2304 |
-| output: Conv(8→4, 1×1) | 8×4=32 | 64 |
-| FiLM MLP (~96 outputs) | ~1600 | 3200 |
-| **Total** | | **~20 KB** |
-
-This exceeds target. **Mitigation strategies:**
-
-1. **Reduce channels:** [4, 8] instead of [8, 16] → cuts conv params by ~4×
-2. **1 level only:** remove H/4 level → drops bottleneck + one dec level
-3. **1×1 conv at bottleneck** (no spatial, just channel mixing)
-4. **FiLM only at bottleneck** → smaller MLP output
+**Implemented architecture (fits ≤ 4 KB):**
-**Conservative plan (fits ≤ 6 KB):**
-```
-enc0: Conv(20→4, 3×3) = 20×4×9 = 720 weights
-enc1: Conv(4→8, 3×3) = 4×8×9 = 288 weights
-bottleneck: Conv(8→8, 1×1) = 8×8×1 = 64 weights
-dec1: Conv(16→4, 3×3) = 16×4×9 = 576 weights
-dec0: Conv(12→4, 3×3) = 12×4×9 = 432 weights
-output: Conv(4→4, 1×1) = 4×4 = 16 weights
-FiLM MLP (5→24 outputs) = 5×16+16×24 = 464 weights
-Total: ~2560 weights × 2B = ~5.0 KB f16 ✓
-```
+| Component | Weights | Bias | Total f16 |
+|-----------|---------|------|-----------|
+| enc0: Conv(20→4, 3×3) | 20×4×9=720 | +4 | 724 |
+| enc1: Conv(4→8, 3×3) | 4×8×9=288 | +8 | 296 |
+| bottleneck: Conv(8→8, 1×1) | 8×8×1=64 | +8 | 72 |
+| dec1: Conv(16→4, 3×3) | 16×4×9=576 | +4 | 580 |
+| dec0: Conv(8→4, 3×3) | 8×4×9=288 | +4 | 292 |
+| FiLM MLP (5→16→40) | 5×16+16×40=720 | +16+40 | 776 |
+| **Total** | | | **~3.9 KB f16** |
-Note: enc0 input is 20ch (feature buffer), dec1 input is 16ch (8 bottleneck + 8 skip),
-dec0 input is 12ch (4 dec1 output + 8 enc0 skip). Skip connections concatenate.
+Skip connections: dec1 input = 8ch (bottleneck) + 8ch (enc1 skip) = 16ch.
+dec0 input = 4ch (dec1) + 4ch (enc0 skip) = 8ch.
---