diff options
| author | skal <pascal.massimino@gmail.com> | 2026-03-21 09:54:16 +0100 |
|---|---|---|
| committer | skal <pascal.massimino@gmail.com> | 2026-03-21 09:54:16 +0100 |
| commit | 5e740fc8f5f48fdd8ec4b84ae0c9a3c74e387d4f (patch) | |
| tree | c330c8402e771d4b02316331d734802337d413c4 | |
| parent | 673a24215b2670007317060325256059d1448f3b (diff) | |
docs(cnn_v3): update CNN_V3.md + HOWTO.md to reflect Phases 1-5 complete
- CNN_V3.md: status line, architecture channel counts (8/16→4/8), FiLM MLP
output count (96→40 params), size budget table (real implemented values)
- HOWTO.md: Phase status table (5→done, add phase 6 training TODO), sections
3-5 rewritten to reflect what exists vs what is still planned
| -rw-r--r-- | cnn_v3/docs/CNN_V3.md | 66 | ||||
| -rw-r--r-- | cnn_v3/docs/HOWTO.md | 50 |
2 files changed, 48 insertions, 68 deletions
diff --git a/cnn_v3/docs/CNN_V3.md b/cnn_v3/docs/CNN_V3.md index 9d64fe3..3f8f7db 100644 --- a/cnn_v3/docs/CNN_V3.md +++ b/cnn_v3/docs/CNN_V3.md @@ -19,9 +19,7 @@ CNN v3 is a next-generation post-processing effect using: - Training from both Blender renders and real photos - Strict test framework: per-pixel bit-exact validation across all implementations -**Status:** Design phase. G-buffer implementation is prerequisite. - -**Prerequisites:** G-buffer (GEOM_BUFFER.md) must be implemented first. +**Status:** Phases 1–5 complete. Parity validated (max_err=4.88e-4 ≤ 1/255). Next: `train_cnn_v3.py` for FiLM MLP training. --- @@ -40,17 +38,17 @@ G-Buffer (albedo, normal, depth, matID, UV) U-Net ┌─────────────────────────────────────────┐ │ Encoder │ - │ enc0 (H×W, 8ch) ────────────skip──────┤ + │ enc0 (H×W, 4ch) ────────────skip──────┤ │ ↓ down (avg pool 2×2) │ - │ enc1 (H/2×W/2, 16ch) ───────skip──────┤ + │ enc1 (H/2×W/2, 8ch) ────────skip──────┤ │ ↓ down │ - │ bottleneck (H/4×W/4, 16ch) │ + │ bottleneck (H/4×W/4, 8ch) │ │ │ │ Decoder │ - │ ↑ up (bilinear 2×) + skip enc1 │ - │ dec1 (H/2×W/2, 16ch) │ + │ ↑ up (nearest ×2) + skip enc1 │ + │ dec1 (H/2×W/2, 4ch) │ │ ↑ up + skip enc0 │ - │ dec0 (H×W, 8ch) │ + │ dec0 (H×W, 4ch) │ └─────────────────────────────────────────┘ │ ▼ @@ -80,14 +78,14 @@ A small MLP takes a conditioning vector `c` and outputs all γ/β: c = [beat_phase, beat_time/8, audio_intensity, style_p0, style_p1] (5D) ↓ Linear(5 → 16) → ReLU ↓ Linear(16 → N_film_params) - → [γ_enc0(8ch), β_enc0(8ch), γ_enc1(16ch), β_enc1(16ch), - γ_dec1(16ch), β_dec1(16ch), γ_dec0(8ch), β_dec0(8ch)] - = 2 × (8+16+16+8) = 96 parameters output + → [γ_enc0(4ch), β_enc0(4ch), γ_enc1(8ch), β_enc1(8ch), + γ_dec1(4ch), β_dec1(4ch), γ_dec0(4ch), β_dec0(4ch)] + = 2 × (4+8+4+4) = 40 parameters output ``` **Runtime cost:** trivial (one MLP forward pass per frame, CPU-side). **Training:** jointly trained with U-Net — backprop through FiLM to MLP. -**Size:** MLP weights ~(5×16 + 16×96) × 2 bytes f16 ≈ 3 KB. +**Size:** MLP weights ~(5×16 + 16×40) × 2 bytes f16 ≈ 1.4 KB. **Why FiLM instead of just uniform parameters?** - γ/β are per-channel, enabling fine-grained style control @@ -346,38 +344,20 @@ All f16, little-endian, same packing as v2 (`pack2x16float`). **CNN v3 target: ≤ 6 KB weights** -| Component | Params | f16 bytes | -|-----------|--------|-----------| -| enc0: Conv(20→8, 3×3) | 20×8×9=1440 | 2880 | -| enc1: Conv(8→16, 3×3) | 8×16×9=1152 | 2304 | -| bottleneck: Conv(16→16, 3×3) | 16×16×9=2304 | 4608 | -| dec1: Conv(32→8, 3×3) | 32×8×9=2304 | 4608 | -| dec0: Conv(16→8, 3×3) | 16×8×9=1152 | 2304 | -| output: Conv(8→4, 1×1) | 8×4=32 | 64 | -| FiLM MLP (~96 outputs) | ~1600 | 3200 | -| **Total** | | **~20 KB** | - -This exceeds target. **Mitigation strategies:** - -1. **Reduce channels:** [4, 8] instead of [8, 16] → cuts conv params by ~4× -2. **1 level only:** remove H/4 level → drops bottleneck + one dec level -3. **1×1 conv at bottleneck** (no spatial, just channel mixing) -4. **FiLM only at bottleneck** → smaller MLP output +**Implemented architecture (fits ≤ 4 KB):** -**Conservative plan (fits ≤ 6 KB):** -``` -enc0: Conv(20→4, 3×3) = 20×4×9 = 720 weights -enc1: Conv(4→8, 3×3) = 4×8×9 = 288 weights -bottleneck: Conv(8→8, 1×1) = 8×8×1 = 64 weights -dec1: Conv(16→4, 3×3) = 16×4×9 = 576 weights -dec0: Conv(12→4, 3×3) = 12×4×9 = 432 weights -output: Conv(4→4, 1×1) = 4×4 = 16 weights -FiLM MLP (5→24 outputs) = 5×16+16×24 = 464 weights -Total: ~2560 weights × 2B = ~5.0 KB f16 ✓ -``` +| Component | Weights | Bias | Total f16 | +|-----------|---------|------|-----------| +| enc0: Conv(20→4, 3×3) | 20×4×9=720 | +4 | 724 | +| enc1: Conv(4→8, 3×3) | 4×8×9=288 | +8 | 296 | +| bottleneck: Conv(8→8, 1×1) | 8×8×1=64 | +8 | 72 | +| dec1: Conv(16→4, 3×3) | 16×4×9=576 | +4 | 580 | +| dec0: Conv(8→4, 3×3) | 8×4×9=288 | +4 | 292 | +| FiLM MLP (5→16→40) | 5×16+16×40=720 | +16+40 | 776 | +| **Total** | | | **~3.9 KB f16** | -Note: enc0 input is 20ch (feature buffer), dec1 input is 16ch (8 bottleneck + 8 skip), -dec0 input is 12ch (4 dec1 output + 8 enc0 skip). Skip connections concatenate. +Skip connections: dec1 input = 8ch (bottleneck) + 8ch (enc1 skip) = 16ch. +dec0 input = 4ch (dec1) + 4ch (enc0 skip) = 8ch. --- diff --git a/cnn_v3/docs/HOWTO.md b/cnn_v3/docs/HOWTO.md index 22266d3..425a33b 100644 --- a/cnn_v3/docs/HOWTO.md +++ b/cnn_v3/docs/HOWTO.md @@ -135,7 +135,7 @@ Mix freely; the dataloader treats all sample directories uniformly. ## 3. Training -*(Network not yet implemented — this section will be filled as Phase 3+ lands.)* +*(Script not yet written — see TODO.md. Architecture spec in `CNN_V3.md` §Training.)* **Planned command:** ```bash @@ -146,21 +146,15 @@ python3 cnn_v3/training/train_cnn_v3.py \ ``` **FiLM conditioning** during training: -- Beat/audio inputs are randomized per sample -- Network learns to produce varied styles from same geometry - -**Validation:** -```bash -python3 cnn_v3/training/train_cnn_v3.py --validate \ - --checkpoint cnn_v3/weights/cnn_v3_weights.bin \ - --input test_frame.png -``` +- Beat/audio inputs randomized per sample +- MLP: `Linear(5→16) → ReLU → Linear(16→40)` trained jointly with U-Net +- Output: γ/β for enc0(4ch) + enc1(8ch) + dec1(4ch) + dec0(4ch) = 40 floats --- -## 4. Running the CNN v3 Effect (Future) +## 4. Running the CNN v3 Effect -Once the C++ CNNv3Effect exists: +`CNNv3Effect` is implemented. Wire into a sequence: ```seq # BPM 120 @@ -169,27 +163,32 @@ SEQUENCE 0 0 "Scene with CNN v3" EFFECT + CNNv3Effect gbuf_feat0 gbuf_feat1 -> sink 0 60 ``` -FiLM parameters are uploaded via uniform each frame: +FiLM parameters uploaded each frame: ```cpp cnn_v3_effect->set_film_params( params.beat_phase, params.beat_time / 8.0f, params.audio_intensity, style_p0, style_p1); ``` +FiLM γ/β default to identity (γ=1, β=0) until `train_cnn_v3.py` produces a trained MLP. + --- ## 5. Per-Pixel Validation -The CNN v3 design requires exact parity between PyTorch, WGSL (HTML), and C++. +C++ parity test passes: `src/tests/gpu/test_cnn_v3_parity.cc` (2 tests). + +```bash +cmake -B build -DDEMO_BUILD_TESTS=ON && cmake --build build -j4 +cd build && ./test_cnn_v3_parity +``` -*(Validation tooling not yet implemented.)* +Results (8×8 test tensors, random weights): +- enc0 max_err = 1.95e-3 ✓ +- dec1 max_err = 1.95e-3 ✓ +- final max_err = 4.88e-4 ✓ (all ≤ 1/255 = 3.92e-3) -**Planned workflow:** -1. Export test input + weights as JSON -2. Run Python reference → save per-pixel output -3. Run HTML WebGPU tool → compare against Python -4. Run C++ `cnn_v3_test` tool → compare against Python -5. All comparisons must pass at ≤ 1/255 per pixel +Test vectors generated by `cnn_v3/training/gen_test_vectors.py` (PyTorch reference). --- @@ -197,12 +196,13 @@ The CNN v3 design requires exact parity between PyTorch, WGSL (HTML), and C++. | Phase | Status | Notes | |-------|--------|-------| -| 1 — G-buffer (raster + pack) | ✅ Done | Integrated, 35/35 tests pass | -| 1 — G-buffer (SDF + shadow passes) | TODO | Placeholder in place | +| 1 — G-buffer (raster + pack) | ✅ Done | Integrated, 36/36 tests pass | +| 1 — G-buffer (SDF + shadow passes) | TODO | Placeholder: shadow=1, transp=0 | | 2 — Training infrastructure | ✅ Done | blender_export.py, pack_*_sample.py | | 3 — WGSL U-Net shaders | ✅ Done | 5 compute shaders + cnn_v3/common snippet | -| 4 — C++ CNNv3Effect | ✅ Done | FiLM uniform upload, 35/35 tests pass | -| 5 — Parity validation | TODO | Test vectors, ≤1/255 | +| 4 — C++ CNNv3Effect | ✅ Done | FiLM uniform upload, 36/36 tests pass | +| 5 — Parity validation | ✅ Done | test_cnn_v3_parity.cc, max_err=4.88e-4 | +| 6 — FiLM MLP training | TODO | train_cnn_v3.py not yet written | --- |
