diff options
Diffstat (limited to 'cnn_v3/docs/HOWTO.md')
| -rw-r--r-- | cnn_v3/docs/HOWTO.md | 170 |
1 files changed, 164 insertions, 6 deletions
diff --git a/cnn_v3/docs/HOWTO.md b/cnn_v3/docs/HOWTO.md index 5cfc371..9a3efdf 100644 --- a/cnn_v3/docs/HOWTO.md +++ b/cnn_v3/docs/HOWTO.md @@ -233,12 +233,13 @@ channel-dropout training. ```bash python3 cnn_v3/training/pack_photo_sample.py \ - --photo cnn_v3/training/input/photo1.jpg \ + --photo input/photo1.jpg \ + --target target/photo1_styled.png \ --output dataset/photos/sample_001/ ``` -The output `target.png` defaults to the input photo (no style). Copy in -your stylized version as `target.png` before training. +`--target` is required and must be a stylized ground-truth image at the same +resolution as the photo. The script writes it as `target.png` in the sample dir. ### Dataset layout @@ -285,10 +286,31 @@ python3 train_cnn_v3.py \ --patch-size 32 --detector random ``` +### Single-sample training + +Use `--single-sample <dir>` to train on one specific sample directory. +Implies `--full-image` and `--batch-size 1` automatically. + +```bash +# Pack input/target pair into a sample directory first +python3 pack_photo_sample.py \ + --photo input/photo1.png \ + --target target/photo1_styled.png \ + --output dataset/simple/sample_001/ + +# Train on that sample only +python3 train_cnn_v3.py \ + --single-sample dataset/simple/sample_001/ \ + --epochs 500 +``` + +All other flags (`--epochs`, `--lr`, `--checkpoint-dir`, `--enc-channels`, etc.) work normally. + ### Key flags | Flag | Default | Notes | |------|---------|-------| +| `--single-sample DIR` | — | Train on one sample dir; implies `--full-image`, `--batch-size 1` | | `--input DIR` | `training/dataset` | Root with `full/` or `simple/` subdirs | | `--input-mode` | `simple` | `simple`=photos, `full`=Blender G-buffer | | `--patch-size N` | `64` | Patch crop size | @@ -417,10 +439,10 @@ FiLM γ/β are computed CPU-side by the FiLM MLP (Phase 4) and uploaded each fra |-------|---------|------|-----------| | enc0 | 20×4×9=720 | +4 | 724 | | enc1 | 4×8×9=288 | +8 | 296 | -| bottleneck | 8×8×1=64 | +8 | 72 | +| bottleneck | 8×8×9=576 | +8 | 584 | | dec1 | 16×4×9=576 | +4 | 580 | | dec0 | 8×4×9=288 | +4 | 292 | -| **Total** | | | **2064 f16 = ~4 KB** | +| **Total** | | | **2476 f16 = ~4.84 KB** | **Asset IDs** (registered in `workspaces/main/assets.txt` + `src/effects/shaders.cc`): `SHADER_CNN_V3_COMMON`, `SHADER_CNN_V3_ENC0`, `SHADER_CNN_V3_ENC1`, @@ -587,9 +609,145 @@ Visualization panel still works. --- -## 10. See Also +## 10. Python / WGSL Parity Check (infer_cnn_v3 + cnn_test) + +Two complementary tools for comparing PyTorch inference against the live WGSL +compute shaders on the same input image. + +### 10a. infer_cnn_v3.py — PyTorch reference inference + +**Location:** `cnn_v3/training/infer_cnn_v3.py` + +Runs the trained `CNNv3` model in Python and saves the RGBA output as PNG. + +**Simple mode** (single PNG, geometry zeroed): +```bash +cd cnn_v3/training +python3 infer_cnn_v3.py photo.png out_python.png \ + --checkpoint checkpoints/checkpoint_epoch_200.pth +``` + +**Full mode** (sample directory with all G-buffer files): +```bash +python3 infer_cnn_v3.py dataset/simple/sample_000/ out_python.png \ + --checkpoint checkpoints/checkpoint_epoch_200.pth +``` + +**Identity FiLM** — bypass MLP, use γ=1 β=0 (matches C++ `cnn_test` default): +```bash +python3 infer_cnn_v3.py photo.png out_python.png \ + --checkpoint checkpoints/checkpoint_epoch_200.pth \ + --identity-film +``` + +**Options:** + +| Flag | Default | Description | +|------|---------|-------------| +| `--checkpoint CKPT` | auto-find latest | Path to `.pth` checkpoint | +| `--enc-channels C` | from checkpoint | `4,8` — must match training config | +| `--cond F F F F F` | `0 0 0 0 0` | FiLM conditioning (beat_phase, beat_norm, audio, style0, style1) | +| `--identity-film` | off | Bypass FiLM MLP, use γ=1 β=0 | +| `--blend F` | `1.0` | Blend with albedo: 0=input, 1=CNN | +| `--debug-hex` | off | Print first 8 output pixels as hex | + +In **simple mode**, geometry channels are zeroed: `normal=(0.5,0.5)` (oct-encodes +to ≈(0,0,1)), `depth=0`, `matid=0`, `shadow=1`, `transp=0`. + +The checkpoint `config` dict (saved by `train_cnn_v3.py`) sets `enc_channels` +and `film_cond_dim` automatically; `--enc-channels` is only needed if the +checkpoint lacks a config key. + +--- + +### 10b. cnn_test — WGSL / GPU reference inference + +**Location:** `tools/cnn_test.cc` **Binary:** `build/cnn_test` + +Packs the same 20-channel feature tensor as `infer_cnn_v3.py`, uploads it to +GPU, runs the five `CNNv3Effect` compute passes, and saves the RGBA16Float +output as PNG. + +**Build** (requires `DEMO_BUILD_TESTS=ON` or `DEMO_WORKSPACE=main`): +```bash +cmake -B build -DDEMO_BUILD_TESTS=ON && cmake --build build -j4 --target cnn_test +``` + +**Simple mode:** +```bash +./build/cnn_test photo.png out_gpu.png --weights workspaces/main/weights/cnn_v3_weights.bin +``` + +**Full mode** (sample directory): +```bash +./build/cnn_test dataset/simple/sample_000/albedo.png out_gpu.png \ + --sample-dir dataset/simple/sample_000/ \ + --weights workspaces/main/weights/cnn_v3_weights.bin +``` + +**Options:** + +| Flag | Description | +|------|-------------| +| `--sample-dir DIR` | Load all G-buffer files (albedo/normal/depth/matid/shadow/transp) | +| `--weights FILE` | `cnn_v3_weights.bin` (uses asset-embedded weights if omitted) | +| `--debug-hex` | Print first 8 output pixels as hex | +| `--help` | Show usage | + +FiLM is always **identity** (γ=1, β=0) — matching the C++ `CNNv3Effect` default +until GPU-side FiLM MLP evaluation is added. + +--- + +### 10c. Side-by-side comparison + +For a pixel-accurate comparison, use `--identity-film` in Python and `--debug-hex` +in both tools: + +```bash +cd cnn_v3/training + +# 1. Python inference (identity FiLM) +python3 infer_cnn_v3.py photo.png out_python.png \ + --checkpoint checkpoints/checkpoint_epoch_200.pth \ + --identity-film --debug-hex + +# 2. GPU inference (always identity FiLM) +./build/cnn_test photo.png out_gpu.png \ + --weights workspaces/main/weights/cnn_v3_weights.bin \ + --debug-hex +``` + +Both tools print first 8 pixels in the same format: +``` + [0] 0x7F804000 (0.4980 0.5020 0.2510 0.0000) +``` + +**Expected delta:** ≤ 1/255 (≈ 4e-3) per channel, matching the parity test +(`test_cnn_v3_parity`). Larger deltas indicate a weight mismatch — re-export +with `export_cnn_v3_weights.py` and verify the `.bin` size is 4952 bytes. + +--- + +### 10d. Feature format note + +Both tools pack features in **training format** ([0,1] oct-encoded normals), +not the runtime `gbuf_pack.wgsl` format (which remaps normals to [-1,1]). +This makes `infer_cnn_v3.py` ↔ `cnn_test` directly comparable. + +The live pipeline (`GBufferEffect → gbuf_pack.wgsl → CNNv3Effect`) uses [-1,1] +normals — that is the intended inference distribution after a full training run +with `--input-mode full` (Blender renders). For training on photos +(`--input-mode simple`), [0,1] normals are correct since channel dropout +teaches the network to handle absent geometry. + +--- + +## 11. See Also - `cnn_v3/docs/CNN_V3.md` — Full architecture design (U-Net, FiLM, feature layout) - `doc/EFFECT_WORKFLOW.md` — General effect integration guide - `cnn_v2/docs/CNN_V2.md` — Reference implementation (simpler, operational) - `src/tests/gpu/test_demo_effects.cc` — GBufferEffect + GBufViewEffect tests +- `src/tests/gpu/test_cnn_v3_parity.cc` — Zero/random weight parity tests +- `cnn_v3/training/export_cnn_v3_weights.py` — Export trained checkpoint → `.bin` |
