diff options
Diffstat (limited to 'cnn_v3/docs/HOWTO.md')
| -rw-r--r-- | cnn_v3/docs/HOWTO.md | 47 |
1 files changed, 28 insertions, 19 deletions
diff --git a/cnn_v3/docs/HOWTO.md b/cnn_v3/docs/HOWTO.md index 9a3efdf..ff8793f 100644 --- a/cnn_v3/docs/HOWTO.md +++ b/cnn_v3/docs/HOWTO.md @@ -267,22 +267,30 @@ Two source files: ```bash cd cnn_v3/training -# Patch-based (default) — 64×64 patches around Harris corners -python3 train_cnn_v3.py \ +# Recommended: [8,16] channels + multi-scale loss (matches runtime) +uv run python3 train_cnn_v3.py \ --input dataset/ \ - --input-mode simple \ - --epochs 200 + --enc-channels 8,16 \ + --epochs 5000 \ + --checkpoint-dir checkpoints_8_16 # Full-image mode (resizes to 256×256) -python3 train_cnn_v3.py \ +uv run python3 train_cnn_v3.py \ --input dataset/ \ - --input-mode full \ + --enc-channels 8,16 \ --full-image --image-size 256 \ - --epochs 500 + --epochs 5000 + +# Size-budget variant [4,8] (fits 6 KB) +uv run python3 train_cnn_v3.py \ + --input dataset/ \ + --enc-channels 4,8 \ + --epochs 5000 # Quick smoke test: 1 epoch, small patches, random detector -python3 train_cnn_v3.py \ +uv run python3 train_cnn_v3.py \ --input dataset/ --epochs 1 \ + --enc-channels 8,16 \ --patch-size 32 --detector random ``` @@ -318,7 +326,7 @@ All other flags (`--epochs`, `--lr`, `--checkpoint-dir`, `--enc-channels`, etc.) | `--detector` | `harris` | `harris` \| `shi-tomasi` \| `fast` \| `gradient` \| `random` | | `--channel-dropout-p F` | `0.3` | Dropout prob for geometric channels | | `--full-image` | off | Resize full image instead of cropping patches | -| `--enc-channels C` | `4,8` | Encoder channel counts, comma-separated | +| `--enc-channels C` | `4,8` | Encoder channel counts: `8,16` (current default runtime), `4,8` (size budget) | | `--film-cond-dim N` | `5` | FiLM conditioning input size | | `--epochs N` | `200` | Training epochs | | `--batch-size N` | `16` | Batch size | @@ -397,6 +405,7 @@ Test vectors generated by `cnn_v3/training/gen_test_vectors.py` (PyTorch referen | 5 — Parity validation | ✅ Done | test_cnn_v3_parity.cc, max_err=4.88e-4 | | 6 — FiLM MLP training | ✅ Done | train_cnn_v3.py + cnn_v3_utils.py written | | 7 — G-buffer visualizer (C++) | ✅ Done | GBufViewEffect, 36/36 tests pass | +| 8 — Architecture upgrade [8,16] | ✅ Done | enc_channels=[8,16], multi-scale loss, 16ch textures split into lo/hi pairs | | 7 — Sample loader (web tool) | ✅ Done | "Load sample directory" in cnn_v3/tools/ | --- @@ -408,10 +417,10 @@ The common snippet provides `get_w()` and `unpack_8ch()`. | Pass | Shader | Input(s) | Output | Dims | |------|--------|----------|--------|------| -| enc0 | `cnn_v3_enc0.wgsl` | feat_tex0+feat_tex1 (20ch) | enc0_tex rgba16float (4ch) | full | -| enc1 | `cnn_v3_enc1.wgsl` | enc0_tex (AvgPool2×2 inline) | enc1_tex rgba32uint (8ch) | ½ | -| bottleneck | `cnn_v3_bottleneck.wgsl` | enc1_tex (AvgPool2×2 inline) | bottleneck_tex rgba32uint (8ch) | ¼ | -| dec1 | `cnn_v3_dec1.wgsl` | bottleneck_tex + enc1_tex (skip) | dec1_tex rgba16float (4ch) | ½ | +| enc0 | `cnn_v3_enc0.wgsl` | feat_tex0+feat_tex1 (20ch) | enc0_tex rgba32uint (8ch) | full | +| enc1 | `cnn_v3_enc1.wgsl` | enc0_tex (AvgPool2×2 inline) | enc1_lo+enc1_hi rgba32uint (16ch split) | ½ | +| bottleneck | `cnn_v3_bottleneck.wgsl` | enc1_lo+enc1_hi (AvgPool2×2 inline) | bn_lo+bn_hi rgba32uint (16ch split) | ¼ | +| dec1 | `cnn_v3_dec1.wgsl` | bn_lo+bn_hi + enc1_lo+enc1_hi (skip) | dec1_tex rgba32uint (8ch) | ½ | | dec0 | `cnn_v3_dec0.wgsl` | dec1_tex + enc0_tex (skip) | output_tex rgba16float (4ch) | full | **Parity rules baked into the shaders:** @@ -437,12 +446,12 @@ FiLM γ/β are computed CPU-side by the FiLM MLP (Phase 4) and uploaded each fra **Weight offsets** (f16 units, including bias): | Layer | Weights | Bias | Total f16 | |-------|---------|------|-----------| -| enc0 | 20×4×9=720 | +4 | 724 | -| enc1 | 4×8×9=288 | +8 | 296 | -| bottleneck | 8×8×9=576 | +8 | 584 | -| dec1 | 16×4×9=576 | +4 | 580 | -| dec0 | 8×4×9=288 | +4 | 292 | -| **Total** | | | **2476 f16 = ~4.84 KB** | +| enc0 | 20×8×9=1440 | +8 | 1448 | +| enc1 | 8×16×9=1152 | +16 | 1168 | +| bottleneck | 16×16×9=2304 | +16 | 2320 | +| dec1 | 32×8×9=2304 | +8 | 2312 | +| dec0 | 16×4×9=576 | +4 | 580 | +| **Total** | | | **7828 f16 = ~15.3 KB** | **Asset IDs** (registered in `workspaces/main/assets.txt` + `src/effects/shaders.cc`): `SHADER_CNN_V3_COMMON`, `SHADER_CNN_V3_ENC0`, `SHADER_CNN_V3_ENC1`, |
