summaryrefslogtreecommitdiff
path: root/cnn_v3/docs/HOWTO.md
diff options
context:
space:
mode:
Diffstat (limited to 'cnn_v3/docs/HOWTO.md')
-rw-r--r--cnn_v3/docs/HOWTO.md47
1 files changed, 28 insertions, 19 deletions
diff --git a/cnn_v3/docs/HOWTO.md b/cnn_v3/docs/HOWTO.md
index 9a3efdf..ff8793f 100644
--- a/cnn_v3/docs/HOWTO.md
+++ b/cnn_v3/docs/HOWTO.md
@@ -267,22 +267,30 @@ Two source files:
```bash
cd cnn_v3/training
-# Patch-based (default) — 64×64 patches around Harris corners
-python3 train_cnn_v3.py \
+# Recommended: [8,16] channels + multi-scale loss (matches runtime)
+uv run python3 train_cnn_v3.py \
--input dataset/ \
- --input-mode simple \
- --epochs 200
+ --enc-channels 8,16 \
+ --epochs 5000 \
+ --checkpoint-dir checkpoints_8_16
# Full-image mode (resizes to 256×256)
-python3 train_cnn_v3.py \
+uv run python3 train_cnn_v3.py \
--input dataset/ \
- --input-mode full \
+ --enc-channels 8,16 \
--full-image --image-size 256 \
- --epochs 500
+ --epochs 5000
+
+# Size-budget variant [4,8] (fits 6 KB)
+uv run python3 train_cnn_v3.py \
+ --input dataset/ \
+ --enc-channels 4,8 \
+ --epochs 5000
# Quick smoke test: 1 epoch, small patches, random detector
-python3 train_cnn_v3.py \
+uv run python3 train_cnn_v3.py \
--input dataset/ --epochs 1 \
+ --enc-channels 8,16 \
--patch-size 32 --detector random
```
@@ -318,7 +326,7 @@ All other flags (`--epochs`, `--lr`, `--checkpoint-dir`, `--enc-channels`, etc.)
| `--detector` | `harris` | `harris` \| `shi-tomasi` \| `fast` \| `gradient` \| `random` |
| `--channel-dropout-p F` | `0.3` | Dropout prob for geometric channels |
| `--full-image` | off | Resize full image instead of cropping patches |
-| `--enc-channels C` | `4,8` | Encoder channel counts, comma-separated |
+| `--enc-channels C` | `4,8` | Encoder channel counts: `8,16` (current default runtime), `4,8` (size budget) |
| `--film-cond-dim N` | `5` | FiLM conditioning input size |
| `--epochs N` | `200` | Training epochs |
| `--batch-size N` | `16` | Batch size |
@@ -397,6 +405,7 @@ Test vectors generated by `cnn_v3/training/gen_test_vectors.py` (PyTorch referen
| 5 — Parity validation | ✅ Done | test_cnn_v3_parity.cc, max_err=4.88e-4 |
| 6 — FiLM MLP training | ✅ Done | train_cnn_v3.py + cnn_v3_utils.py written |
| 7 — G-buffer visualizer (C++) | ✅ Done | GBufViewEffect, 36/36 tests pass |
+| 8 — Architecture upgrade [8,16] | ✅ Done | enc_channels=[8,16], multi-scale loss, 16ch textures split into lo/hi pairs |
| 7 — Sample loader (web tool) | ✅ Done | "Load sample directory" in cnn_v3/tools/ |
---
@@ -408,10 +417,10 @@ The common snippet provides `get_w()` and `unpack_8ch()`.
| Pass | Shader | Input(s) | Output | Dims |
|------|--------|----------|--------|------|
-| enc0 | `cnn_v3_enc0.wgsl` | feat_tex0+feat_tex1 (20ch) | enc0_tex rgba16float (4ch) | full |
-| enc1 | `cnn_v3_enc1.wgsl` | enc0_tex (AvgPool2×2 inline) | enc1_tex rgba32uint (8ch) | ½ |
-| bottleneck | `cnn_v3_bottleneck.wgsl` | enc1_tex (AvgPool2×2 inline) | bottleneck_tex rgba32uint (8ch) | ¼ |
-| dec1 | `cnn_v3_dec1.wgsl` | bottleneck_tex + enc1_tex (skip) | dec1_tex rgba16float (4ch) | ½ |
+| enc0 | `cnn_v3_enc0.wgsl` | feat_tex0+feat_tex1 (20ch) | enc0_tex rgba32uint (8ch) | full |
+| enc1 | `cnn_v3_enc1.wgsl` | enc0_tex (AvgPool2×2 inline) | enc1_lo+enc1_hi rgba32uint (16ch split) | ½ |
+| bottleneck | `cnn_v3_bottleneck.wgsl` | enc1_lo+enc1_hi (AvgPool2×2 inline) | bn_lo+bn_hi rgba32uint (16ch split) | ¼ |
+| dec1 | `cnn_v3_dec1.wgsl` | bn_lo+bn_hi + enc1_lo+enc1_hi (skip) | dec1_tex rgba32uint (8ch) | ½ |
| dec0 | `cnn_v3_dec0.wgsl` | dec1_tex + enc0_tex (skip) | output_tex rgba16float (4ch) | full |
**Parity rules baked into the shaders:**
@@ -437,12 +446,12 @@ FiLM γ/β are computed CPU-side by the FiLM MLP (Phase 4) and uploaded each fra
**Weight offsets** (f16 units, including bias):
| Layer | Weights | Bias | Total f16 |
|-------|---------|------|-----------|
-| enc0 | 20×4×9=720 | +4 | 724 |
-| enc1 | 4×8×9=288 | +8 | 296 |
-| bottleneck | 8×8×9=576 | +8 | 584 |
-| dec1 | 16×4×9=576 | +4 | 580 |
-| dec0 | 8×4×9=288 | +4 | 292 |
-| **Total** | | | **2476 f16 = ~4.84 KB** |
+| enc0 | 20×8×9=1440 | +8 | 1448 |
+| enc1 | 8×16×9=1152 | +16 | 1168 |
+| bottleneck | 16×16×9=2304 | +16 | 2320 |
+| dec1 | 32×8×9=2304 | +8 | 2312 |
+| dec0 | 16×4×9=576 | +4 | 580 |
+| **Total** | | | **7828 f16 = ~15.3 KB** |
**Asset IDs** (registered in `workspaces/main/assets.txt` + `src/effects/shaders.cc`):
`SHADER_CNN_V3_COMMON`, `SHADER_CNN_V3_ENC0`, `SHADER_CNN_V3_ENC1`,