1 files changed, 28 insertions, 19 deletions
diff --git a/cnn_v3/docs/HOWTO.md b/cnn_v3/docs/HOWTO.md
index 9a3efdf..ff8793f 100644
--- a/cnn_v3/docs/HOWTO.md
+++ b/cnn_v3/docs/HOWTO.md
@@ -267,22 +267,30 @@ Two source files:
 ```bash
 cd cnn_v3/training
 
-# Patch-based (default) — 64×64 patches around Harris corners
-python3 train_cnn_v3.py \
+# Recommended: [8,16] channels + multi-scale loss (matches runtime)
+uv run python3 train_cnn_v3.py \
     --input dataset/ \
-    --input-mode simple \
-    --epochs 200
+    --enc-channels 8,16 \
+    --epochs 5000 \
+    --checkpoint-dir checkpoints_8_16
 
 # Full-image mode (resizes to 256×256)
-python3 train_cnn_v3.py \
+uv run python3 train_cnn_v3.py \
     --input dataset/ \
-    --input-mode full \
+    --enc-channels 8,16 \
     --full-image --image-size 256 \
-    --epochs 500
+    --epochs 5000
+
+# Size-budget variant [4,8] (fits 6 KB)
+uv run python3 train_cnn_v3.py \
+    --input dataset/ \
+    --enc-channels 4,8 \
+    --epochs 5000
 
 # Quick smoke test: 1 epoch, small patches, random detector
-python3 train_cnn_v3.py \
+uv run python3 train_cnn_v3.py \
     --input dataset/ --epochs 1 \
+    --enc-channels 8,16 \
     --patch-size 32 --detector random
 ```
 
@@ -318,7 +326,7 @@ All other flags (`--epochs`, `--lr`, `--checkpoint-dir`, `--enc-channels`, etc.)
 | `--detector` | `harris` | `harris` \| `shi-tomasi` \| `fast` \| `gradient` \| `random` |
 | `--channel-dropout-p F` | `0.3` | Dropout prob for geometric channels |
 | `--full-image` | off | Resize full image instead of cropping patches |
-| `--enc-channels C` | `4,8` | Encoder channel counts, comma-separated |
+| `--enc-channels C` | `4,8` | Encoder channel counts: `8,16` (current default runtime), `4,8` (size budget) |
 | `--film-cond-dim N` | `5` | FiLM conditioning input size |
 | `--epochs N` | `200` | Training epochs |
 | `--batch-size N` | `16` | Batch size |
@@ -397,6 +405,7 @@ Test vectors generated by `cnn_v3/training/gen_test_vectors.py` (PyTorch referen
 | 5 — Parity validation | ✅ Done | test_cnn_v3_parity.cc, max_err=4.88e-4 |
 | 6 — FiLM MLP training | ✅ Done | train_cnn_v3.py + cnn_v3_utils.py written |
 | 7 — G-buffer visualizer (C++) | ✅ Done | GBufViewEffect, 36/36 tests pass |
+| 8 — Architecture upgrade [8,16] | ✅ Done | enc_channels=[8,16], multi-scale loss, 16ch textures split into lo/hi pairs |
 | 7 — Sample loader (web tool) | ✅ Done | "Load sample directory" in cnn_v3/tools/ |
 
 ---
@@ -408,10 +417,10 @@ The common snippet provides `get_w()` and `unpack_8ch()`.
 
 | Pass | Shader | Input(s) | Output | Dims |
 |------|--------|----------|--------|------|
-| enc0 | `cnn_v3_enc0.wgsl` | feat_tex0+feat_tex1 (20ch) | enc0_tex rgba16float (4ch) | full |
-| enc1 | `cnn_v3_enc1.wgsl` | enc0_tex (AvgPool2×2 inline) | enc1_tex rgba32uint (8ch) | ½ |
-| bottleneck | `cnn_v3_bottleneck.wgsl` | enc1_tex (AvgPool2×2 inline) | bottleneck_tex rgba32uint (8ch) | ¼ |
-| dec1 | `cnn_v3_dec1.wgsl` | bottleneck_tex + enc1_tex (skip) | dec1_tex rgba16float (4ch) | ½ |
+| enc0 | `cnn_v3_enc0.wgsl` | feat_tex0+feat_tex1 (20ch) | enc0_tex rgba32uint (8ch) | full |
+| enc1 | `cnn_v3_enc1.wgsl` | enc0_tex (AvgPool2×2 inline) | enc1_lo+enc1_hi rgba32uint (16ch split) | ½ |
+| bottleneck | `cnn_v3_bottleneck.wgsl` | enc1_lo+enc1_hi (AvgPool2×2 inline) | bn_lo+bn_hi rgba32uint (16ch split) | ¼ |
+| dec1 | `cnn_v3_dec1.wgsl` | bn_lo+bn_hi + enc1_lo+enc1_hi (skip) | dec1_tex rgba32uint (8ch) | ½ |
 | dec0 | `cnn_v3_dec0.wgsl` | dec1_tex + enc0_tex (skip) | output_tex rgba16float (4ch) | full |
 
 **Parity rules baked into the shaders:**
@@ -437,12 +446,12 @@ FiLM γ/β are computed CPU-side by the FiLM MLP (Phase 4) and uploaded each fra
 **Weight offsets** (f16 units, including bias):
 | Layer | Weights | Bias | Total f16 |
 |-------|---------|------|-----------|
-| enc0  | 20×4×9=720 | +4 | 724 |
-| enc1  | 4×8×9=288  | +8 | 296 |
-| bottleneck | 8×8×9=576 | +8 | 584 |
-| dec1  | 16×4×9=576 | +4 | 580 |
-| dec0  | 8×4×9=288  | +4 | 292 |
-| **Total** | | | **2476 f16 = ~4.84 KB** |
+| enc0  | 20×8×9=1440 | +8  | 1448 |
+| enc1  | 8×16×9=1152 | +16 | 1168 |
+| bottleneck | 16×16×9=2304 | +16 | 2320 |
+| dec1  | 32×8×9=2304 | +8  | 2312 |
+| dec0  | 16×4×9=576  | +4  | 580  |
+| **Total** | | | **7828 f16 = ~15.3 KB** |
 
 **Asset IDs** (registered in `workspaces/main/assets.txt` + `src/effects/shaders.cc`):
 `SHADER_CNN_V3_COMMON`, `SHADER_CNN_V3_ENC0`, `SHADER_CNN_V3_ENC1`,