1 files changed, 17 insertions, 24 deletions
diff --git a/cnn_v3/docs/HOW_TO_CNN.md b/cnn_v3/docs/HOW_TO_CNN.md
index f5f1b1a..09db97c 100644
--- a/cnn_v3/docs/HOW_TO_CNN.md
+++ b/cnn_v3/docs/HOW_TO_CNN.md
@@ -28,26 +28,13 @@ CNN v3 is a 2-level U-Net with FiLM conditioning, designed to run in real-time a
 
 **Architecture:**
 
-```
-Input: 20-channel G-buffer feature textures (rgba32uint)
-  │
-  enc0 ──── Conv(20→4, 3×3) + FiLM + ReLU         ┐ full res
-  │    ↘ skip                                       │
-  enc1 ──── AvgPool2×2 + Conv(4→8, 3×3) + FiLM    ┐ ½ res
-  │    ↘ skip                                       │
-  bottleneck AvgPool2×2 + Conv(8→8, 1×1) + ReLU   ¼ res (no FiLM)
-  │                                                 │
-  dec1 ←── upsample×2 + cat(enc1 skip) + Conv(16→4, 3×3) + FiLM
-  │                                                 │ ½ res
-  dec0 ←── upsample×2 + cat(enc0 skip) + Conv(8→4, 3×3) + FiLM + sigmoid
-                                                    full res → RGBA output
-```
+![CNN v3 U-Net + FiLM Architecture](cnn_v3_architecture.png)
 
 **FiLM MLP:** `Linear(5→16) → ReLU → Linear(16→40)` trained jointly with U-Net.
 - Input: `[beat_phase, beat_norm, audio_intensity, style_p0, style_p1]`
 - Output: 40 γ/β values controlling style across all 4 FiLM layers
 
-**Weight budget:** ~3.9 KB f16 (fits ≤6 KB target)
+**Weight budget:** ~4.84 KB f16 conv (fits ≤6 KB target)
 
 **Two data paths:**
 - **Simple mode** — real photos with zeroed geometric channels (normal, depth, matid)
@@ -307,7 +294,9 @@ uv run train_cnn_v3.py --input dataset/ --epochs 1 \
 uv run train_cnn_v3.py \
     --input dataset/ \
     --input-mode simple \
-    --epochs 200
+    --epochs 200 \
+    --edge-loss-weight 0.1 \
+    --film-warmup-epochs 50
 ```
 
 **Blender G-buffer training:**
@@ -315,7 +304,9 @@ uv run train_cnn_v3.py \
 uv run train_cnn_v3.py \
     --input dataset/ \
     --input-mode full \
-    --epochs 200
+    --epochs 200 \
+    --edge-loss-weight 0.1 \
+    --film-warmup-epochs 50
 ```
 
 **Full-image mode (better global coherence, slower):**
@@ -360,12 +351,14 @@ uv run train_cnn_v3.py \
 | `--checkpoint-dir DIR` | `checkpoints/` | Set per-experiment |
 | `--checkpoint-every N` | 50 | 0 to disable intermediate checkpoints |
 | `--resume [CKPT]` | — | Resume from checkpoint path; if path missing, uses latest in `--checkpoint-dir` |
+| `--edge-loss-weight F` | 0.1 | Sobel gradient loss weight alongside MSE; improves style/edge capture; 0=MSE only |
+| `--film-warmup-epochs N` | 50 | Freeze FiLM MLP for first N epochs (phase-1), then unfreeze at lr×0.1; 0=joint training |
 
 ### Architecture at startup
 
 The model prints its parameter count:
 ```
-Model: enc=[4, 8]  film_cond_dim=5  params=2740  (~5.4 KB f16)
+Model: enc=[4, 8]  film_cond_dim=5  params=3252  (~6.4 KB f16)
 ```
 
 If `params` is much higher, `--enc-channels` was changed; update C++ constants accordingly.
@@ -489,7 +482,7 @@ Use `--html-output PATH` to write to a different `weights.js` location.
 
 Output files are registered in `workspaces/main/assets.txt` as:
 ```
-WEIGHTS_CNN_V3, BINARY, weights/cnn_v3_weights.bin, "CNN v3 conv weights (f16, 3928 bytes)"
+WEIGHTS_CNN_V3, BINARY, weights/cnn_v3_weights.bin, "CNN v3 conv weights (f16, 4952 bytes)"
 WEIGHTS_CNN_V3_FILM_MLP, BINARY, weights/cnn_v3_film_mlp.bin, "CNN v3 FiLM MLP weights (f32, 3104 bytes)"
 ```
 
@@ -501,10 +494,10 @@ WEIGHTS_CNN_V3_FILM_MLP, BINARY, weights/cnn_v3_film_mlp.bin, "CNN v3 FiLM MLP w
 |-------|-----------|-------|
 | enc0 Conv(20→4,3×3)+bias | 724 | — |
 | enc1 Conv(4→8,3×3)+bias | 296 | — |
-| bottleneck Conv(8→8,1×1)+bias | 72 | — |
+| bottleneck Conv(8→8,3×3,dil=2)+bias | 584 | — |
 | dec1 Conv(16→4,3×3)+bias | 580 | — |
 | dec0 Conv(8→4,3×3)+bias | 292 | — |
-| **Total** | **1964 f16** | **3928 bytes** |
+| **Total** | **2476 f16** | **4952 bytes** |
 
 **`cnn_v3_film_mlp.bin`** — FiLM MLP weights as raw f32, row-major:
 
@@ -534,8 +527,8 @@ Checkpoint: epoch=200  loss=0.012345
   enc_channels=[4, 8]  film_cond_dim=5
 
 cnn_v3_weights.bin
-  1964 f16 values → 982 u32 → 3928 bytes
-  Upload via CNNv3Effect::upload_weights(queue, data, 3928)
+  2476 f16 values → 1238 u32 → 4952 bytes
+  Upload via CNNv3Effect::upload_weights(queue, data, 4952)
 
 cnn_v3_film_mlp.bin
   L0: weight (16, 5) + bias (16,)
@@ -824,7 +817,7 @@ all geometric channels (normal, depth, depth_grad, mat_id, prev) = 0.
 ### Pitfalls
 
 - `rgba32uint` and `rgba16float` textures both need `STORAGE_BINDING | TEXTURE_BINDING` usage.
-- Weight offsets are **f16 indices** (enc0=0, enc1=724, bn=1020, dec1=1092, dec0=1672).
+- Weight offsets are **f16 indices** (enc0=0, enc1=724, bn=1020, dec1=1604, dec0=2184).
 - Uniform buffer layouts must match WGSL `Params` structs exactly (padding included).
 
 ---