summaryrefslogtreecommitdiff
path: root/cnn_v3/docs/HOW_TO_CNN.md
diff options
context:
space:
mode:
Diffstat (limited to 'cnn_v3/docs/HOW_TO_CNN.md')
-rw-r--r--cnn_v3/docs/HOW_TO_CNN.md173
1 files changed, 95 insertions, 78 deletions
diff --git a/cnn_v3/docs/HOW_TO_CNN.md b/cnn_v3/docs/HOW_TO_CNN.md
index 4966a61..09db97c 100644
--- a/cnn_v3/docs/HOW_TO_CNN.md
+++ b/cnn_v3/docs/HOW_TO_CNN.md
@@ -28,26 +28,13 @@ CNN v3 is a 2-level U-Net with FiLM conditioning, designed to run in real-time a
**Architecture:**
-```
-Input: 20-channel G-buffer feature textures (rgba32uint)
- │
- enc0 ──── Conv(20→4, 3×3) + FiLM + ReLU ┐ full res
- │ ↘ skip │
- enc1 ──── AvgPool2×2 + Conv(4→8, 3×3) + FiLM ┐ ½ res
- │ ↘ skip │
- bottleneck AvgPool2×2 + Conv(8→8, 1×1) + ReLU ¼ res (no FiLM)
- │ │
- dec1 ←── upsample×2 + cat(enc1 skip) + Conv(16→4, 3×3) + FiLM
- │ │ ½ res
- dec0 ←── upsample×2 + cat(enc0 skip) + Conv(8→4, 3×3) + FiLM + sigmoid
- full res → RGBA output
-```
+![CNN v3 U-Net + FiLM Architecture](cnn_v3_architecture.png)
**FiLM MLP:** `Linear(5→16) → ReLU → Linear(16→40)` trained jointly with U-Net.
- Input: `[beat_phase, beat_norm, audio_intensity, style_p0, style_p1]`
- Output: 40 γ/β values controlling style across all 4 FiLM layers
-**Weight budget:** ~3.9 KB f16 (fits ≤6 KB target)
+**Weight budget:** ~4.84 KB f16 conv (fits ≤6 KB target)
**Two data paths:**
- **Simple mode** — real photos with zeroed geometric channels (normal, depth, matid)
@@ -58,13 +45,13 @@ Input: 20-channel G-buffer feature textures (rgba32uint)
```
photos/Blender → pack → dataset/ → train_cnn_v3.py → checkpoint.pth
- export_cnn_v3_weights.py
- ┌─────────┴──────────┐
- cnn_v3_weights.bin cnn_v3_film_mlp.bin
- │
- CNNv3Effect::upload_weights()
- │
- demo / HTML tool
+ export_cnn_v3_weights.py [--html]
+ ┌──────────┴────────────┬──────────────┐
+ cnn_v3_weights.bin cnn_v3_film_mlp.bin weights.js
+ │ (HTML tool
+ CNNv3Effect::upload_weights() defaults)
+ │
+ demo
```
---
@@ -107,15 +94,6 @@ The network learns the mapping `albedo → target`. If you pass the same image a
input and target, the network learns identity (useful as sanity check, not for real
training). Confirm `target.png` looks correct before running training.
-**Alternative — pack without a target yet:**
-```bash
-python3 pack_photo_sample.py \
- --photo /path/to/photo.png \
- --output dataset/simple/sample_001/
-# target.png defaults to a copy of the input; replace it before training:
-cp my_stylized_version.png dataset/simple/sample_001/target.png
-```
-
**Batch packing:**
```bash
for f in photos/*.png; do
@@ -284,55 +262,78 @@ The U-Net conv weights and FiLM MLP train **jointly** in a single run. No separa
### Prerequisites
+`train_cnn_v3.py` and `export_cnn_v3_weights.py` carry inline `uv` dependency metadata
+(`# /// script`). Use `uv run` — no manual `pip install` needed:
+
```bash
-pip install torch torchvision pillow numpy opencv-python
cd cnn_v3/training
+uv run train_cnn_v3.py --input dataset/ --epochs 1 --patch-size 32 --detector random
```
-**With `uv` (no pip needed):** dependencies are declared inline in `train_cnn_v3.py`
-and installed automatically on first run:
+**Without `uv` (manual pip):**
```bash
+pip install torch torchvision pillow numpy opencv-python
cd cnn_v3/training
-uv run train_cnn_v3.py --input dataset/ --epochs 1 --patch-size 32 --detector random
+python3 train_cnn_v3.py ...
```
+The pack scripts (`pack_photo_sample.py`, `pack_blender_sample.py`) and
+`gen_test_vectors.py` do **not** have uv metadata — run them with `python3` directly
+(they only need `numpy`, `pillow`, and optionally `openexr`).
+
### Quick-start commands
**Smoke test — 1 epoch, validates end-to-end without GPU:**
```bash
-python3 train_cnn_v3.py --input dataset/ --epochs 1 \
+uv run train_cnn_v3.py --input dataset/ --epochs 1 \
--patch-size 32 --detector random
```
**Standard photo training (patch-based):**
```bash
-python3 train_cnn_v3.py \
+uv run train_cnn_v3.py \
--input dataset/ \
--input-mode simple \
- --epochs 200
+ --epochs 200 \
+ --edge-loss-weight 0.1 \
+ --film-warmup-epochs 50
```
**Blender G-buffer training:**
```bash
-python3 train_cnn_v3.py \
+uv run train_cnn_v3.py \
--input dataset/ \
--input-mode full \
- --epochs 200
+ --epochs 200 \
+ --edge-loss-weight 0.1 \
+ --film-warmup-epochs 50
```
**Full-image mode (better global coherence, slower):**
```bash
-python3 train_cnn_v3.py \
+uv run train_cnn_v3.py \
--input dataset/ \
--input-mode full \
--full-image --image-size 256 \
--epochs 500
```
+**Single-sample training (overfit on one input/target pair):**
+```bash
+# Pack first
+./gen_sample.sh input/photo.png target/photo_styled.png dataset/simple/sample_001/
+
+# Train — --full-image and --batch-size 1 are implied
+uv run train_cnn_v3.py \
+ --single-sample dataset/simple/sample_001/ \
+ --epochs 500
+```
+
### Flag reference
| Flag | Default | Notes |
|------|---------|-------|
+| `--single-sample DIR` | — | Train on one sample dir; implies `--full-image`, `--batch-size 1` |
| `--input DIR` | `training/dataset` | Dataset root; always set explicitly |
| `--input-mode` | `simple` | `simple`=photos, `full`=Blender G-buffer |
| `--epochs N` | 200 | 500 recommended for full-image mode |
@@ -340,7 +341,8 @@ python3 train_cnn_v3.py \
| `--lr F` | 1e-3 | Reduce to 1e-4 if loss oscillates or NaN |
| `--patch-size N` | 64 | Smaller = faster epoch, less spatial context |
| `--patches-per-image N` | 256 | Reduce for small datasets |
-| `--detector` | `harris` | `random` for smoke tests; `shi-tomasi` as alternative |
+| `--detector` | `harris` | `random` for smoke tests; also `shi-tomasi`, `fast`, `gradient` |
+| `--patch-search-window N` | 0 | Search ±N px in target to find best alignment (grayscale MSE) per patch; 0=disabled. Use when source and target are not perfectly co-registered (e.g. photo + hand-painted target). Offsets cached at dataset init. |
| `--channel-dropout-p F` | 0.3 | Lower if all samples have geometry (Blender only) |
| `--full-image` | off | Resize full image instead of patch crops |
| `--image-size N` | 256 | Resize target; only used with `--full-image` |
@@ -348,12 +350,15 @@ python3 train_cnn_v3.py \
| `--film-cond-dim N` | 5 | Must match `CNNv3FiLMParams` field count in C++ |
| `--checkpoint-dir DIR` | `checkpoints/` | Set per-experiment |
| `--checkpoint-every N` | 50 | 0 to disable intermediate checkpoints |
+| `--resume [CKPT]` | — | Resume from checkpoint path; if path missing, uses latest in `--checkpoint-dir` |
+| `--edge-loss-weight F` | 0.1 | Sobel gradient loss weight alongside MSE; improves style/edge capture; 0=MSE only |
+| `--film-warmup-epochs N` | 50 | Freeze FiLM MLP for first N epochs (phase-1), then unfreeze at lr×0.1; 0=joint training |
### Architecture at startup
The model prints its parameter count:
```
-Model: enc=[4, 8] film_cond_dim=5 params=2740 (~5.4 KB f16)
+Model: enc=[4, 8] film_cond_dim=5 params=3252 (~6.4 KB f16)
```
If `params` is much higher, `--enc-channels` was changed; update C++ constants accordingly.
@@ -454,17 +459,30 @@ The final checkpoint is always written even if `--checkpoint-every 0`.
## 3. Exporting Weights
-Converts a trained `.pth` checkpoint to two raw binary files for the C++ runtime.
+Converts a trained `.pth` checkpoint to two raw binary files for the C++ runtime,
+and optionally updates the HTML tool's embedded defaults.
+**Standard export (C++ runtime only):**
```bash
cd cnn_v3/training
-python3 export_cnn_v3_weights.py checkpoints/checkpoint_epoch_200.pth \
+uv run export_cnn_v3_weights.py checkpoints/checkpoint_epoch_200.pth \
--output ../../workspaces/main/weights/
```
+**Export + update HTML tool defaults (`cnn_v3/tools/weights.js`):**
+```bash
+uv run export_cnn_v3_weights.py checkpoints/checkpoint_epoch_200.pth \
+ --output ../../workspaces/main/weights/ \
+ --html
+```
+
+`--html` base64-encodes both `.bin` files and rewrites `cnn_v3/tools/weights.js`
+so the HTML tool loads the new weights as its embedded defaults at startup.
+Use `--html-output PATH` to write to a different `weights.js` location.
+
Output files are registered in `workspaces/main/assets.txt` as:
```
-WEIGHTS_CNN_V3, BINARY, weights/cnn_v3_weights.bin, "CNN v3 conv weights (f16, 3928 bytes)"
+WEIGHTS_CNN_V3, BINARY, weights/cnn_v3_weights.bin, "CNN v3 conv weights (f16, 4952 bytes)"
WEIGHTS_CNN_V3_FILM_MLP, BINARY, weights/cnn_v3_film_mlp.bin, "CNN v3 FiLM MLP weights (f32, 3104 bytes)"
```
@@ -476,10 +494,10 @@ WEIGHTS_CNN_V3_FILM_MLP, BINARY, weights/cnn_v3_film_mlp.bin, "CNN v3 FiLM MLP w
|-------|-----------|-------|
| enc0 Conv(20→4,3×3)+bias | 724 | — |
| enc1 Conv(4→8,3×3)+bias | 296 | — |
-| bottleneck Conv(8→8,1×1)+bias | 72 | — |
+| bottleneck Conv(8→8,3×3,dil=2)+bias | 584 | — |
| dec1 Conv(16→4,3×3)+bias | 580 | — |
| dec0 Conv(8→4,3×3)+bias | 292 | — |
-| **Total** | **1964 f16** | **3928 bytes** |
+| **Total** | **2476 f16** | **4952 bytes** |
**`cnn_v3_film_mlp.bin`** — FiLM MLP weights as raw f32, row-major:
@@ -509,8 +527,8 @@ Checkpoint: epoch=200 loss=0.012345
enc_channels=[4, 8] film_cond_dim=5
cnn_v3_weights.bin
- 1964 f16 values → 982 u32 → 3928 bytes
- Upload via CNNv3Effect::upload_weights(queue, data, 3928)
+ 2476 f16 values → 1238 u32 → 4952 bytes
+ Upload via CNNv3Effect::upload_weights(queue, data, 4952)
cnn_v3_film_mlp.bin
L0: weight (16, 5) + bias (16,)
@@ -543,10 +561,12 @@ It owns:
```
SEQUENCE 0 0 "Scene with CNN v3"
- EFFECT + GBufferEffect prev_cnn -> gbuf_feat0 gbuf_feat1 0 60
- EFFECT + CNNv3Effect gbuf_feat0 gbuf_feat1 -> sink 0 60
+ EFFECT + GBufferEffect source -> gbuf_feat0 gbuf_feat1 0 60
+ EFFECT + CNNv3Effect gbuf_feat0 gbuf_feat1 -> sink 0 60
```
+Temporal feedback (`prev_cnn`) is wired automatically by `wire_dag()` — no explicit input needed in the `.seq` file.
+
Or direct C++:
```cpp
#include "cnn_v3/src/cnn_v3_effect.h"
@@ -636,8 +656,8 @@ Do not reference them from outside the effect unless debugging.
```bash
cmake -B build -DCMAKE_BUILD_TYPE=Release
-cmake --build build -j$(nproc)
-./build/demo
+cmake --build build -j4
+./build/demo64k
```
### Expected visual output
@@ -733,13 +753,14 @@ If results drift after shader edits, verify these invariants match the Python re
## 7. HTML WebGPU Tool
-**Location:** `cnn_v3/tools/` — three files, no build step.
+**Location:** `cnn_v3/tools/` — four files, no build step.
| File | Lines | Contents |
|------|-------|----------|
-| `index.html` | 147 | HTML + CSS |
-| `shaders.js` | 252 | WGSL shader constants, weight-offset constants |
-| `tester.js` | 540 | `CNNv3Tester` class, event wiring |
+| `index.html` | 168 | HTML + CSS |
+| `shaders.js` | 312 | WGSL shader constants, weight-offset constants |
+| `tester.js` | 913 | `CNNv3Tester` class, inference pipeline, layer viz |
+| `weights.js` | 7 | Embedded default weights (base64); auto-generated by `--html` |
### Usage
@@ -750,32 +771,27 @@ python3 -m http.server 8080
# Open: http://localhost:8080/cnn_v3/tools/
```
-Or on macOS with Chrome:
+Weights are **loaded automatically at startup** from `weights.js` (embedded base64).
+If the tool is served from the repo root, it also tries to fetch the latest
+`workspaces/main/weights/*.bin` over HTTP and uses those if available.
+Use the **↺ Reload** button to re-fetch after updating weights on disk.
+
+To update the embedded defaults after a training run, use `--html` (§3):
```bash
-open -a "Google Chrome" --args --allow-file-access-from-files
-open cnn_v3/tools/index.html
+uv run export_cnn_v3_weights.py checkpoints/checkpoint.pth \
+ --output ../../workspaces/main/weights/ --html
```
### Workflow
-1. **Drop `cnn_v3_weights.bin`** onto the left "weights" drop zone.
-2. **Drop a PNG or video** onto the centre canvas → CNN runs immediately.
-3. _(Optional)_ **Drop `cnn_v3_film_mlp.bin`** → FiLM sliders become active.
-4. Adjust **beat_phase / beat_norm / audio_int / style_p0 / style_p1** sliders → reruns on change.
-5. Click layer buttons (**Feat · Enc0 · Enc1 · BN · Dec1 · Output**) in the right panel to inspect activations.
-6. **Save PNG** to export the current output.
+1. **Drop a PNG or video** onto the canvas → CNN runs immediately (weights pre-loaded).
+2. Adjust **beat_phase / beat_norm / audio_int / style_p0 / style_p1** sliders.
+3. Click layer buttons (**Feat · Enc0 · Enc1 · BN · Dec1 · Output**) to inspect activations.
+4. **Save PNG** to export the current output.
+5. _(Optional)_ Drop updated `.bin` files onto the left panel to override embedded weights.
Keyboard: `[SPACE]` toggle original · `[D]` diff×10.
-### Input files
-
-| File | Format | Notes |
-|------|--------|-------|
-| `cnn_v3_weights.bin` | raw u32 (no header) | 982 u32 = 1964 f16 = ~3.9 KB |
-| `cnn_v3_film_mlp.bin` | raw f32 | 776 f32 = 3.1 KB; optional — identity FiLM used if absent |
-
-Both produced by `export_cnn_v3_weights.py` (§3).
-
### Texture chain
| Texture | Format | Size |
@@ -801,7 +817,7 @@ all geometric channels (normal, depth, depth_grad, mat_id, prev) = 0.
### Pitfalls
- `rgba32uint` and `rgba16float` textures both need `STORAGE_BINDING | TEXTURE_BINDING` usage.
-- Weight offsets are **f16 indices** (enc0=0, enc1=724, bn=1020, dec1=1092, dec0=1672).
+- Weight offsets are **f16 indices** (enc0=0, enc1=724, bn=1020, dec1=1604, dec0=2184).
- Uniform buffer layouts must match WGSL `Params` structs exactly (padding included).
---
@@ -816,7 +832,7 @@ all geometric channels (normal, depth, depth_grad, mat_id, prev) = 0.
| `cnn_v3/training/pack_photo_sample.py` | Photo → zeroed-geometry sample directory |
| `cnn_v3/training/cnn_v3_utils.py` | Dataset class, feature assembly, channel dropout, salient-point detection |
| `cnn_v3/training/train_cnn_v3.py` | CNNv3 model definition, training loop, CLI |
-| `cnn_v3/training/export_cnn_v3_weights.py` | Checkpoint → `cnn_v3_weights.bin` + `cnn_v3_film_mlp.bin` |
+| `cnn_v3/training/export_cnn_v3_weights.py` | Checkpoint → `cnn_v3_weights.bin` + `cnn_v3_film_mlp.bin`; `--html` rewrites `weights.js` |
| `cnn_v3/training/gen_test_vectors.py` | NumPy reference forward pass + C header generator |
| `cnn_v3/test_vectors.h` | Compiled-in test vectors (auto-generated, do not edit) |
| `cnn_v3/src/cnn_v3_effect.h` | C++ class, Params structs, `CNNv3FiLMParams` API |
@@ -827,6 +843,7 @@ all geometric channels (normal, depth, depth_grad, mat_id, prev) = 0.
| `cnn_v3/tools/index.html` | HTML tool — UI shell + CSS |
| `cnn_v3/tools/shaders.js` | HTML tool — inline WGSL shaders + weight-offset constants |
| `cnn_v3/tools/tester.js` | HTML tool — CNNv3Tester class, inference pipeline, layer viz |
+| `cnn_v3/tools/weights.js` | HTML tool — embedded default weights (base64, auto-generated) |
| `cnn_v2/tools/cnn_v2_test/index.html` | HTML tool reference pattern (v2) |
---