summaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorskal <pascal.massimino@gmail.com>2026-02-11 09:27:06 +0100
committerskal <pascal.massimino@gmail.com>2026-02-11 09:27:06 +0100
commit5cc6da3831d5bce35af353c14f15c30dbc66b081 (patch)
tree800526fe4c3a937121d7f1e87b85ff89cc3c7fcb /doc
parent66a489f64209925ec9615c9f6c4907e4e3caf9e2 (diff)
fix: CNN training/inference to match WGSL sliding window
Training now computes loss only on center pixels (excludes conv padding borders). Inference changed from tiling to full-image sliding window. Both match cnn_layer.wgsl: each pixel processed from NxN neighborhood. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Diffstat (limited to 'doc')
-rw-r--r--doc/HOWTO.md15
1 files changed, 11 insertions, 4 deletions
diff --git a/doc/HOWTO.md b/doc/HOWTO.md
index db324ec..7b0daa0 100644
--- a/doc/HOWTO.md
+++ b/doc/HOWTO.md
@@ -89,7 +89,7 @@ make run_util_tests # Utility tests
## Training
### Patch-Based (Recommended)
-Extracts patches at salient points, preserves natural pixel scale:
+Extracts patches at salient points, trains on center pixels only (matches WGSL sliding window):
```bash
# Train with 32×32 patches at detected corners/edges
./training/train_cnn.py \
@@ -99,10 +99,15 @@ Extracts patches at salient points, preserves natural pixel scale:
--checkpoint-every 1000
```
+**Training behavior:**
+- Loss computed only on center pixels (excludes conv padding borders)
+- For 3-layer network: excludes 3px border on each side
+- Matches GPU shader sliding-window paradigm
+
**Detectors:** `harris` (default), `fast`, `shi-tomasi`, `gradient`
-### Full-Image (Legacy)
-Resizes to 256×256 (distorts scale):
+### Full-Image
+Processes entire image with sliding window (matches WGSL):
```bash
./training/train_cnn.py \
--input training/input/ --target training/output/ \
@@ -115,12 +120,14 @@ Resizes to 256×256 (distorts scale):
# Generate shaders from checkpoint
./training/train_cnn.py --export-only checkpoints/checkpoint_epoch_5000.pth
-# Generate ground truth for comparison
+# Generate ground truth (sliding window, no tiling)
./training/train_cnn.py --infer input.png \
--export-only checkpoints/checkpoint_epoch_5000.pth \
--output ground_truth.png
```
+**Inference:** Processes full image with sliding window (each pixel from NxN neighborhood). No tiling artifacts.
+
**Kernel sizes:** 3×3 (36 weights), 5×5 (100 weights), 7×7 (196 weights)
---