From 2adcf1bac1ec651861930eb2af00641eb23f6ef1 Mon Sep 17 00:00:00 2001 From: skal Date: Tue, 10 Feb 2026 22:54:38 +0100 Subject: docs: Update CNN training documentation with patch extraction Streamlined and updated all training docs with new patch-based approach. Changes: - HOWTO.md: Updated training section with patch/full-image examples - CNN_EFFECT.md: Streamlined training workflow, added detector info - training/README.md: Complete rewrite with detector comparison table New sections: - Detector comparison (harris, fast, shi-tomasi, gradient) - Practical examples for different use cases - Tips for patch size and batch size selection - Benefits of patch-based training Co-Authored-By: Claude Sonnet 4.5 --- doc/CNN_EFFECT.md | 71 ++++++++------------ doc/HOWTO.md | 33 ++++++--- training/README.md | 193 ++++++++++++++++++++++++++--------------------------- 3 files changed, 146 insertions(+), 151 deletions(-) diff --git a/doc/CNN_EFFECT.md b/doc/CNN_EFFECT.md index 22cf985..06065b1 100644 --- a/doc/CNN_EFFECT.md +++ b/doc/CNN_EFFECT.md @@ -98,73 +98,54 @@ workspaces/main/shaders/cnn/ ### 1. Prepare Training Data -Collect input/target image pairs: -- **Input:** RGBA (RGB + depth as alpha channel, D=1/z) -- **Target:** Grayscale stylized output - -```bash -training/input/img_000.png # RGBA render (RGB + depth) +Input/target image pairs: +``` +training/input/img_000.png # RGBA (RGB + alpha) training/output/img_000.png # Grayscale target ``` -**Note:** Input images must be RGBA where alpha = inverse depth (1/z) +**Note:** Alpha channel can be depth (1/z) or constant (255). Network learns from RGB primarily. ### 2. Train Network +**Patch-based (Recommended)** - Preserves natural pixel scale: ```bash python3 training/train_cnn.py \ - --input training/input \ - --target training/output \ - --layers 1 \ - --kernel-sizes 3 \ - --epochs 500 \ - --checkpoint-every 50 + --input training/input --target training/output \ + --patch-size 32 --patches-per-image 64 --detector harris \ + --layers 3 --kernel-sizes 3,5,3 \ + --epochs 5000 --batch-size 16 --checkpoint-every 1000 ``` -**Multi-layer example (3 layers with varying kernel sizes):** +**Detectors:** `harris` (corners), `fast` (features), `shi-tomasi` (corners), `gradient` (edges) + +**Full-image (Legacy)** - Resizes to 256×256: ```bash python3 training/train_cnn.py \ - --input training/input \ - --target training/output \ - --layers 3 \ - --kernel-sizes 3,5,3 \ - --epochs 1000 \ - --checkpoint-every 100 + --input training/input --target training/output \ + --layers 3 --kernel-sizes 3,5,3 \ + --epochs 10000 --batch-size 8 --checkpoint-every 1000 ``` -**Note:** Training script auto-generates: -- `cnn_weights_generated.wgsl` - weight arrays for all layers -- `cnn_layer.wgsl` - shader with layer switches and original input binding +**Auto-generates:** +- `cnn_weights_generated.wgsl` - Weight arrays +- `cnn_layer.wgsl` - Layer shader -**Resume from checkpoint:** -```bash -python3 training/train_cnn.py \ - --input training/input \ - --target training/output \ - --resume training/checkpoints/checkpoint_epoch_200.pth -``` +### 3. Export & Validate -**Export WGSL from checkpoint (no training):** ```bash -python3 training/train_cnn.py \ - --export-only training/checkpoints/checkpoint_epoch_200.pth \ - --output workspaces/main/shaders/cnn/cnn_weights_generated.wgsl -``` +# Export shaders +./training/train_cnn.py --export-only checkpoints/checkpoint_epoch_5000.pth -**Generate ground truth (for shader validation):** -```bash -python3 training/train_cnn.py \ - --infer training/input/img_000.png \ - --export-only training/checkpoints/checkpoint_epoch_200.pth \ - --output training/ground_truth.png +# Generate ground truth +./training/train_cnn.py --infer input.png \ + --export-only checkpoints/checkpoint_epoch_5000.pth --output ground_truth.png ``` -### 3. Rebuild Demo +### 4. Rebuild Demo -Training script auto-generates both `cnn_weights_generated.wgsl` and `cnn_layer.wgsl`: ```bash -cmake --build build -j4 -./build/demo64k +cmake --build build -j4 && ./build/demo64k ``` --- diff --git a/doc/HOWTO.md b/doc/HOWTO.md index 5ea6afd..ba550bb 100644 --- a/doc/HOWTO.md +++ b/doc/HOWTO.md @@ -88,23 +88,40 @@ make run_util_tests # Utility tests ## Training +### Patch-Based (Recommended) +Extracts patches at salient points, preserves natural pixel scale: ```bash -./training/train_cnn.py --layers 3 --kernel_sizes 3,5,3 --epochs 10000 --batch_size 8 --input training/input/ --target training/output/ --checkpoint-every 1000 +# Train with 32×32 patches at detected corners/edges +./training/train_cnn.py \ + --input training/input/ --target training/output/ \ + --patch-size 32 --patches-per-image 64 --detector harris \ + --layers 3 --kernel_sizes 3,5,3 --epochs 5000 --batch_size 16 \ + --checkpoint-every 1000 ``` -Generate shaders from checkpoint: +**Detectors:** `harris` (default), `fast`, `shi-tomasi`, `gradient` + +### Full-Image (Legacy) +Resizes to 256×256 (distorts scale): ```bash -./training/train_cnn.py --export-only training/checkpoints/checkpoint_epoch_7000.pth +./training/train_cnn.py \ + --input training/input/ --target training/output/ \ + --layers 3 --kernel_sizes 3,5,3 --epochs 10000 --batch_size 8 \ + --checkpoint-every 1000 ``` -Generate ground truth (for shader validation): +### Export & Validation ```bash -./training/train_cnn.py --infer input.png --export-only checkpoints/checkpoint_epoch_7000.pth --output ground_truth.png +# Generate shaders from checkpoint +./training/train_cnn.py --export-only checkpoints/checkpoint_epoch_5000.pth + +# Generate ground truth for comparison +./training/train_cnn.py --infer input.png \ + --export-only checkpoints/checkpoint_epoch_5000.pth \ + --output ground_truth.png ``` -**Note:** Kernel sizes must match shader functions: -- 3×3 kernel → `cnn_conv3x3_7to4` (36 weights: 9 pos × 4 channels) -- 5×5 kernel → `cnn_conv5x5_7to4` (100 weights: 25 pos × 4 channels) +**Kernel sizes:** 3×3 (36 weights), 5×5 (100 weights), 7×7 (196 weights) --- diff --git a/training/README.md b/training/README.md index 0a46718..e78b471 100644 --- a/training/README.md +++ b/training/README.md @@ -1,117 +1,109 @@ # CNN Training Tools -Tools for training and preparing data for the CNN post-processing effect. +PyTorch-based training for image-to-image stylization with patch extraction. --- -## train_cnn.py - -PyTorch-based training script for image-to-image stylization. - -### Basic Usage +## Quick Start ```bash -python3 train_cnn.py --input --target [options] +# Patch-based (recommended) +python3 train_cnn.py \ + --input training/input --target training/output \ + --patch-size 32 --patches-per-image 64 --detector harris \ + --layers 3 --kernel-sizes 3,5,3 --epochs 5000 --batch-size 16 + +# Full-image (legacy) +python3 train_cnn.py \ + --input training/input --target training/output \ + --layers 3 --kernel-sizes 3,5,3 --epochs 10000 --batch-size 8 ``` +--- + +## Patch-Based Training (Recommended) + +Extracts patches at salient points, preserves natural pixel scale. + +### Detectors + +| Detector | Best For | Speed | +|----------|----------|-------| +| `harris` (default) | Corners, structured scenes | Medium | +| `fast` | Dense features, textures | Fast | +| `shi-tomasi` | High-quality corners | Medium | +| `gradient` | Edges, high-contrast areas | Fast | + ### Examples -**Single layer, 3×3 kernel:** +**Single layer, Harris corners:** ```bash python3 train_cnn.py --input training/input --target training/output \ - --layers 1 --kernel-sizes 3 --epochs 500 + --patch-size 32 --patches-per-image 64 --detector harris \ + --layers 1 --kernel-sizes 3 --epochs 2000 ``` -**Multi-layer, mixed kernels:** +**Multi-layer, FAST features:** ```bash python3 train_cnn.py --input training/input --target training/output \ - --layers 3 --kernel-sizes 3,5,3 --epochs 1000 + --patch-size 32 --patches-per-image 128 --detector fast \ + --layers 3 --kernel-sizes 3,5,3 --epochs 5000 --batch-size 16 ``` -**With checkpointing:** +**Edge-focused (gradient detector):** ```bash python3 train_cnn.py --input training/input --target training/output \ - --epochs 500 --checkpoint-every 50 + --patch-size 16 --patches-per-image 96 --detector gradient \ + --layers 2 --kernel-sizes 3,3 --epochs 3000 ``` -**Resume from checkpoint:** -```bash -python3 train_cnn.py --input training/input --target training/output \ - --resume training/checkpoints/checkpoint_epoch_200.pth -``` +### Benefits -### Options +- **Preserves scale:** No resize distortion +- **More samples:** 64 patches × 10 images = 640 samples vs 10 +- **Focused learning:** Trains on interesting features, not flat areas +- **Better generalization:** Network sees diverse local patterns + +--- + +## Options | Option | Default | Description | |--------|---------|-------------| | `--input` | *required* | Input image directory | | `--target` | *required* | Target image directory | +| `--patch-size` | None | Patch size (e.g., 32). Omit for full-image mode | +| `--patches-per-image` | 64 | Patches to extract per image | +| `--detector` | harris | harris\|fast\|shi-tomasi\|gradient | | `--layers` | 1 | Number of CNN layers | -| `--kernel-sizes` | 3 | Comma-separated kernel sizes (auto-repeats if single value) | +| `--kernel-sizes` | 3 | Comma-separated (e.g., 3,5,3) | | `--epochs` | 100 | Training epochs | -| `--batch-size` | 4 | Batch size | +| `--batch-size` | 4 | Batch size (use 16 for patches, 8 for full-image) | | `--learning-rate` | 0.001 | Learning rate | -| `--output` | `workspaces/main/shaders/cnn/cnn_weights_generated.wgsl` | Output WGSL file | -| `--checkpoint-every` | 0 | Save checkpoint every N epochs (0=disabled) | -| `--checkpoint-dir` | `training/checkpoints` | Checkpoint directory | -| `--resume` | None | Resume from checkpoint file | - -### Architecture - -- **Layer 0:** `CoordConv2d` - accepts (x,y) patch center + 3×3 RGBA samples -- **Layers 1+:** Standard `Conv2d` - 3×3 RGBA samples only -- **Activation:** Tanh between layers -- **Output:** Residual connection (30% stylization blend) - -### Requirements - -```bash -pip install torch torchvision pillow -``` +| `--checkpoint-every` | 0 | Save every N epochs (0=off) | +| `--resume` | None | Resume from checkpoint | +| `--export-only` | None | Export WGSL without training | +| `--infer` | None | Generate ground truth PNG | --- -## image_style_processor.py - -Generates stylized target images from raw renders. - -### Usage +## Export & Validation +**Export shaders from checkpoint:** ```bash -python3 image_style_processor.py