summaryrefslogtreecommitdiff
path: root/training/README.md
blob: e78b4716b33585492ad16a6184160679348f8dc0 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
# CNN Training Tools

PyTorch-based training for image-to-image stylization with patch extraction.

---

## Quick Start

```bash
# Patch-based (recommended)
python3 train_cnn.py \
  --input training/input --target training/output \
  --patch-size 32 --patches-per-image 64 --detector harris \
  --layers 3 --kernel-sizes 3,5,3 --epochs 5000 --batch-size 16

# Full-image (legacy)
python3 train_cnn.py \
  --input training/input --target training/output \
  --layers 3 --kernel-sizes 3,5,3 --epochs 10000 --batch-size 8
```

---

## Patch-Based Training (Recommended)

Extracts patches at salient points, preserves natural pixel scale.

### Detectors

| Detector | Best For | Speed |
|----------|----------|-------|
| `harris` (default) | Corners, structured scenes | Medium |
| `fast` | Dense features, textures | Fast |
| `shi-tomasi` | High-quality corners | Medium |
| `gradient` | Edges, high-contrast areas | Fast |

### Examples

**Single layer, Harris corners:**
```bash
python3 train_cnn.py --input training/input --target training/output \
  --patch-size 32 --patches-per-image 64 --detector harris \
  --layers 1 --kernel-sizes 3 --epochs 2000
```

**Multi-layer, FAST features:**
```bash
python3 train_cnn.py --input training/input --target training/output \
  --patch-size 32 --patches-per-image 128 --detector fast \
  --layers 3 --kernel-sizes 3,5,3 --epochs 5000 --batch-size 16
```

**Edge-focused (gradient detector):**
```bash
python3 train_cnn.py --input training/input --target training/output \
  --patch-size 16 --patches-per-image 96 --detector gradient \
  --layers 2 --kernel-sizes 3,3 --epochs 3000
```

### Benefits

- **Preserves scale:** No resize distortion
- **More samples:** 64 patches × 10 images = 640 samples vs 10
- **Focused learning:** Trains on interesting features, not flat areas
- **Better generalization:** Network sees diverse local patterns

---

## Options

| Option | Default | Description |
|--------|---------|-------------|
| `--input` | *required* | Input image directory |
| `--target` | *required* | Target image directory |
| `--patch-size` | None | Patch size (e.g., 32). Omit for full-image mode |
| `--patches-per-image` | 64 | Patches to extract per image |
| `--detector` | harris | harris\|fast\|shi-tomasi\|gradient |
| `--layers` | 1 | Number of CNN layers |
| `--kernel-sizes` | 3 | Comma-separated (e.g., 3,5,3) |
| `--epochs` | 100 | Training epochs |
| `--batch-size` | 4 | Batch size (use 16 for patches, 8 for full-image) |
| `--learning-rate` | 0.001 | Learning rate |
| `--checkpoint-every` | 0 | Save every N epochs (0=off) |
| `--resume` | None | Resume from checkpoint |
| `--export-only` | None | Export WGSL without training |
| `--infer` | None | Generate ground truth PNG |

---

## Export & Validation

**Export shaders from checkpoint:**
```bash
python3 train_cnn.py --export-only checkpoints/checkpoint_epoch_5000.pth
```

**Generate ground truth for comparison:**
```bash
python3 train_cnn.py --infer input.png \
  --export-only checkpoints/checkpoint_epoch_5000.pth \
  --output ground_truth.png
```

**Auto-generates:**
- `cnn_weights_generated.wgsl` - Weight arrays
- `cnn_layer.wgsl` - Layer shader with correct architecture

---

## Workflow

### 1. Render Raw Frames

```bash
./build/demo64k --headless --duration 5 --output training/input/
```

### 2. Generate Stylized Targets

```bash
python3 training/image_style_processor.py training/input/ training/output/ pencil_sketch
```

**Available styles:** pencil_sketch, ink_drawing, charcoal_pastel, glitch_art, circuit_board, wireframe_topo

### 3. Train CNN

```bash
python3 train_cnn.py \
  --input training/input --target training/output \
  --patch-size 32 --patches-per-image 64 \
  --layers 3 --kernel-sizes 3,5,3 --epochs 5000 --checkpoint-every 1000
```

### 4. Rebuild Demo

```bash
cmake --build build -j4 && ./build/demo64k
```

---

## Architecture

**Input:** 7 channels = [RGBD, UV coords, grayscale] normalized to [-1,1]
**Output:** Grayscale [0,1]

**Layers:**
- **Inner (0..N-2):** Conv2d(7→4) + tanh → RGBD output [-1,1]
- **Final (N-1):** Conv2d(7→1) + clamp(0,1) → Grayscale output

**Coordinate awareness:** Layer 0 receives UV coords for position-dependent effects (vignetting, radial gradients).

---

## Tips

- **Training data:** 10-50 image pairs recommended
- **Patch size:** 32×32 good balance (16×16 for detail, 64×64 for context)
- **Patches per image:** 64-128 for good coverage
- **Batch size:** Higher for patches (16) vs full-image (8)
- **Checkpoints:** Save every 500-1000 epochs
- **Loss plateaus:** Lower learning rate (0.0001) or add layers

---

## Requirements

```bash
pip install torch torchvision pillow opencv-python numpy
```

---

## References

- **CNN Effect:** `doc/CNN_EFFECT.md`
- **Timeline:** `doc/SEQUENCE.md`
- **HOWTO:** `doc/HOWTO.md`