summaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
Diffstat (limited to 'doc')
-rw-r--r--doc/CNN_EFFECT.md75
-rw-r--r--doc/CNN_RGBD_GRAYSCALE_SUMMARY.md134
-rw-r--r--doc/HOWTO.md8
3 files changed, 189 insertions, 28 deletions
diff --git a/doc/CNN_EFFECT.md b/doc/CNN_EFFECT.md
index ae0f38a..b7d157f 100644
--- a/doc/CNN_EFFECT.md
+++ b/doc/CNN_EFFECT.md
@@ -21,27 +21,44 @@ Trainable convolutional neural network layers for artistic stylization (painterl
## Architecture
-### Coordinate-Aware Layer 0
+### RGBD → Grayscale Pipeline
-Layer 0 accepts normalized (x,y) patch center coordinates alongside RGBA samples:
+**Input:** RGBD (RGB + inverse depth D=1/z)
+**Output:** Grayscale (1 channel)
+**Layer Input:** 7 channels = [RGBD, UV coords, grayscale] all normalized to [-1,1]
+
+**Architecture:**
+- **Inner layers (0..N-2):** Conv2d(7→4) - output RGBD
+- **Final layer (N-1):** Conv2d(7→1) - output grayscale
```wgsl
-fn cnn_conv3x3_with_coord(
+// Inner layers: 7→4 (RGBD output)
+fn cnn_conv3x3_7to4(
tex: texture_2d<f32>,
samp: sampler,
- uv: vec2<f32>, # Center position [0,1]
+ uv: vec2<f32>,
resolution: vec2<f32>,
- rgba_weights: array<mat4x4<f32>, 9>, # 9 samples × 4×4 matrix
- coord_weights: mat2x4<f32>, # 2 coords → 4 outputs
- bias: vec4<f32>
+ original: vec4<f32>, # Original RGBD [0,1]
+ weights: array<array<f32, 8>, 36> # 9 pos × 4 out × (7 weights + bias)
) -> vec4<f32>
-```
-**Input structure:** 9 RGBA samples (36 values) + 1 xy coordinate (2 values) = 38 inputs → 4 outputs
+// Final layer: 7→1 (grayscale output)
+fn cnn_conv3x3_7to1(
+ tex: texture_2d<f32>,
+ samp: sampler,
+ uv: vec2<f32>,
+ resolution: vec2<f32>,
+ original: vec4<f32>,
+ weights: array<array<f32, 8>, 9> # 9 pos × (7 weights + bias)
+) -> f32
+```
-**Size impact:** +32B coord weights, kernel-agnostic
+**Input normalization (all to [-1,1]):**
+- RGBD: `(rgbd - 0.5) * 2`
+- UV coords: `(uv - 0.5) * 2`
+- Grayscale: `(0.2126*R + 0.7152*G + 0.0722*B - 0.5) * 2`
-**Use cases:** Position-dependent stylization (vignettes, corner darkening, radial gradients)
+**Activation:** tanh for inner layers, none for final layer
### Multi-Layer Architecture
@@ -80,18 +97,15 @@ workspaces/main/shaders/cnn/
### 1. Prepare Training Data
Collect input/target image pairs:
-- **Input:** Raw 3D render
-- **Target:** Artistic style (hand-painted, filtered, stylized)
+- **Input:** RGBA (RGB + depth as alpha channel, D=1/z)
+- **Target:** Grayscale stylized output
```bash
-training/input/img_000.png # Raw render
-training/output/img_000.png # Stylized target
+training/input/img_000.png # RGBA render (RGB + depth)
+training/output/img_000.png # Grayscale target
```
-Use `image_style_processor.py` to generate targets:
-```bash
-python3 training/image_style_processor.py input/ output/ pencil_sketch
-```
+**Note:** Input images must be RGBA where alpha = inverse depth (1/z)
### 2. Train Network
@@ -245,20 +259,25 @@ Expands to:
**Weight Storage:**
-**Layer 0 (coordinate-aware):**
+**Inner layers (7→4 RGBD output):**
```wgsl
-const rgba_weights_layer0: array<mat4x4<f32>, 9> = array(...);
-const coord_weights_layer0 = mat2x4<f32>(
- 0.1, -0.2, 0.0, 0.0, # x-coord weights
- -0.1, 0.0, 0.2, 0.0 # y-coord weights
+// Structure: array<array<f32, 8>, 36>
+// 9 positions × 4 output channels, each with 7 weights + bias
+const weights_layer0: array<array<f32, 8>, 36> = array(
+ array<f32, 8>(w0_r, w0_g, w0_b, w0_d, w0_u, w0_v, w0_gray, bias0), // pos0_ch0
+ array<f32, 8>(w1_r, w1_g, w1_b, w1_d, w1_u, w1_v, w1_gray, bias1), // pos0_ch1
+ // ... 34 more entries
);
-const bias_layer0 = vec4<f32>(0.0, 0.0, 0.0, 0.0);
```
-**Layers 1+ (standard):**
+**Final layer (7→1 grayscale output):**
```wgsl
-const weights_layer1: array<mat4x4<f32>, 9> = array(...);
-const bias_layer1 = vec4<f32>(0.0, 0.0, 0.0, 0.0);
+// Structure: array<array<f32, 8>, 9>
+// 9 positions, each with 7 weights + bias
+const weights_layerN: array<array<f32, 8>, 9> = array(
+ array<f32, 8>(w0_r, w0_g, w0_b, w0_d, w0_u, w0_v, w0_gray, bias0), // pos0
+ // ... 8 more entries
+);
```
---
diff --git a/doc/CNN_RGBD_GRAYSCALE_SUMMARY.md b/doc/CNN_RGBD_GRAYSCALE_SUMMARY.md
new file mode 100644
index 0000000..4c13693
--- /dev/null
+++ b/doc/CNN_RGBD_GRAYSCALE_SUMMARY.md
@@ -0,0 +1,134 @@
+# CNN RGBD→Grayscale Architecture Implementation
+
+## Summary
+
+Implemented CNN architecture upgrade: RGBD input → grayscale output with 7-channel augmented input.
+
+## Changes Made
+
+### Architecture
+
+**Input:** RGBD (4 channels: RGB + inverse depth D=1/z)
+**Output:** Grayscale (1 channel)
+**Layer Input:** 7 channels = [RGBD, UV coords, grayscale] all normalized to [-1,1]
+
+**Layer Configuration:**
+- Inner layers (0..N-2): Conv2d(7→4) - output RGBD with tanh activation
+- Final layer (N-1): Conv2d(7→1) - output grayscale, no activation
+
+### Input Normalization (all to [-1,1])
+
+- **RGBD:** `(rgbd - 0.5) * 2`
+- **UV coords:** `(uv - 0.5) * 2`
+- **Grayscale:** `(0.2126*R + 0.7152*G + 0.0722*B - 0.5) * 2`
+
+**Rationale:** Zero-centered inputs for tanh activation, better gradient flow.
+
+### Modified Files
+
+**Training (`/Users/skal/demo/training/train_cnn.py`):**
+1. Removed `CoordConv2d` class
+2. Updated `SimpleCNN`:
+ - Inner layers: `Conv2d(7, 4)` - RGBD output
+ - Final layer: `Conv2d(7, 1)` - grayscale output
+3. Updated `forward()`:
+ - Normalize RGBD/coords/gray to [-1,1]
+ - Concatenate 7-channel input for each layer
+ - Apply tanh (inner) or none (final)
+ - Denormalize final output
+4. Updated `export_weights_to_wgsl()`:
+ - Inner: `array<array<f32, 8>, 36>` (9 pos × 4 ch × 8 values)
+ - Final: `array<array<f32, 8>, 9>` (9 pos × 8 values)
+5. Updated `generate_layer_shader()`:
+ - Use `cnn_conv3x3_7to4` for inner layers
+ - Use `cnn_conv3x3_7to1` for final layer
+ - Denormalize outputs from [-1,1] to [0,1]
+6. Updated `ImagePairDataset`:
+ - Load RGBA input (was RGB)
+
+**Shaders (`/Users/skal/demo/workspaces/main/shaders/cnn/cnn_conv3x3.wgsl`):**
+1. Added `cnn_conv3x3_7to4()`:
+ - 7-channel input: [RGBD, uv_x, uv_y, gray]
+ - 4-channel output: RGBD
+ - Weights: `array<array<f32, 8>, 36>`
+2. Added `cnn_conv3x3_7to1()`:
+ - 7-channel input: [RGBD, uv_x, uv_y, gray]
+ - 1-channel output: grayscale
+ - Weights: `array<array<f32, 8>, 9>`
+
+**Documentation (`/Users/skal/demo/doc/CNN_EFFECT.md`):**
+1. Updated architecture section with RGBD→grayscale pipeline
+2. Updated training data requirements (RGBA input)
+3. Updated weight storage format
+
+### No C++ Changes
+
+CNNLayerParams and bind groups remain unchanged.
+
+## Data Flow
+
+1. Layer 0 captures original RGBD to `captured_frame`
+2. Each layer:
+ - Samples previous layer output (RGBD in [0,1])
+ - Normalizes RGBD to [-1,1]
+ - Computes UV coords and grayscale, normalizes to [-1,1]
+ - Concatenates 7-channel input
+ - Applies convolution with layer-specific weights
+ - Outputs RGBD (inner) or grayscale (final) in [-1,1]
+ - Applies tanh (inner only)
+ - Denormalizes to [0,1] for texture storage
+ - Blends with original
+
+## Next Steps
+
+1. **Prepare RGBD training data:**
+ - Input: RGBA images (RGB + depth in alpha)
+ - Target: Grayscale stylized output
+
+2. **Train network:**
+ ```bash
+ python3 training/train_cnn.py \
+ --input training/input \
+ --target training/output \
+ --layers 3 \
+ --epochs 1000
+ ```
+
+3. **Verify generated shaders:**
+ - Check `cnn_weights_generated.wgsl` structure
+ - Check `cnn_layer.wgsl` uses new conv functions
+
+4. **Test in demo:**
+ ```bash
+ cmake --build build -j4
+ ./build/demo64k
+ ```
+
+## Design Rationale
+
+**Why [-1,1] normalization?**
+- Centered inputs for tanh (operates best around 0)
+- Better gradient flow
+- Standard ML practice for normalized data
+
+**Why RGBD throughout vs RGB?**
+- Depth information propagates through network
+- Enables depth-aware stylization
+- Consistent 4-channel processing
+
+**Why 7-channel input?**
+- Coordinates: position-dependent effects (vignettes)
+- Grayscale: luminance-aware processing
+- RGBD: full color+depth information
+- Enables richer feature learning
+
+## Testing Checklist
+
+- [ ] Train network with RGBD input data
+- [ ] Verify `cnn_weights_generated.wgsl` structure
+- [ ] Verify `cnn_layer.wgsl` uses `7to4`/`7to1` functions
+- [ ] Build demo without errors
+- [ ] Visual test: inner layers show RGBD evolution
+- [ ] Visual test: final layer produces grayscale
+- [ ] Visual test: blending works correctly
+- [ ] Compare quality with previous RGB→RGB architecture
diff --git a/doc/HOWTO.md b/doc/HOWTO.md
index bdc0214..2c813f7 100644
--- a/doc/HOWTO.md
+++ b/doc/HOWTO.md
@@ -86,6 +86,14 @@ make run_util_tests # Utility tests
---
+## Training
+
+```bash
+./training/train_cnn.py --layers 3 --kernel_sizes 3,5,3 --epochs 10000 --batch_size 8 --input training/input/ --target training/output/ --checkpoint-every 1000
+```
+
+---
+
## Timeline
Edit `workspaces/main/timeline.seq`: