summaryrefslogtreecommitdiff
path: root/doc/CNN_RGBD_GRAYSCALE_SUMMARY.md
diff options
context:
space:
mode:
authorskal <pascal.massimino@gmail.com>2026-02-10 16:44:39 +0100
committerskal <pascal.massimino@gmail.com>2026-02-10 16:44:39 +0100
commit61104d5b9e1774c11f0dba3b6d6018dabc2bce8f (patch)
tree882e642721984cc921cbe5678fe7905721a2ad40 /doc/CNN_RGBD_GRAYSCALE_SUMMARY.md
parent3942653de11542acc4892470243a8a6bf8d5c4f7 (diff)
feat: CNN RGBD→grayscale with 7-channel augmented input
Upgrade CNN architecture to process RGBD input, output grayscale, with 7-channel layer inputs (RGBD + UV coords + grayscale). Architecture changes: - Inner layers: Conv2d(7→4) output RGBD - Final layer: Conv2d(7→1) output grayscale - All inputs normalized to [-1,1] for tanh activation - Removed CoordConv2d in favor of unified 7-channel input Training (train_cnn.py): - SimpleCNN: 7→4 (inner), 7→1 (final) architecture - Forward: Normalize RGBD/coords/gray to [-1,1] - Weight export: array<array<f32, 8>, 36> (inner), array<f32, 8>, 9> (final) - Dataset: Load RGBA (RGBD) input Shaders (cnn_conv3x3.wgsl): - Added cnn_conv3x3_7to4: 7-channel input → RGBD output - Added cnn_conv3x3_7to1: 7-channel input → grayscale output - Both normalize inputs and use flattened weight arrays Documentation: - CNN_EFFECT.md: Updated architecture, training, weight format - CNN_RGBD_GRAYSCALE_SUMMARY.md: Implementation summary - HOWTO.md: Added training command example Next: Train with RGBD input data Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Diffstat (limited to 'doc/CNN_RGBD_GRAYSCALE_SUMMARY.md')
-rw-r--r--doc/CNN_RGBD_GRAYSCALE_SUMMARY.md134
1 files changed, 134 insertions, 0 deletions
diff --git a/doc/CNN_RGBD_GRAYSCALE_SUMMARY.md b/doc/CNN_RGBD_GRAYSCALE_SUMMARY.md
new file mode 100644
index 0000000..4c13693
--- /dev/null
+++ b/doc/CNN_RGBD_GRAYSCALE_SUMMARY.md
@@ -0,0 +1,134 @@
+# CNN RGBD→Grayscale Architecture Implementation
+
+## Summary
+
+Implemented CNN architecture upgrade: RGBD input → grayscale output with 7-channel augmented input.
+
+## Changes Made
+
+### Architecture
+
+**Input:** RGBD (4 channels: RGB + inverse depth D=1/z)
+**Output:** Grayscale (1 channel)
+**Layer Input:** 7 channels = [RGBD, UV coords, grayscale] all normalized to [-1,1]
+
+**Layer Configuration:**
+- Inner layers (0..N-2): Conv2d(7→4) - output RGBD with tanh activation
+- Final layer (N-1): Conv2d(7→1) - output grayscale, no activation
+
+### Input Normalization (all to [-1,1])
+
+- **RGBD:** `(rgbd - 0.5) * 2`
+- **UV coords:** `(uv - 0.5) * 2`
+- **Grayscale:** `(0.2126*R + 0.7152*G + 0.0722*B - 0.5) * 2`
+
+**Rationale:** Zero-centered inputs for tanh activation, better gradient flow.
+
+### Modified Files
+
+**Training (`/Users/skal/demo/training/train_cnn.py`):**
+1. Removed `CoordConv2d` class
+2. Updated `SimpleCNN`:
+ - Inner layers: `Conv2d(7, 4)` - RGBD output
+ - Final layer: `Conv2d(7, 1)` - grayscale output
+3. Updated `forward()`:
+ - Normalize RGBD/coords/gray to [-1,1]
+ - Concatenate 7-channel input for each layer
+ - Apply tanh (inner) or none (final)
+ - Denormalize final output
+4. Updated `export_weights_to_wgsl()`:
+ - Inner: `array<array<f32, 8>, 36>` (9 pos × 4 ch × 8 values)
+ - Final: `array<array<f32, 8>, 9>` (9 pos × 8 values)
+5. Updated `generate_layer_shader()`:
+ - Use `cnn_conv3x3_7to4` for inner layers
+ - Use `cnn_conv3x3_7to1` for final layer
+ - Denormalize outputs from [-1,1] to [0,1]
+6. Updated `ImagePairDataset`:
+ - Load RGBA input (was RGB)
+
+**Shaders (`/Users/skal/demo/workspaces/main/shaders/cnn/cnn_conv3x3.wgsl`):**
+1. Added `cnn_conv3x3_7to4()`:
+ - 7-channel input: [RGBD, uv_x, uv_y, gray]
+ - 4-channel output: RGBD
+ - Weights: `array<array<f32, 8>, 36>`
+2. Added `cnn_conv3x3_7to1()`:
+ - 7-channel input: [RGBD, uv_x, uv_y, gray]
+ - 1-channel output: grayscale
+ - Weights: `array<array<f32, 8>, 9>`
+
+**Documentation (`/Users/skal/demo/doc/CNN_EFFECT.md`):**
+1. Updated architecture section with RGBD→grayscale pipeline
+2. Updated training data requirements (RGBA input)
+3. Updated weight storage format
+
+### No C++ Changes
+
+CNNLayerParams and bind groups remain unchanged.
+
+## Data Flow
+
+1. Layer 0 captures original RGBD to `captured_frame`
+2. Each layer:
+ - Samples previous layer output (RGBD in [0,1])
+ - Normalizes RGBD to [-1,1]
+ - Computes UV coords and grayscale, normalizes to [-1,1]
+ - Concatenates 7-channel input
+ - Applies convolution with layer-specific weights
+ - Outputs RGBD (inner) or grayscale (final) in [-1,1]
+ - Applies tanh (inner only)
+ - Denormalizes to [0,1] for texture storage
+ - Blends with original
+
+## Next Steps
+
+1. **Prepare RGBD training data:**
+ - Input: RGBA images (RGB + depth in alpha)
+ - Target: Grayscale stylized output
+
+2. **Train network:**
+ ```bash
+ python3 training/train_cnn.py \
+ --input training/input \
+ --target training/output \
+ --layers 3 \
+ --epochs 1000
+ ```
+
+3. **Verify generated shaders:**
+ - Check `cnn_weights_generated.wgsl` structure
+ - Check `cnn_layer.wgsl` uses new conv functions
+
+4. **Test in demo:**
+ ```bash
+ cmake --build build -j4
+ ./build/demo64k
+ ```
+
+## Design Rationale
+
+**Why [-1,1] normalization?**
+- Centered inputs for tanh (operates best around 0)
+- Better gradient flow
+- Standard ML practice for normalized data
+
+**Why RGBD throughout vs RGB?**
+- Depth information propagates through network
+- Enables depth-aware stylization
+- Consistent 4-channel processing
+
+**Why 7-channel input?**
+- Coordinates: position-dependent effects (vignettes)
+- Grayscale: luminance-aware processing
+- RGBD: full color+depth information
+- Enables richer feature learning
+
+## Testing Checklist
+
+- [ ] Train network with RGBD input data
+- [ ] Verify `cnn_weights_generated.wgsl` structure
+- [ ] Verify `cnn_layer.wgsl` uses `7to4`/`7to1` functions
+- [ ] Build demo without errors
+- [ ] Visual test: inner layers show RGBD evolution
+- [ ] Visual test: final layer produces grayscale
+- [ ] Visual test: blending works correctly
+- [ ] Compare quality with previous RGB→RGB architecture