diff options
| author | skal <pascal.massimino@gmail.com> | 2026-02-10 16:44:39 +0100 |
|---|---|---|
| committer | skal <pascal.massimino@gmail.com> | 2026-02-10 16:44:39 +0100 |
| commit | 61104d5b9e1774c11f0dba3b6d6018dabc2bce8f (patch) | |
| tree | 882e642721984cc921cbe5678fe7905721a2ad40 /doc/CNN_RGBD_GRAYSCALE_SUMMARY.md | |
| parent | 3942653de11542acc4892470243a8a6bf8d5c4f7 (diff) | |
feat: CNN RGBD→grayscale with 7-channel augmented input
Upgrade CNN architecture to process RGBD input, output grayscale, with
7-channel layer inputs (RGBD + UV coords + grayscale).
Architecture changes:
- Inner layers: Conv2d(7→4) output RGBD
- Final layer: Conv2d(7→1) output grayscale
- All inputs normalized to [-1,1] for tanh activation
- Removed CoordConv2d in favor of unified 7-channel input
Training (train_cnn.py):
- SimpleCNN: 7→4 (inner), 7→1 (final) architecture
- Forward: Normalize RGBD/coords/gray to [-1,1]
- Weight export: array<array<f32, 8>, 36> (inner), array<f32, 8>, 9> (final)
- Dataset: Load RGBA (RGBD) input
Shaders (cnn_conv3x3.wgsl):
- Added cnn_conv3x3_7to4: 7-channel input → RGBD output
- Added cnn_conv3x3_7to1: 7-channel input → grayscale output
- Both normalize inputs and use flattened weight arrays
Documentation:
- CNN_EFFECT.md: Updated architecture, training, weight format
- CNN_RGBD_GRAYSCALE_SUMMARY.md: Implementation summary
- HOWTO.md: Added training command example
Next: Train with RGBD input data
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Diffstat (limited to 'doc/CNN_RGBD_GRAYSCALE_SUMMARY.md')
| -rw-r--r-- | doc/CNN_RGBD_GRAYSCALE_SUMMARY.md | 134 |
1 files changed, 134 insertions, 0 deletions
diff --git a/doc/CNN_RGBD_GRAYSCALE_SUMMARY.md b/doc/CNN_RGBD_GRAYSCALE_SUMMARY.md new file mode 100644 index 0000000..4c13693 --- /dev/null +++ b/doc/CNN_RGBD_GRAYSCALE_SUMMARY.md @@ -0,0 +1,134 @@ +# CNN RGBD→Grayscale Architecture Implementation + +## Summary + +Implemented CNN architecture upgrade: RGBD input → grayscale output with 7-channel augmented input. + +## Changes Made + +### Architecture + +**Input:** RGBD (4 channels: RGB + inverse depth D=1/z) +**Output:** Grayscale (1 channel) +**Layer Input:** 7 channels = [RGBD, UV coords, grayscale] all normalized to [-1,1] + +**Layer Configuration:** +- Inner layers (0..N-2): Conv2d(7→4) - output RGBD with tanh activation +- Final layer (N-1): Conv2d(7→1) - output grayscale, no activation + +### Input Normalization (all to [-1,1]) + +- **RGBD:** `(rgbd - 0.5) * 2` +- **UV coords:** `(uv - 0.5) * 2` +- **Grayscale:** `(0.2126*R + 0.7152*G + 0.0722*B - 0.5) * 2` + +**Rationale:** Zero-centered inputs for tanh activation, better gradient flow. + +### Modified Files + +**Training (`/Users/skal/demo/training/train_cnn.py`):** +1. Removed `CoordConv2d` class +2. Updated `SimpleCNN`: + - Inner layers: `Conv2d(7, 4)` - RGBD output + - Final layer: `Conv2d(7, 1)` - grayscale output +3. Updated `forward()`: + - Normalize RGBD/coords/gray to [-1,1] + - Concatenate 7-channel input for each layer + - Apply tanh (inner) or none (final) + - Denormalize final output +4. Updated `export_weights_to_wgsl()`: + - Inner: `array<array<f32, 8>, 36>` (9 pos × 4 ch × 8 values) + - Final: `array<array<f32, 8>, 9>` (9 pos × 8 values) +5. Updated `generate_layer_shader()`: + - Use `cnn_conv3x3_7to4` for inner layers + - Use `cnn_conv3x3_7to1` for final layer + - Denormalize outputs from [-1,1] to [0,1] +6. Updated `ImagePairDataset`: + - Load RGBA input (was RGB) + +**Shaders (`/Users/skal/demo/workspaces/main/shaders/cnn/cnn_conv3x3.wgsl`):** +1. Added `cnn_conv3x3_7to4()`: + - 7-channel input: [RGBD, uv_x, uv_y, gray] + - 4-channel output: RGBD + - Weights: `array<array<f32, 8>, 36>` +2. Added `cnn_conv3x3_7to1()`: + - 7-channel input: [RGBD, uv_x, uv_y, gray] + - 1-channel output: grayscale + - Weights: `array<array<f32, 8>, 9>` + +**Documentation (`/Users/skal/demo/doc/CNN_EFFECT.md`):** +1. Updated architecture section with RGBD→grayscale pipeline +2. Updated training data requirements (RGBA input) +3. Updated weight storage format + +### No C++ Changes + +CNNLayerParams and bind groups remain unchanged. + +## Data Flow + +1. Layer 0 captures original RGBD to `captured_frame` +2. Each layer: + - Samples previous layer output (RGBD in [0,1]) + - Normalizes RGBD to [-1,1] + - Computes UV coords and grayscale, normalizes to [-1,1] + - Concatenates 7-channel input + - Applies convolution with layer-specific weights + - Outputs RGBD (inner) or grayscale (final) in [-1,1] + - Applies tanh (inner only) + - Denormalizes to [0,1] for texture storage + - Blends with original + +## Next Steps + +1. **Prepare RGBD training data:** + - Input: RGBA images (RGB + depth in alpha) + - Target: Grayscale stylized output + +2. **Train network:** + ```bash + python3 training/train_cnn.py \ + --input training/input \ + --target training/output \ + --layers 3 \ + --epochs 1000 + ``` + +3. **Verify generated shaders:** + - Check `cnn_weights_generated.wgsl` structure + - Check `cnn_layer.wgsl` uses new conv functions + +4. **Test in demo:** + ```bash + cmake --build build -j4 + ./build/demo64k + ``` + +## Design Rationale + +**Why [-1,1] normalization?** +- Centered inputs for tanh (operates best around 0) +- Better gradient flow +- Standard ML practice for normalized data + +**Why RGBD throughout vs RGB?** +- Depth information propagates through network +- Enables depth-aware stylization +- Consistent 4-channel processing + +**Why 7-channel input?** +- Coordinates: position-dependent effects (vignettes) +- Grayscale: luminance-aware processing +- RGBD: full color+depth information +- Enables richer feature learning + +## Testing Checklist + +- [ ] Train network with RGBD input data +- [ ] Verify `cnn_weights_generated.wgsl` structure +- [ ] Verify `cnn_layer.wgsl` uses `7to4`/`7to1` functions +- [ ] Build demo without errors +- [ ] Visual test: inner layers show RGBD evolution +- [ ] Visual test: final layer produces grayscale +- [ ] Visual test: blending works correctly +- [ ] Compare quality with previous RGB→RGB architecture |
