diff options
| author | skal <pascal.massimino@gmail.com> | 2026-02-15 18:52:48 +0100 |
|---|---|---|
| committer | skal <pascal.massimino@gmail.com> | 2026-02-15 18:52:48 +0100 |
| commit | d4b67e2f6ab48ab9ec658140be4f1999f604559a (patch) | |
| tree | 2502b0dc89748f7cfe674d3c177bd1528ce1c231 /doc/CNN_RGBD_GRAYSCALE_SUMMARY.md | |
| parent | 161a59fa50bb92e3664c389fa03b95aefe349b3f (diff) | |
archive(cnn): move CNN v1 to cnn_v1/ subdirectory
Consolidate CNN v1 (CNNEffect) into dedicated directory:
- C++ effect: src/effects → cnn_v1/src/
- Shaders: workspaces/main/shaders/cnn → cnn_v1/shaders/
- Training: training/train_cnn.py → cnn_v1/training/
- Docs: doc/CNN*.md → cnn_v1/docs/
Updated all references:
- CMake source list
- C++ includes (relative paths: ../../cnn_v1/src/)
- Asset paths (../../cnn_v1/shaders/)
- Documentation cross-references
CNN v1 remains active in timeline. For new work, use CNN v2 with
enhanced features (7D static, storage buffer, sigmoid activation).
Tests: 34/34 passing (100%)
Diffstat (limited to 'doc/CNN_RGBD_GRAYSCALE_SUMMARY.md')
| -rw-r--r-- | doc/CNN_RGBD_GRAYSCALE_SUMMARY.md | 136 |
1 files changed, 0 insertions, 136 deletions
diff --git a/doc/CNN_RGBD_GRAYSCALE_SUMMARY.md b/doc/CNN_RGBD_GRAYSCALE_SUMMARY.md deleted file mode 100644 index 3439f2c..0000000 --- a/doc/CNN_RGBD_GRAYSCALE_SUMMARY.md +++ /dev/null @@ -1,136 +0,0 @@ -# CNN RGBD→Grayscale Architecture Implementation - -## Summary - -Implemented CNN architecture upgrade: RGBD input → grayscale output with 7-channel augmented input. - -## Changes Made - -### Architecture - -**Input:** RGBD (4 channels: RGB + inverse depth D=1/z) -**Output:** Grayscale (1 channel) -**Layer Input:** 7 channels = [RGBD, UV coords, grayscale] all normalized to [-1,1] - -**Layer Configuration:** -- Inner layers (0..N-2): Conv2d(7→4) - output RGBD with tanh activation -- Final layer (N-1): Conv2d(7→1) - output grayscale, no activation - -### Input Normalization (all to [-1,1]) - -- **RGBD:** `(rgbd - 0.5) * 2` -- **UV coords:** `(uv - 0.5) * 2` -- **Grayscale:** `dot(original.rgb, vec3<f32>(0.2126, 0.7152, 0.0722))` (computed once, passed as parameter) - -**Rationale:** Zero-centered inputs for tanh activation, better gradient flow. - -### Modified Files - -**Training (`/Users/skal/demo/training/train_cnn.py`):** -1. Removed `CoordConv2d` class -2. Updated `SimpleCNN`: - - Inner layers: `Conv2d(7, 4)` - RGBD output - - Final layer: `Conv2d(7, 1)` - grayscale output -3. Updated `forward()`: - - Normalize RGBD/coords/gray to [-1,1] - - Concatenate 7-channel input for each layer - - Apply tanh (inner) or none (final) - - Denormalize final output -4. Updated `export_weights_to_wgsl()`: - - Inner: `array<array<f32, 8>, 36>` (9 pos × 4 ch × 8 values) - - Final: `array<array<f32, 8>, 9>` (9 pos × 8 values) -5. Updated `generate_layer_shader()`: - - Use `cnn_conv3x3_7to4` for inner layers - - Use `cnn_conv3x3_7to1` for final layer - - Denormalize outputs from [-1,1] to [0,1] -6. Updated `ImagePairDataset`: - - Load RGBA input (was RGB) - -**Shaders (`/Users/skal/demo/workspaces/main/shaders/cnn/cnn_conv3x3.wgsl`):** -1. Added `cnn_conv3x3_7to4()`: - - 7-channel input: [RGBD, uv_x, uv_y, gray] (gray passed as parameter) - - 4-channel output: RGBD - - Weights: `array<array<f32, 8>, 36>` -2. Added `cnn_conv3x3_7to1()`: - - 7-channel input: [RGBD, uv_x, uv_y, gray] (gray passed as parameter) - - 1-channel output: grayscale - - Weights: `array<array<f32, 8>, 9>` -3. Optimized: gray computed once in caller using `dot()`, not per-function - -**Documentation (`/Users/skal/demo/doc/CNN_EFFECT.md`):** -1. Updated architecture section with RGBD→grayscale pipeline -2. Updated training data requirements (RGBA input) -3. Updated weight storage format - -### No C++ Changes - -CNNLayerParams and bind groups remain unchanged. - -## Data Flow - -1. Layer 0 captures original RGBD to `captured_frame` -2. Each layer: - - Samples previous layer output (RGBD in [0,1]) - - Normalizes RGBD to [-1,1] - - Computes gray once using `dot()` (fs_main level) - - Normalizes UV coords to [-1,1] (inside conv functions) - - Concatenates 7-channel input - - Applies convolution with layer-specific weights - - Outputs RGBD (inner) or grayscale (final) in [-1,1] - - Applies tanh (inner only) - - Denormalizes to [0,1] for texture storage - - Blends with original - -## Next Steps - -1. **Prepare RGBD training data:** - - Input: RGBA images (RGB + depth in alpha) - - Target: Grayscale stylized output - -2. **Train network:** - ```bash - python3 training/train_cnn.py \ - --input training/input \ - --target training/output \ - --layers 3 \ - --epochs 1000 - ``` - -3. **Verify generated shaders:** - - Check `cnn_weights_generated.wgsl` structure - - Check `cnn_layer.wgsl` uses new conv functions - -4. **Test in demo:** - ```bash - cmake --build build -j4 - ./build/demo64k - ``` - -## Design Rationale - -**Why [-1,1] normalization?** -- Centered inputs for tanh (operates best around 0) -- Better gradient flow -- Standard ML practice for normalized data - -**Why RGBD throughout vs RGB?** -- Depth information propagates through network -- Enables depth-aware stylization -- Consistent 4-channel processing - -**Why 7-channel input?** -- Coordinates: position-dependent effects (vignettes) -- Grayscale: luminance-aware processing -- RGBD: full color+depth information -- Enables richer feature learning - -## Testing Checklist - -- [ ] Train network with RGBD input data -- [ ] Verify `cnn_weights_generated.wgsl` structure -- [ ] Verify `cnn_layer.wgsl` uses `7to4`/`7to1` functions -- [ ] Build demo without errors -- [ ] Visual test: inner layers show RGBD evolution -- [ ] Visual test: final layer produces grayscale -- [ ] Visual test: blending works correctly -- [ ] Compare quality with previous RGB→RGB architecture |
