diff options
Diffstat (limited to 'doc/CNN_RGBD_GRAYSCALE_SUMMARY.md')
| -rw-r--r-- | doc/CNN_RGBD_GRAYSCALE_SUMMARY.md | 136 |
1 files changed, 0 insertions, 136 deletions
diff --git a/doc/CNN_RGBD_GRAYSCALE_SUMMARY.md b/doc/CNN_RGBD_GRAYSCALE_SUMMARY.md deleted file mode 100644 index 3439f2c..0000000 --- a/doc/CNN_RGBD_GRAYSCALE_SUMMARY.md +++ /dev/null @@ -1,136 +0,0 @@ -# CNN RGBD→Grayscale Architecture Implementation - -## Summary - -Implemented CNN architecture upgrade: RGBD input → grayscale output with 7-channel augmented input. - -## Changes Made - -### Architecture - -**Input:** RGBD (4 channels: RGB + inverse depth D=1/z) -**Output:** Grayscale (1 channel) -**Layer Input:** 7 channels = [RGBD, UV coords, grayscale] all normalized to [-1,1] - -**Layer Configuration:** -- Inner layers (0..N-2): Conv2d(7→4) - output RGBD with tanh activation -- Final layer (N-1): Conv2d(7→1) - output grayscale, no activation - -### Input Normalization (all to [-1,1]) - -- **RGBD:** `(rgbd - 0.5) * 2` -- **UV coords:** `(uv - 0.5) * 2` -- **Grayscale:** `dot(original.rgb, vec3<f32>(0.2126, 0.7152, 0.0722))` (computed once, passed as parameter) - -**Rationale:** Zero-centered inputs for tanh activation, better gradient flow. - -### Modified Files - -**Training (`/Users/skal/demo/training/train_cnn.py`):** -1. Removed `CoordConv2d` class -2. Updated `SimpleCNN`: - - Inner layers: `Conv2d(7, 4)` - RGBD output - - Final layer: `Conv2d(7, 1)` - grayscale output -3. Updated `forward()`: - - Normalize RGBD/coords/gray to [-1,1] - - Concatenate 7-channel input for each layer - - Apply tanh (inner) or none (final) - - Denormalize final output -4. Updated `export_weights_to_wgsl()`: - - Inner: `array<array<f32, 8>, 36>` (9 pos × 4 ch × 8 values) - - Final: `array<array<f32, 8>, 9>` (9 pos × 8 values) -5. Updated `generate_layer_shader()`: - - Use `cnn_conv3x3_7to4` for inner layers - - Use `cnn_conv3x3_7to1` for final layer - - Denormalize outputs from [-1,1] to [0,1] -6. Updated `ImagePairDataset`: - - Load RGBA input (was RGB) - -**Shaders (`/Users/skal/demo/workspaces/main/shaders/cnn/cnn_conv3x3.wgsl`):** -1. Added `cnn_conv3x3_7to4()`: - - 7-channel input: [RGBD, uv_x, uv_y, gray] (gray passed as parameter) - - 4-channel output: RGBD - - Weights: `array<array<f32, 8>, 36>` -2. Added `cnn_conv3x3_7to1()`: - - 7-channel input: [RGBD, uv_x, uv_y, gray] (gray passed as parameter) - - 1-channel output: grayscale - - Weights: `array<array<f32, 8>, 9>` -3. Optimized: gray computed once in caller using `dot()`, not per-function - -**Documentation (`/Users/skal/demo/doc/CNN_EFFECT.md`):** -1. Updated architecture section with RGBD→grayscale pipeline -2. Updated training data requirements (RGBA input) -3. Updated weight storage format - -### No C++ Changes - -CNNLayerParams and bind groups remain unchanged. - -## Data Flow - -1. Layer 0 captures original RGBD to `captured_frame` -2. Each layer: - - Samples previous layer output (RGBD in [0,1]) - - Normalizes RGBD to [-1,1] - - Computes gray once using `dot()` (fs_main level) - - Normalizes UV coords to [-1,1] (inside conv functions) - - Concatenates 7-channel input - - Applies convolution with layer-specific weights - - Outputs RGBD (inner) or grayscale (final) in [-1,1] - - Applies tanh (inner only) - - Denormalizes to [0,1] for texture storage - - Blends with original - -## Next Steps - -1. **Prepare RGBD training data:** - - Input: RGBA images (RGB + depth in alpha) - - Target: Grayscale stylized output - -2. **Train network:** - ```bash - python3 training/train_cnn.py \ - --input training/input \ - --target training/output \ - --layers 3 \ - --epochs 1000 - ``` - -3. **Verify generated shaders:** - - Check `cnn_weights_generated.wgsl` structure - - Check `cnn_layer.wgsl` uses new conv functions - -4. **Test in demo:** - ```bash - cmake --build build -j4 - ./build/demo64k - ``` - -## Design Rationale - -**Why [-1,1] normalization?** -- Centered inputs for tanh (operates best around 0) -- Better gradient flow -- Standard ML practice for normalized data - -**Why RGBD throughout vs RGB?** -- Depth information propagates through network -- Enables depth-aware stylization -- Consistent 4-channel processing - -**Why 7-channel input?** -- Coordinates: position-dependent effects (vignettes) -- Grayscale: luminance-aware processing -- RGBD: full color+depth information -- Enables richer feature learning - -## Testing Checklist - -- [ ] Train network with RGBD input data -- [ ] Verify `cnn_weights_generated.wgsl` structure -- [ ] Verify `cnn_layer.wgsl` uses `7to4`/`7to1` functions -- [ ] Build demo without errors -- [ ] Visual test: inner layers show RGBD evolution -- [ ] Visual test: final layer produces grayscale -- [ ] Visual test: blending works correctly -- [ ] Compare quality with previous RGB→RGB architecture |
