# CNN RGBD→Grayscale Architecture Implementation ## Summary Implemented CNN architecture upgrade: RGBD input → grayscale output with 7-channel augmented input. ## Changes Made ### Architecture **Input:** RGBD (4 channels: RGB + inverse depth D=1/z) **Output:** Grayscale (1 channel) **Layer Input:** 7 channels = [RGBD, UV coords, grayscale] all normalized to [-1,1] **Layer Configuration:** - Inner layers (0..N-2): Conv2d(7→4) - output RGBD with tanh activation - Final layer (N-1): Conv2d(7→1) - output grayscale, no activation ### Input Normalization (all to [-1,1]) - **RGBD:** `(rgbd - 0.5) * 2` - **UV coords:** `(uv - 0.5) * 2` - **Grayscale:** `(0.2126*R + 0.7152*G + 0.0722*B - 0.5) * 2` **Rationale:** Zero-centered inputs for tanh activation, better gradient flow. ### Modified Files **Training (`/Users/skal/demo/training/train_cnn.py`):** 1. Removed `CoordConv2d` class 2. Updated `SimpleCNN`: - Inner layers: `Conv2d(7, 4)` - RGBD output - Final layer: `Conv2d(7, 1)` - grayscale output 3. Updated `forward()`: - Normalize RGBD/coords/gray to [-1,1] - Concatenate 7-channel input for each layer - Apply tanh (inner) or none (final) - Denormalize final output 4. Updated `export_weights_to_wgsl()`: - Inner: `array, 36>` (9 pos × 4 ch × 8 values) - Final: `array, 9>` (9 pos × 8 values) 5. Updated `generate_layer_shader()`: - Use `cnn_conv3x3_7to4` for inner layers - Use `cnn_conv3x3_7to1` for final layer - Denormalize outputs from [-1,1] to [0,1] 6. Updated `ImagePairDataset`: - Load RGBA input (was RGB) **Shaders (`/Users/skal/demo/workspaces/main/shaders/cnn/cnn_conv3x3.wgsl`):** 1. Added `cnn_conv3x3_7to4()`: - 7-channel input: [RGBD, uv_x, uv_y, gray] - 4-channel output: RGBD - Weights: `array, 36>` 2. Added `cnn_conv3x3_7to1()`: - 7-channel input: [RGBD, uv_x, uv_y, gray] - 1-channel output: grayscale - Weights: `array, 9>` **Documentation (`/Users/skal/demo/doc/CNN_EFFECT.md`):** 1. Updated architecture section with RGBD→grayscale pipeline 2. Updated training data requirements (RGBA input) 3. Updated weight storage format ### No C++ Changes CNNLayerParams and bind groups remain unchanged. ## Data Flow 1. Layer 0 captures original RGBD to `captured_frame` 2. Each layer: - Samples previous layer output (RGBD in [0,1]) - Normalizes RGBD to [-1,1] - Computes UV coords and grayscale, normalizes to [-1,1] - Concatenates 7-channel input - Applies convolution with layer-specific weights - Outputs RGBD (inner) or grayscale (final) in [-1,1] - Applies tanh (inner only) - Denormalizes to [0,1] for texture storage - Blends with original ## Next Steps 1. **Prepare RGBD training data:** - Input: RGBA images (RGB + depth in alpha) - Target: Grayscale stylized output 2. **Train network:** ```bash python3 training/train_cnn.py \ --input training/input \ --target training/output \ --layers 3 \ --epochs 1000 ``` 3. **Verify generated shaders:** - Check `cnn_weights_generated.wgsl` structure - Check `cnn_layer.wgsl` uses new conv functions 4. **Test in demo:** ```bash cmake --build build -j4 ./build/demo64k ``` ## Design Rationale **Why [-1,1] normalization?** - Centered inputs for tanh (operates best around 0) - Better gradient flow - Standard ML practice for normalized data **Why RGBD throughout vs RGB?** - Depth information propagates through network - Enables depth-aware stylization - Consistent 4-channel processing **Why 7-channel input?** - Coordinates: position-dependent effects (vignettes) - Grayscale: luminance-aware processing - RGBD: full color+depth information - Enables richer feature learning ## Testing Checklist - [ ] Train network with RGBD input data - [ ] Verify `cnn_weights_generated.wgsl` structure - [ ] Verify `cnn_layer.wgsl` uses `7to4`/`7to1` functions - [ ] Build demo without errors - [ ] Visual test: inner layers show RGBD evolution - [ ] Visual test: final layer produces grayscale - [ ] Visual test: blending works correctly - [ ] Compare quality with previous RGB→RGB architecture