# CNN RGBD→Grayscale Architecture Implementation

## Summary

Implemented CNN architecture upgrade: RGBD input → grayscale output with 7-channel augmented input.

## Changes Made

### Architecture

**Input:** RGBD (4 channels: RGB + inverse depth D=1/z)
**Output:** Grayscale (1 channel)
**Layer Input:** 7 channels = [RGBD, UV coords, grayscale] all normalized to [-1,1]

**Layer Configuration:**
- Inner layers (0..N-2): Conv2d(7→4) - output RGBD with tanh activation
- Final layer (N-1): Conv2d(7→1) - output grayscale, no activation

### Input Normalization (all to [-1,1])

- **RGBD:** `(rgbd - 0.5) * 2`
- **UV coords:** `(uv - 0.5) * 2`
- **Grayscale:** `(0.2126*R + 0.7152*G + 0.0722*B - 0.5) * 2`

**Rationale:** Zero-centered inputs for tanh activation, better gradient flow.

### Modified Files

**Training (`/Users/skal/demo/training/train_cnn.py`):**
1. Removed `CoordConv2d` class
2. Updated `SimpleCNN`:
   - Inner layers: `Conv2d(7, 4)` - RGBD output
   - Final layer: `Conv2d(7, 1)` - grayscale output
3. Updated `forward()`:
   - Normalize RGBD/coords/gray to [-1,1]
   - Concatenate 7-channel input for each layer
   - Apply tanh (inner) or none (final)
   - Denormalize final output
4. Updated `export_weights_to_wgsl()`:
   - Inner: `array<array<f32, 8>, 36>` (9 pos × 4 ch × 8 values)
   - Final: `array<array<f32, 8>, 9>` (9 pos × 8 values)
5. Updated `generate_layer_shader()`:
   - Use `cnn_conv3x3_7to4` for inner layers
   - Use `cnn_conv3x3_7to1` for final layer
   - Denormalize outputs from [-1,1] to [0,1]
6. Updated `ImagePairDataset`:
   - Load RGBA input (was RGB)

**Shaders (`/Users/skal/demo/workspaces/main/shaders/cnn/cnn_conv3x3.wgsl`):**
1. Added `cnn_conv3x3_7to4()`:
   - 7-channel input: [RGBD, uv_x, uv_y, gray]
   - 4-channel output: RGBD
   - Weights: `array<array<f32, 8>, 36>`
2. Added `cnn_conv3x3_7to1()`:
   - 7-channel input: [RGBD, uv_x, uv_y, gray]
   - 1-channel output: grayscale
   - Weights: `array<array<f32, 8>, 9>`

**Documentation (`/Users/skal/demo/doc/CNN_EFFECT.md`):**
1. Updated architecture section with RGBD→grayscale pipeline
2. Updated training data requirements (RGBA input)
3. Updated weight storage format

### No C++ Changes

CNNLayerParams and bind groups remain unchanged.

## Data Flow

1. Layer 0 captures original RGBD to `captured_frame`
2. Each layer:
   - Samples previous layer output (RGBD in [0,1])
   - Normalizes RGBD to [-1,1]
   - Computes UV coords and grayscale, normalizes to [-1,1]
   - Concatenates 7-channel input
   - Applies convolution with layer-specific weights
   - Outputs RGBD (inner) or grayscale (final) in [-1,1]
   - Applies tanh (inner only)
   - Denormalizes to [0,1] for texture storage
   - Blends with original

## Next Steps

1. **Prepare RGBD training data:**
   - Input: RGBA images (RGB + depth in alpha)
   - Target: Grayscale stylized output

2. **Train network:**
   ```bash
   python3 training/train_cnn.py \
     --input training/input \
     --target training/output \
     --layers 3 \
     --epochs 1000
   ```

3. **Verify generated shaders:**
   - Check `cnn_weights_generated.wgsl` structure
   - Check `cnn_layer.wgsl` uses new conv functions

4. **Test in demo:**
   ```bash
   cmake --build build -j4
   ./build/demo64k
   ```

## Design Rationale

**Why [-1,1] normalization?**
- Centered inputs for tanh (operates best around 0)
- Better gradient flow
- Standard ML practice for normalized data

**Why RGBD throughout vs RGB?**
- Depth information propagates through network
- Enables depth-aware stylization
- Consistent 4-channel processing

**Why 7-channel input?**
- Coordinates: position-dependent effects (vignettes)
- Grayscale: luminance-aware processing
- RGBD: full color+depth information
- Enables richer feature learning

## Testing Checklist

- [ ] Train network with RGBD input data
- [ ] Verify `cnn_weights_generated.wgsl` structure
- [ ] Verify `cnn_layer.wgsl` uses `7to4`/`7to1` functions
- [ ] Build demo without errors
- [ ] Visual test: inner layers show RGBD evolution
- [ ] Visual test: final layer produces grayscale
- [ ] Visual test: blending works correctly
- [ ] Compare quality with previous RGB→RGB architecture