summaryrefslogtreecommitdiff
path: root/doc/CNN_RGBD_GRAYSCALE_SUMMARY.md
diff options
context:
space:
mode:
Diffstat (limited to 'doc/CNN_RGBD_GRAYSCALE_SUMMARY.md')
-rw-r--r--doc/CNN_RGBD_GRAYSCALE_SUMMARY.md136
1 files changed, 0 insertions, 136 deletions
diff --git a/doc/CNN_RGBD_GRAYSCALE_SUMMARY.md b/doc/CNN_RGBD_GRAYSCALE_SUMMARY.md
deleted file mode 100644
index 3439f2c..0000000
--- a/doc/CNN_RGBD_GRAYSCALE_SUMMARY.md
+++ /dev/null
@@ -1,136 +0,0 @@
-# CNN RGBD→Grayscale Architecture Implementation
-
-## Summary
-
-Implemented CNN architecture upgrade: RGBD input → grayscale output with 7-channel augmented input.
-
-## Changes Made
-
-### Architecture
-
-**Input:** RGBD (4 channels: RGB + inverse depth D=1/z)
-**Output:** Grayscale (1 channel)
-**Layer Input:** 7 channels = [RGBD, UV coords, grayscale] all normalized to [-1,1]
-
-**Layer Configuration:**
-- Inner layers (0..N-2): Conv2d(7→4) - output RGBD with tanh activation
-- Final layer (N-1): Conv2d(7→1) - output grayscale, no activation
-
-### Input Normalization (all to [-1,1])
-
-- **RGBD:** `(rgbd - 0.5) * 2`
-- **UV coords:** `(uv - 0.5) * 2`
-- **Grayscale:** `dot(original.rgb, vec3<f32>(0.2126, 0.7152, 0.0722))` (computed once, passed as parameter)
-
-**Rationale:** Zero-centered inputs for tanh activation, better gradient flow.
-
-### Modified Files
-
-**Training (`/Users/skal/demo/training/train_cnn.py`):**
-1. Removed `CoordConv2d` class
-2. Updated `SimpleCNN`:
- - Inner layers: `Conv2d(7, 4)` - RGBD output
- - Final layer: `Conv2d(7, 1)` - grayscale output
-3. Updated `forward()`:
- - Normalize RGBD/coords/gray to [-1,1]
- - Concatenate 7-channel input for each layer
- - Apply tanh (inner) or none (final)
- - Denormalize final output
-4. Updated `export_weights_to_wgsl()`:
- - Inner: `array<array<f32, 8>, 36>` (9 pos × 4 ch × 8 values)
- - Final: `array<array<f32, 8>, 9>` (9 pos × 8 values)
-5. Updated `generate_layer_shader()`:
- - Use `cnn_conv3x3_7to4` for inner layers
- - Use `cnn_conv3x3_7to1` for final layer
- - Denormalize outputs from [-1,1] to [0,1]
-6. Updated `ImagePairDataset`:
- - Load RGBA input (was RGB)
-
-**Shaders (`/Users/skal/demo/workspaces/main/shaders/cnn/cnn_conv3x3.wgsl`):**
-1. Added `cnn_conv3x3_7to4()`:
- - 7-channel input: [RGBD, uv_x, uv_y, gray] (gray passed as parameter)
- - 4-channel output: RGBD
- - Weights: `array<array<f32, 8>, 36>`
-2. Added `cnn_conv3x3_7to1()`:
- - 7-channel input: [RGBD, uv_x, uv_y, gray] (gray passed as parameter)
- - 1-channel output: grayscale
- - Weights: `array<array<f32, 8>, 9>`
-3. Optimized: gray computed once in caller using `dot()`, not per-function
-
-**Documentation (`/Users/skal/demo/doc/CNN_EFFECT.md`):**
-1. Updated architecture section with RGBD→grayscale pipeline
-2. Updated training data requirements (RGBA input)
-3. Updated weight storage format
-
-### No C++ Changes
-
-CNNLayerParams and bind groups remain unchanged.
-
-## Data Flow
-
-1. Layer 0 captures original RGBD to `captured_frame`
-2. Each layer:
- - Samples previous layer output (RGBD in [0,1])
- - Normalizes RGBD to [-1,1]
- - Computes gray once using `dot()` (fs_main level)
- - Normalizes UV coords to [-1,1] (inside conv functions)
- - Concatenates 7-channel input
- - Applies convolution with layer-specific weights
- - Outputs RGBD (inner) or grayscale (final) in [-1,1]
- - Applies tanh (inner only)
- - Denormalizes to [0,1] for texture storage
- - Blends with original
-
-## Next Steps
-
-1. **Prepare RGBD training data:**
- - Input: RGBA images (RGB + depth in alpha)
- - Target: Grayscale stylized output
-
-2. **Train network:**
- ```bash
- python3 training/train_cnn.py \
- --input training/input \
- --target training/output \
- --layers 3 \
- --epochs 1000
- ```
-
-3. **Verify generated shaders:**
- - Check `cnn_weights_generated.wgsl` structure
- - Check `cnn_layer.wgsl` uses new conv functions
-
-4. **Test in demo:**
- ```bash
- cmake --build build -j4
- ./build/demo64k
- ```
-
-## Design Rationale
-
-**Why [-1,1] normalization?**
-- Centered inputs for tanh (operates best around 0)
-- Better gradient flow
-- Standard ML practice for normalized data
-
-**Why RGBD throughout vs RGB?**
-- Depth information propagates through network
-- Enables depth-aware stylization
-- Consistent 4-channel processing
-
-**Why 7-channel input?**
-- Coordinates: position-dependent effects (vignettes)
-- Grayscale: luminance-aware processing
-- RGBD: full color+depth information
-- Enables richer feature learning
-
-## Testing Checklist
-
-- [ ] Train network with RGBD input data
-- [ ] Verify `cnn_weights_generated.wgsl` structure
-- [ ] Verify `cnn_layer.wgsl` uses `7to4`/`7to1` functions
-- [ ] Build demo without errors
-- [ ] Visual test: inner layers show RGBD evolution
-- [ ] Visual test: final layer produces grayscale
-- [ ] Visual test: blending works correctly
-- [ ] Compare quality with previous RGB→RGB architecture