diff options
Diffstat (limited to 'doc')
| -rw-r--r-- | doc/CNN_EFFECT.md | 75 | ||||
| -rw-r--r-- | doc/CNN_RGBD_GRAYSCALE_SUMMARY.md | 134 | ||||
| -rw-r--r-- | doc/HOWTO.md | 8 |
3 files changed, 189 insertions, 28 deletions
diff --git a/doc/CNN_EFFECT.md b/doc/CNN_EFFECT.md index ae0f38a..b7d157f 100644 --- a/doc/CNN_EFFECT.md +++ b/doc/CNN_EFFECT.md @@ -21,27 +21,44 @@ Trainable convolutional neural network layers for artistic stylization (painterl ## Architecture -### Coordinate-Aware Layer 0 +### RGBD → Grayscale Pipeline -Layer 0 accepts normalized (x,y) patch center coordinates alongside RGBA samples: +**Input:** RGBD (RGB + inverse depth D=1/z) +**Output:** Grayscale (1 channel) +**Layer Input:** 7 channels = [RGBD, UV coords, grayscale] all normalized to [-1,1] + +**Architecture:** +- **Inner layers (0..N-2):** Conv2d(7→4) - output RGBD +- **Final layer (N-1):** Conv2d(7→1) - output grayscale ```wgsl -fn cnn_conv3x3_with_coord( +// Inner layers: 7→4 (RGBD output) +fn cnn_conv3x3_7to4( tex: texture_2d<f32>, samp: sampler, - uv: vec2<f32>, # Center position [0,1] + uv: vec2<f32>, resolution: vec2<f32>, - rgba_weights: array<mat4x4<f32>, 9>, # 9 samples × 4×4 matrix - coord_weights: mat2x4<f32>, # 2 coords → 4 outputs - bias: vec4<f32> + original: vec4<f32>, # Original RGBD [0,1] + weights: array<array<f32, 8>, 36> # 9 pos × 4 out × (7 weights + bias) ) -> vec4<f32> -``` -**Input structure:** 9 RGBA samples (36 values) + 1 xy coordinate (2 values) = 38 inputs → 4 outputs +// Final layer: 7→1 (grayscale output) +fn cnn_conv3x3_7to1( + tex: texture_2d<f32>, + samp: sampler, + uv: vec2<f32>, + resolution: vec2<f32>, + original: vec4<f32>, + weights: array<array<f32, 8>, 9> # 9 pos × (7 weights + bias) +) -> f32 +``` -**Size impact:** +32B coord weights, kernel-agnostic +**Input normalization (all to [-1,1]):** +- RGBD: `(rgbd - 0.5) * 2` +- UV coords: `(uv - 0.5) * 2` +- Grayscale: `(0.2126*R + 0.7152*G + 0.0722*B - 0.5) * 2` -**Use cases:** Position-dependent stylization (vignettes, corner darkening, radial gradients) +**Activation:** tanh for inner layers, none for final layer ### Multi-Layer Architecture @@ -80,18 +97,15 @@ workspaces/main/shaders/cnn/ ### 1. Prepare Training Data Collect input/target image pairs: -- **Input:** Raw 3D render -- **Target:** Artistic style (hand-painted, filtered, stylized) +- **Input:** RGBA (RGB + depth as alpha channel, D=1/z) +- **Target:** Grayscale stylized output ```bash -training/input/img_000.png # Raw render -training/output/img_000.png # Stylized target +training/input/img_000.png # RGBA render (RGB + depth) +training/output/img_000.png # Grayscale target ``` -Use `image_style_processor.py` to generate targets: -```bash -python3 training/image_style_processor.py input/ output/ pencil_sketch -``` +**Note:** Input images must be RGBA where alpha = inverse depth (1/z) ### 2. Train Network @@ -245,20 +259,25 @@ Expands to: **Weight Storage:** -**Layer 0 (coordinate-aware):** +**Inner layers (7→4 RGBD output):** ```wgsl -const rgba_weights_layer0: array<mat4x4<f32>, 9> = array(...); -const coord_weights_layer0 = mat2x4<f32>( - 0.1, -0.2, 0.0, 0.0, # x-coord weights - -0.1, 0.0, 0.2, 0.0 # y-coord weights +// Structure: array<array<f32, 8>, 36> +// 9 positions × 4 output channels, each with 7 weights + bias +const weights_layer0: array<array<f32, 8>, 36> = array( + array<f32, 8>(w0_r, w0_g, w0_b, w0_d, w0_u, w0_v, w0_gray, bias0), // pos0_ch0 + array<f32, 8>(w1_r, w1_g, w1_b, w1_d, w1_u, w1_v, w1_gray, bias1), // pos0_ch1 + // ... 34 more entries ); -const bias_layer0 = vec4<f32>(0.0, 0.0, 0.0, 0.0); ``` -**Layers 1+ (standard):** +**Final layer (7→1 grayscale output):** ```wgsl -const weights_layer1: array<mat4x4<f32>, 9> = array(...); -const bias_layer1 = vec4<f32>(0.0, 0.0, 0.0, 0.0); +// Structure: array<array<f32, 8>, 9> +// 9 positions, each with 7 weights + bias +const weights_layerN: array<array<f32, 8>, 9> = array( + array<f32, 8>(w0_r, w0_g, w0_b, w0_d, w0_u, w0_v, w0_gray, bias0), // pos0 + // ... 8 more entries +); ``` --- diff --git a/doc/CNN_RGBD_GRAYSCALE_SUMMARY.md b/doc/CNN_RGBD_GRAYSCALE_SUMMARY.md new file mode 100644 index 0000000..4c13693 --- /dev/null +++ b/doc/CNN_RGBD_GRAYSCALE_SUMMARY.md @@ -0,0 +1,134 @@ +# CNN RGBD→Grayscale Architecture Implementation + +## Summary + +Implemented CNN architecture upgrade: RGBD input → grayscale output with 7-channel augmented input. + +## Changes Made + +### Architecture + +**Input:** RGBD (4 channels: RGB + inverse depth D=1/z) +**Output:** Grayscale (1 channel) +**Layer Input:** 7 channels = [RGBD, UV coords, grayscale] all normalized to [-1,1] + +**Layer Configuration:** +- Inner layers (0..N-2): Conv2d(7→4) - output RGBD with tanh activation +- Final layer (N-1): Conv2d(7→1) - output grayscale, no activation + +### Input Normalization (all to [-1,1]) + +- **RGBD:** `(rgbd - 0.5) * 2` +- **UV coords:** `(uv - 0.5) * 2` +- **Grayscale:** `(0.2126*R + 0.7152*G + 0.0722*B - 0.5) * 2` + +**Rationale:** Zero-centered inputs for tanh activation, better gradient flow. + +### Modified Files + +**Training (`/Users/skal/demo/training/train_cnn.py`):** +1. Removed `CoordConv2d` class +2. Updated `SimpleCNN`: + - Inner layers: `Conv2d(7, 4)` - RGBD output + - Final layer: `Conv2d(7, 1)` - grayscale output +3. Updated `forward()`: + - Normalize RGBD/coords/gray to [-1,1] + - Concatenate 7-channel input for each layer + - Apply tanh (inner) or none (final) + - Denormalize final output +4. Updated `export_weights_to_wgsl()`: + - Inner: `array<array<f32, 8>, 36>` (9 pos × 4 ch × 8 values) + - Final: `array<array<f32, 8>, 9>` (9 pos × 8 values) +5. Updated `generate_layer_shader()`: + - Use `cnn_conv3x3_7to4` for inner layers + - Use `cnn_conv3x3_7to1` for final layer + - Denormalize outputs from [-1,1] to [0,1] +6. Updated `ImagePairDataset`: + - Load RGBA input (was RGB) + +**Shaders (`/Users/skal/demo/workspaces/main/shaders/cnn/cnn_conv3x3.wgsl`):** +1. Added `cnn_conv3x3_7to4()`: + - 7-channel input: [RGBD, uv_x, uv_y, gray] + - 4-channel output: RGBD + - Weights: `array<array<f32, 8>, 36>` +2. Added `cnn_conv3x3_7to1()`: + - 7-channel input: [RGBD, uv_x, uv_y, gray] + - 1-channel output: grayscale + - Weights: `array<array<f32, 8>, 9>` + +**Documentation (`/Users/skal/demo/doc/CNN_EFFECT.md`):** +1. Updated architecture section with RGBD→grayscale pipeline +2. Updated training data requirements (RGBA input) +3. Updated weight storage format + +### No C++ Changes + +CNNLayerParams and bind groups remain unchanged. + +## Data Flow + +1. Layer 0 captures original RGBD to `captured_frame` +2. Each layer: + - Samples previous layer output (RGBD in [0,1]) + - Normalizes RGBD to [-1,1] + - Computes UV coords and grayscale, normalizes to [-1,1] + - Concatenates 7-channel input + - Applies convolution with layer-specific weights + - Outputs RGBD (inner) or grayscale (final) in [-1,1] + - Applies tanh (inner only) + - Denormalizes to [0,1] for texture storage + - Blends with original + +## Next Steps + +1. **Prepare RGBD training data:** + - Input: RGBA images (RGB + depth in alpha) + - Target: Grayscale stylized output + +2. **Train network:** + ```bash + python3 training/train_cnn.py \ + --input training/input \ + --target training/output \ + --layers 3 \ + --epochs 1000 + ``` + +3. **Verify generated shaders:** + - Check `cnn_weights_generated.wgsl` structure + - Check `cnn_layer.wgsl` uses new conv functions + +4. **Test in demo:** + ```bash + cmake --build build -j4 + ./build/demo64k + ``` + +## Design Rationale + +**Why [-1,1] normalization?** +- Centered inputs for tanh (operates best around 0) +- Better gradient flow +- Standard ML practice for normalized data + +**Why RGBD throughout vs RGB?** +- Depth information propagates through network +- Enables depth-aware stylization +- Consistent 4-channel processing + +**Why 7-channel input?** +- Coordinates: position-dependent effects (vignettes) +- Grayscale: luminance-aware processing +- RGBD: full color+depth information +- Enables richer feature learning + +## Testing Checklist + +- [ ] Train network with RGBD input data +- [ ] Verify `cnn_weights_generated.wgsl` structure +- [ ] Verify `cnn_layer.wgsl` uses `7to4`/`7to1` functions +- [ ] Build demo without errors +- [ ] Visual test: inner layers show RGBD evolution +- [ ] Visual test: final layer produces grayscale +- [ ] Visual test: blending works correctly +- [ ] Compare quality with previous RGB→RGB architecture diff --git a/doc/HOWTO.md b/doc/HOWTO.md index bdc0214..2c813f7 100644 --- a/doc/HOWTO.md +++ b/doc/HOWTO.md @@ -86,6 +86,14 @@ make run_util_tests # Utility tests --- +## Training + +```bash +./training/train_cnn.py --layers 3 --kernel_sizes 3,5,3 --epochs 10000 --batch_size 8 --input training/input/ --target training/output/ --checkpoint-every 1000 +``` + +--- + ## Timeline Edit `workspaces/main/timeline.seq`: |
