diff options
Diffstat (limited to 'doc')
| -rw-r--r-- | doc/AI_RULES.md | 19 | ||||
| -rw-r--r-- | doc/CNN_DEBUG.md | 43 | ||||
| -rw-r--r-- | doc/CNN_EFFECT.md | 378 | ||||
| -rw-r--r-- | doc/CNN_RGBD_GRAYSCALE_SUMMARY.md | 134 | ||||
| -rw-r--r-- | doc/COMPLETED.md | 16 | ||||
| -rw-r--r-- | doc/CONTRIBUTING.md | 15 | ||||
| -rw-r--r-- | doc/EFFECT_WORKFLOW.md | 228 | ||||
| -rw-r--r-- | doc/HOWTO.md | 19 | ||||
| -rw-r--r-- | doc/RECIPE.md | 4 |
9 files changed, 726 insertions, 130 deletions
diff --git a/doc/AI_RULES.md b/doc/AI_RULES.md index d18a0cc..1a4ee78 100644 --- a/doc/AI_RULES.md +++ b/doc/AI_RULES.md @@ -5,3 +5,22 @@ - Prefer small, reviewable commits - All `cmake --build` commands must use the `-j4` option for parallel building. - after a task, a 'big' final commit should contain a short handoff tag like "handoff(Gemini):..." if you're gemini-cli, or "handoff(Claude): ..." if you're claude-code. + +## Adding Visual Effects + +**IMPORTANT:** When adding new visual effects, follow the complete workflow in `doc/EFFECT_WORKFLOW.md`. + +**Required steps (must complete ALL):** +1. Create effect files (.h, .cc, .wgsl) +2. Add shader to `workspaces/main/assets.txt` +3. Add `.cc` to CMakeLists.txt GPU_SOURCES (BOTH sections: headless and normal) +4. Include header in `src/gpu/demo_effects.h` +5. Add to timeline with `EFFECT +` (priority modifier is REQUIRED) +6. Add to test list in `src/tests/gpu/test_demo_effects.cc` +7. Build and verify: `cmake --build build -j4 && cd build && ./test_demo_effects` + +**Common mistakes to avoid:** +- Missing priority modifier in timeline (`EFFECT` must be `EFFECT +`, `EFFECT =`, or `EFFECT -`) +- Adding `.cc` to only one CMakeLists.txt section (need BOTH headless and normal) +- Wrong asset ID (check assets.txt entry name → `ASSET_SHADER_<NAME>`) +- Forgetting to add to test file diff --git a/doc/CNN_DEBUG.md b/doc/CNN_DEBUG.md new file mode 100644 index 0000000..dba0b60 --- /dev/null +++ b/doc/CNN_DEBUG.md @@ -0,0 +1,43 @@ +# CNN Effect Black Screen Bug - Resolution (2026-02) + +## Problem +CNN post-processing effect showed black screen when activated at 11.50s, despite scene rendering correctly before CNN started. + +## Root Causes + +### Bug 1: Framebuffer Capture Timing +**Location**: `src/gpu/effect.cc` +**Issue**: Capture ran INSIDE post-effect loop after ping-pong buffer swaps. CNN layers 1+ captured wrong buffer (output being written to, not scene). +**Fix**: Moved capture before loop starts (lines 308-346). Capture now copies `framebuffer_a` to `captured_frame` auxiliary texture ONCE before any post-effects run. + +### Bug 2: Missing Uniforms Update ⚠️ CRITICAL +**Location**: `src/gpu/effects/cnn_effect.cc` +**Issue**: `CNNEffect::update_bind_group()` never updated `uniforms_` buffer. `uniforms.resolution` uninitialized (0,0 or garbage) → UV calculation `p.xy / uniforms.resolution` produced NaN → all texture samples black. +**Fix**: Added uniforms update before bind group creation (lines 132-142): +```cpp +const CommonPostProcessUniforms u = { + .resolution = {(float)width_, (float)height_}, + .aspect_ratio = (float)width_ / (float)height_, + .time = 0.0f, + .beat = 0.0f, + .audio_intensity = 0.0f, +}; +uniforms_.update(ctx_.queue, u); +``` + +## Key Lessons + +1. **All post-process effects MUST update `uniforms_` buffer** - Required for UV calculations and shader parameters +2. **Framebuffer capture timing is critical** - Must happen before post-chain ping-pong starts +3. **Uninitialized uniforms cause silent failures** - Produces black output without validation errors +4. **Post-effects must render or chain breaks** - `loadOp=Load` preserves previous (black) content if no draw call executes + +## Files Modified +- `src/gpu/effect.cc`: Lines 308-346 (capture timing) +- `src/gpu/effects/cnn_effect.cc`: Lines 132-142 (uniforms update) + +## Verification +Test: `demo64k --seek 11.5` +- ✅ Scene visible with RotatingCube +- ✅ CNN stylization applied +- ✅ All 3 layers process with correct original texture reference diff --git a/doc/CNN_EFFECT.md b/doc/CNN_EFFECT.md index 9045739..d51c187 100644 --- a/doc/CNN_EFFECT.md +++ b/doc/CNN_EFFECT.md @@ -6,157 +6,281 @@ Neural network-based stylization for rendered scenes. ## Overview -The CNN effect applies trainable convolutional neural network layers to post-process 3D rendered output, enabling artistic stylization (e.g., painterly, sketch, cel-shaded effects) with minimal runtime overhead. +Trainable convolutional neural network layers for artistic stylization (painterly, sketch, cel-shaded effects) with minimal runtime overhead. **Key Features:** -- Multi-layer convolutions (3×3, 5×5, 7×7 kernels) +- Position-aware layer 0 (coordinate input for vignetting, edge effects) +- Multi-layer convolutions (3×3, 5×5, 7×7 kernels) with automatic chaining +- Original input available to all layers via framebuffer capture +- Configurable final blend with original scene - Modular WGSL shader architecture -- Hardcoded weights (trained offline) -- Residual connections for stable learning +- Hardcoded weights (trained offline via PyTorch) - ~5-8 KB binary footprint --- ## Architecture -### File Structure - -``` -src/gpu/effects/ - cnn_effect.h # CNNEffect class - cnn_effect.cc # Implementation +### RGBD → Grayscale Pipeline -workspaces/main/shaders/cnn/ - cnn_activation.wgsl # Activation functions (tanh, ReLU, sigmoid, leaky_relu) - cnn_conv3x3.wgsl # 3×3 convolution - cnn_conv5x5.wgsl # 5×5 convolution - cnn_conv7x7.wgsl # 7×7 convolution - cnn_weights_generated.wgsl # Weight arrays (generated by training script) - cnn_layer.wgsl # Main shader (composes above snippets) -``` +**Input:** RGBD (RGB + inverse depth D=1/z) +**Output:** Grayscale (1 channel) +**Layer Input:** 7 channels = [RGBD, UV coords, grayscale] all normalized to [-1,1] -### Shader Composition +**Architecture:** +- **Inner layers (0..N-2):** Conv2d(7→4) - output RGBD +- **Final layer (N-1):** Conv2d(7→1) - output grayscale -`cnn_layer.wgsl` uses `#include` directives (resolved by `ShaderComposer`): ```wgsl -#include "common_uniforms" -#include "cnn_activation" -#include "cnn_conv3x3" -#include "cnn_weights_generated" +// Inner layers: 7→4 (RGBD output) +fn cnn_conv3x3_7to4( + tex: texture_2d<f32>, + samp: sampler, + uv: vec2<f32>, + resolution: vec2<f32>, + original: vec4<f32>, # Original RGBD [-1,1] + weights: array<array<f32, 8>, 36> # 9 pos × 4 out × (7 weights + bias) +) -> vec4<f32> + +// Final layer: 7→1 (grayscale output) +fn cnn_conv3x3_7to1( + tex: texture_2d<f32>, + samp: sampler, + uv: vec2<f32>, + resolution: vec2<f32>, + original: vec4<f32>, + weights: array<array<f32, 8>, 9> # 9 pos × (7 weights + bias) +) -> f32 ``` ---- +**Input normalization:** +- **fs_main** normalizes textures once: `(tex - 0.5) * 2` → [-1,1] +- **Conv functions** normalize UV coords: `(uv - 0.5) * 2` → [-1,1] +- **Grayscale** computed from normalized RGBD: `0.2126*R + 0.7152*G + 0.0722*B` +- **Inter-layer data** stays in [-1,1] (no denormalization) +- **Final output** denormalized for display: `(result + 1.0) * 0.5` → [0,1] -## Usage +**Activation:** tanh for inner layers (output stays [-1,1]), none for final layer -### C++ Integration +### Multi-Layer Architecture -```cpp -#include "gpu/effects/cnn_effect.h" +CNNEffect supports multi-layer networks via automatic effect chaining: -// Create effect (1 layer for now, expandable to 4) -auto cnn = std::make_shared<CNNEffect>(ctx, /*num_layers=*/1); +1. **Timeline specifies total layers**: `CNNEffect layers=3 blend=0.7` +2. **Compiler expands to chain**: 3 separate CNNEffect instances (layer 0→1→2) +3. **Framebuffer capture**: Layer 0 captures original input to `"captured_frame"` +4. **Original input binding**: All layers access original via `@binding(4)` +5. **Final blend**: Last layer blends result with original: `mix(original, result, 0.7)` -// Add to timeline -timeline.add_effect(cnn, start_time, end_time); -``` +**Framebuffer Capture API:** +- `Effect::needs_framebuffer_capture()` - effect requests pre-capture +- MainSequence automatically blits input → `"captured_frame"` auxiliary texture +- Generic mechanism usable by any effect -### Timeline Example +### File Structure ``` -SEQUENCE 10.0 0 - EFFECT CNNEffect 10.0 15.0 0 # Apply CNN stylization for 5 seconds +src/gpu/effects/ + cnn_effect.h/cc # CNNEffect class + framebuffer capture + +workspaces/main/shaders/cnn/ + cnn_activation.wgsl # tanh, ReLU, sigmoid, leaky_relu + cnn_conv3x3.wgsl # 3×3 convolution (standard + coord-aware) + cnn_conv5x5.wgsl # 5×5 convolution (standard + coord-aware) + cnn_conv7x7.wgsl # 7×7 convolution (standard + coord-aware) + cnn_weights_generated.wgsl # Weight arrays (auto-generated by train_cnn.py) + cnn_layer.wgsl # Main shader with layer switches (auto-generated by train_cnn.py) ``` --- -## Training Workflow (Planned) +## Training Workflow + +### 1. Prepare Training Data + +Collect input/target image pairs: +- **Input:** RGBA (RGB + depth as alpha channel, D=1/z) +- **Target:** Grayscale stylized output -**Step 1: Prepare Training Data** ```bash -# Collect before/after image pairs -# - Before: Raw 3D render -# - After: Target artistic style (hand-painted, filtered, etc.) +training/input/img_000.png # RGBA render (RGB + depth) +training/output/img_000.png # Grayscale target ``` -**Step 2: Train Network** +**Note:** Input images must be RGBA where alpha = inverse depth (1/z) + +### 2. Train Network + ```bash -python scripts/train_cnn.py \ - --input rendered_scene.png \ - --target stylized_scene.png \ +python3 training/train_cnn.py \ + --input training/input \ + --target training/output \ + --layers 1 \ + --kernel-sizes 3 \ + --epochs 500 \ + --checkpoint-every 50 +``` + +**Multi-layer example (3 layers with varying kernel sizes):** +```bash +python3 training/train_cnn.py \ + --input training/input \ + --target training/output \ --layers 3 \ - --kernel_sizes 3,5,3 \ - --epochs 100 + --kernel-sizes 3,5,3 \ + --epochs 1000 \ + --checkpoint-every 100 +``` + +**Note:** Training script auto-generates: +- `cnn_weights_generated.wgsl` - weight arrays for all layers +- `cnn_layer.wgsl` - shader with layer switches and original input binding + +**Resume from checkpoint:** +```bash +python3 training/train_cnn.py \ + --input training/input \ + --target training/output \ + --resume training/checkpoints/checkpoint_epoch_200.pth ``` -**Step 3: Export Weights** -```python -# scripts/train_cnn.py automatically generates: -# workspaces/main/shaders/cnn/cnn_weights_generated.wgsl +**Export WGSL from checkpoint (no training):** +```bash +python3 training/train_cnn.py \ + --export-only training/checkpoints/checkpoint_epoch_200.pth \ + --output workspaces/main/shaders/cnn/cnn_weights_generated.wgsl ``` -**Step 4: Rebuild** +### 3. Rebuild Demo + +Training script auto-generates both `cnn_weights_generated.wgsl` and `cnn_layer.wgsl`: ```bash cmake --build build -j4 +./build/demo64k ``` --- -## Implementation Details +## Usage + +### C++ Integration -### Convolution Function Signature +**Single layer (manual):** +```cpp +#include "gpu/effects/cnn_effect.h" -```wgsl -fn cnn_conv3x3( - tex: texture_2d<f32>, - samp: sampler, - uv: vec2<f32>, - resolution: vec2<f32>, - weights: array<mat4x4<f32>, 9>, # 9 samples × 4×4 matrix - bias: vec4<f32> -) -> vec4<f32> +CNNEffectParams p; +p.layer_index = 0; +p.total_layers = 1; +p.blend_amount = 1.0f; +auto cnn = std::make_shared<CNNEffect>(ctx, p); +timeline.add_effect(cnn, start_time, end_time); ``` -- Samples 9 pixels (3×3 neighborhood) -- Applies 4×4 weight matrix per sample (RGBA channels) -- Returns weighted sum + bias (pre-activation) +**Multi-layer (automatic via timeline compiler):** -### Weight Storage +Use timeline syntax - `seq_compiler` expands to multiple instances. -Weights are stored as WGSL constants: -```wgsl -const weights_layer0: array<mat4x4<f32>, 9> = array( - mat4x4<f32>(1.0, 0.0, 0.0, 0.0, ...), # Center pixel - mat4x4<f32>(0.0, 0.0, 0.0, 0.0, ...), # Neighbor 1 - // ... 7 more matrices -); -const bias_layer0 = vec4<f32>(0.0, 0.0, 0.0, 0.0); +### Timeline Examples + +**Single-layer CNN (full stylization):** +``` +SEQUENCE 10.0 0 + EFFECT + Hybrid3DEffect 0.00 5.00 + EFFECT + CNNEffect 0.50 5.00 layers=1 ``` -### Residual Connection +**Multi-layer CNN with blend:** +``` +SEQUENCE 10.0 0 + EFFECT + Hybrid3DEffect 0.00 5.00 + EFFECT + CNNEffect 0.50 5.00 layers=3 blend=0.7 +``` -Final layer adds original input: -```wgsl -if (params.use_residual != 0) { - let input = textureSample(txt, smplr, uv); - result = input + result * 0.3; # Blend 30% stylization +Expands to: +```cpp +// Layer 0 (captures original, blend=1.0) +{ + CNNEffectParams p; + p.layer_index = 0; + p.total_layers = 3; + p.blend_amount = 1.0f; + seq->add_effect(std::make_shared<CNNEffect>(ctx, p), 0.50f, 5.00f, 1); +} +// Layer 1 (blend=1.0) +{ + CNNEffectParams p; + p.layer_index = 1; + p.total_layers = 3; + p.blend_amount = 1.0f; + seq->add_effect(std::make_shared<CNNEffect>(ctx, p), 0.50f, 5.00f, 2); +} +// Layer 2 (final blend=0.7) +{ + CNNEffectParams p; + p.layer_index = 2; + p.total_layers = 3; + p.blend_amount = 0.7f; + seq->add_effect(std::make_shared<CNNEffect>(ctx, p), 0.50f, 5.00f, 3); } ``` --- -## Multi-Layer Rendering (Future) +## Shader Structure -For N layers, use ping-pong textures: +**Bindings:** +```wgsl +@group(0) @binding(0) var smplr: sampler; +@group(0) @binding(1) var txt: texture_2d<f32>; // Current layer input +@group(0) @binding(2) var<uniform> uniforms: CommonUniforms; +@group(0) @binding(3) var<uniform> params: CNNLayerParams; +@group(0) @binding(4) var original_input: texture_2d<f32>; // Layer 0 input (captured) +``` + +**Fragment shader logic:** +```wgsl +@fragment fn fs_main(@builtin(position) p: vec4<f32>) -> @location(0) vec4<f32> { + let uv = p.xy / uniforms.resolution; + let input = textureSample(txt, smplr, uv); // Layer N-1 output + let original = textureSample(original_input, smplr, uv); // Layer 0 input + + var result = vec4<f32>(0.0); + + if (params.layer_index == 0) { + result = cnn_conv3x3_with_coord(txt, smplr, uv, uniforms.resolution, + rgba_weights_layer0, coord_weights_layer0, bias_layer0); + result = cnn_tanh(result); + } + // ... other layers + // Blend with ORIGINAL input (not previous layer) + return mix(original, result, params.blend_amount); +} ``` -Pass 0: input → temp_a (conv + activate) -Pass 1: temp_a → temp_b (conv + activate) -Pass 2: temp_b → temp_a (conv + activate) -Pass 3: temp_a → screen (conv + activate + residual) + +**Weight Storage:** + +**Inner layers (7→4 RGBD output):** +```wgsl +// Structure: array<array<f32, 8>, 36> +// 9 positions × 4 output channels, each with 7 weights + bias +const weights_layer0: array<array<f32, 8>, 36> = array( + array<f32, 8>(w0_r, w0_g, w0_b, w0_d, w0_u, w0_v, w0_gray, bias0), // pos0_ch0 + array<f32, 8>(w1_r, w1_g, w1_b, w1_d, w1_u, w1_v, w1_gray, bias1), // pos0_ch1 + // ... 34 more entries +); ``` -**Current Status:** Single-layer implementation. Multi-pass infrastructure ready but not exposed. +**Final layer (7→1 grayscale output):** +```wgsl +// Structure: array<array<f32, 8>, 9> +// 9 positions, each with 7 weights + bias +const weights_layerN: array<array<f32, 8>, 9> = array( + array<f32, 8>(w0_r, w0_g, w0_b, w0_d, w0_u, w0_v, w0_gray, bias0), // pos0 + // ... 8 more entries +); +``` --- @@ -164,60 +288,72 @@ Pass 3: temp_a → screen (conv + activate + residual) | Component | Size | Notes | |-----------|------|-------| -| `cnn_activation.wgsl` | ~200 B | 4 activation functions | -| `cnn_conv3x3.wgsl` | ~400 B | 3×3 convolution logic | -| `cnn_conv5x5.wgsl` | ~600 B | 5×5 convolution logic | -| `cnn_conv7x7.wgsl` | ~800 B | 7×7 convolution logic | -| `cnn_layer.wgsl` | ~800 B | Main shader | -| `cnn_effect.cc` | ~300 B | C++ implementation | -| **Weights (variable)** | **2-6 KB** | Depends on network depth/width | -| **Total** | **5-9 KB** | Acceptable for 64k demo | +| Activation functions | ~200 B | 4 functions | +| Conv3x3 (standard + coord) | ~500 B | Both variants | +| Conv5x5 (standard + coord) | ~700 B | Both variants | +| Conv7x7 (standard + coord) | ~900 B | Both variants | +| Main shader | ~800 B | Layer composition | +| C++ implementation | ~300 B | Effect class | +| **Coord weights** | **+32 B** | Per-layer overhead (layer 0 only) | +| **RGBA weights** | **2-6 KB** | Depends on depth/kernel sizes | +| **Total** | **5-9 KB** | Acceptable for 64k | -**Optimization Strategies:** +**Optimization strategies:** - Quantize weights (float32 → int8) - Prune near-zero weights -- Share weights across layers -- Use separable convolutions (not yet implemented) +- Use separable convolutions --- ## Testing ```bash -# Run effect test -./build/test_demo_effects - -# Visual test in demo -./build/demo64k # CNN appears in timeline if added +./build/test_demo_effects # CNN construction/shader tests +./build/demo64k # Visual test ``` -**Test Coverage:** -- Construction/initialization -- Shader compilation -- Bind group creation -- Render pass execution - --- +## Blend Parameter Behavior + +**blend_amount** controls final compositing with original: +- `blend=0.0`: Pure original (no CNN effect) +- `blend=0.5`: 50% original + 50% CNN +- `blend=1.0`: Pure CNN output (full stylization) + +**Important:** Blend uses captured layer 0 input, not previous layer output. + +**Example use cases:** +- `blend=1.0`: Full stylization (default) +- `blend=0.7`: Subtle effect preserving original details +- `blend=0.3`: Light artistic touch + ## Troubleshooting **Shader compilation fails:** - Check `cnn_weights_generated.wgsl` syntax -- Verify all snippets registered in `shaders.cc::InitShaderComposer()` +- Verify snippets registered in `shaders.cc::InitShaderComposer()` +- Ensure `cnn_layer.wgsl` has 5 bindings (including `original_input`) **Black/corrupted output:** -- Weights likely untrained (using placeholder identity) -- Check residual blending factor (0.3 default) +- Weights untrained (identity placeholder) +- Check `captured_frame` auxiliary texture is registered +- Verify layer priorities in timeline are sequential + +**Wrong blend result:** +- Ensure layer 0 has `needs_framebuffer_capture() == true` +- Check MainSequence framebuffer capture logic +- Verify `original_input` binding is populated -**Performance issues:** -- Reduce kernel sizes (7×7 → 3×3) -- Decrease layer count -- Profile with `--hot-reload` to measure frame time +**Training loss not decreasing:** +- Lower learning rate (`--learning-rate 0.0001`) +- More epochs (`--epochs 1000`) +- Check input/target image alignment --- ## References -- **Shader Composition:** `doc/SEQUENCE.md` (shader parameters) -- **Effect System:** `src/gpu/effect.h` (Effect base class) -- **Training (external):** TensorFlow/PyTorch CNN tutorials +- **Training Script:** `training/train_cnn.py` +- **Shader Composition:** `doc/SEQUENCE.md` +- **Effect System:** `src/gpu/effect.h` diff --git a/doc/CNN_RGBD_GRAYSCALE_SUMMARY.md b/doc/CNN_RGBD_GRAYSCALE_SUMMARY.md new file mode 100644 index 0000000..4c13693 --- /dev/null +++ b/doc/CNN_RGBD_GRAYSCALE_SUMMARY.md @@ -0,0 +1,134 @@ +# CNN RGBD→Grayscale Architecture Implementation + +## Summary + +Implemented CNN architecture upgrade: RGBD input → grayscale output with 7-channel augmented input. + +## Changes Made + +### Architecture + +**Input:** RGBD (4 channels: RGB + inverse depth D=1/z) +**Output:** Grayscale (1 channel) +**Layer Input:** 7 channels = [RGBD, UV coords, grayscale] all normalized to [-1,1] + +**Layer Configuration:** +- Inner layers (0..N-2): Conv2d(7→4) - output RGBD with tanh activation +- Final layer (N-1): Conv2d(7→1) - output grayscale, no activation + +### Input Normalization (all to [-1,1]) + +- **RGBD:** `(rgbd - 0.5) * 2` +- **UV coords:** `(uv - 0.5) * 2` +- **Grayscale:** `(0.2126*R + 0.7152*G + 0.0722*B - 0.5) * 2` + +**Rationale:** Zero-centered inputs for tanh activation, better gradient flow. + +### Modified Files + +**Training (`/Users/skal/demo/training/train_cnn.py`):** +1. Removed `CoordConv2d` class +2. Updated `SimpleCNN`: + - Inner layers: `Conv2d(7, 4)` - RGBD output + - Final layer: `Conv2d(7, 1)` - grayscale output +3. Updated `forward()`: + - Normalize RGBD/coords/gray to [-1,1] + - Concatenate 7-channel input for each layer + - Apply tanh (inner) or none (final) + - Denormalize final output +4. Updated `export_weights_to_wgsl()`: + - Inner: `array<array<f32, 8>, 36>` (9 pos × 4 ch × 8 values) + - Final: `array<array<f32, 8>, 9>` (9 pos × 8 values) +5. Updated `generate_layer_shader()`: + - Use `cnn_conv3x3_7to4` for inner layers + - Use `cnn_conv3x3_7to1` for final layer + - Denormalize outputs from [-1,1] to [0,1] +6. Updated `ImagePairDataset`: + - Load RGBA input (was RGB) + +**Shaders (`/Users/skal/demo/workspaces/main/shaders/cnn/cnn_conv3x3.wgsl`):** +1. Added `cnn_conv3x3_7to4()`: + - 7-channel input: [RGBD, uv_x, uv_y, gray] + - 4-channel output: RGBD + - Weights: `array<array<f32, 8>, 36>` +2. Added `cnn_conv3x3_7to1()`: + - 7-channel input: [RGBD, uv_x, uv_y, gray] + - 1-channel output: grayscale + - Weights: `array<array<f32, 8>, 9>` + +**Documentation (`/Users/skal/demo/doc/CNN_EFFECT.md`):** +1. Updated architecture section with RGBD→grayscale pipeline +2. Updated training data requirements (RGBA input) +3. Updated weight storage format + +### No C++ Changes + +CNNLayerParams and bind groups remain unchanged. + +## Data Flow + +1. Layer 0 captures original RGBD to `captured_frame` +2. Each layer: + - Samples previous layer output (RGBD in [0,1]) + - Normalizes RGBD to [-1,1] + - Computes UV coords and grayscale, normalizes to [-1,1] + - Concatenates 7-channel input + - Applies convolution with layer-specific weights + - Outputs RGBD (inner) or grayscale (final) in [-1,1] + - Applies tanh (inner only) + - Denormalizes to [0,1] for texture storage + - Blends with original + +## Next Steps + +1. **Prepare RGBD training data:** + - Input: RGBA images (RGB + depth in alpha) + - Target: Grayscale stylized output + +2. **Train network:** + ```bash + python3 training/train_cnn.py \ + --input training/input \ + --target training/output \ + --layers 3 \ + --epochs 1000 + ``` + +3. **Verify generated shaders:** + - Check `cnn_weights_generated.wgsl` structure + - Check `cnn_layer.wgsl` uses new conv functions + +4. **Test in demo:** + ```bash + cmake --build build -j4 + ./build/demo64k + ``` + +## Design Rationale + +**Why [-1,1] normalization?** +- Centered inputs for tanh (operates best around 0) +- Better gradient flow +- Standard ML practice for normalized data + +**Why RGBD throughout vs RGB?** +- Depth information propagates through network +- Enables depth-aware stylization +- Consistent 4-channel processing + +**Why 7-channel input?** +- Coordinates: position-dependent effects (vignettes) +- Grayscale: luminance-aware processing +- RGBD: full color+depth information +- Enables richer feature learning + +## Testing Checklist + +- [ ] Train network with RGBD input data +- [ ] Verify `cnn_weights_generated.wgsl` structure +- [ ] Verify `cnn_layer.wgsl` uses `7to4`/`7to1` functions +- [ ] Build demo without errors +- [ ] Visual test: inner layers show RGBD evolution +- [ ] Visual test: final layer produces grayscale +- [ ] Visual test: blending works correctly +- [ ] Compare quality with previous RGB→RGB architecture diff --git a/doc/COMPLETED.md b/doc/COMPLETED.md index d1c89af..2336f62 100644 --- a/doc/COMPLETED.md +++ b/doc/COMPLETED.md @@ -29,6 +29,22 @@ Detailed historical documents have been moved to `doc/archive/` for reference: Use `read @doc/archive/FILENAME.md` to access archived documents. +## Recently Completed (February 10, 2026) + +- [x] **WGPU Boilerplate Factorization** + - **Goal**: Reduce repetitive WGPU code via builder pattern helpers + - **Implementation**: + - Created `BindGroupLayoutBuilder` and `BindGroupBuilder` for declarative bind group creation + - Created `RenderPipelineBuilder` to simplify pipeline setup with ShaderComposer integration + - Created `SamplerCache` singleton to deduplicate sampler instances + - Refactored `post_process_helper.cc`, `cnn_effect.cc`, `rotating_cube_effect.cc` + - **Result**: + - Bind group creation: 19 instances reduced from 14→4 lines each + - Pipeline creation: 30-50 lines reduced to 8 lines + - Sampler deduplication: 6 instances → cached + - Total: -122 lines boilerplate, binary size unchanged (6.3M debug) + - Tests pass, prevents binding index errors + ## Recently Completed (February 9, 2026) - [x] **External Library Size Measurement (Task #76)** diff --git a/doc/CONTRIBUTING.md b/doc/CONTRIBUTING.md index 9cd785b..98df873 100644 --- a/doc/CONTRIBUTING.md +++ b/doc/CONTRIBUTING.md @@ -65,12 +65,15 @@ See `doc/CODING_STYLE.md` for detailed examples. ## Development Protocols ### Adding Visual Effect -1. Implement `Effect` subclass in `src/gpu/demo_effects.cc` -2. Add to workspace `timeline.seq` (e.g., `workspaces/main/timeline.seq`) -3. **Update `test_demo_effects.cc`**: - - Add to test list - - Increment `EXPECTED_*_COUNT` -4. Verify: +1. Create effect class files (use `tools/shadertoy/convert_shadertoy.py` or templates) +2. Add shader to `workspaces/main/assets.txt` +3. Add effect `.cc` file to `CMakeLists.txt` GPU_SOURCES (both sections) +4. Include header in `src/gpu/demo_effects.h` +5. Add to workspace `timeline.seq` (e.g., `workspaces/main/timeline.seq`) +6. **Update `src/tests/gpu/test_demo_effects.cc`**: + - Add to `post_process_effects` list (lines 80-93) or `scene_effects` list (lines 125-137) + - Example: `{"MyEffect", std::make_shared<MyEffect>(fixture.ctx())},` +7. Verify: ```bash cmake -S . -B build -DDEMO_BUILD_TESTS=ON cmake --build build -j4 --target test_demo_effects diff --git a/doc/EFFECT_WORKFLOW.md b/doc/EFFECT_WORKFLOW.md new file mode 100644 index 0000000..45c47b7 --- /dev/null +++ b/doc/EFFECT_WORKFLOW.md @@ -0,0 +1,228 @@ +# Effect Creation Workflow + +**Target Audience:** AI coding agents and developers + +Automated checklist for adding new visual effects to the demo. + +--- + +## Quick Reference + +**For ShaderToy conversions:** Use `tools/shadertoy/convert_shadertoy.py` then follow steps 3-8 below. + +**For custom effects:** Follow all steps 1-8. + +--- + +## Step-by-Step Workflow + +### 1. Create Effect Files + +**Location:** +- Header: `src/gpu/effects/<effect_name>_effect.h` +- Implementation: `src/gpu/effects/<effect_name>_effect.cc` +- Shader: `workspaces/main/shaders/<effect_name>.wgsl` + +**Naming Convention:** +- Class name: `<EffectName>Effect` (e.g., `TunnelEffect`, `PlasmaEffect`) +- Files: `<effect_name>_effect.*` (snake_case) + +**Base Class:** +- Post-process effects: inherit from `PostProcessEffect` +- Scene effects: inherit from `Effect` + +**Template:** See `tools/shadertoy/template.*` or use `convert_shadertoy.py` + +### 2. Add Shader to Assets + +**File:** `workspaces/main/assets.txt` + +**Format:** +``` +SHADER_<UPPER_SNAKE_NAME>, NONE, shaders/<effect_name>.wgsl, "Effect description" +``` + +**Example:** +``` +SHADER_TUNNEL, NONE, shaders/tunnel.wgsl, "Tunnel effect shader" +``` + +**Asset ID:** Will be `AssetId::ASSET_SHADER_<UPPER_SNAKE_NAME>` in C++ + +### 3. Add to CMakeLists.txt + +**File:** `CMakeLists.txt` + +**Action:** Add `src/gpu/effects/<effect_name>_effect.cc` to **BOTH** GPU_SOURCES sections: +- Headless mode section (around line 141-167) +- Normal mode section (around line 171-197) + +**Location:** After similar effects (post-process with post-process, scene with scene) + +**Example:** +```cmake +# In headless section (line ~152): + src/gpu/effects/solarize_effect.cc + src/gpu/effects/tunnel_effect.cc # <-- Add here + src/gpu/effects/chroma_aberration_effect.cc + +# In normal section (line ~183): + src/gpu/effects/solarize_effect.cc + src/gpu/effects/tunnel_effect.cc # <-- Add here + src/gpu/effects/chroma_aberration_effect.cc +``` + +### 4. Include in demo_effects.h + +**File:** `src/gpu/demo_effects.h` + +**Action:** Add include directive: +```cpp +#include "gpu/effects/<effect_name>_effect.h" +``` + +**Location:** Alphabetically with other effect includes + +### 5. Add to Timeline + +**File:** `workspaces/main/timeline.seq` + +**Format:** +``` +SEQUENCE <start_time> <priority> + EFFECT <+|=|-> <EffectName>Effect <local_start> <local_end> [params...] +``` + +**Priority Modifiers (REQUIRED):** +- `+` : Increment priority +- `=` : Same priority as previous effect +- `-` : Decrement priority (for backgrounds) + +**Example:** +``` +SEQUENCE 0.0 0 + EFFECT + TunnelEffect 0.0 10.0 +``` + +**Common Mistake:** Missing priority modifier (`+`, `=`, `-`) after EFFECT keyword + +### 6. Update Tests + +**File:** `src/tests/gpu/test_demo_effects.cc` + +**Action:** Add effect to appropriate list: + +**Post-Process Effects (lines 80-93):** +```cpp +{"TunnelEffect", std::make_shared<TunnelEffect>(fixture.ctx())}, +``` + +**Scene Effects (lines 125-137):** +```cpp +{"TunnelEffect", std::make_shared<TunnelEffect>(fixture.ctx())}, +``` + +**3D Effects:** If requires Renderer3D, add to `requires_3d` check (line 148-151) + +### 7. Build and Test + +```bash +# Full build +cmake --build build -j4 + +# Run effect tests +cmake -S . -B build -DDEMO_BUILD_TESTS=ON +cmake --build build -j4 --target test_demo_effects +cd build && ./test_demo_effects + +# Run all tests +cd build && ctest +``` + +### 8. Verify + +**Checklist:** +- [ ] Effect compiles without errors +- [ ] Effect appears in timeline +- [ ] test_demo_effects passes +- [ ] Effect renders correctly: `./build/demo64k` +- [ ] No shader compilation errors +- [ ] Follows naming conventions + +--- + +## Common Issues + +### Build Error: "no member named 'ASSET_..._SHADER'" + +**Cause:** Shader not in assets.txt or wrong asset ID name + +**Fix:** +1. Check `workspaces/main/assets.txt` has shader entry +2. Asset ID is `ASSET_` + uppercase entry name (e.g., `SHADER_TUNNEL` → `ASSET_SHADER_TUNNEL`) + +### Build Error: "undefined symbol for architecture" + +**Cause:** Effect not in CMakeLists.txt GPU_SOURCES + +**Fix:** Add `.cc` file to BOTH sections (headless and normal mode) + +### Timeline Parse Error: "Expected '+', '=', or '-'" + +**Cause:** Missing priority modifier after EFFECT keyword + +**Fix:** Use `EFFECT +`, `EFFECT =`, or `EFFECT -` (never just `EFFECT`) + +### Test Failure: Effect not in test list + +**Cause:** Effect not added to test_demo_effects.cc + +**Fix:** Add to `post_process_effects` or `scene_effects` list + +--- + +## Automation Script Example + +```bash +#!/bin/bash +# Example automation for AI agents + +EFFECT_NAME="$1" # CamelCase (e.g., "Tunnel") +SNAKE_NAME=$(echo "$EFFECT_NAME" | sed 's/\([A-Z]\)/_\L\1/g' | sed 's/^_//') +UPPER_NAME=$(echo "$SNAKE_NAME" | tr '[:lower:]' '[:upper:]') + +echo "Creating effect: $EFFECT_NAME" +echo " Snake case: $SNAKE_NAME" +echo " Upper case: $UPPER_NAME" + +# 1. Generate files (if using ShaderToy) +# ./tools/shadertoy/convert_shadertoy.py shader.txt "$EFFECT_NAME" + +# 2. Add to assets.txt +echo "SHADER_${UPPER_NAME}, NONE, shaders/${SNAKE_NAME}.wgsl, \"${EFFECT_NAME} effect\"" \ + >> workspaces/main/assets.txt + +# 3. Add to CMakeLists.txt (both sections) +# Use Edit tool to add to both GPU_SOURCES sections + +# 4. Add include to demo_effects.h +# Use Edit tool to add #include line + +# 5. Add to timeline.seq +# Use Edit tool to add EFFECT line with priority modifier + +# 6. Add to test file +# Use Edit tool to add to appropriate test list + +# 7. Build +cmake --build build -j4 +``` + +--- + +## See Also + +- `tools/shadertoy/README.md` - ShaderToy conversion guide +- `doc/SEQUENCE.md` - Timeline format documentation +- `doc/CONTRIBUTING.md` - General contribution guidelines +- `src/gpu/effects/` - Existing effect examples diff --git a/doc/HOWTO.md b/doc/HOWTO.md index bdc0214..a57a161 100644 --- a/doc/HOWTO.md +++ b/doc/HOWTO.md @@ -86,12 +86,29 @@ make run_util_tests # Utility tests --- +## Training + +```bash +./training/train_cnn.py --layers 3 --kernel_sizes 3,5,3 --epochs 10000 --batch_size 8 --input training/input/ --target training/output/ --checkpoint-every 1000 +``` + +Generate shaders from checkpoint: +```bash +./training/train_cnn.py --export-only training/checkpoints/checkpoint_epoch_7000.pth +``` + +**Note:** Kernel sizes must match shader functions: +- 3×3 kernel → `cnn_conv3x3_7to4` (36 weights: 9 pos × 4 channels) +- 5×5 kernel → `cnn_conv5x5_7to4` (100 weights: 25 pos × 4 channels) + +--- + ## Timeline Edit `workspaces/main/timeline.seq`: ```text SEQUENCE 0.0 0 - EFFECT HeptagonEffect 0.0 60.0 0 + EFFECT + HeptagonEffect 0.0 60.0 0 ``` Rebuild to apply. See `doc/SEQUENCE.md`. diff --git a/doc/RECIPE.md b/doc/RECIPE.md index 6404391..d563027 100644 --- a/doc/RECIPE.md +++ b/doc/RECIPE.md @@ -157,8 +157,8 @@ void MyEffect::render(WGPUTextureView prev, WGPUTextureView target, **.seq syntax:** ``` -EFFECT MyEffect 0.0 10.0 strength=0.5 speed=3.0 -EFFECT MyEffect 10.0 20.0 strength=2.0 # speed keeps previous value +EFFECT + MyEffect 0.0 10.0 strength=0.5 speed=3.0 +EFFECT = MyEffect 10.0 20.0 strength=2.0 # speed keeps previous value ``` **Example:** `src/gpu/effects/flash_effect.cc`, `src/gpu/effects/chroma_aberration_effect.cc` |
