diff options
| -rw-r--r-- | doc/CNN_EFFECT.md | 239 | ||||
| -rw-r--r-- | training/README.md | 233 |
2 files changed, 238 insertions, 234 deletions
diff --git a/doc/CNN_EFFECT.md b/doc/CNN_EFFECT.md index 9045739..ec70b13 100644 --- a/doc/CNN_EFFECT.md +++ b/doc/CNN_EFFECT.md @@ -6,12 +6,13 @@ Neural network-based stylization for rendered scenes. ## Overview -The CNN effect applies trainable convolutional neural network layers to post-process 3D rendered output, enabling artistic stylization (e.g., painterly, sketch, cel-shaded effects) with minimal runtime overhead. +Trainable convolutional neural network layers for artistic stylization (painterly, sketch, cel-shaded effects) with minimal runtime overhead. **Key Features:** +- Position-aware layer 0 (coordinate input for vignetting, edge effects) - Multi-layer convolutions (3×3, 5×5, 7×7 kernels) - Modular WGSL shader architecture -- Hardcoded weights (trained offline) +- Hardcoded weights (trained offline via PyTorch) - Residual connections for stable learning - ~5-8 KB binary footprint @@ -19,144 +20,141 @@ The CNN effect applies trainable convolutional neural network layers to post-pro ## Architecture +### Coordinate-Aware Layer 0 + +Layer 0 accepts normalized (x,y) patch center coordinates alongside RGBA samples: + +```wgsl +fn cnn_conv3x3_with_coord( + tex: texture_2d<f32>, + samp: sampler, + uv: vec2<f32>, # Center position [0,1] + resolution: vec2<f32>, + rgba_weights: array<mat4x4<f32>, 9>, # 9 samples × 4×4 matrix + coord_weights: mat2x4<f32>, # 2 coords → 4 outputs + bias: vec4<f32> +) -> vec4<f32> +``` + +**Input structure:** 9 RGBA samples (36 values) + 1 xy coordinate (2 values) = 38 inputs → 4 outputs + +**Size impact:** +32B coord weights, kernel-agnostic + +**Use cases:** Position-dependent stylization (vignettes, corner darkening, radial gradients) + ### File Structure ``` src/gpu/effects/ - cnn_effect.h # CNNEffect class - cnn_effect.cc # Implementation + cnn_effect.h/cc # CNNEffect class workspaces/main/shaders/cnn/ - cnn_activation.wgsl # Activation functions (tanh, ReLU, sigmoid, leaky_relu) - cnn_conv3x3.wgsl # 3×3 convolution - cnn_conv5x5.wgsl # 5×5 convolution - cnn_conv7x7.wgsl # 7×7 convolution - cnn_weights_generated.wgsl # Weight arrays (generated by training script) + cnn_activation.wgsl # tanh, ReLU, sigmoid, leaky_relu + cnn_conv3x3.wgsl # 3×3 convolution (standard + coord-aware) + cnn_conv5x5.wgsl # 5×5 convolution (standard + coord-aware) + cnn_conv7x7.wgsl # 7×7 convolution (standard + coord-aware) + cnn_weights_generated.wgsl # Weight arrays (auto-generated) cnn_layer.wgsl # Main shader (composes above snippets) ``` -### Shader Composition - -`cnn_layer.wgsl` uses `#include` directives (resolved by `ShaderComposer`): -```wgsl -#include "common_uniforms" -#include "cnn_activation" -#include "cnn_conv3x3" -#include "cnn_weights_generated" -``` - --- -## Usage - -### C++ Integration +## Training Workflow -```cpp -#include "gpu/effects/cnn_effect.h" +### 1. Prepare Training Data -// Create effect (1 layer for now, expandable to 4) -auto cnn = std::make_shared<CNNEffect>(ctx, /*num_layers=*/1); +Collect input/target image pairs: +- **Input:** Raw 3D render +- **Target:** Artistic style (hand-painted, filtered, stylized) -// Add to timeline -timeline.add_effect(cnn, start_time, end_time); +```bash +training/input/img_000.png # Raw render +training/output/img_000.png # Stylized target ``` -### Timeline Example - -``` -SEQUENCE 10.0 0 - EFFECT CNNEffect 10.0 15.0 0 # Apply CNN stylization for 5 seconds +Use `image_style_processor.py` to generate targets: +```bash +python3 training/image_style_processor.py input/ output/ pencil_sketch ``` ---- - -## Training Workflow (Planned) +### 2. Train Network -**Step 1: Prepare Training Data** ```bash -# Collect before/after image pairs -# - Before: Raw 3D render -# - After: Target artistic style (hand-painted, filtered, etc.) +python3 training/train_cnn.py \ + --input training/input \ + --target training/output \ + --layers 1 \ + --kernel-sizes 3 \ + --epochs 500 \ + --checkpoint-every 50 ``` -**Step 2: Train Network** +**Multi-layer example:** ```bash -python scripts/train_cnn.py \ - --input rendered_scene.png \ - --target stylized_scene.png \ +python3 training/train_cnn.py \ + --input training/input \ + --target training/output \ --layers 3 \ - --kernel_sizes 3,5,3 \ - --epochs 100 + --kernel-sizes 3,5,3 \ + --epochs 1000 \ + --checkpoint-every 100 ``` -**Step 3: Export Weights** -```python -# scripts/train_cnn.py automatically generates: -# workspaces/main/shaders/cnn/cnn_weights_generated.wgsl +**Resume from checkpoint:** +```bash +python3 training/train_cnn.py \ + --input training/input \ + --target training/output \ + --resume training/checkpoints/checkpoint_epoch_200.pth ``` -**Step 4: Rebuild** +### 3. Rebuild Demo + +Training script auto-generates `cnn_weights_generated.wgsl`: ```bash cmake --build build -j4 +./build/demo64k ``` --- -## Implementation Details - -### Convolution Function Signature - -```wgsl -fn cnn_conv3x3( - tex: texture_2d<f32>, - samp: sampler, - uv: vec2<f32>, - resolution: vec2<f32>, - weights: array<mat4x4<f32>, 9>, # 9 samples × 4×4 matrix - bias: vec4<f32> -) -> vec4<f32> -``` +## Usage -- Samples 9 pixels (3×3 neighborhood) -- Applies 4×4 weight matrix per sample (RGBA channels) -- Returns weighted sum + bias (pre-activation) +### C++ Integration -### Weight Storage +```cpp +#include "gpu/effects/cnn_effect.h" -Weights are stored as WGSL constants: -```wgsl -const weights_layer0: array<mat4x4<f32>, 9> = array( - mat4x4<f32>(1.0, 0.0, 0.0, 0.0, ...), # Center pixel - mat4x4<f32>(0.0, 0.0, 0.0, 0.0, ...), # Neighbor 1 - // ... 7 more matrices -); -const bias_layer0 = vec4<f32>(0.0, 0.0, 0.0, 0.0); +auto cnn = std::make_shared<CNNEffect>(ctx, /*num_layers=*/1); +timeline.add_effect(cnn, start_time, end_time); ``` -### Residual Connection +### Timeline Example -Final layer adds original input: -```wgsl -if (params.use_residual != 0) { - let input = textureSample(txt, smplr, uv); - result = input + result * 0.3; # Blend 30% stylization -} +``` +SEQUENCE 10.0 0 + EFFECT CNNEffect 10.0 15.0 0 ``` --- -## Multi-Layer Rendering (Future) - -For N layers, use ping-pong textures: +## Weight Storage -``` -Pass 0: input → temp_a (conv + activate) -Pass 1: temp_a → temp_b (conv + activate) -Pass 2: temp_b → temp_a (conv + activate) -Pass 3: temp_a → screen (conv + activate + residual) +**Layer 0 (coordinate-aware):** +```wgsl +const rgba_weights_layer0: array<mat4x4<f32>, 9> = array(...); +const coord_weights_layer0 = mat2x4<f32>( + 0.1, -0.2, 0.0, 0.0, # x-coord weights + -0.1, 0.0, 0.2, 0.0 # y-coord weights +); +const bias_layer0 = vec4<f32>(0.0, 0.0, 0.0, 0.0); ``` -**Current Status:** Single-layer implementation. Multi-pass infrastructure ready but not exposed. +**Layers 1+ (standard):** +```wgsl +const weights_layer1: array<mat4x4<f32>, 9> = array(...); +const bias_layer1 = vec4<f32>(0.0, 0.0, 0.0, 0.0); +``` --- @@ -164,60 +162,51 @@ Pass 3: temp_a → screen (conv + activate + residual) | Component | Size | Notes | |-----------|------|-------| -| `cnn_activation.wgsl` | ~200 B | 4 activation functions | -| `cnn_conv3x3.wgsl` | ~400 B | 3×3 convolution logic | -| `cnn_conv5x5.wgsl` | ~600 B | 5×5 convolution logic | -| `cnn_conv7x7.wgsl` | ~800 B | 7×7 convolution logic | -| `cnn_layer.wgsl` | ~800 B | Main shader | -| `cnn_effect.cc` | ~300 B | C++ implementation | -| **Weights (variable)** | **2-6 KB** | Depends on network depth/width | -| **Total** | **5-9 KB** | Acceptable for 64k demo | +| Activation functions | ~200 B | 4 functions | +| Conv3x3 (standard + coord) | ~500 B | Both variants | +| Conv5x5 (standard + coord) | ~700 B | Both variants | +| Conv7x7 (standard + coord) | ~900 B | Both variants | +| Main shader | ~800 B | Layer composition | +| C++ implementation | ~300 B | Effect class | +| **Coord weights** | **+32 B** | Per-layer overhead (layer 0 only) | +| **RGBA weights** | **2-6 KB** | Depends on depth/kernel sizes | +| **Total** | **5-9 KB** | Acceptable for 64k | -**Optimization Strategies:** +**Optimization strategies:** - Quantize weights (float32 → int8) - Prune near-zero weights -- Share weights across layers -- Use separable convolutions (not yet implemented) +- Use separable convolutions --- ## Testing ```bash -# Run effect test -./build/test_demo_effects - -# Visual test in demo -./build/demo64k # CNN appears in timeline if added +./build/test_demo_effects # CNN construction/shader tests +./build/demo64k # Visual test ``` -**Test Coverage:** -- Construction/initialization -- Shader compilation -- Bind group creation -- Render pass execution - --- ## Troubleshooting **Shader compilation fails:** - Check `cnn_weights_generated.wgsl` syntax -- Verify all snippets registered in `shaders.cc::InitShaderComposer()` +- Verify snippets registered in `shaders.cc::InitShaderComposer()` **Black/corrupted output:** -- Weights likely untrained (using placeholder identity) -- Check residual blending factor (0.3 default) +- Weights untrained (identity placeholder) +- Check residual blending (0.3 default) -**Performance issues:** -- Reduce kernel sizes (7×7 → 3×3) -- Decrease layer count -- Profile with `--hot-reload` to measure frame time +**Training loss not decreasing:** +- Lower learning rate (`--learning-rate 0.0001`) +- More epochs (`--epochs 1000`) +- Check input/target image alignment --- ## References -- **Shader Composition:** `doc/SEQUENCE.md` (shader parameters) -- **Effect System:** `src/gpu/effect.h` (Effect base class) -- **Training (external):** TensorFlow/PyTorch CNN tutorials +- **Training Script:** `training/train_cnn.py` +- **Shader Composition:** `doc/SEQUENCE.md` +- **Effect System:** `src/gpu/effect.h` diff --git a/training/README.md b/training/README.md index 08379ee..0a46718 100644 --- a/training/README.md +++ b/training/README.md @@ -1,167 +1,182 @@ -# Image Style Processor +# CNN Training Tools -A comprehensive Python script that applies artistic hand-drawn and futuristic effects to images. +Tools for training and preparing data for the CNN post-processing effect. -## Requirements +--- -- Python 3 -- OpenCV (cv2) -- NumPy +## train_cnn.py + +PyTorch-based training script for image-to-image stylization. + +### Basic Usage -Install dependencies: ```bash -pip install opencv-python numpy +python3 train_cnn.py --input <input_dir> --target <target_dir> [options] ``` -## Usage +### Examples +**Single layer, 3×3 kernel:** ```bash -python3 image_style_processor.py <input_directory> <output_directory> <style> +python3 train_cnn.py --input training/input --target training/output \ + --layers 1 --kernel-sizes 3 --epochs 500 ``` -### Arguments +**Multi-layer, mixed kernels:** +```bash +python3 train_cnn.py --input training/input --target training/output \ + --layers 3 --kernel-sizes 3,5,3 --epochs 1000 +``` -- `input_directory`: Directory containing your input images (PNG, JPG, JPEG) -- `output_directory`: Directory where processed images will be saved (created if doesn't exist) -- `style`: The artistic style to apply (see below) +**With checkpointing:** +```bash +python3 train_cnn.py --input training/input --target training/output \ + --epochs 500 --checkpoint-every 50 +``` -## Available Styles +**Resume from checkpoint:** +```bash +python3 train_cnn.py --input training/input --target training/output \ + --resume training/checkpoints/checkpoint_epoch_200.pth +``` + +### Options -### Sketch Styles +| Option | Default | Description | +|--------|---------|-------------| +| `--input` | *required* | Input image directory | +| `--target` | *required* | Target image directory | +| `--layers` | 1 | Number of CNN layers | +| `--kernel-sizes` | 3 | Comma-separated kernel sizes (auto-repeats if single value) | +| `--epochs` | 100 | Training epochs | +| `--batch-size` | 4 | Batch size | +| `--learning-rate` | 0.001 | Learning rate | +| `--output` | `workspaces/main/shaders/cnn/cnn_weights_generated.wgsl` | Output WGSL file | +| `--checkpoint-every` | 0 | Save checkpoint every N epochs (0=disabled) | +| `--checkpoint-dir` | `training/checkpoints` | Checkpoint directory | +| `--resume` | None | Resume from checkpoint file | -1. **pencil_sketch** - Dense cross-hatching with progressive layers in shadows - - Best for: Detailed technical drawings, architectural scenes - - Features: Clean line art, 5 layers of cross-hatching, strong shadow definition +### Architecture -2. **ink_drawing** - Bold black outlines with comic book aesthetic - - Best for: Graphic novel style, high contrast scenes - - Features: Bold outlines, posterized tones, minimal shading +- **Layer 0:** `CoordConv2d` - accepts (x,y) patch center + 3×3 RGBA samples +- **Layers 1+:** Standard `Conv2d` - 3×3 RGBA samples only +- **Activation:** Tanh between layers +- **Output:** Residual connection (30% stylization blend) -3. **charcoal_pastel** - Dramatic contrasts with soft, smudged textures - - Best for: Portraits, dramatic landscapes - - Features: Soft blending, grainy texture, highlighted areas +### Requirements -4. **conte_crayon** - Directional strokes following image contours - - Best for: Figure studies, natural forms - - Features: Stroke direction follows gradients, cross-hatching in dark areas +```bash +pip install torch torchvision pillow +``` -5. **gesture_sketch** - Loose, quick observational sketch style - - Best for: Quick studies, energetic compositions - - Features: Randomized line wobble, sparse suggestion lines +--- -### Futuristic Styles +## image_style_processor.py -6. **circuit_board** - Tech blueprint with circuit paths and geometric patterns - - Best for: Sci-fi imagery, technological themes - - Features: Multi-layer circuit paths, connection nodes, technical grid overlay +Generates stylized target images from raw renders. -7. **glitch_art** - Digital corruption with scan line shifts and pixel sorting - - Best for: Cyberpunk aesthetics, digital art - - Features: Horizontal scan artifacts, block displacement, pixel sorting, noise strips +### Usage -8. **wireframe_topo** - Topographic contour lines with holographic grid - - Best for: Landscape, abstract patterns, sci-fi hologram effect - - Features: 20 contour levels, scan lines, measurement markers, grid overlay +```bash +python3 image_style_processor.py <input_dir> <output_dir> <style> +``` -9. **data_mosaic** - Voronoi geometric fragmentation with angular cells - - Best for: Abstract art, geometric compositions - - Features: 200 Voronoi cells, posterized tones, embedded geometric patterns +### Available Styles -10. **holographic_scan** - CRT/hologram display with scanlines and HUD elements - - Best for: Retro-futuristic, heads-up display aesthetic - - Features: Scanlines, interference patterns, glitch effects, corner brackets, crosshair +**Sketch:** +- `pencil_sketch` - Dense cross-hatching +- `ink_drawing` - Bold outlines, comic style +- `charcoal_pastel` - Soft, dramatic contrasts +- `conte_crayon` - Directional strokes +- `gesture_sketch` - Loose, energetic lines -## Examples +**Futuristic:** +- `circuit_board` - Tech blueprint +- `glitch_art` - Digital corruption +- `wireframe_topo` - Topographic contours +- `data_mosaic` - Voronoi fragmentation +- `holographic_scan` - CRT/HUD aesthetic -### Sketch Effects +### Examples -Process images with pencil sketch: ```bash -python3 image_style_processor.py ./photos ./output pencil_sketch -``` +# Generate pencil sketch targets +python3 image_style_processor.py input/ output/ pencil_sketch -Apply ink drawing style: -```bash -python3 image_style_processor.py ./input ./sketches ink_drawing +# Generate glitch art targets +python3 image_style_processor.py input/ output/ glitch_art ``` -Create charcoal effect: +### Requirements + ```bash -python3 image_style_processor.py ./images ./results charcoal_pastel +pip install opencv-python numpy ``` -### Futuristic Effects +--- + +## Workflow + +### 1. Render Raw Frames -Apply circuit board style: +Generate raw 3D renders as input: ```bash -python3 image_style_processor.py ./photos ./output circuit_board +./build/demo64k --headless --duration 5 --output training/input/ ``` -Create glitch art: +### 2. Generate Stylized Targets + +Apply artistic style: ```bash -python3 image_style_processor.py ./input ./glitched glitch_art +python3 training/image_style_processor.py training/input/ training/output/ pencil_sketch ``` -Apply holographic effect: +### 3. Train CNN + +Train network to reproduce the style: ```bash -python3 image_style_processor.py ./images ./holo holographic_scan +python3 training/train_cnn.py \ + --input training/input \ + --target training/output \ + --epochs 500 \ + --checkpoint-every 50 ``` -## Output +### 4. Rebuild Demo -- Processed images are saved to the output directory with **the same filename** as the input -- Supported input formats: PNG, JPG, JPEG (case-insensitive) -- Output format: PNG (preserves quality) -- Original images are never modified +Weights auto-exported to `cnn_weights_generated.wgsl`: +```bash +cmake --build build -j4 +./build/demo64k +``` -## Style Comparison +--- -### Sketch Styles -- **pencil_sketch**: Most detailed, traditional drawing look -- **ink_drawing**: Boldest, most graphic/comic-like -- **charcoal_pastel**: Softest, most artistic/painterly -- **conte_crayon**: Most directional, follows contours -- **gesture_sketch**: Loosest, most expressive +## Tips -### Futuristic Styles -- **circuit_board**: Cleanest, most technical/blueprint-like -- **glitch_art**: Most chaotic, digital corruption aesthetic -- **wireframe_topo**: Most structured, topographic/hologram feel -- **data_mosaic**: Most geometric, fragmented cells -- **holographic_scan**: Most retro-futuristic, HUD/CRT display +- **Training data:** 10-50 image pairs recommended +- **Resolution:** 256×256 (auto-resized during training) +- **Checkpoints:** Save every 50-100 epochs for long runs +- **Loss plateaus:** Try lower learning rate (0.0001) or more layers +- **Residual connection:** Prevents catastrophic divergence (input always blended in) -## Tips +--- -- Images are automatically converted to grayscale before processing -- All styles work best with high-resolution images (300+ DPI recommended) -- Processing time varies by style: - - Fast: ink_drawing, glitch_art, holographic_scan - - Medium: charcoal_pastel, gesture_sketch, circuit_board, wireframe_topo - - Slow: pencil_sketch, conte_crayon, data_mosaic (due to intensive computation) -- For batch processing large collections, consider processing in smaller batches -- Randomized styles (glitch_art, gesture_sketch, data_mosaic) will produce slightly different results each run +## Coordinate-Aware Layer 0 -## Technical Notes +Layer 0 receives normalized (x,y) patch center coordinates, enabling position-dependent effects: -### Randomization -Some styles use randomization for natural variation: -- **glitch_art**: Random scan line shifts, block positions -- **gesture_sketch**: Random line wobble, stroke placement -- **data_mosaic**: Random Voronoi cell centers -- **circuit_board**: Random pattern placement in dark regions -- **holographic_scan**: Random glitch line positions +- **Vignetting:** Darker edges +- **Radial gradients:** Center-focused stylization +- **Corner effects:** Edge-specific treatments -### Processing Details -- **pencil_sketch**: Uses 5-level progressive cross-hatching algorithm -- **conte_crayon**: Follows Sobel gradients for directional strokes -- **wireframe_topo**: Generates 20 brightness-based contour levels -- **data_mosaic**: Creates 200 Voronoi cells via nearest-neighbor algorithm -- **holographic_scan**: Applies scanline patterns and interference waves +Training coordinate grid is auto-generated during forward pass. No manual intervention needed. -## License +Size impact: +32B coord weights (kernel-agnostic). -Free to use and modify for any purpose. +--- -## Version +## References -Version 1.0 - Complete collection of 10 artistic styles (5 sketch + 5 futuristic) +- **CNN Effect Documentation:** `doc/CNN_EFFECT.md` +- **Training Architecture:** See `train_cnn.py` (CoordConv2d class) |
