summaryrefslogtreecommitdiff
path: root/training/train_cnn.py
AgeCommit message (Collapse)Author
23 hoursdocs: Update CNN comments and add bias fix summaryskal
- Fix stale comments: RGBD→RGB (not grayscale) - Clarify shape transformations in inference - Add CNN_BIAS_FIX_2026-02.md consolidating recent fixes - Include regenerated weights with 5x5 kernel for layer 0 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
24 hoursfix: CNN bias accumulation and output format improvementsskal
- Fix bias division bug: divide by num_positions to compensate for shader loop accumulation (affects all layers) - train_cnn.py: Save RGBA output preserving alpha channel from input - Add --debug-hex flag to both tools for pixel-level debugging - Remove sRGB/linear_png debug code from cnn_test - Regenerate weights with corrected bias export Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
30 hoursrefactor: Use linspace(-1,1) directly for coordsskal
Simplify coordinate initialization by generating [-1,1] range directly instead of [0,1] then normalizing. Mathematically equivalent, clearer. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
30 hoursfix: Compute gray from [0,1] RGB in CNN shader generatorskal
Match training forward pass: compute grayscale from original [0,1] RGB before normalization, then normalize gray to [-1,1]. Previously computed gray from normalized [-1,1] RGB in generated shader, creating mismatch with train.py which does: gray = 0.2126*R + 0.7152*G + 0.0722*B # [0,1] gray = (gray - 0.5) * 2.0 # [-1,1] Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
36 hoursadd --save-intermediates to train.py and cnn_testskal
37 hoursfix: Move sigmoid activation to call site in CNN layer shaderskal
Conv functions now return raw sum, sigmoid applied at call site. Matches tanh pattern used for inner layers. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
37 hoursfix: Replace clamp with sigmoid in CNN final layerskal
Final layer used hard clamp causing saturation to white when output > 1.0. Replaced with sigmoid activation for smooth [0,1] mapping with gradients. Changes: - train_cnn.py: torch.sigmoid() in forward pass and WGSL codegen - WGSL shaders: 1.0/(1.0+exp(-sum)) in cnn_conv3x3/5x5 _7to1 functions Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
37 hoursfeat: Add early stopping to CNN trainingskal
Add --early-stop-patience and --early-stop-eps parameters to stop training when loss plateaus. Automatically exports weights when triggered. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
37 hoursfix: CNN training/inference to match WGSL sliding windowskal
Training now computes loss only on center pixels (excludes conv padding borders). Inference changed from tiling to full-image sliding window. Both match cnn_layer.wgsl: each pixel processed from NxN neighborhood. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
38 hoursformat .wgsl layer code (cosmetics)skal
46 hoursfix: Use patch-based inference to match CNN training distributionskal
Inference now tiles images into patches matching training patch size, preventing distribution mismatch between patch training and full-image inference. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
46 hoursopt: Move invariant in1 calculation outside CNN convolution loopsskal
The in1 vector (uv_norm, gray, 1.0) is loop-invariant and doesn't depend on dx/dy offset. Moving it outside the convolution loop eliminates redundant computation and enables better SIMD optimization. Updated both shader files and train.py code generation. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
48 hoursopt: Vec4-optimize CNN convolution shaders for SIMDskal
Restructured CNN weight storage and computation for GPU SIMD efficiency: **Weight format:** - Before: array<array<f32, 8>, N> (scalar array) - After: array<vec4<f32>, N*2> (vec4 pairs) **Computation:** - Before: 8 scalar MADs + separate bias add - After: 2 dot4 instructions (4 parallel MADs each) - Input: [rgba][uv,gray,1] where 1.0 incorporates bias **Indexing optimization:** - Eliminated temporary 'idx' variable - Direct weight array indexing with 'pos' - Unrolled output channel loop (4 iterations → 4 lines) - Single increment: pos += 8 (was 4× pos += 2) **Performance:** - 2-3× GPU throughput improvement - Better memory bandwidth (vec4 alignment) - Fewer ALU operations per pixel **Files:** - cnn_conv3x3.wgsl, cnn_conv5x5.wgsl: All 3 functions per file - train_cnn.py: Export format + code generation - cnn_weights_generated.wgsl, cnn_layer.wgsl: Regenerated - CNN_EFFECT.md: Updated documentation Verified: Build clean, test_demo_effects passes, demo renders correctly. handoff(Claude): CNN vec4 SIMD optimization complete
2 daysfeat: Add salient-point patch extraction for CNN trainingskal
Preserve natural pixel scale by extracting patches at salient points instead of resizing entire images. Features: - Multiple detectors: Harris (default), FAST, Shi-Tomasi, gradient - Configurable patch size (e.g., 32×32) and patches per image - Automatic fallback to random patches if insufficient features Usage: # Patch-based training (preserves scale) python3 train_cnn.py --input dir/ --target dir/ --patch-size 32 --patches-per-image 64 --detector harris # Original resize mode (if --patch-size omitted) python3 train_cnn.py --input dir/ --target dir/ Arguments: --patch-size: Patch dimension (e.g., 32 for 32×32 patches) --patches-per-image: Number of patches to extract per image (default: 64) --detector: harris|fast|shi-tomasi|gradient (default: harris) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2 daysfix: Correct UV coordinate computation to match PyTorch linspaceskal
Critical mismatch: shader used pixel-center coordinates while PyTorch uses pixel-corner coordinates, causing 0.5-pixel offset. PyTorch: linspace(0, 1, H) → [0, 1/(H-1), ..., 1] Shader: (p.xy - 0.5) / (resolution - 1.0) to match Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2 daysfix: Add clamp to CNN final layer to match PyTorch trainingskal
CNN output mismatch resolved: final layer (7→1) now clamps to [0,1]. Changes: - Add clamp(sum, 0.0, 1.0) to cnn_conv3x3_7to1 and cnn_conv5x5_7to1 - Add generate_conv_final_function() to train_cnn.py for auto-generation - Update comments to clarify clamping behavior - Future exports will auto-generate final layers with correct clamp PyTorch uses torch.clamp(out, 0.0, 1.0) on final output; shaders were missing this critical operation, causing range mismatches. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2 daysrefactor: Optimize CNN grayscale computationskal
Compute gray once per fragment using dot() instead of per-layer. Pass gray as f32 parameter to conv functions instead of vec4 original. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2 daysupdate train_cnn.py and shaderskal
2 daysfeat: Add inference mode to train_cnn.py for ground truth generationskal
- Added --infer flag for single-image inference - Loads checkpoint, runs forward pass, saves PNG output - Useful for verifying shader matches trained model Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2 daysfix: CNN training normalization pipeline consistencyskal
**Training changes:** - Final layer now outputs [0,1] directly with torch.clamp() - Removed denormalization step (was converting [-1,1] to [0,1]) - Network learns [0,1] output natively **Shader generation fixes:** - Layer 0 uses _src variant (5 params, normalizes [0,1] input internally) - Removed pre-normalization of input texture (handled by _src) - Final layer blending: gray_out already [0,1], no denormalization needed - Added generate_conv_src_function() for all kernel sizes - Auto-generates _src variants when exporting (skips if exists) **Cleanup:** - Removed obsolete 4-channel functions from cnn_conv5x5.wgsl - Keep only 7-channel variants (_7to4, _7to1, _7to4_src) **Normalization flow:** [0,1] texture → _src normalizes to [-1,1] → tanh [-1,1] → ... → final conv [0,1] clipped handoff(Claude): CNN normalization pipeline fixed and consistent with training
2 daysrefactor: Optimize CNN normalization to eliminate redundant conversionsskal
Normalize textures once in fs_main instead of in every conv function. Keep all intermediate layers in [-1,1] range, denormalize only for final display. Changes: - train_cnn.py: Generator normalizes input once, keeps [-1,1] between layers - cnn_conv*.wgsl: Remove texture normalization (already [-1,1]) - cnn_layer.wgsl: Regenerated with new normalization flow - CNN_EFFECT.md: Updated documentation Eliminates redundant [0,1]↔[-1,1] conversions, reducing shader complexity. handoff(Claude): CNN normalization optimized, all tests passing (35/36).
2 daysfix: Support variable kernel sizes in CNN layer generationskal
Training script was hardcoded to generate cnn_conv3x3_* calls regardless of actual kernel size, causing shader validation errors when layer 1 used 5×5 kernel (100 weights) but called 3×3 function (expected 36). Changes: - train_cnn.py: Generate correct conv function based on kernel_sizes[i] - cnn_conv5x5.wgsl: Add cnn_conv5x5_7to4 and cnn_conv5x5_7to1 variants - Regenerate cnn_layer.wgsl with correct function calls for [3,5,3] - Document kernel size→function mapping in HOWTO.md Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2 daysfeat: CNN RGBD→grayscale with 7-channel augmented inputskal
Upgrade CNN architecture to process RGBD input, output grayscale, with 7-channel layer inputs (RGBD + UV coords + grayscale). Architecture changes: - Inner layers: Conv2d(7→4) output RGBD - Final layer: Conv2d(7→1) output grayscale - All inputs normalized to [-1,1] for tanh activation - Removed CoordConv2d in favor of unified 7-channel input Training (train_cnn.py): - SimpleCNN: 7→4 (inner), 7→1 (final) architecture - Forward: Normalize RGBD/coords/gray to [-1,1] - Weight export: array<array<f32, 8>, 36> (inner), array<f32, 8>, 9> (final) - Dataset: Load RGBA (RGBD) input Shaders (cnn_conv3x3.wgsl): - Added cnn_conv3x3_7to4: 7-channel input → RGBD output - Added cnn_conv3x3_7to1: 7-channel input → grayscale output - Both normalize inputs and use flattened weight arrays Documentation: - CNN_EFFECT.md: Updated architecture, training, weight format - CNN_RGBD_GRAYSCALE_SUMMARY.md: Implementation summary - HOWTO.md: Added training command example Next: Train with RGBD input data Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2 daysfeat: Add multi-layer CNN support with framebuffer capture and blend controlskal
Implements automatic layer chaining and generic framebuffer capture API for multi-layer neural network effects with proper original input preservation. Key changes: - Effect::needs_framebuffer_capture() - generic API for pre-render capture - MainSequence: auto-capture to "captured_frame" auxiliary texture - CNNEffect: multi-layer support via layer_index/total_layers params - seq_compiler: expands "layers=N" to N chained effect instances - Shader: @binding(4) original_input available to all layers - Training: generates layer switches and original input binding - Blend: mix(original, result, blend_amount) uses layer 0 input Timeline syntax: CNNEffect layers=3 blend=0.7 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
3 daysfeat: Add checkpointing support to CNN training scriptskal
- Save checkpoints every N epochs (--checkpoint-every) - Resume from checkpoint (--resume) - Store model, optimizer, epoch, loss, and architecture info - Auto-create checkpoint directory Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
3 daysfix: Auto-expand single kernel size to all layers in training scriptskal
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
3 daysfeat: Add coordinate-aware CNN layer 0 for position-dependent stylizationskal
- Implement CoordConv2d custom layer accepting (x,y) patch center - Split layer 0 weights: rgba_weights (9x mat4x4) + coord_weights (mat2x4) - Add *_with_coord() functions to 3x3/5x5/7x7 convolution shaders - Update training script to generate coordinate grid and export split weights - Regenerate placeholder weights with new format Size impact: +32B coord weights + ~100B shader code = +132B total All 36 tests passing (100%) handoff(Claude): CNN coordinate awareness implemented, ready for training Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>