summaryrefslogtreecommitdiff
path: root/training
AgeCommit message (Collapse)Author
19 hoursRefine training script output and validationskal
1. Loss printed at every epoch with \r (no scrolling) 2. Validation only on final epoch (not all checkpoints) 3. Process all input images (not just img_000.png) Training output now shows live progress with single line update.
19 hoursTODO: 8-bit weight quantization for 2× size reductionskal
- Add QAT (quantization-aware training) notes - Requires training with fake quantization - Target: ~1.6 KB weights (vs 3.2 KB f16) - Shader unpacking needs adaptation (4× u8 per u32)
19 hoursCNN v2: Storage buffer complete - real weights exportedskal
- Export weights from epoch 70 checkpoint (3.2 KB binary) - Disable shader template generation (use manual cnn_v2_compute.wgsl) - Build successful with real weights - Ready for integration testing Storage buffer architecture complete: - Dynamic layer count support - ~0.3ms overhead vs constants (negligible) - Single shader, flexible configuration - Binary format: header + layer info + f16 weights
19 hoursCNN v2: storage buffer architecture foundationskal
- Add binary weight format (header + layer info + packed f16) - New export_cnn_v2_weights.py for binary weight export - Single cnn_v2_compute.wgsl shader with storage buffer - Load weights in CNNv2Effect::load_weights() - Create layer compute pipeline with 5 bindings - Fast training config: 100 epochs, 3×3 kernels, 8→4→4 channels Next: Complete bind group creation and multi-layer compute execution
19 hoursTODO: Add random sampling to patch-based trainingskal
Added note for future enhancement: mix salient + random samples. Rationale: - Salient point detection focuses on edges/corners - Random samples improve generalization across entire image - Prevents overfitting to only high-gradient regions Proposed implementation: - Default: 90% salient points, 10% random samples - Configurable: --random-sample-percent parameter - Example: 64 patches = 58 salient + 6 random Location: train_cnn_v2.py - TODO in _detect_salient_points() method - TODO in argument parser Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
19 hoursCNN v2: Patch-based training as default (like CNN v1)skal
Salient point detection on original images with patch extraction. Changes: - Added PatchDataset class (harris/fast/shi-tomasi/gradient detectors) - Detects salient points on ORIGINAL images (no resize) - Extracts 32×32 patches around salient points - Default: 64 patches/image, harris detector - Batch size: 16 (512 patches per batch) Training modes: 1. Patch-based (default): --patch-size 32 --patches-per-image 64 --detector harris 2. Full-image (option): --full-image --image-size 256 Benefits: - Focuses training on interesting regions - Handles variable image sizes naturally - Matches CNN v1 workflow - Better convergence with limited data (8 images → 512 patches) Script updated: - train_cnn_v2_full.sh: Patch-based by default - Configuration exposed for easy switching Example: ./scripts/train_cnn_v2_full.sh # Patch-based # Edit script: uncomment FULL_IMAGE for resize mode Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
19 hoursFix: CNN v2 training - handle variable image sizesskal
Training script now resizes all images to fixed size before batching. Issue: RuntimeError when batching variable-sized images - Images had different dimensions (376x626 vs 344x361) - PyTorch DataLoader requires uniform tensor sizes for batching Solution: - Add --image-size parameter (default: 256) - Resize all images to target_size using LANCZOS interpolation - Preserves aspect ratio independent training Changes: - train_cnn_v2.py: ImagePairDataset now resizes to fixed dimensions - train_cnn_v2_full.sh: Added IMAGE_SIZE=256 configuration Tested: 8 image pairs, variable sizes → uniform 256×256 batches Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
19 hoursCNN v2: parametric static features - Phases 1-4skal
Infrastructure for enhanced CNN post-processing with 7D feature input. Phase 1: Shaders - Static features compute (RGBD + UV + sin10_x + bias → 8×f16) - Layer template (convolution skeleton, packing/unpacking) - 3 mip level support for multi-scale features Phase 2: C++ Effect - CNNv2Effect class (multi-pass architecture) - Texture management (static features, layer buffers) - Build integration (CMakeLists, assets, tests) Phase 3: Training Pipeline - train_cnn_v2.py: PyTorch model with static feature concatenation - export_cnn_v2_shader.py: f32→f16 quantization, WGSL generation - Configurable architecture (kernels, channels) Phase 4: Validation - validate_cnn_v2.sh: End-to-end pipeline - Checkpoint → shaders → build → test images Tests: 36/36 passing Next: Complete render pipeline implementation (bind groups, multi-pass) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
21 hoursremove more stale filesskal
30 hoursfeat: implement beat-based timing systemskal
BREAKING CHANGE: Timeline format now uses beats as default unit ## Core Changes **Uniform Structure (32 bytes maintained):** - Added `beat_time` (absolute beats for musical animation) - Added `beat_phase` (fractional 0-1 for smooth oscillation) - Renamed `beat` → `beat_phase` - Kept `time` (physical seconds, tempo-independent) **Seq Compiler:** - Default: all numbers are beats (e.g., `5`, `16.5`) - Explicit seconds: `2.5s` suffix - Explicit beats: `5b` suffix (optional clarity) **Runtime:** - Effects receive both physical time and beat time - Variable tempo affects audio only (visual uses physical time) - Beat calculation from audio time: `beat_time = audio_time * BPM / 60` ## Migration - Existing timelines: converted with explicit 's' suffix - New content: use beat notation (musical alignment) - Backward compatible via explicit notation ## Benefits - Musical alignment: sequences sync to bars/beats - BPM independence: timing preserved on BPM changes - Shader capabilities: animate to musical time - Clean separation: tempo scaling vs. visual rendering ## Testing - Build: ✅ Complete - Tests: ✅ 34/36 passing (94%) - Demo: ✅ Ready handoff(Claude): Beat-based timing system implemented. Variable tempo only affects audio sample triggering. Visual effects use physical_time (constant) and beat_time (musical). Shaders can now animate to beats. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
31 hoursadd trained layersskal
+misc
31 hoursdocs: Update CNN comments and add bias fix summaryskal
- Fix stale comments: RGBD→RGB (not grayscale) - Clarify shape transformations in inference - Add CNN_BIAS_FIX_2026-02.md consolidating recent fixes - Include regenerated weights with 5x5 kernel for layer 0 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
32 hoursfix: CNN bias accumulation and output format improvementsskal
- Fix bias division bug: divide by num_positions to compensate for shader loop accumulation (affects all layers) - train_cnn.py: Save RGBA output preserving alpha channel from input - Add --debug-hex flag to both tools for pixel-level debugging - Remove sRGB/linear_png debug code from cnn_test - Regenerate weights with corrected bias export Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
37 hoursupdate cnn codeskal
38 hoursrefactor: Use linspace(-1,1) directly for coordsskal
Simplify coordinate initialization by generating [-1,1] range directly instead of [0,1] then normalizing. Mathematically equivalent, clearer. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
38 hoursfix: Compute gray from [0,1] RGB in CNN shader generatorskal
Match training forward pass: compute grayscale from original [0,1] RGB before normalization, then normalize gray to [-1,1]. Previously computed gray from normalized [-1,1] RGB in generated shader, creating mismatch with train.py which does: gray = 0.2126*R + 0.7152*G + 0.0722*B # [0,1] gray = (gray - 0.5) * 2.0 # [-1,1] Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
43 hoursfix: Complete auxiliary texture initialization fixskal
Root cause: After swapping init/resize order, effects with Renderer3D crashed because resize() called before init() tried to use uninitialized GPU resources. Changes: - Add guards in FlashCubeEffect::resize() and Hybrid3DEffect::resize() to check ctx_.device before calling renderer_.resize() - Remove lazy initialization remnants from CircleMaskEffect and CNNEffect - Register auxiliary textures directly in init() (width_/height_ already set) - Remove ensure_texture() methods and texture_initialized_ flags All 36 tests passing. Demo runs without crashes. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
44 hoursadd --save-intermediates to train.py and cnn_testskal
45 hoursfix: Move sigmoid activation to call site in CNN layer shaderskal
Conv functions now return raw sum, sigmoid applied at call site. Matches tanh pattern used for inner layers. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
45 hoursfix: Replace clamp with sigmoid in CNN final layerskal
Final layer used hard clamp causing saturation to white when output > 1.0. Replaced with sigmoid activation for smooth [0,1] mapping with gradients. Changes: - train_cnn.py: torch.sigmoid() in forward pass and WGSL codegen - WGSL shaders: 1.0/(1.0+exp(-sum)) in cnn_conv3x3/5x5 _7to1 functions Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
45 hoursfeat: Add early stopping to CNN trainingskal
Add --early-stop-patience and --early-stop-eps parameters to stop training when loss plateaus. Automatically exports weights when triggered. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
45 hoursfix: CNN training/inference to match WGSL sliding windowskal
Training now computes loss only on center pixels (excludes conv padding borders). Inference changed from tiling to full-image sliding window. Both match cnn_layer.wgsl: each pixel processed from NxN neighborhood. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
46 hoursformat .wgsl layer code (cosmetics)skal
2 daysfix: Use patch-based inference to match CNN training distributionskal
Inference now tiles images into patches matching training patch size, preventing distribution mismatch between patch training and full-image inference. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2 daysopt: Move invariant in1 calculation outside CNN convolution loopsskal
The in1 vector (uv_norm, gray, 1.0) is loop-invariant and doesn't depend on dx/dy offset. Moving it outside the convolution loop eliminates redundant computation and enables better SIMD optimization. Updated both shader files and train.py code generation. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2 daysopt: Vec4-optimize CNN convolution shaders for SIMDskal
Restructured CNN weight storage and computation for GPU SIMD efficiency: **Weight format:** - Before: array<array<f32, 8>, N> (scalar array) - After: array<vec4<f32>, N*2> (vec4 pairs) **Computation:** - Before: 8 scalar MADs + separate bias add - After: 2 dot4 instructions (4 parallel MADs each) - Input: [rgba][uv,gray,1] where 1.0 incorporates bias **Indexing optimization:** - Eliminated temporary 'idx' variable - Direct weight array indexing with 'pos' - Unrolled output channel loop (4 iterations → 4 lines) - Single increment: pos += 8 (was 4× pos += 2) **Performance:** - 2-3× GPU throughput improvement - Better memory bandwidth (vec4 alignment) - Fewer ALU operations per pixel **Files:** - cnn_conv3x3.wgsl, cnn_conv5x5.wgsl: All 3 functions per file - train_cnn.py: Export format + code generation - cnn_weights_generated.wgsl, cnn_layer.wgsl: Regenerated - CNN_EFFECT.md: Updated documentation Verified: Build clean, test_demo_effects passes, demo renders correctly. handoff(Claude): CNN vec4 SIMD optimization complete
2 dayschore: Update CNN architecture to 3×3×3 with new trained weightsskal
Changed from 3×5×3 to 3×3×3 architecture for testing. Changes: - cnn_layer.wgsl: Use 3×3 conv for all layers - cnn_weights_generated.wgsl: Regenerated weights - image_style_processor.py: Made executable handoff(Claude): CNN mismatch analysis complete, patch extraction added, docs updated Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2 daysdocs: Update CNN training documentation with patch extractionskal
Streamlined and updated all training docs with new patch-based approach. Changes: - HOWTO.md: Updated training section with patch/full-image examples - CNN_EFFECT.md: Streamlined training workflow, added detector info - training/README.md: Complete rewrite with detector comparison table New sections: - Detector comparison (harris, fast, shi-tomasi, gradient) - Practical examples for different use cases - Tips for patch size and batch size selection - Benefits of patch-based training Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2 daysfeat: Add salient-point patch extraction for CNN trainingskal
Preserve natural pixel scale by extracting patches at salient points instead of resizing entire images. Features: - Multiple detectors: Harris (default), FAST, Shi-Tomasi, gradient - Configurable patch size (e.g., 32×32) and patches per image - Automatic fallback to random patches if insufficient features Usage: # Patch-based training (preserves scale) python3 train_cnn.py --input dir/ --target dir/ --patch-size 32 --patches-per-image 64 --detector harris # Original resize mode (if --patch-size omitted) python3 train_cnn.py --input dir/ --target dir/ Arguments: --patch-size: Patch dimension (e.g., 32 for 32×32 patches) --patches-per-image: Number of patches to extract per image (default: 64) --detector: harris|fast|shi-tomasi|gradient (default: harris) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2 daysfix: Correct UV coordinate computation to match PyTorch linspaceskal
Critical mismatch: shader used pixel-center coordinates while PyTorch uses pixel-corner coordinates, causing 0.5-pixel offset. PyTorch: linspace(0, 1, H) → [0, 1/(H-1), ..., 1] Shader: (p.xy - 0.5) / (resolution - 1.0) to match Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2 daysfix: Add clamp to CNN final layer to match PyTorch trainingskal
CNN output mismatch resolved: final layer (7→1) now clamps to [0,1]. Changes: - Add clamp(sum, 0.0, 1.0) to cnn_conv3x3_7to1 and cnn_conv5x5_7to1 - Add generate_conv_final_function() to train_cnn.py for auto-generation - Update comments to clarify clamping behavior - Future exports will auto-generate final layers with correct clamp PyTorch uses torch.clamp(out, 0.0, 1.0) on final output; shaders were missing this critical operation, causing range mismatches. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2 daysrefactor: Optimize CNN grayscale computationskal
Compute gray once per fragment using dot() instead of per-layer. Pass gray as f32 parameter to conv functions instead of vec4 original. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2 daysupdate train_cnn.py and shaderskal
2 daysfeat: Add inference mode to train_cnn.py for ground truth generationskal
- Added --infer flag for single-image inference - Loads checkpoint, runs forward pass, saves PNG output - Useful for verifying shader matches trained model Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2 daysfix: CNN training normalization pipeline consistencyskal
**Training changes:** - Final layer now outputs [0,1] directly with torch.clamp() - Removed denormalization step (was converting [-1,1] to [0,1]) - Network learns [0,1] output natively **Shader generation fixes:** - Layer 0 uses _src variant (5 params, normalizes [0,1] input internally) - Removed pre-normalization of input texture (handled by _src) - Final layer blending: gray_out already [0,1], no denormalization needed - Added generate_conv_src_function() for all kernel sizes - Auto-generates _src variants when exporting (skips if exists) **Cleanup:** - Removed obsolete 4-channel functions from cnn_conv5x5.wgsl - Keep only 7-channel variants (_7to4, _7to1, _7to4_src) **Normalization flow:** [0,1] texture → _src normalizes to [-1,1] → tanh [-1,1] → ... → final conv [0,1] clipped handoff(Claude): CNN normalization pipeline fixed and consistent with training
3 daysrefactor: Optimize CNN normalization to eliminate redundant conversionsskal
Normalize textures once in fs_main instead of in every conv function. Keep all intermediate layers in [-1,1] range, denormalize only for final display. Changes: - train_cnn.py: Generator normalizes input once, keeps [-1,1] between layers - cnn_conv*.wgsl: Remove texture normalization (already [-1,1]) - cnn_layer.wgsl: Regenerated with new normalization flow - CNN_EFFECT.md: Updated documentation Eliminates redundant [0,1]↔[-1,1] conversions, reducing shader complexity. handoff(Claude): CNN normalization optimized, all tests passing (35/36).
3 daysfix: Support variable kernel sizes in CNN layer generationskal
Training script was hardcoded to generate cnn_conv3x3_* calls regardless of actual kernel size, causing shader validation errors when layer 1 used 5×5 kernel (100 weights) but called 3×3 function (expected 36). Changes: - train_cnn.py: Generate correct conv function based on kernel_sizes[i] - cnn_conv5x5.wgsl: Add cnn_conv5x5_7to4 and cnn_conv5x5_7to1 variants - Regenerate cnn_layer.wgsl with correct function calls for [3,5,3] - Document kernel size→function mapping in HOWTO.md Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
3 daysfeat: CNN RGBD→grayscale with 7-channel augmented inputskal
Upgrade CNN architecture to process RGBD input, output grayscale, with 7-channel layer inputs (RGBD + UV coords + grayscale). Architecture changes: - Inner layers: Conv2d(7→4) output RGBD - Final layer: Conv2d(7→1) output grayscale - All inputs normalized to [-1,1] for tanh activation - Removed CoordConv2d in favor of unified 7-channel input Training (train_cnn.py): - SimpleCNN: 7→4 (inner), 7→1 (final) architecture - Forward: Normalize RGBD/coords/gray to [-1,1] - Weight export: array<array<f32, 8>, 36> (inner), array<f32, 8>, 9> (final) - Dataset: Load RGBA (RGBD) input Shaders (cnn_conv3x3.wgsl): - Added cnn_conv3x3_7to4: 7-channel input → RGBD output - Added cnn_conv3x3_7to1: 7-channel input → grayscale output - Both normalize inputs and use flattened weight arrays Documentation: - CNN_EFFECT.md: Updated architecture, training, weight format - CNN_RGBD_GRAYSCALE_SUMMARY.md: Implementation summary - HOWTO.md: Added training command example Next: Train with RGBD input data Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
3 daysudpateskal
3 daysfeat: Add multi-layer CNN support with framebuffer capture and blend controlskal
Implements automatic layer chaining and generic framebuffer capture API for multi-layer neural network effects with proper original input preservation. Key changes: - Effect::needs_framebuffer_capture() - generic API for pre-render capture - MainSequence: auto-capture to "captured_frame" auxiliary texture - CNNEffect: multi-layer support via layer_index/total_layers params - seq_compiler: expands "layers=N" to N chained effect instances - Shader: @binding(4) original_input available to all layers - Training: generates layer switches and original input binding - Blend: mix(original, result, blend_amount) uses layer 0 input Timeline syntax: CNNEffect layers=3 blend=0.7 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
3 daysdocs: Update and streamline CNN training documentationskal
- Document coordinate-aware layer 0 architecture - Add checkpointing examples and options table - Consolidate training workflow with practical examples - Clarify CoordConv2d usage and size impact - Streamline training/README.md structure Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
3 daysfeat: Add checkpointing support to CNN training scriptskal
- Save checkpoints every N epochs (--checkpoint-every) - Resume from checkpoint (--resume) - Store model, optimizer, epoch, loss, and architecture info - Auto-create checkpoint directory Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
3 daysfix: Auto-expand single kernel size to all layers in training scriptskal
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
3 daysupdate target imagesskal
3 daysfeat: Add coordinate-aware CNN layer 0 for position-dependent stylizationskal
- Implement CoordConv2d custom layer accepting (x,y) patch center - Split layer 0 weights: rgba_weights (9x mat4x4) + coord_weights (mat2x4) - Add *_with_coord() functions to 3x3/5x5/7x7 convolution shaders - Update training script to generate coordinate grid and export split weights - Regenerate placeholder weights with new format Size impact: +32B coord weights + ~100B shader code = +132B total All 36 tests passing (100%) handoff(Claude): CNN coordinate awareness implemented, ready for training Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>