summaryrefslogtreecommitdiff
path: root/training/train_cnn_v2.py
AgeCommit message (Collapse)Author
16 hoursCNN v2: Change feature #6 from sin(10*x) to sin(20*y)skal
Update positional encoding to use vertical coordinate at higher frequency. Changes: - train_cnn_v2.py: sin10_x → sin20_y (computed from uv_y) - cnn_v2_static.wgsl: sin10_x → sin20_y (computed from uv_y) - index.html: sin10_x → sin20_y (STATIC_SHADER) - CNN_V2.md: Update feature descriptions and examples - CNN_V2_BINARY_FORMAT.md: Update static features documentation Feature vector: [p0, p1, p2, p3, uv_x, uv_y, sin20_y, bias] Rationale: Higher frequency (20 vs 10) + vertical axis provides better spatial discrimination for position encoding. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
16 hoursCNN v2: Add TODO for flexible feature layout in binary format v3skal
Document future enhancement for arbitrary feature vector layouts. Proposed feature descriptor in binary format v3: - Specify feature types, sources, and ordering - Enable runtime experimentation without shader recompilation - Examples: [R,G,B,dx,dy,uv_x,bias] or [mip1.r,mip2.g,laplacian,uv_x,sin20_x,bias] Added TODOs in: - CNN_V2_BINARY_FORMAT.md: Detailed proposal with struct layout - CNN_V2.md: Future extensions section - train_cnn_v2.py: compute_static_features() docstring - cnn_v2_static.wgsl: Shader header comment - cnn_v2_effect.cc: Version check comment Current limitation: Hardcoded [p0,p1,p2,p3,uv_x,uv_y,sin10_x,bias] layout. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
16 hoursCNN v2: Add --mip-level option for parametric featuresskal
Add mip level control for p0-p3 features (0=original, 1=half, 2=quarter, 3=eighth). Uses pyrDown/pyrUp for proper Gaussian filtering during mip generation. Changes: - compute_static_features(): Accept mip_level param, generate mip via cv2 pyramid - PatchDataset/ImagePairDataset: Pass mip_level to feature computation - CLI: Add --mip-level arg with choices [0,1,2,3] - Save mip_level in checkpoint config for tracking - Doc updates: HOWTO.md and CNN_V2.md Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
20 hoursCNN v2 training: Use target image alpha channelskal
Changed target loading from RGB to RGBA to preserve transparency. Model learns to predict alpha channel from target image instead of constant 1.0 padding. Before: Target padded with alpha=1.0 After: Target uses actual alpha from image (or 1.0 if no alpha) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
20 hoursCNN v2: Restore per-layer kernel sizes supportskal
Training: - train_cnn_v2.py: Accept --kernel-sizes as comma-separated list - CNNv2 model: Per-layer kernel sizes (e.g., [1,3,5]) - Single value replicates across layers (e.g., "3" → [3,3,3]) Export: - export_cnn_v2_weights.py: Backward compatible with old checkpoints - Handles both kernel_size (old) and kernel_sizes (new) format Documentation: - CNN_V2.md: Updated code examples and config format - HOWTO.md: Updated training examples to show comma-separated syntax Binary format: Already supports per-layer kernel sizes (no changes) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
20 hoursCNN v2: Refactor to uniform 12D→4D architectureskal
**Architecture changes:** - Static features (8D): p0-p3 (parametric) + uv_x, uv_y, sin(10×uv_x), bias - Input RGBD (4D): fed separately to all layers - All layers: uniform 12D→4D (4 prev/input + 8 static → 4 output) - Bias integrated in static features (bias=False in PyTorch) **Weight calculations:** - 3 layers × (12 × 3×3 × 4) = 1296 weights - f16: 2.6 KB (vs old variable arch: ~6.4 KB) **Updated files:** *Training (Python):* - train_cnn_v2.py: Uniform model, takes input_rgbd + static_features - export_cnn_v2_weights.py: Binary export for storage buffers - export_cnn_v2_shader.py: Per-layer shader export (debugging) *Shaders (WGSL):* - cnn_v2_static.wgsl: p0-p3 parametric features (mips/gradients) - cnn_v2_compute.wgsl: 12D input, 4D output, vec4 packing *Tools:* - HTML tool (cnn_v2_test): Updated for 12D→4D, layer visualization *Docs:* - CNN_V2.md: Updated architecture, training, validation sections - HOWTO.md: Reference HTML tool for validation *Removed:* - validate_cnn_v2.sh: Obsolete (used CNN v1 tool) All code consistent with bias=False (bias in static features as 1.0). handoff(Claude): CNN v2 architecture finalized and documented
45 hoursRefine training script output and validationskal
1. Loss printed at every epoch with \r (no scrolling) 2. Validation only on final epoch (not all checkpoints) 3. Process all input images (not just img_000.png) Training output now shows live progress with single line update.
45 hoursTODO: 8-bit weight quantization for 2× size reductionskal
- Add QAT (quantization-aware training) notes - Requires training with fake quantization - Target: ~1.6 KB weights (vs 3.2 KB f16) - Shader unpacking needs adaptation (4× u8 per u32)
45 hoursTODO: Add random sampling to patch-based trainingskal
Added note for future enhancement: mix salient + random samples. Rationale: - Salient point detection focuses on edges/corners - Random samples improve generalization across entire image - Prevents overfitting to only high-gradient regions Proposed implementation: - Default: 90% salient points, 10% random samples - Configurable: --random-sample-percent parameter - Example: 64 patches = 58 salient + 6 random Location: train_cnn_v2.py - TODO in _detect_salient_points() method - TODO in argument parser Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
45 hoursCNN v2: Patch-based training as default (like CNN v1)skal
Salient point detection on original images with patch extraction. Changes: - Added PatchDataset class (harris/fast/shi-tomasi/gradient detectors) - Detects salient points on ORIGINAL images (no resize) - Extracts 32×32 patches around salient points - Default: 64 patches/image, harris detector - Batch size: 16 (512 patches per batch) Training modes: 1. Patch-based (default): --patch-size 32 --patches-per-image 64 --detector harris 2. Full-image (option): --full-image --image-size 256 Benefits: - Focuses training on interesting regions - Handles variable image sizes naturally - Matches CNN v1 workflow - Better convergence with limited data (8 images → 512 patches) Script updated: - train_cnn_v2_full.sh: Patch-based by default - Configuration exposed for easy switching Example: ./scripts/train_cnn_v2_full.sh # Patch-based # Edit script: uncomment FULL_IMAGE for resize mode Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
45 hoursFix: CNN v2 training - handle variable image sizesskal
Training script now resizes all images to fixed size before batching. Issue: RuntimeError when batching variable-sized images - Images had different dimensions (376x626 vs 344x361) - PyTorch DataLoader requires uniform tensor sizes for batching Solution: - Add --image-size parameter (default: 256) - Resize all images to target_size using LANCZOS interpolation - Preserves aspect ratio independent training Changes: - train_cnn_v2.py: ImagePairDataset now resizes to fixed dimensions - train_cnn_v2_full.sh: Added IMAGE_SIZE=256 configuration Tested: 8 image pairs, variable sizes → uniform 256×256 batches Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
45 hoursCNN v2: parametric static features - Phases 1-4skal
Infrastructure for enhanced CNN post-processing with 7D feature input. Phase 1: Shaders - Static features compute (RGBD + UV + sin10_x + bias → 8×f16) - Layer template (convolution skeleton, packing/unpacking) - 3 mip level support for multi-scale features Phase 2: C++ Effect - CNNv2Effect class (multi-pass architecture) - Texture management (static features, layer buffers) - Build integration (CMakeLists, assets, tests) Phase 3: Training Pipeline - train_cnn_v2.py: PyTorch model with static feature concatenation - export_cnn_v2_shader.py: f32→f16 quantization, WGSL generation - Configurable architecture (kernels, channels) Phase 4: Validation - validate_cnn_v2.sh: End-to-end pipeline - Checkpoint → shaders → build → test images Tests: 36/36 passing Next: Complete render pipeline implementation (bind groups, multi-pass) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>