| Age | Commit message (Collapse) | Author |
|
Update positional encoding to use vertical coordinate at higher frequency.
Changes:
- train_cnn_v2.py: sin10_x → sin20_y (computed from uv_y)
- cnn_v2_static.wgsl: sin10_x → sin20_y (computed from uv_y)
- index.html: sin10_x → sin20_y (STATIC_SHADER)
- CNN_V2.md: Update feature descriptions and examples
- CNN_V2_BINARY_FORMAT.md: Update static features documentation
Feature vector: [p0, p1, p2, p3, uv_x, uv_y, sin20_y, bias]
Rationale: Higher frequency (20 vs 10) + vertical axis provides better
spatial discrimination for position encoding.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
Document future enhancement for arbitrary feature vector layouts.
Proposed feature descriptor in binary format v3:
- Specify feature types, sources, and ordering
- Enable runtime experimentation without shader recompilation
- Examples: [R,G,B,dx,dy,uv_x,bias] or [mip1.r,mip2.g,laplacian,uv_x,sin20_x,bias]
Added TODOs in:
- CNN_V2_BINARY_FORMAT.md: Detailed proposal with struct layout
- CNN_V2.md: Future extensions section
- train_cnn_v2.py: compute_static_features() docstring
- cnn_v2_static.wgsl: Shader header comment
- cnn_v2_effect.cc: Version check comment
Current limitation: Hardcoded [p0,p1,p2,p3,uv_x,uv_y,sin10_x,bias] layout.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
Add mip level control for p0-p3 features (0=original, 1=half, 2=quarter, 3=eighth).
Uses pyrDown/pyrUp for proper Gaussian filtering during mip generation.
Changes:
- compute_static_features(): Accept mip_level param, generate mip via cv2 pyramid
- PatchDataset/ImagePairDataset: Pass mip_level to feature computation
- CLI: Add --mip-level arg with choices [0,1,2,3]
- Save mip_level in checkpoint config for tracking
- Doc updates: HOWTO.md and CNN_V2.md
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
Changed target loading from RGB to RGBA to preserve transparency.
Model learns to predict alpha channel from target image instead of
constant 1.0 padding.
Before: Target padded with alpha=1.0
After: Target uses actual alpha from image (or 1.0 if no alpha)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
Training:
- train_cnn_v2.py: Accept --kernel-sizes as comma-separated list
- CNNv2 model: Per-layer kernel sizes (e.g., [1,3,5])
- Single value replicates across layers (e.g., "3" → [3,3,3])
Export:
- export_cnn_v2_weights.py: Backward compatible with old checkpoints
- Handles both kernel_size (old) and kernel_sizes (new) format
Documentation:
- CNN_V2.md: Updated code examples and config format
- HOWTO.md: Updated training examples to show comma-separated syntax
Binary format: Already supports per-layer kernel sizes (no changes)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
**Architecture changes:**
- Static features (8D): p0-p3 (parametric) + uv_x, uv_y, sin(10×uv_x), bias
- Input RGBD (4D): fed separately to all layers
- All layers: uniform 12D→4D (4 prev/input + 8 static → 4 output)
- Bias integrated in static features (bias=False in PyTorch)
**Weight calculations:**
- 3 layers × (12 × 3×3 × 4) = 1296 weights
- f16: 2.6 KB (vs old variable arch: ~6.4 KB)
**Updated files:**
*Training (Python):*
- train_cnn_v2.py: Uniform model, takes input_rgbd + static_features
- export_cnn_v2_weights.py: Binary export for storage buffers
- export_cnn_v2_shader.py: Per-layer shader export (debugging)
*Shaders (WGSL):*
- cnn_v2_static.wgsl: p0-p3 parametric features (mips/gradients)
- cnn_v2_compute.wgsl: 12D input, 4D output, vec4 packing
*Tools:*
- HTML tool (cnn_v2_test): Updated for 12D→4D, layer visualization
*Docs:*
- CNN_V2.md: Updated architecture, training, validation sections
- HOWTO.md: Reference HTML tool for validation
*Removed:*
- validate_cnn_v2.sh: Obsolete (used CNN v1 tool)
All code consistent with bias=False (bias in static features as 1.0).
handoff(Claude): CNN v2 architecture finalized and documented
|
|
1. Loss printed at every epoch with \r (no scrolling)
2. Validation only on final epoch (not all checkpoints)
3. Process all input images (not just img_000.png)
Training output now shows live progress with single line update.
|
|
- Add QAT (quantization-aware training) notes
- Requires training with fake quantization
- Target: ~1.6 KB weights (vs 3.2 KB f16)
- Shader unpacking needs adaptation (4× u8 per u32)
|
|
Added note for future enhancement: mix salient + random samples.
Rationale:
- Salient point detection focuses on edges/corners
- Random samples improve generalization across entire image
- Prevents overfitting to only high-gradient regions
Proposed implementation:
- Default: 90% salient points, 10% random samples
- Configurable: --random-sample-percent parameter
- Example: 64 patches = 58 salient + 6 random
Location: train_cnn_v2.py
- TODO in _detect_salient_points() method
- TODO in argument parser
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
Salient point detection on original images with patch extraction.
Changes:
- Added PatchDataset class (harris/fast/shi-tomasi/gradient detectors)
- Detects salient points on ORIGINAL images (no resize)
- Extracts 32×32 patches around salient points
- Default: 64 patches/image, harris detector
- Batch size: 16 (512 patches per batch)
Training modes:
1. Patch-based (default): --patch-size 32 --patches-per-image 64 --detector harris
2. Full-image (option): --full-image --image-size 256
Benefits:
- Focuses training on interesting regions
- Handles variable image sizes naturally
- Matches CNN v1 workflow
- Better convergence with limited data (8 images → 512 patches)
Script updated:
- train_cnn_v2_full.sh: Patch-based by default
- Configuration exposed for easy switching
Example:
./scripts/train_cnn_v2_full.sh # Patch-based
# Edit script: uncomment FULL_IMAGE for resize mode
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
Training script now resizes all images to fixed size before batching.
Issue: RuntimeError when batching variable-sized images
- Images had different dimensions (376x626 vs 344x361)
- PyTorch DataLoader requires uniform tensor sizes for batching
Solution:
- Add --image-size parameter (default: 256)
- Resize all images to target_size using LANCZOS interpolation
- Preserves aspect ratio independent training
Changes:
- train_cnn_v2.py: ImagePairDataset now resizes to fixed dimensions
- train_cnn_v2_full.sh: Added IMAGE_SIZE=256 configuration
Tested: 8 image pairs, variable sizes → uniform 256×256 batches
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
Infrastructure for enhanced CNN post-processing with 7D feature input.
Phase 1: Shaders
- Static features compute (RGBD + UV + sin10_x + bias → 8×f16)
- Layer template (convolution skeleton, packing/unpacking)
- 3 mip level support for multi-scale features
Phase 2: C++ Effect
- CNNv2Effect class (multi-pass architecture)
- Texture management (static features, layer buffers)
- Build integration (CMakeLists, assets, tests)
Phase 3: Training Pipeline
- train_cnn_v2.py: PyTorch model with static feature concatenation
- export_cnn_v2_shader.py: f32→f16 quantization, WGSL generation
- Configurable architecture (kernels, channels)
Phase 4: Validation
- validate_cnn_v2.sh: End-to-end pipeline
- Checkpoint → shaders → build → test images
Tests: 36/36 passing
Next: Complete render pipeline implementation (bind groups, multi-pass)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|