demo.git/checkpoints, branch main

Fix --mix option: blend prev layer with static p4-p7, not p0-p3

2026-02-14T00:04:07Z

Updated gen_identity_weights.py --mix mode to use static features p4-p7 (uv_x, uv_y, sin20_y, bias) at channels 8-11 instead of p0-p3 (RGB+D) at channels 4-7. Before: 0.5*prev[i] + 0.5*static_p{i} (channels 4-7) After: 0.5*prev[i] + 0.5*static_p{4+i} (channels 8-11) Co-Authored-By: Claude Sonnet 4.5

CNN v2: Remove vizScale, always clip to [0,1]

2026-02-13T22:42:53Z

All layers now use scale 1.0, shader clamps values >1. Co-Authored-By: Claude Sonnet 4.5

CNN v2: Fix Layer 0 visualization scale (was 0.5, now 1.0)

2026-02-13T22:40:30Z

Layer 0 output is clamped [0,1], does not need 0.5 dimming. Middle layers (ReLU) keep 0.5 scale for values >1. Co-Authored-By: Claude Sonnet 4.5

CNN v2: Alpha channel depth handling and layer visualization

2026-02-13T22:17:42Z

Training changes: - Changed p3 default depth from 0.0 to 1.0 (far plane semantics) - Extract depth from target alpha channel in both datasets - Consistent alpha-as-depth across training/validation Test tool enhancements (cnn_test): - Added load_depth_from_alpha() for R32Float depth texture - Fixed bind group layout for UnfilterableFloat sampling - Added --save-intermediates with per-channel grayscale composites - Each layer saved as 4x wide PNG (p0-p3 stacked horizontally) - Global layers_composite.png for vertical layer stack overview Investigation notes: - Static features p4-p7 ARE computed and bound correctly - Sin_20_y pattern visibility difference between tools under investigation - Binary weights timestamp (Feb 13 20:36) vs HTML tool (Feb 13 22:12) - Next: Update HTML tool with canonical binary weights handoff(Claude): HTML tool weights update pending - base64 encoded canonical weights ready in /tmp/weights_b64.txt for line 392 replacement. Co-Authored-By: Claude Sonnet 4.5

Refactor: Move application entry points to src/app/

2026-02-13T07:14:07Z

Moved main.cc, stub_main.cc, and test_demo.cc from src/ to src/app/ for better organization. Updated cmake/DemoExecutables.cmake paths. handoff(Claude): App files reorganized into src/app/ directory

Refine training script output and validation

2026-02-12T11:17:59Z

1. Loss printed at every epoch with \r (no scrolling) 2. Validation only on final epoch (not all checkpoints) 3. Process all input images (not just img_000.png) Training output now shows live progress with single line update.

TODO: 8-bit weight quantization for 2× size reduction

2026-02-12T11:11:53Z

- Add QAT (quantization-aware training) notes - Requires training with fake quantization - Target: ~1.6 KB weights (vs 3.2 KB f16) - Shader unpacking needs adaptation (4× u8 per u32)

CNN v2: Storage buffer complete - real weights exported

2026-02-12T11:10:40Z

- Export weights from epoch 70 checkpoint (3.2 KB binary) - Disable shader template generation (use manual cnn_v2_compute.wgsl) - Build successful with real weights - Ready for integration testing Storage buffer architecture complete: - Dynamic layer count support - ~0.3ms overhead vs constants (negligible) - Single shader, flexible configuration - Binary format: header + layer info + f16 weights

CNN v2: Complete multi-layer compute execution

2026-02-12T11:09:27Z

- Create bind groups per layer with ping-pong buffers - Update layer params uniform per dispatch - Execute all layers in sequence with proper input/output swapping - Ready for weight export and end-to-end testing

CNN v2: storage buffer architecture foundation

2026-02-12T11:08:22Z

- Add binary weight format (header + layer info + packed f16) - New export_cnn_v2_weights.py for binary weight export - Single cnn_v2_compute.wgsl shader with storage buffer - Load weights in CNNv2Effect::load_weights() - Create layer compute pipeline with 5 bindings - Fast training config: 100 epochs, 3×3 kernels, 8→4→4 channels Next: Complete bind group creation and multi-layer compute execution