demo.git - Vide-coded 64k demo system

Age	Commit message (Collapse)	Author
2026-02-14	Fix --mix option: blend prev layer with static p4-p7, not p0-p3	skal
	Updated gen_identity_weights.py --mix mode to use static features p4-p7 (uv_x, uv_y, sin20_y, bias) at channels 8-11 instead of p0-p3 (RGB+D) at channels 4-7. Before: 0.5prev[i] + 0.5static_p{i} (channels 4-7) After: 0.5prev[i] + 0.5static_p{4+i} (channels 8-11) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-13	CNN v2: Remove vizScale, always clip to [0,1]	skal
	All layers now use scale 1.0, shader clamps values >1. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-13	CNN v2: Fix Layer 0 visualization scale (was 0.5, now 1.0)	skal
	Layer 0 output is clamped [0,1], does not need 0.5 dimming. Middle layers (ReLU) keep 0.5 scale for values >1. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-13	CNN v2: Alpha channel depth handling and layer visualization	skal
	Training changes: - Changed p3 default depth from 0.0 to 1.0 (far plane semantics) - Extract depth from target alpha channel in both datasets - Consistent alpha-as-depth across training/validation Test tool enhancements (cnn_test): - Added load_depth_from_alpha() for R32Float depth texture - Fixed bind group layout for UnfilterableFloat sampling - Added --save-intermediates with per-channel grayscale composites - Each layer saved as 4x wide PNG (p0-p3 stacked horizontally) - Global layers_composite.png for vertical layer stack overview Investigation notes: - Static features p4-p7 ARE computed and bound correctly - Sin_20_y pattern visibility difference between tools under investigation - Binary weights timestamp (Feb 13 20:36) vs HTML tool (Feb 13 22:12) - Next: Update HTML tool with canonical binary weights handoff(Claude): HTML tool weights update pending - base64 encoded canonical weights ready in /tmp/weights_b64.txt for line 392 replacement. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-13	Refactor: Move application entry points to src/app/	skal
	Moved main.cc, stub_main.cc, and test_demo.cc from src/ to src/app/ for better organization. Updated cmake/DemoExecutables.cmake paths. handoff(Claude): App files reorganized into src/app/ directory
2026-02-12	Refine training script output and validation	skal
	1. Loss printed at every epoch with \r (no scrolling) 2. Validation only on final epoch (not all checkpoints) 3. Process all input images (not just img_000.png) Training output now shows live progress with single line update.
2026-02-12	TODO: 8-bit weight quantization for 2× size reduction	skal
	- Add QAT (quantization-aware training) notes - Requires training with fake quantization - Target: ~1.6 KB weights (vs 3.2 KB f16) - Shader unpacking needs adaptation (4× u8 per u32)
2026-02-12	CNN v2: Storage buffer complete - real weights exported	skal
	- Export weights from epoch 70 checkpoint (3.2 KB binary) - Disable shader template generation (use manual cnn_v2_compute.wgsl) - Build successful with real weights - Ready for integration testing Storage buffer architecture complete: - Dynamic layer count support - ~0.3ms overhead vs constants (negligible) - Single shader, flexible configuration - Binary format: header + layer info + f16 weights
2026-02-12	CNN v2: Complete multi-layer compute execution	skal
	- Create bind groups per layer with ping-pong buffers - Update layer params uniform per dispatch - Execute all layers in sequence with proper input/output swapping - Ready for weight export and end-to-end testing
2026-02-12	CNN v2: storage buffer architecture foundation	skal
	- Add binary weight format (header + layer info + packed f16) - New export_cnn_v2_weights.py for binary weight export - Single cnn_v2_compute.wgsl shader with storage buffer - Load weights in CNNv2Effect::load_weights() - Create layer compute pipeline with 5 bindings - Fast training config: 100 epochs, 3×3 kernels, 8→4→4 channels Next: Complete bind group creation and multi-layer compute execution