demo.git - Vide-coded 64k demo system

Age	Commit message (Collapse)	Author
10 hours	Refactor: Move application entry points to src/app/	skal
	Moved main.cc, stub_main.cc, and test_demo.cc from src/ to src/app/ for better organization. Updated cmake/DemoExecutables.cmake paths. handoff(Claude): App files reorganized into src/app/ directory
30 hours	Refine training script output and validation	skal
	1. Loss printed at every epoch with \r (no scrolling) 2. Validation only on final epoch (not all checkpoints) 3. Process all input images (not just img_000.png) Training output now shows live progress with single line update.
30 hours	TODO: 8-bit weight quantization for 2× size reduction	skal
	- Add QAT (quantization-aware training) notes - Requires training with fake quantization - Target: ~1.6 KB weights (vs 3.2 KB f16) - Shader unpacking needs adaptation (4× u8 per u32)
30 hours	CNN v2: Storage buffer complete - real weights exported	skal
	- Export weights from epoch 70 checkpoint (3.2 KB binary) - Disable shader template generation (use manual cnn_v2_compute.wgsl) - Build successful with real weights - Ready for integration testing Storage buffer architecture complete: - Dynamic layer count support - ~0.3ms overhead vs constants (negligible) - Single shader, flexible configuration - Binary format: header + layer info + f16 weights
30 hours	CNN v2: Complete multi-layer compute execution	skal
	- Create bind groups per layer with ping-pong buffers - Update layer params uniform per dispatch - Execute all layers in sequence with proper input/output swapping - Ready for weight export and end-to-end testing
30 hours	CNN v2: storage buffer architecture foundation	skal
	- Add binary weight format (header + layer info + packed f16) - New export_cnn_v2_weights.py for binary weight export - Single cnn_v2_compute.wgsl shader with storage buffer - Load weights in CNNv2Effect::load_weights() - Create layer compute pipeline with 5 bindings - Fast training config: 100 epochs, 3×3 kernels, 8→4→4 channels Next: Complete bind group creation and multi-layer compute execution