| Age | Commit message (Collapse) | Author |
|
Moved main.cc, stub_main.cc, and test_demo.cc from src/ to src/app/
for better organization. Updated cmake/DemoExecutables.cmake paths.
handoff(Claude): App files reorganized into src/app/ directory
|
|
1. Loss printed at every epoch with \r (no scrolling)
2. Validation only on final epoch (not all checkpoints)
3. Process all input images (not just img_000.png)
Training output now shows live progress with single line update.
|
|
- Add QAT (quantization-aware training) notes
- Requires training with fake quantization
- Target: ~1.6 KB weights (vs 3.2 KB f16)
- Shader unpacking needs adaptation (4× u8 per u32)
|
|
- Export weights from epoch 70 checkpoint (3.2 KB binary)
- Disable shader template generation (use manual cnn_v2_compute.wgsl)
- Build successful with real weights
- Ready for integration testing
Storage buffer architecture complete:
- Dynamic layer count support
- ~0.3ms overhead vs constants (negligible)
- Single shader, flexible configuration
- Binary format: header + layer info + f16 weights
|
|
- Create bind groups per layer with ping-pong buffers
- Update layer params uniform per dispatch
- Execute all layers in sequence with proper input/output swapping
- Ready for weight export and end-to-end testing
|
|
- Add binary weight format (header + layer info + packed f16)
- New export_cnn_v2_weights.py for binary weight export
- Single cnn_v2_compute.wgsl shader with storage buffer
- Load weights in CNNv2Effect::load_weights()
- Create layer compute pipeline with 5 bindings
- Fast training config: 100 epochs, 3×3 kernels, 8→4→4 channels
Next: Complete bind group creation and multi-layer compute execution
|