| Age | Commit message (Collapse) | Author | |
|---|---|---|---|
| 20 hours | docs: Add CNN flatten mode technical analysis | skal | |
| Comprehensive analysis of single-pass CNN shader architecture: - Full flatten (3 layers): 544 bytes/thread register pressure - NOT recommended - Partial flatten (layers 1+2): 288 bytes/thread - marginal benefit - Current multi-pass: Optimal for GPU occupancy and maintainability Recommendation: Keep current 3-pass architecture. Alternative size optimizations: weight quantization, kernel reduction. handoff(Claude): CNN flatten analysis documented | |||
