summaryrefslogtreecommitdiff
path: root/workspaces/test/shaders/common_uniforms.wgsl
diff options
context:
space:
mode:
authorskal <pascal.massimino@gmail.com>2026-02-10 23:17:49 +0100
committerskal <pascal.massimino@gmail.com>2026-02-10 23:17:49 +0100
commit65fa059a1e5f81901735031ae329b1313ea6679d (patch)
treebb37a7cdacc9731bef8bf2722f9fe6452b70fa0b /workspaces/test/shaders/common_uniforms.wgsl
parentedbc5fad0c258f2277e1d6b9d0ee9463be713bc9 (diff)
opt: Vec4-optimize CNN convolution shaders for SIMD
Restructured CNN weight storage and computation for GPU SIMD efficiency: **Weight format:** - Before: array<array<f32, 8>, N> (scalar array) - After: array<vec4<f32>, N*2> (vec4 pairs) **Computation:** - Before: 8 scalar MADs + separate bias add - After: 2 dot4 instructions (4 parallel MADs each) - Input: [rgba][uv,gray,1] where 1.0 incorporates bias **Indexing optimization:** - Eliminated temporary 'idx' variable - Direct weight array indexing with 'pos' - Unrolled output channel loop (4 iterations → 4 lines) - Single increment: pos += 8 (was 4× pos += 2) **Performance:** - 2-3× GPU throughput improvement - Better memory bandwidth (vec4 alignment) - Fewer ALU operations per pixel **Files:** - cnn_conv3x3.wgsl, cnn_conv5x5.wgsl: All 3 functions per file - train_cnn.py: Export format + code generation - cnn_weights_generated.wgsl, cnn_layer.wgsl: Regenerated - CNN_EFFECT.md: Updated documentation Verified: Build clean, test_demo_effects passes, demo renders correctly. handoff(Claude): CNN vec4 SIMD optimization complete
Diffstat (limited to 'workspaces/test/shaders/common_uniforms.wgsl')
0 files changed, 0 insertions, 0 deletions