demo.git/src/gpu, branch main

style: apply clang-format

2026-05-21T06:10:47Z

fix: correct WGPUTexelCopyBufferInfo initialization order

2026-05-21T06:03:58Z

fix: code review cleanup — bugs, dead code, factorization, simplification

2026-05-20T21:21:59Z

Bugs: - B1: fix dead tempo debug (prev_tempo captured after assignment) - B2: fix ReloadAssetsFromFile leak for disk-loaded assets; simplify DropAsset - B3: fix get_free_pool_slot leak (unregister synth + free data on reuse) - B4: volatile -> std::atomic with acquire/release in miniaudio_backend, synth - B5: fix unaligned reads in scene_loader (memcpy-based read_f32/read_u32) - B6: fix shader module + BGL + pipeline layout leaks in gpu.cc, pipeline_builder Dead code: - D1: remove unused particle_defs.h - D3: remove create_post_process_pipeline_simple (zero callers) - D4: remove empty gpu_draw() - D5: remove write-only Hybrid3D::initialized_ - D6: remove legacy pending buffer path in audio.cc Factorization: - F1: Effect::run_fullscreen_pass() replaces boilerplate in 5 effects - F2: particle_common.wgsl snippet, #include in 3 WGSL shaders - F3: gpu_create_shader_module() helper, used in 3 call sites - F5: get_world_aabb() shared between bvh.cc and physics.cc - F6: samples_to_seconds() replaces 6 inline expressions - F7: gpu_create_linear/nearest_sampler use SamplerCache; add nearest() preset Simplification: - S9+S1: WgslSamplerType param; Scene2Effect collapsed to thin wrapper - S4: FFT heap allocs -> stack arrays (zero allocs on hot path) - S5: ObjectType::CUBE documented as legacy alias for BOX; default changed - S6: bind group dirty-flag in Renderer3D; remove duplicate pipeline set - S7: create_gpu_procedural() helper in texture_manager (~80 lines removed) 37/37 tests passing. handoff(Claude): code review batch — all items verified, no regressions.

fix: code review cleanup — bugs, dead code, factorization (-167 lines)

2026-05-20T20:44:44Z

Bugs: - B1: fix dead tempo debug (prev_tempo captured after assignment) - B2: fix ReloadAssetsFromFile leak for disk-loaded assets; simplify DropAsset - B3: fix get_free_pool_slot leak (unregister synth + free data on reuse) - B4: volatile -> std::atomic with acquire/release in miniaudio_backend, synth - B5: fix unaligned reads in scene_loader (memcpy-based read_f32/read_u32) - B6: fix shader module + BGL + pipeline layout leaks in gpu.cc, pipeline_builder Dead code: - D1: remove unused particle_defs.h - D3: remove create_post_process_pipeline_simple (zero callers) - D4: remove empty gpu_draw() - D5: remove write-only Hybrid3D::initialized_ - D6: remove legacy pending buffer path in audio.cc Factorization: - F1: Effect::run_fullscreen_pass() replaces boilerplate in 5 effects - F2: particle_common.wgsl snippet, #include in 3 WGSL shaders - F3: gpu_create_shader_module() helper, used in 3 call sites - F5: get_world_aabb() shared between bvh.cc and physics.cc - F6: samples_to_seconds() replaces 6 inline expressions - F7: gpu_create_linear/nearest_sampler use SamplerCache; add nearest() preset 37/37 tests passing. handoff(Claude): code review batch — all items verified, no regressions.

chore(src/gpu): remove stale cnn_v1/v2 artifacts and dead comments

2026-03-26T06:22:19Z

- demo_effects.h: drop commented-out cnn_v1/v2 includes (will be removed) - gpu.cc: replace stale "V2:" migration comments with accurate descriptions - effect.h, sequence.h: drop redundant #ifndef guards (kept #pragma once) handoff(Gemini): stale comment cleanup in src/gpu/ — no logic changes

feat(cnn_v3): upgrade architecture to enc_channels=[8,16]

2026-03-26T06:03:01Z

Double encoder capacity: enc0 4→8ch, enc1 8→16ch, bottleneck 16→16ch, dec1 32→8ch, dec0 16→4ch. Total weights 2476→7828 f16 (~15.3 KB). FiLM MLP output 40→72 params (L1: 16×40→16×72). 16-ch textures split into _lo/_hi rgba32uint pairs (enc1, bottleneck). enc0 and dec1 textures changed from rgba16float to rgba32uint (8ch). GBUF_RGBA32UINT node gains CopySrc for parity test readback. - WGSL shaders: all 5 passes rewritten for new channel counts - C++ CNNv3Effect: new weight offsets/sizes, 8ch uniform structs - Web tool (shaders.js + tester.js): matching texture formats and bindings - Parity test: readback_rgba32uint_8ch helper, updated vector counts - Training scripts: default enc_channels=[8,16], updated docstrings - Docs + architecture PNG regenerated handoff(Gemini): CNN v3 [8,16] upgrade complete. All code, tests, web tool, training scripts, and docs updated. Next: run training pass.

feat(gbuffer): wire_dag() + find_downstream_output() for temporal feedback

2026-03-23T06:54:18Z

- Add Effect::wire_dag() virtual (called from init_effect_nodes after full DAG built) - Add Effect::find_downstream_output() protected helper (first downstream consumer output) - GBufferEffect::wire_dag() auto-sets cnn_output_node_ via find_downstream_output, guarding against sink (external view, null texture) - GBufferEffect::post_render() null-checks src texture before CopyTextureToTexture - Tests: find_downstream_output cases + wire_dag integration in test_effect_base - Doc: SEQUENCE.md updated with wire_dag pattern, helper contract, and sink guard Co-Authored-By: Claude Sonnet 4.6

feat(cnn_v3): GBufferEffect temporal feedback via post_render()

2026-03-23T06:31:14Z

- Add Effect::post_render() virtual hook, called after all effects in the sequence have rendered each frame. Default is no-op. - Sequence::render_effects() runs a second pass invoking post_render() on all DAG nodes after the render pass completes. - GBufferEffect: declare internal node_prev_tex_ (U8X4_NORM) for persistent prev-frame CNN output. post_render() copies cnn_output_node_ → node_prev_tex_ via CopyTextureToTexture. render() binds node_prev_tex_ as prev_cnn (binding 6) — zero on frame 0 (matches training convention). - Expose set_cnn_output_node(name) API; call once at setup. - Drop brittle ping-pong / input_nodes_[0] fallback. - Update doc/SEQUENCE.md: post_render() semantics, frame execution order, temporal feedback canonical pattern, node types table with G-buffer types. - Update cnn_v3/docs/HOWTO.md: temporal feedback wiring section. 36/36 tests passing. handoff(Gemini): prev.rgb temporal feedback now correct and generic. Set set_cnn_output_node("sink") (or CNN output node name) once at setup.

feat(cnn_v3): GBufDeferredEffect — simple deferred render (albedo * shadow)

2026-03-22T18:58:04Z

New effect unpacks feat_tex0/feat_tex1 and outputs albedo * shadow. Replaces CNNv3Effect in cnn_v3_test sequence until training is complete. 37/37 tests passing. handoff(Gemini): GBufDeferredEffect wired in timeline; CNN v3 pipeline: GBufferEffect → GBufDeferredEffect → sink.

feat(cnn_v3): add G-buffer visualizer + web sample loader (Phase 7)

2026-03-22T15:21:25Z

C++ GBufViewEffect: renders all 20 feature channels from feat_tex0/feat_tex1 in a 4×5 tiled grid. Custom BGL with WGPUTextureSampleType_Uint; bind group rebuilt per frame via wgpuRenderPipelineGetBindGroupLayout. Web tool: "Load sample directory" button — webkitdirectory picker, FULL_PACK_SHADER compute (matches gbuf_pack.wgsl packing), runFromFeat() skips photo-pack step, computePSNR() readback + comparison vs target.png side-by-side. 36/36 tests pass. Docs updated: HOWTO.md §9, README, PROJECT_CONTEXT, TODO, COMPLETED. handoff(Gemini): CNN v3 Phase 7 done. Next: run train_cnn_v3.py (see HOWTO §3).