demo.git/cmake, branch main

fix: code review cleanup — bugs, dead code, factorization, simplification

2026-05-20T21:21:59Z

Bugs: - B1: fix dead tempo debug (prev_tempo captured after assignment) - B2: fix ReloadAssetsFromFile leak for disk-loaded assets; simplify DropAsset - B3: fix get_free_pool_slot leak (unregister synth + free data on reuse) - B4: volatile -> std::atomic with acquire/release in miniaudio_backend, synth - B5: fix unaligned reads in scene_loader (memcpy-based read_f32/read_u32) - B6: fix shader module + BGL + pipeline layout leaks in gpu.cc, pipeline_builder Dead code: - D1: remove unused particle_defs.h - D3: remove create_post_process_pipeline_simple (zero callers) - D4: remove empty gpu_draw() - D5: remove write-only Hybrid3D::initialized_ - D6: remove legacy pending buffer path in audio.cc Factorization: - F1: Effect::run_fullscreen_pass() replaces boilerplate in 5 effects - F2: particle_common.wgsl snippet, #include in 3 WGSL shaders - F3: gpu_create_shader_module() helper, used in 3 call sites - F5: get_world_aabb() shared between bvh.cc and physics.cc - F6: samples_to_seconds() replaces 6 inline expressions - F7: gpu_create_linear/nearest_sampler use SamplerCache; add nearest() preset Simplification: - S9+S1: WgslSamplerType param; Scene2Effect collapsed to thin wrapper - S4: FFT heap allocs -> stack arrays (zero allocs on hot path) - S5: ObjectType::CUBE documented as legacy alias for BOX; default changed - S6: bind group dirty-flag in Renderer3D; remove duplicate pipeline set - S7: create_gpu_procedural() helper in texture_manager (~80 lines removed) 37/37 tests passing. handoff(Claude): code review batch — all items verified, no regressions.

ans: order-0 rANS coder + WGSL asset compression

2026-05-14T17:11:28Z

Adds src/util/ans.{h,cc}, a per-chunk-adaptive order-0 rANS entropy coder. Decoder is always built; encoder is gated on ANS_ENABLE_ENCODER (tools only). Both sides take an optional 256-entry initial_counts table to seed the adaptive model. The per-chunk initial state is (1 << kBits). Higher initial states (e.g. with a signature packed into the upper bits) force a renorm-emit at iter 0 that the decoder never consumes, corrupting multi-chunk streams once stats become skewed. Asset pipeline: - AssetRecord gains 'compression' and 'uncompressed_size' fields. - asset_packer scans every WGSL file to build a corpus-wide byte histogram, then ANS-encodes each shader using that histogram as the seed. Histogram and accessor are emitted alongside the asset table. Round-trip verification runs at pack time for every compressed asset; failures fall back to uncompressed storage. - asset_manager decompresses on first GetAsset(), caches the heap-allocated buffer, and DropAsset / ReloadAssetsFromFile free it along with the procedural cache. - Disk-load (dev) builds are unchanged: WGSL paths stay as filenames. Tests: - src/tests/util/test_ans.cc: roundtrip variants (empty, single byte, single-symbol run, all-zeros, random uniform/skewed, repeated ASCII), seeded-vs-uniform compression, rejection of mismatched counts / corruption / truncation, PeekUncompressedSize. - 37/37 dev, 36/36 STRIP_ALL. Compression observed: WGSL shaders shrink to ~0.62-0.71x in the main workspace (81 of 105 assets qualify). Docs: - doc/ANS.md (new): algorithm, bitstream, API, asset pipeline integration, compression numbers, limitations, tests. - doc/ASSET_SYSTEM.md: new Compression section + updated technical guarantees for compressed assets. - doc/COMPLETED.md: May 2026 entry. - PROJECT_CONTEXT.md: Build status line mentions WGSL ANS compression. - CLAUDE.md, GEMINI.md: tier-3 build doc list includes ANS.md.

feat(cnn_v3): add infer_cnn_v3.py + rewrite cnn_test for v3 parity

2026-03-25T07:07:53Z

- cnn_v3/training/infer_cnn_v3.py: PyTorch inference tool; simple mode (single PNG, zeroed geometry) and full mode (sample directory); supports --identity-film (γ=1 β=0) to match C++ default, --cond for FiLM MLP, --blend, --debug-hex for pixel comparison - tools/cnn_test.cc: full rewrite, v3 only; packs 20-channel features on CPU (training format: [0,1] oct normals, pyrdown mip), uploads to GPU, runs CNNv3Effect, reads back RGBA16Float, saves PNG; --sample-dir for full G-buffer input, --weights for .bin override, --debug-hex - cmake/DemoTests.cmake: add cnn_v3/src include path, drop unused offscreen_render_target.cc from cnn_test sources - cnn_v3/docs/HOWTO.md: new §10 documenting both tools, comparison workflow, and feature-format convention (training vs runtime) handoff(Gemini): cnn_test + infer_cnn_v3.py ready for parity testing. Run both with --identity-film / --debug-hex on same image to compare.

feat(cnn_v3): GBufDeferredEffect — simple deferred render (albedo * shadow)

2026-03-22T18:58:04Z

New effect unpacks feat_tex0/feat_tex1 and outputs albedo * shadow. Replaces CNNv3Effect in cnn_v3_test sequence until training is complete. 37/37 tests passing. handoff(Gemini): GBufDeferredEffect wired in timeline; CNN v3 pipeline: GBufferEffect → GBufDeferredEffect → sink.

feat(cnn_v3): add G-buffer visualizer + web sample loader (Phase 7)

2026-03-22T15:21:25Z

C++ GBufViewEffect: renders all 20 feature channels from feat_tex0/feat_tex1 in a 4×5 tiled grid. Custom BGL with WGPUTextureSampleType_Uint; bind group rebuilt per frame via wgpuRenderPipelineGetBindGroupLayout. Web tool: "Load sample directory" button — webkitdirectory picker, FULL_PACK_SHADER compute (matches gbuf_pack.wgsl packing), runFromFeat() skips photo-pack step, computePSNR() readback + comparison vs target.png side-by-side. 36/36 tests pass. Docs updated: HOWTO.md §9, README, PROJECT_CONTEXT, TODO, COMPLETED. handoff(Gemini): CNN v3 Phase 7 done. Next: run train_cnn_v3.py (see HOWTO §3).

feat(cnn_v3): Phase 5 complete — parity validation passing (36/36 tests)

2026-03-21T08:51:58Z

- Add test_cnn_v3_parity.cc: zero_weights + random_weights tests - Add gen_test_vectors.py: PyTorch reference implementation for enc0/enc1/bn/dec1/dec0 - Add test_vectors.h: generated C header with enc0, dec1, output expected values - Fix declare_nodes(): intermediate textures at fractional resolutions (W/2, W/4) using new NodeRegistry::default_width()/default_height() getters - Add layer-by-layer readback (enc0, dec1) for regression coverage - Final parity: enc0 max_err=1.95e-3, dec1 max_err=1.95e-3, out max_err=4.88e-4 handoff(Claude): CNN v3 parity done. Next: train_cnn_v3.py (FiLM MLP training).

feat(cnn_v3): Phase 4 complete — CNNv3Effect C++ + FiLM uniform upload

2026-03-21T07:52:53Z

- cnn_v3/src/cnn_v3_effect.{h,cc}: full Effect subclass with 5 compute passes (enc0→enc1→bottleneck→dec1→dec0), shared weights storage buffer, per-pass uniform buffers, set_film_params() API - Fixed WGSL/C++ struct alignment: vec3u has align=16, so CnnV3Params4ch is 64 bytes and CnnV3ParamsEnc1 is 96 bytes (not 48/80) - Weight offsets computed as explicit formulas (e.g. 20*4*9+4) for clarity - Registered in CMake, shaders.h/cc, demo_effects.h, test_demo_effects.cc - 35/35 tests pass handoff(Gemini): CNN v3 Phase 5 next — parity validation (Python ref vs WGSL)

feat(cnn_v3): Phase 1 complete - GBufferEffect integrated + HOWTO playbook

2026-03-20T08:22:18Z

- Wire GBufferEffect into demo build: assets.txt, DemoSourceLists.cmake, demo_effects.h, shaders.h/cc. ShaderComposer::Compose() applied to gbuf_raster.wgsl (resolves #include "common_uniforms"). - Add GBufferEffect construction test. 35/35 passing. - Write cnn_v3/docs/HOWTO.md: G-buffer wiring, training data prep, training plan, per-pixel validation workflow, phase status table, troubleshooting guide. - Add project hooks: remind to update HOWTO.md on cnn_v3/ edits; warn on direct str_view(*_wgsl) usage bypassing ShaderComposer. - Update PROJECT_CONTEXT.md and TODO.md: Phase 1 done, Phase 3 (WGSL U-Net shaders) is next active. handoff(Gemini): CNN v3 Phase 3 is next - WGSL enc/dec/bottleneck/FiLM shaders in cnn_v3/shaders/. See cnn_v3/docs/CNN_V3.md Architecture section and cnn_v3/docs/HOWTO.md section 3 for spec. GBufferEffect outputs feat_tex0 + feat_tex1 (rgba32uint, 20ch, 32 bytes/pixel). C++ CNNv3Effect (Phase 4) takes those as input nodes.

fix(build): move generated assets to per-build binary dir

2026-03-12T20:04:38Z

assets_data.cc was shared in src/generated/ across all builds. When a native debug build regenerated it with --disk_load (file paths), then build_win compiled with STRIP_ALL=ON, the disk-load fopen() path was compiled out, leaving raw macOS paths as shader source content — causing wgpu WGSL parse errors on Wine. Fix: GEN_DEMO_H/CC and all stamps now live in CMAKE_CURRENT_BINARY_DIR/ src/generated/ so each build dir independently generates assets in the correct mode (embedded vs disk-load). Added CMAKE_CURRENT_BINARY_DIR/src to CORE_INCLUDES so the binary-dir assets.h is resolved first. handoff(Gemini): build system fix — assets are now per-build-dir; tests 35/35 pass; build_win produces embedded shaders. Co-Authored-By: Claude Sonnet 4.6

fix(assets): regenerate assets when DEMO_STRIP_ALL toggles

2026-03-12T19:52:15Z

Write asset_packer_mode.flag at configure time so that switching between disk-load (debug) and embedded (STRIP_ALL) modes invalidates assets_data.cc. Previously a stale embedded assets_data.cc caused fopen() on WGSL bytes, silently failing all shader snippet registration (vs_main not found crash). Co-Authored-By: Claude Sonnet 4.6