demo.git - Vide-coded 64k demo system

Age	Commit message (Collapse)	Author
9 days	style: apply clang-formatHEAD main	skal

9 days	fix: code review cleanup — bugs, dead code, factorization, simplification	skal
	Bugs: - B1: fix dead tempo debug (prev_tempo captured after assignment) - B2: fix ReloadAssetsFromFile leak for disk-loaded assets; simplify DropAsset - B3: fix get_free_pool_slot leak (unregister synth + free data on reuse) - B4: volatile -> std::atomic with acquire/release in miniaudio_backend, synth - B5: fix unaligned reads in scene_loader (memcpy-based read_f32/read_u32) - B6: fix shader module + BGL + pipeline layout leaks in gpu.cc, pipeline_builder Dead code: - D1: remove unused particle_defs.h - D3: remove create_post_process_pipeline_simple (zero callers) - D4: remove empty gpu_draw() - D5: remove write-only Hybrid3D::initialized_ - D6: remove legacy pending buffer path in audio.cc Factorization: - F1: Effect::run_fullscreen_pass() replaces boilerplate in 5 effects - F2: particle_common.wgsl snippet, #include in 3 WGSL shaders - F3: gpu_create_shader_module() helper, used in 3 call sites - F5: get_world_aabb() shared between bvh.cc and physics.cc - F6: samples_to_seconds() replaces 6 inline expressions - F7: gpu_create_linear/nearest_sampler use SamplerCache; add nearest() preset Simplification: - S9+S1: WgslSamplerType param; Scene2Effect collapsed to thin wrapper - S4: FFT heap allocs -> stack arrays (zero allocs on hot path) - S5: ObjectType::CUBE documented as legacy alias for BOX; default changed - S6: bind group dirty-flag in Renderer3D; remove duplicate pipeline set - S7: create_gpu_procedural() helper in texture_manager (~80 lines removed) 37/37 tests passing. handoff(Claude): code review batch — all items verified, no regressions.
2026-03-27	fix(cnn_v3): remove dec0 ReLU, load FiLM MLP at runtime	skal
	Two bugs blocking training convergence: 1. dec0 ReLU before sigmoid constrained output to [0.5,1.0] — network could never produce dark pixels. Removed F.relu in train_cnn_v3.py and max(0,…) in cnn_v3_dec0.wgsl. Test vectors regenerated. 2. set_film_params() used hardcoded heuristics instead of the trained MLP. Added CNNv3FilmMlp struct + load_film_mlp() to cnn_v3_effect.h/.cc. MLP auto-loaded from ASSET_WEIGHTS_CNN_V3_FILM_MLP at construction; Linear(5→16)→ReLU→Linear(16→72) runs CPU-side each frame. 36/36 tests pass. Parity max_err=4.88e-4 unchanged. handoff(Gemini): retrain from scratch — needs ≥50 samples (currently 11). See cnn_v3/docs/HOWTO.md §2-3.
2026-03-26	feat(cnn_v3): upgrade architecture to enc_channels=[8,16]	skal
	Double encoder capacity: enc0 4→8ch, enc1 8→16ch, bottleneck 16→16ch, dec1 32→8ch, dec0 16→4ch. Total weights 2476→7828 f16 (~15.3 KB). FiLM MLP output 40→72 params (L1: 16×40→16×72). 16-ch textures split into _lo/_hi rgba32uint pairs (enc1, bottleneck). enc0 and dec1 textures changed from rgba16float to rgba32uint (8ch). GBUF_RGBA32UINT node gains CopySrc for parity test readback. - WGSL shaders: all 5 passes rewritten for new channel counts - C++ CNNv3Effect: new weight offsets/sizes, 8ch uniform structs - Web tool (shaders.js + tester.js): matching texture formats and bindings - Parity test: readback_rgba32uint_8ch helper, updated vector counts - Training scripts: default enc_channels=[8,16], updated docstrings - Docs + architecture PNG regenerated handoff(Gemini): CNN v3 [8,16] upgrade complete. All code, tests, web tool, training scripts, and docs updated. Next: run training pass.
2026-03-25	feat(cnn_v3): 3×3 dilated bottleneck + Sobel loss + FiLM warmup + ↵	skal
	architecture PNG - Replace 1×1 pointwise bottleneck with Conv(8→8, 3×3, dilation=2): effective RF grows from ~13px to ~29px at ¼res (~+1 KB weights) - Add Sobel edge loss in training (--edge-loss-weight, default 0.1) - Add FiLM 2-phase training: freeze MLP for warmup epochs then unfreeze at lr×0.1 (--film-warmup-epochs, default 50) - Update weight layout: BN 72→584 f16, total 1964→2476 f16 (4952 B) - Cascade offsets in C++ effect, JS tool, export/gen_test_vectors scripts - Regenerate test_vectors.h (1238 u32); parity max_err=9.77e-04 - Generate dark-theme U-Net+FiLM architecture PNG (gen_architecture_png.py) - Replace ASCII art in CNN_V3.md and HOW_TO_CNN.md with PNG embed handoff(Gemini): bottleneck dilation + Sobel loss + FiLM warmup landed. Next: run first real training pass (see cnn_v3/docs/HOWTO.md §3).
2026-03-23	fix(cnn_v3_debug): add CNNv3Effect to debug sequence for prev.r/g/b temporal ↵	skal
	feedback timeline.seq is the canonical source — timeline.cc was wrongly hand-edited. Add CNNv3Effect + cnn_out (gbuf_albedo) node to cnn_v3_debug sequence so wire_dag() can wire GBufferEffect.cnn_output_node_ correctly. Also fix node_prev_tex_ NodeType: F16X8 (Rgba16Float+CopyDst) to match CNNv3Effect output format (GBUF_ALBEDO = Rgba16Float). Regenerated timeline.cc via: python3 tools/seq_compiler.py workspaces/main/timeline.seq Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23	feat(gbuffer): wire_dag() + find_downstream_output() for temporal feedback	skal
	- Add Effect::wire_dag() virtual (called from init_effect_nodes after full DAG built) - Add Effect::find_downstream_output() protected helper (first downstream consumer output) - GBufferEffect::wire_dag() auto-sets cnn_output_node_ via find_downstream_output, guarding against sink (external view, null texture) - GBufferEffect::post_render() null-checks src texture before CopyTextureToTexture - Tests: find_downstream_output cases + wire_dag integration in test_effect_base - Doc: SEQUENCE.md updated with wire_dag pattern, helper contract, and sink guard Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23	feat(cnn_v3): GBufferEffect temporal feedback via post_render()	skal
	- Add Effect::post_render() virtual hook, called after all effects in the sequence have rendered each frame. Default is no-op. - Sequence::render_effects() runs a second pass invoking post_render() on all DAG nodes after the render pass completes. - GBufferEffect: declare internal node_prev_tex_ (U8X4_NORM) for persistent prev-frame CNN output. post_render() copies cnn_output_node_ → node_prev_tex_ via CopyTextureToTexture. render() binds node_prev_tex_ as prev_cnn (binding 6) — zero on frame 0 (matches training convention). - Expose set_cnn_output_node(name) API; call once at setup. - Drop brittle ping-pong / input_nodes_[0] fallback. - Update doc/SEQUENCE.md: post_render() semantics, frame execution order, temporal feedback canonical pattern, node types table with G-buffer types. - Update cnn_v3/docs/HOWTO.md: temporal feedback wiring section. 36/36 tests passing. handoff(Gemini): prev.rgb temporal feedback now correct and generic. Set set_cnn_output_node("sink") (or CNN output node name) once at setup.
2026-03-23	wip(cnn_v3): shadow→dif intermediate + scene tweaks + migration plan	skal
	- gbuf_shadow.wgsl: normal bias 0.05→0.02 - gbuf_pack.wgsl: compute dif=diffuse*shadow, drop shadow from t1.z, store dif in t1.w (INTERMEDIATE — incorrect packing, see migration plan) - gbuf_deferred.wgsl: read dif from t1.w.x (matches intermediate packing) - gbuf_view.wgsl: expand to 4×6 grid, show dif.r/g/b in row 5 (INTERMEDIATE — to be reverted to 4×5 with ch18=dif) - gbuffer_effect.cc: add small hovering sphere (r=0.6) above scene; swap cube/sphere positions; both spheres pulsate - docs/GBUF_DIF_MIGRATION.md: full migration plan with checklist handoff(Claude): intermediate commit — GBUF_DIF_MIGRATION.md §Current State describes what is wrong and the full implementation checklist (5 steps).
2026-03-22	refactor(cnn_v3): simplify sphere SDF in shadow pass, remove per-frame alloc	skal
	gbuf_shadow.wgsl — dfWithID(): - Sphere: replace inv_model local-space transform with direct world-space formula (length(p - center) - radius). Exact, no matrix multiply, no floating-point error from matrix inversion that can corrupt soft-shadow penumbra over 64 march steps. - lp/scale now computed only inside the cases that need them (box/torus/plane) instead of eagerly for every object. gbuffer_effect.cc — upload_scene_data(): - Replace per-frame std::vector<GBufObjectData> heap allocation with a file-static staging buffer s_obj_staging[256]: zero alloc per frame. handoff(Gemini): sphere SDF now exact; shadow march should be cleaner.
2026-03-22	fix(cnn_v3): shadow pass — 5 bugs fixed, labels in gbuf_view	skal
	1. Camera Y-inversion: proj.m[5] = -proj.m[5] in upload_scene_data + WGPUFrontFace_CCW on raster pipeline. 2. Shadow formula: replace shadowWithStoredDistance with 64-step IQ soft shadow (8d/t, unbounded). 3. Local→world SDF scale: d = length(obj.model[0].xyz). 4. Shadow bias: use rasterized normal from normal_mat_tex (binding 4) instead of light direction — fixes terminator self-shadow on spheres. 5. ShaderComposer: GBufViewEffect now resolves #include via ShaderComposer::Get().Compose(). Also: per-tile channel labels in gbuf_view.wgsl via debug_str. Scene simplified to 1 cube + 1 sphere for debugging (restore TODO). Scale propagation for pulsating sphere confirmed correct end-to-end. handoff(Gemini): shadow validated. Next: restore full scene in GBufferEffect::set_scene() (20 cubes + 4 spheres, 2 lights), then run training pass per cnn_v3/docs/HOWTO.md §3.
2026-03-22	docs+feat(cnn_v3): compact context, re-enable shadow in GBufDeferredEffect	skal
	- TODO/PROJECT_CONTEXT updated to reflect operational pipeline state - GBufDeferredEffect: shadow re-enabled (albedo * (ambient + diffuse * shadow)) feat_tex1 binding restored for shadow channel debugging handoff(Gemini): shadow pass live again — investigate why shadow looks broken.
2026-03-22	fix(cnn_v3): frontFace_CW for raster pipeline + sphere impostor in gbuf_raster	skal
	- Missing WGPUFrontFace_CW (Y-flipped perspective) caused back faces to render instead of front faces → cubes appeared inside-out. - Sphere objects now use ray-sphere impostor in fs_main: correct silhouette, smooth normal from hit point, and reprojected clip-space depth.
2026-03-22	fix(cnn_v3): resolve #include via ShaderComposer in GBufDeferredEffect	skal
	Raw WGSL was sent to WebGPU without resolving the math/normal include. Also removed unused feat_tex1 binding (shadow dropped for now).
2026-03-22	feat(cnn_v3): GBufDeferredEffect — simple deferred render (albedo * shadow)	skal
	New effect unpacks feat_tex0/feat_tex1 and outputs albedo * shadow. Replaces CNNv3Effect in cnn_v3_test sequence until training is complete. 37/37 tests passing. handoff(Gemini): GBufDeferredEffect wired in timeline; CNN v3 pipeline: GBufferEffect → GBufDeferredEffect → sink.
2026-03-22	fix(cnn_v3): call set_scene() in constructor + orbiting camera	skal
	- GBufferEffect::render() was a no-op (scene_ready_=false) because set_scene() was never called from the timeline sequence constructor. Fixed by calling set_scene() at the end of the constructor. - Camera now orbits the scene at 0.3 rad/s (R=6, y=2.5). handoff(Gemini): cnn_v3_test sequence now renders G-buffer + GBufViewEffect with animated orbiting camera.
2026-03-22	refactor(cnn_v3): GBufferEffect cleanup	skal
	Remove dead code and reduce duplication: - drop create_bilinear_sampler() (never called) - drop update_pack_bind_group() stub and pack_bind_group_ member - drop node_feat0_/node_feat1_; use output_nodes_[0/1] directly - Compose({}, src) consistently for all three pipelines - extract clear_r8_node() helper to replace two identical 10-line blocks No behavior change. 36/36 tests pass.
2026-03-22	feat(cnn_v3): GBufferEffect Pass 2 — SDF shadow raymarching	skal
	Implements gbuf_shadow.wgsl: fullscreen render pass that reads depth from Pass 1, reconstructs world-space positions, evaluates a proxy-box SDF for each object (via inv_model), computes soft shadows for both directional lights using shadowWithStoredDistance(), and writes shadow factor to the RGBA8Unorm node_shadow_ target consumed by gbuf_pack.wgsl. Bind layout: B0=GlobalUniforms, B1=ObjectsBuffer (storage-read), B2=texture_depth_2d, B3=GBufLightsUniforms. Sky fragments (depth=1.0) are output as 1.0 (fully lit). Falls back to clear(1.0) if pipeline is not ready. 36/36 tests pass. handoff(Gemini): Pass 2 done. Pass 3 (transparency) still TODO. Phase 4 (type-aware SDF) optional after visual validation.
2026-03-22	feat(cnn_v3): GBufferEffect internal scene + GBufViewEffect debug wiring	skal
	GBufferEffect: - set_scene() now owns Scene/Camera internally; no external pointers needed - 20 randomly rotating cubes (xorshift32 seed, axis-angle animation) - 4 pumping spheres (radius = base_r * (1 + audio_intensity * 0.8)) - Camera at (0,2.5,6) looking at origin; aspect updated per-frame - GBufLightsUniforms: 2 directional lights (warm key + cool fill) - object_type written to ObjectData.params.x (ready for SDF shadow) - shadow/transp nodes cleared via zero-draw render passes (placeholder) - bilinear sampler cached via create_linear_sampler() / sampler_.get() - dead placeholder textures removed GBufViewEffect: - gbuf_view.wgsl: all channels now fully grayscale (removed color tint) - seq_compiler.py: GBufViewEffect added to CLASS_TO_HEADER - timeline.seq: cnn_v3_test uses GBufViewEffect -> sink for debug view Docs: HOWTO.md §1 updated with set_scene() description + §1b implementation plan for Pass 2 SDF shadow (shader spec, bind layout, C++ additions) handoff(Gemini): GBufferEffect has internal scene, 36/36 tests green. Next: implement Pass 2 shadow (gbuf_shadow.wgsl) per §1b plan in HOWTO.md.
2026-03-22	feat(cnn_v3): add G-buffer visualizer + web sample loader (Phase 7)	skal
	C++ GBufViewEffect: renders all 20 feature channels from feat_tex0/feat_tex1 in a 4×5 tiled grid. Custom BGL with WGPUTextureSampleType_Uint; bind group rebuilt per frame via wgpuRenderPipelineGetBindGroupLayout. Web tool: "Load sample directory" button — webkitdirectory picker, FULL_PACK_SHADER compute (matches gbuf_pack.wgsl packing), runFromFeat() skips photo-pack step, computePSNR() readback + comparison vs target.png side-by-side. 36/36 tests pass. Docs updated: HOWTO.md §9, README, PROJECT_CONTEXT, TODO, COMPLETED. handoff(Gemini): CNN v3 Phase 7 done. Next: run train_cnn_v3.py (see HOWTO §3).
2026-03-22	fix(cnn_v3): fix texture format mismatches in cnn_v3_test sequence	skal
	- seq_compiler: add gbuf_albedo/gbuf_rgba32uint to NODE_TYPES - timeline: declare gbuf_feat0/feat1 as gbuf_rgba32uint, route CNNv3Effect output through cnn_v3_out (gbuf_albedo) + Passthrough to sink (dec0 can't write directly to Rgba8Unorm sink) - cnn_v3_effect: fix update_bind_groups using .set() instead of .replace() causing FATAL assert on second frame - TODO: add CNN v3 "2D mode" (G-buffer-free) future task handoff(Gemini): CNNv3Effect now runs without crashes at --seek 48
2026-03-22	feat(cnn_v3): wire trained weights into CNNv3Effect + add timeline test sequence	skal
	- CNNv3Effect constructor loads ASSET_WEIGHTS_CNN_V3 via GetAsset on startup - seq_compiler.py: CLASS_TO_HEADER supports full #include paths for cnn_v3/ classes - timeline.seq: add cnn_v3_test sequence at 48s (GBufferEffect → CNNv3Effect) - test_cnn_v3_parity: zero_weights test now explicitly uploads zeros to override asset handoff(Gemini): CNNv3Effect ready; export weights to workspaces/main/weights/ and seek to 48s to test
2026-03-21	refactor(cnn_v3): code review — comments, simplifications, test fix	skal
	C++: - cnn_v3_effect.cc: fix declare_nodes comment (output node declared by caller) - cnn_v3_effect.cc: add TODO(phase-7) marker for FiLM MLP replacement WGSL: - cnn_v3_bottleneck.wgsl: consolidate _pad fields onto one line, explain why array<u32,3> is invalid in uniform address space - cnn_v3_enc0.wgsl: fix "12xu8" → "12ch u8norm" in header comment - cnn_v3_dec0.wgsl: clarify parity note (sigmoid after FiLM+ReLU, not raw conv) - cnn_v3_common.wgsl: clarify unpack_8ch pack layout (low/high 16 bits) Python: - cnn_v3_utils.py: replace PIL-based _upsample_nearest (uint8 round-trip) with pure numpy index arithmetic; rename _resize_rgb → _resize_img (handles any channel count); add comment on normal zero-pad workaround - export_cnn_v3_weights.py: add cross-ref to cnn_v3_effect.cc constants; clarify weight count comments with Conv notation Test: - test_cnn_v3_parity.cc: enc0/dec1 layer failures now return 0 (were print-only) handoff(Gemini): CNN v3 review complete, 36/36 tests passing.
2026-03-21	feat(cnn_v3): Phase 5 complete — parity validation passing (36/36 tests)	skal
	- Add test_cnn_v3_parity.cc: zero_weights + random_weights tests - Add gen_test_vectors.py: PyTorch reference implementation for enc0/enc1/bn/dec1/dec0 - Add test_vectors.h: generated C header with enc0, dec1, output expected values - Fix declare_nodes(): intermediate textures at fractional resolutions (W/2, W/4) using new NodeRegistry::default_width()/default_height() getters - Add layer-by-layer readback (enc0, dec1) for regression coverage - Final parity: enc0 max_err=1.95e-3, dec1 max_err=1.95e-3, out max_err=4.88e-4 handoff(Claude): CNN v3 parity done. Next: train_cnn_v3.py (FiLM MLP training).
2026-03-21	feat(cnn_v3): Phase 4 complete — CNNv3Effect C++ + FiLM uniform upload	skal
	- cnn_v3/src/cnn_v3_effect.{h,cc}: full Effect subclass with 5 compute passes (enc0→enc1→bottleneck→dec1→dec0), shared weights storage buffer, per-pass uniform buffers, set_film_params() API - Fixed WGSL/C++ struct alignment: vec3u has align=16, so CnnV3Params4ch is 64 bytes and CnnV3ParamsEnc1 is 96 bytes (not 48/80) - Weight offsets computed as explicit formulas (e.g. 2049+4) for clarity - Registered in CMake, shaders.h/cc, demo_effects.h, test_demo_effects.cc - 35/35 tests pass handoff(Gemini): CNN v3 Phase 5 next — parity validation (Python ref vs WGSL)
2026-03-20	feat(cnn_v3): Phase 1 complete - GBufferEffect integrated + HOWTO playbook	skal
	- Wire GBufferEffect into demo build: assets.txt, DemoSourceLists.cmake, demo_effects.h, shaders.h/cc. ShaderComposer::Compose() applied to gbuf_raster.wgsl (resolves #include "common_uniforms"). - Add GBufferEffect construction test. 35/35 passing. - Write cnn_v3/docs/HOWTO.md: G-buffer wiring, training data prep, training plan, per-pixel validation workflow, phase status table, troubleshooting guide. - Add project hooks: remind to update HOWTO.md on cnn_v3/ edits; warn on direct str_view(*_wgsl) usage bypassing ShaderComposer. - Update PROJECT_CONTEXT.md and TODO.md: Phase 1 done, Phase 3 (WGSL U-Net shaders) is next active. handoff(Gemini): CNN v3 Phase 3 is next - WGSL enc/dec/bottleneck/FiLM shaders in cnn_v3/shaders/. See cnn_v3/docs/CNN_V3.md Architecture section and cnn_v3/docs/HOWTO.md section 3 for spec. GBufferEffect outputs feat_tex0 + feat_tex1 (rgba32uint, 20ch, 32 bytes/pixel). C++ CNNv3Effect (Phase 4) takes those as input nodes.
2026-03-20	feat(cnn_v3): G-buffer phase 1 + training infrastructure	skal
	G-buffer (Phase 1): - Add NodeTypes GBUF_ALBEDO/DEPTH32/R8/RGBA32UINT to NodeRegistry - GBufferEffect: MRT raster pass (albedo+normal_mat+depth) + pack compute - Shaders: gbuf_raster.wgsl (MRT), gbuf_pack.wgsl (feature packing, 32B/px) - Shadow/SDF passes stubbed (placeholder textures), CMake integration deferred Training infrastructure (Phase 2): - blender_export.py: headless EXR export with all G-buffer render passes - pack_blender_sample.py: EXR → per-channel PNGs (oct-normals, 1/z depth) - pack_photo_sample.py: photo → zero-filled G-buffer sample layout handoff(Gemini): G-buffer phases 3-5 remain (U-Net shaders, CNNv3Effect, parity)