demo.git - Vide-coded 64k demo system

Age	Commit message (Collapse)	Author
2026-03-29	docs: consolidate and sync docs with current codebase state	skal
	- PROJECT_CONTEXT.md: fix effect count (12→18), shader count (27→37), update CNN v3 pipeline description, tighten Next Up section - TODO.md: fix priority numbering, restore GPU PCM synthesis as pending, streamline CNN v3 section, consolidate Future items - doc/SEQUENCE.md: effect count 12→18 - cnn_v3/README.md: phases 1–7→1–9, test count 36→38, add phases 8–9 - cnn_v3/docs/HOWTO.md: fix dataset layout blender/photos→full/simple, update test counts 36→38 throughout - doc/COMPLETED.md: archive FFT/timing/OLA fixes, remove false GPU PCM claim - src/audio/audio_engine.cc: fix step comment numbering (6→5) - src/audio/synth.cc: remove stale fractional_pos tempo-scaling comment handoff(Gemini): docs now accurate — 18 effects, 37 shaders, 38/38 tests, GPU PCM synthesis back in TODO as pending, CNN v3 dataset layout corrected.
2026-03-27	fix(cnn_v3): remove dec0 ReLU, load FiLM MLP at runtime	skal
	Two bugs blocking training convergence: 1. dec0 ReLU before sigmoid constrained output to [0.5,1.0] — network could never produce dark pixels. Removed F.relu in train_cnn_v3.py and max(0,…) in cnn_v3_dec0.wgsl. Test vectors regenerated. 2. set_film_params() used hardcoded heuristics instead of the trained MLP. Added CNNv3FilmMlp struct + load_film_mlp() to cnn_v3_effect.h/.cc. MLP auto-loaded from ASSET_WEIGHTS_CNN_V3_FILM_MLP at construction; Linear(5→16)→ReLU→Linear(16→72) runs CPU-side each frame. 36/36 tests pass. Parity max_err=4.88e-4 unchanged. handoff(Gemini): retrain from scratch — needs ≥50 samples (currently 11). See cnn_v3/docs/HOWTO.md §2-3.
2026-03-26	feat(cnn_v3): upgrade architecture to enc_channels=[8,16]	skal
	Double encoder capacity: enc0 4→8ch, enc1 8→16ch, bottleneck 16→16ch, dec1 32→8ch, dec0 16→4ch. Total weights 2476→7828 f16 (~15.3 KB). FiLM MLP output 40→72 params (L1: 16×40→16×72). 16-ch textures split into _lo/_hi rgba32uint pairs (enc1, bottleneck). enc0 and dec1 textures changed from rgba16float to rgba32uint (8ch). GBUF_RGBA32UINT node gains CopySrc for parity test readback. - WGSL shaders: all 5 passes rewritten for new channel counts - C++ CNNv3Effect: new weight offsets/sizes, 8ch uniform structs - Web tool (shaders.js + tester.js): matching texture formats and bindings - Parity test: readback_rgba32uint_8ch helper, updated vector counts - Training scripts: default enc_channels=[8,16], updated docstrings - Docs + architecture PNG regenerated handoff(Gemini): CNN v3 [8,16] upgrade complete. All code, tests, web tool, training scripts, and docs updated. Next: run training pass.
2026-03-25	feat(cnn_v3): 3×3 dilated bottleneck + Sobel loss + FiLM warmup + ↵	skal
	architecture PNG - Replace 1×1 pointwise bottleneck with Conv(8→8, 3×3, dilation=2): effective RF grows from ~13px to ~29px at ¼res (~+1 KB weights) - Add Sobel edge loss in training (--edge-loss-weight, default 0.1) - Add FiLM 2-phase training: freeze MLP for warmup epochs then unfreeze at lr×0.1 (--film-warmup-epochs, default 50) - Update weight layout: BN 72→584 f16, total 1964→2476 f16 (4952 B) - Cascade offsets in C++ effect, JS tool, export/gen_test_vectors scripts - Regenerate test_vectors.h (1238 u32); parity max_err=9.77e-04 - Generate dark-theme U-Net+FiLM architecture PNG (gen_architecture_png.py) - Replace ASCII art in CNN_V3.md and HOW_TO_CNN.md with PNG embed handoff(Gemini): bottleneck dilation + Sobel loss + FiLM warmup landed. Next: run first real training pass (see cnn_v3/docs/HOWTO.md §3).
2026-03-25	feat(cnn_v3/training): add --single-sample option + doc fixes	skal
	- train_cnn_v3.py: --single-sample <dir> implies --full-image + --batch-size 1 - cnn_v3_utils.py: CNNv3Dataset accepts single_sample= kwarg (explicit override) - HOWTO.md: document --single-sample workflow, fix pack_photo_sample.py usage (--target required) - HOW_TO_CNN.md: fix GBufferEffect seq input (prev_cnn→source), fix binary name (demo→demo64k), add --resume to flag table, remove stale "pack without target" block handoff(Gemini): --single-sample <dir> added to train_cnn_v3.py; docs audited and corrected
2026-03-25	feat(cnn_v3): add infer_cnn_v3.py + rewrite cnn_test for v3 parity	skal
	- cnn_v3/training/infer_cnn_v3.py: PyTorch inference tool; simple mode (single PNG, zeroed geometry) and full mode (sample directory); supports --identity-film (γ=1 β=0) to match C++ default, --cond for FiLM MLP, --blend, --debug-hex for pixel comparison - tools/cnn_test.cc: full rewrite, v3 only; packs 20-channel features on CPU (training format: [0,1] oct normals, pyrdown mip), uploads to GPU, runs CNNv3Effect, reads back RGBA16Float, saves PNG; --sample-dir for full G-buffer input, --weights for .bin override, --debug-hex - cmake/DemoTests.cmake: add cnn_v3/src include path, drop unused offscreen_render_target.cc from cnn_test sources - cnn_v3/docs/HOWTO.md: new §10 documenting both tools, comparison workflow, and feature-format convention (training vs runtime) handoff(Gemini): cnn_test + infer_cnn_v3.py ready for parity testing. Run both with --identity-film / --debug-hex on same image to compare.
2026-03-25	feat(cnn_v3/tools): embed default weights in HTML tool; add --html export flag	skal
	- cnn_v3/tools/weights.js: new file — base64-encoded cnn_v3_weights.bin + cnn_v3_film_mlp.bin; loaded at startup so the tool works without dropping files - tester.js: preload() falls back to embedded weights.js constants when fetch fails; logs "Loaded embedded" vs "Preloaded" to distinguish the two paths - index.html: load weights.js before tester.js - export_cnn_v3_weights.py: add --html / --html-output flags that call update_weights_js() to regenerate weights.js after a training run - HOW_TO_CNN.md: update pipeline diagram, §3 export commands, §7 HTML tool section (file table, workflow, weights.js description), Appendix A handoff(Gemini): weights.js now the canonical source for HTML tool defaults; regenerate with `uv run export_cnn_v3_weights.py <ckpt> --output ... --html`
2026-03-23	docs: update temporal feedback docs — wire_dag auto-wiring, F16X8 format	skal
	- HOWTO.md: replace manual set_cnn_output_node() instructions with wire_dag() auto-wiring explanation; add timeline.seq snippet as canonical wiring example; document F16X8/GBUF_ALBEDO format requirement - CNN_V3.md: fix prev_tex format (rgba16float, not rgba8unorm); mark timeline.seq CNNv3Effect TODO as done - SEQUENCE.md: already updated in previous commit (wire_dag pattern, format-matching rule, sink guard) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23	feat(cnn_v3): GBufferEffect temporal feedback via post_render()	skal
	- Add Effect::post_render() virtual hook, called after all effects in the sequence have rendered each frame. Default is no-op. - Sequence::render_effects() runs a second pass invoking post_render() on all DAG nodes after the render pass completes. - GBufferEffect: declare internal node_prev_tex_ (U8X4_NORM) for persistent prev-frame CNN output. post_render() copies cnn_output_node_ → node_prev_tex_ via CopyTextureToTexture. render() binds node_prev_tex_ as prev_cnn (binding 6) — zero on frame 0 (matches training convention). - Expose set_cnn_output_node(name) API; call once at setup. - Drop brittle ping-pong / input_nodes_[0] fallback. - Update doc/SEQUENCE.md: post_render() semantics, frame execution order, temporal feedback canonical pattern, node types table with G-buffer types. - Update cnn_v3/docs/HOWTO.md: temporal feedback wiring section. 36/36 tests passing. handoff(Gemini): prev.rgb temporal feedback now correct and generic. Set set_cnn_output_node("sink") (or CNN output node name) once at setup.
2026-03-23	feat(cnn_v3): shadow→dif migration complete (ch18)	skal
	Replace raw shadow (ch18) with dif = max(0,dot(normal,KEY_LIGHT))*shadow across all layers. Channel count stays 20, weight shapes unchanged. - gbuf_pack.wgsl: t1.z = pack4x8unorm(mip2.g, mip2.b, dif, transp); t1.w = 0u - gbuf_deferred.wgsl: read dif from unpack4x8unorm(t1.z).z - gbuf_view.wgsl: revert to 4×5 grid, ch18=dif label, ch19=trns label - tools/shaders.js: FULL_PACK_SHADER adds oct_decode + computes dif - cnn_v3_utils.py: assemble_features() computes dif on-the-fly via oct_decode - docs: CNN_V3.md, HOWTO.md, HOW_TO_CNN.md, GBUF_DIF_MIGRATION.md updated handoff(Gemini): shadow→dif migration done, ready for first training pass
2026-03-23	wip(cnn_v3): shadow→dif intermediate + scene tweaks + migration plan	skal
	- gbuf_shadow.wgsl: normal bias 0.05→0.02 - gbuf_pack.wgsl: compute dif=diffuse*shadow, drop shadow from t1.z, store dif in t1.w (INTERMEDIATE — incorrect packing, see migration plan) - gbuf_deferred.wgsl: read dif from t1.w.x (matches intermediate packing) - gbuf_view.wgsl: expand to 4×6 grid, show dif.r/g/b in row 5 (INTERMEDIATE — to be reverted to 4×5 with ch18=dif) - gbuffer_effect.cc: add small hovering sphere (r=0.6) above scene; swap cube/sphere positions; both spheres pulsate - docs/GBUF_DIF_MIGRATION.md: full migration plan with checklist handoff(Claude): intermediate commit — GBUF_DIF_MIGRATION.md §Current State describes what is wrong and the full implementation checklist (5 steps).
2026-03-22	feat(cnn_v3): Phase 4 — type-aware SDF in shadow pass	skal
	dfWithID() in gbuf_shadow.wgsl now branches on obj.params.x (ObjectType) instead of using sdBox for everything: 0=CUBE → sdBox(lp, vec3(1)) 1=SPHERE → sdSphere(lp, 1.0) 2=PLANE → sdPlane(lp, vec3(0,1,0), obj.params.y) 3=TORUS → sdTorus(lp, vec2(0.8, 0.2)) 36/36 tests pass.
2026-03-22	feat(cnn_v3): GBufferEffect Pass 2 — SDF shadow raymarching	skal
	Implements gbuf_shadow.wgsl: fullscreen render pass that reads depth from Pass 1, reconstructs world-space positions, evaluates a proxy-box SDF for each object (via inv_model), computes soft shadows for both directional lights using shadowWithStoredDistance(), and writes shadow factor to the RGBA8Unorm node_shadow_ target consumed by gbuf_pack.wgsl. Bind layout: B0=GlobalUniforms, B1=ObjectsBuffer (storage-read), B2=texture_depth_2d, B3=GBufLightsUniforms. Sky fragments (depth=1.0) are output as 1.0 (fully lit). Falls back to clear(1.0) if pipeline is not ready. 36/36 tests pass. handoff(Gemini): Pass 2 done. Pass 3 (transparency) still TODO. Phase 4 (type-aware SDF) optional after visual validation.
2026-03-22	feat(cnn_v3): GBufferEffect internal scene + GBufViewEffect debug wiring	skal
	GBufferEffect: - set_scene() now owns Scene/Camera internally; no external pointers needed - 20 randomly rotating cubes (xorshift32 seed, axis-angle animation) - 4 pumping spheres (radius = base_r * (1 + audio_intensity * 0.8)) - Camera at (0,2.5,6) looking at origin; aspect updated per-frame - GBufLightsUniforms: 2 directional lights (warm key + cool fill) - object_type written to ObjectData.params.x (ready for SDF shadow) - shadow/transp nodes cleared via zero-draw render passes (placeholder) - bilinear sampler cached via create_linear_sampler() / sampler_.get() - dead placeholder textures removed GBufViewEffect: - gbuf_view.wgsl: all channels now fully grayscale (removed color tint) - seq_compiler.py: GBufViewEffect added to CLASS_TO_HEADER - timeline.seq: cnn_v3_test uses GBufViewEffect -> sink for debug view Docs: HOWTO.md §1 updated with set_scene() description + §1b implementation plan for Pass 2 SDF shadow (shader spec, bind layout, C++ additions) handoff(Gemini): GBufferEffect has internal scene, 36/36 tests green. Next: implement Pass 2 shadow (gbuf_shadow.wgsl) per §1b plan in HOWTO.md.
2026-03-22	fix(cnn_v3/tools): remove unused sampler binding from FULL_PACK_SHADER	skal
	WebGPU auto-reflects the BGL from the shader; a declared-but-unused sampler binding is omitted from the layout, causing CreateBindGroup to reject it. Removed binding 6 (sampler) entirely — all reads use textureLoad(). Renumbered f0/f1 from 7/8 to 6/7 to match.
2026-03-22	feat(cnn_v3): add G-buffer visualizer + web sample loader (Phase 7)	skal
	C++ GBufViewEffect: renders all 20 feature channels from feat_tex0/feat_tex1 in a 4×5 tiled grid. Custom BGL with WGPUTextureSampleType_Uint; bind group rebuilt per frame via wgpuRenderPipelineGetBindGroupLayout. Web tool: "Load sample directory" button — webkitdirectory picker, FULL_PACK_SHADER compute (matches gbuf_pack.wgsl packing), runFromFeat() skips photo-pack step, computePSNR() readback + comparison vs target.png side-by-side. 36/36 tests pass. Docs updated: HOWTO.md §9, README, PROJECT_CONTEXT, TODO, COMPLETED. handoff(Gemini): CNN v3 Phase 7 done. Next: run train_cnn_v3.py (see HOWTO §3).
2026-03-22	feat(cnn_v3): add weight assets to assets.txt, update HOW_TO_CNN export docs	skal
	- Add WEIGHTS_CNN_V3 and WEIGHTS_CNN_V3_FILM_MLP to workspaces/main/assets.txt - Add opencv-python and pillow to export_cnn_v3_weights.py uv inline deps - Update HOW_TO_CNN.md §3 export target → workspaces/main/weights/ - Update HOW_TO_CNN.md §4 weight loading → SafeGetAsset (asset system) handoff(Gemini): cnn_v3 weight assets registered; export and C++ load path documented
2026-03-22	docs(cnn_v3): add uv inline deps to train_cnn_v3.py + HOW_TO_CNN note	skal
	handoff(Gemini): train_cnn_v3.py now has uv script metadata block (torch, torchvision, numpy, pillow, opencv-python). HOW_TO_CNN §2 Prerequisites updated with uv quick-start alternative.
2026-03-22	docs(cnn_v3): add Windows 10 + CUDA training section to HOW_TO_CNN §2	skal

2026-03-22	fix(cnn_v3): resize target to albedo dims when sizes differ	skal
	target.png can have a different resolution than albedo.png in simple samples; patch slicing into the smaller target produced 0×0 tensors, crashing torch.stack in the DataLoader collate. handoff(Gemini): target resized in _load_sample (LANCZOS) + note in HOW_TO_CNN §1c.
2026-03-22	docs(cnn_v3): add full Old House example to HOW_TO_CNN §1b	skal
	handoff(Gemini): added render + batch-pack example commands at end of section 1b
2026-03-22	fix(cnn_v3): native OPEN_EXR_MULTILAYER + quiet render + flexible channel names	skal
	blender_export.py: - Replace broken compositor FileOutput approach with native OPEN_EXR_MULTILAYER render output; all enabled passes included automatically, no socket wiring needed - Suppress Fra:/Mem: render spam via os.dup2 fd redirect; per-frame progress printed to stderr via render_post handler pack_blender_sample.py: - get_pass_r: try .R/.X/.Y/.Z/'' suffixes + aliases param for Depth→Z fallback - combined_rgba loaded once via ("Combined","Image") loop; shared by transp+target - Remove unused sys import HOW_TO_CNN.md: update channel table to native EXR naming (Depth.Z, IndexOB.X, Shadow.X), fix example command, note Shadow defaults to 255 when absent handoff(Gemini): blender pipeline now produces correct multilayer EXR with all G-buffer passes; pack script handles native channel naming
2026-03-22	docs(cnn_v3): blender4 alias + Blender 4.5 LTS requirement for training data	skal

2026-03-22	docs(cnn_v3): clarify --output is a base dir, not a frame_### pattern	skal

2026-03-22	docs(cnn_v3): update HOW_TO_CNN for Blender 5.x compatibility	skal

2026-03-22	fix(cnn_v3): blender_export --view-layer flag + fallback to layer[0]	skal
	Fixes KeyError when blend file uses a non-default view layer name. Adds --view-layer NAME arg; pass '?' to list available layers. Defaults to index 0 with a clear error if the name is not found. handoff(Gemini): blender_export.py view layer selection now robust
2026-03-22	feat(cnn_v3): gen_sample tool + 7 simple training samples	skal
	- pack_photo_sample.py: --target now required (no albedo fallback) - gen_sample.py: bash wrapper with positional args (input target output_dir) - input/photo7.jpg: copy of photo2 (second style target) - target_1: photo2_1_out→photo2_out, photo2_2_out→photo7_out - dataset/simple/sample_001..007: 7 packed photo/target pairs handoff(Gemini): training data ready; next step is train_cnn_v3.py run
2026-03-21	feat(cnn_v3): HTML WebGPU tool (index.html + shaders.js + tester.js)	skal
	3-file tool, 939 lines total. Implements full U-Net+FiLM inference in the browser: Pack→Enc0→Enc1→Bottleneck→Dec1→Dec0 compute passes, layer visualisation (Feat/Enc0/Enc1/BN/Dec1/Output), FiLM MLP sliders, drag-drop weights + image/video, Save PNG, diff/blend view modes. HOW_TO_CNN.md §7 updated to reflect tool is implemented. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-21	feat(cnn_v3): export script + HOW_TO_CNN.md playbook	skal
	- export_cnn_v3_weights.py: .pth → cnn_v3_weights.bin (f16 packed u32) + cnn_v3_film_mlp.bin (f32) - HOW_TO_CNN.md: full pipeline playbook (data collection, training, export, C++ wiring, parity, HTML tool) - TODO.md: mark export script done Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-21	feat(cnn_v3): Phase 6 — training script (train_cnn_v3.py + cnn_v3_utils.py)	skal
	- train_cnn_v3.py: CNNv3 U-Net+FiLM model, training loop, CLI - cnn_v3_utils.py: image I/O, pyrdown, depth_gradient, assemble_features, apply_channel_dropout, detect_salient_points, CNNv3Dataset - Patch-based training (default 64×64) with salient-point extraction (harris/shi-tomasi/fast/gradient/random detectors, pre-cached at init) - Channel dropout for geometric/context/temporal channels - Random FiLM conditioning per sample for joint MLP+U-Net training - docs: HOWTO.md §3 updated with commands and flag reference - TODO.md: Phase 6 marked done, export script noted as next step Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-21	docs(cnn_v3): update CNN_V3.md + HOWTO.md to reflect Phases 1-5 complete	skal
	- CNN_V3.md: status line, architecture channel counts (8/16→4/8), FiLM MLP output count (96→40 params), size budget table (real implemented values) - HOWTO.md: Phase status table (5→done, add phase 6 training TODO), sections 3-5 rewritten to reflect what exists vs what is still planned
2026-03-21	feat(cnn_v3): Phase 4 complete — CNNv3Effect C++ + FiLM uniform upload	skal
	- cnn_v3/src/cnn_v3_effect.{h,cc}: full Effect subclass with 5 compute passes (enc0→enc1→bottleneck→dec1→dec0), shared weights storage buffer, per-pass uniform buffers, set_film_params() API - Fixed WGSL/C++ struct alignment: vec3u has align=16, so CnnV3Params4ch is 64 bytes and CnnV3ParamsEnc1 is 96 bytes (not 48/80) - Weight offsets computed as explicit formulas (e.g. 2049+4) for clarity - Registered in CMake, shaders.h/cc, demo_effects.h, test_demo_effects.cc - 35/35 tests pass handoff(Gemini): CNN v3 Phase 5 next — parity validation (Python ref vs WGSL)
2026-03-21	feat(cnn_v3): Phase 3 complete — WGSL U-Net inference shaders	skal
	5 compute shaders + cnn_v3/common snippet: enc0: Conv(20→4,3×3) + FiLM + ReLU full-res enc1: AvgPool + Conv(4→8,3×3) + FiLM + ReLU half-res bottleneck: AvgPool + Conv(8→8,1×1) + ReLU quarter-res dec1: NearestUp + cat(enc1) + Conv(16→4) + FiLM half-res dec0: NearestUp + cat(enc0) + Conv(8→4) + FiLM + Sigmoid full-res Parity rules: zero-pad conv, AvgPool down, NearestUp, FiLM after conv+bias, skip=concat, OIHW weights+bias layout. Matches PyTorch train_cnn_v3.py forward() exactly. Registered in workspaces/main/assets.txt + src/effects/shaders.cc. Weight layout + Params struct documented in cnn_v3/docs/HOWTO.md §7. Next: Phase 4 — C++ CNNv3Effect + FiLM uniform upload. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-20	feat(cnn_v3): Phase 1 complete - GBufferEffect integrated + HOWTO playbook	skal
	- Wire GBufferEffect into demo build: assets.txt, DemoSourceLists.cmake, demo_effects.h, shaders.h/cc. ShaderComposer::Compose() applied to gbuf_raster.wgsl (resolves #include "common_uniforms"). - Add GBufferEffect construction test. 35/35 passing. - Write cnn_v3/docs/HOWTO.md: G-buffer wiring, training data prep, training plan, per-pixel validation workflow, phase status table, troubleshooting guide. - Add project hooks: remind to update HOWTO.md on cnn_v3/ edits; warn on direct str_view(*_wgsl) usage bypassing ShaderComposer. - Update PROJECT_CONTEXT.md and TODO.md: Phase 1 done, Phase 3 (WGSL U-Net shaders) is next active. handoff(Gemini): CNN v3 Phase 3 is next - WGSL enc/dec/bottleneck/FiLM shaders in cnn_v3/shaders/. See cnn_v3/docs/CNN_V3.md Architecture section and cnn_v3/docs/HOWTO.md section 3 for spec. GBufferEffect outputs feat_tex0 + feat_tex1 (rgba32uint, 20ch, 32 bytes/pixel). C++ CNNv3Effect (Phase 4) takes those as input nodes.
2026-03-19	docs(cnn_v3): full design doc — U-Net + FiLM architecture plan	skal
	- CNN_V3.md: complete design document - U-Net enc_channels=[4,8], ~5 KB f16 weights - FiLM conditioning (5D → γ/β per level, CPU-side MLP) - 20-channel feature buffer, 32 bytes/pixel: two rgba32uint textures - feat_tex0: albedo.rgb, normal.xy, depth, depth_grad.xy (f16) - feat_tex1: mat_id, prev.rgb, mip1.rgb, mip2.rgb, shadow, transp (u8) - 4-pass G-buffer: raster MRT + SDF compute + lighting + pack - Per-pixel parity framework: PyTorch / HTML WebGPU / C++ WebGPU (≤1/255) - Training pipelines: Blender full G-buffer + photo-only (channel dropout) - train_cnn_v3_full.sh spec (modelled on v2 script) - HTML tool adaptation plan from cnn_v2/tools/cnn_v2_test/index.html - Binary format v3 header spec - 8-phase ordered implementation checklist - TODO.md: add CNN v3 U-Net+FiLM future task with phases - cnn_v3/README.md: update status to design phase handoff(Gemini): CNN v3 design complete. Phase 0 (stub G-buffer) unblocks all other phases — one compute shader writing feat_tex0+feat_tex1 with synthetic values from the current framebuffer. See cnn_v3/docs/CNN_V3.md Implementation Checklist.