<feed xmlns='http://www.w3.org/2005/Atom'>
<title>demo.git/cnn_v3/docs, branch main</title>
<subtitle>Vide-coded 64k demo system</subtitle>
<id>https://git.taar-o.com/demo.git/atom?h=main</id>
<link rel='self' href='https://git.taar-o.com/demo.git/atom?h=main'/>
<link rel='alternate' type='text/html' href='https://git.taar-o.com/demo.git/'/>
<updated>2026-03-29T08:15:38Z</updated>
<entry>
<title>docs: consolidate and sync docs with current codebase state</title>
<updated>2026-03-29T08:15:38Z</updated>
<author>
<name>skal</name>
<email>pascal.massimino@gmail.com</email>
</author>
<published>2026-03-29T08:15:38Z</published>
<link rel='alternate' type='text/html' href='https://git.taar-o.com/demo.git/commit/?id=e22256e374694fd92cc55ba198d3f7b1911713fe'/>
<id>urn:sha1:e22256e374694fd92cc55ba198d3f7b1911713fe</id>
<content type='text'>
- PROJECT_CONTEXT.md: fix effect count (12→18), shader count (27→37),
  update CNN v3 pipeline description, tighten Next Up section
- TODO.md: fix priority numbering, restore GPU PCM synthesis as pending,
  streamline CNN v3 section, consolidate Future items
- doc/SEQUENCE.md: effect count 12→18
- cnn_v3/README.md: phases 1–7→1–9, test count 36→38, add phases 8–9
- cnn_v3/docs/HOWTO.md: fix dataset layout blender/photos→full/simple,
  update test counts 36→38 throughout
- doc/COMPLETED.md: archive FFT/timing/OLA fixes, remove false GPU PCM claim
- src/audio/audio_engine.cc: fix step comment numbering (6→5)
- src/audio/synth.cc: remove stale fractional_pos tempo-scaling comment

handoff(Gemini): docs now accurate — 18 effects, 37 shaders, 38/38 tests,
GPU PCM synthesis back in TODO as pending, CNN v3 dataset layout corrected.
</content>
</entry>
<entry>
<title>fix(cnn_v3): remove dec0 ReLU, load FiLM MLP at runtime</title>
<updated>2026-03-27T06:59:00Z</updated>
<author>
<name>skal</name>
<email>pascal.massimino@gmail.com</email>
</author>
<published>2026-03-27T06:59:00Z</published>
<link rel='alternate' type='text/html' href='https://git.taar-o.com/demo.git/commit/?id=fb13e67acbc7d7dd2974a456fcb134966c47cee0'/>
<id>urn:sha1:fb13e67acbc7d7dd2974a456fcb134966c47cee0</id>
<content type='text'>
Two bugs blocking training convergence:

1. dec0 ReLU before sigmoid constrained output to [0.5,1.0] — network
   could never produce dark pixels. Removed F.relu in train_cnn_v3.py
   and max(0,…) in cnn_v3_dec0.wgsl. Test vectors regenerated.

2. set_film_params() used hardcoded heuristics instead of the trained MLP.
   Added CNNv3FilmMlp struct + load_film_mlp() to cnn_v3_effect.h/.cc.
   MLP auto-loaded from ASSET_WEIGHTS_CNN_V3_FILM_MLP at construction;
   Linear(5→16)→ReLU→Linear(16→72) runs CPU-side each frame.

36/36 tests pass. Parity max_err=4.88e-4 unchanged.

handoff(Gemini): retrain from scratch — needs ≥50 samples (currently 11).
See cnn_v3/docs/HOWTO.md §2-3.
</content>
</entry>
<entry>
<title>feat(cnn_v3): upgrade architecture to enc_channels=[8,16]</title>
<updated>2026-03-26T06:03:01Z</updated>
<author>
<name>skal</name>
<email>pascal.massimino@gmail.com</email>
</author>
<published>2026-03-26T06:03:01Z</published>
<link rel='alternate' type='text/html' href='https://git.taar-o.com/demo.git/commit/?id=8f14bdd66cb002b2f89265b2a578ad93249089c9'/>
<id>urn:sha1:8f14bdd66cb002b2f89265b2a578ad93249089c9</id>
<content type='text'>
Double encoder capacity: enc0 4→8ch, enc1 8→16ch, bottleneck 16→16ch,
dec1 32→8ch, dec0 16→4ch. Total weights 2476→7828 f16 (~15.3 KB).
FiLM MLP output 40→72 params (L1: 16×40→16×72).

16-ch textures split into _lo/_hi rgba32uint pairs (enc1, bottleneck).
enc0 and dec1 textures changed from rgba16float to rgba32uint (8ch).
GBUF_RGBA32UINT node gains CopySrc for parity test readback.

- WGSL shaders: all 5 passes rewritten for new channel counts
- C++ CNNv3Effect: new weight offsets/sizes, 8ch uniform structs
- Web tool (shaders.js + tester.js): matching texture formats and bindings
- Parity test: readback_rgba32uint_8ch helper, updated vector counts
- Training scripts: default enc_channels=[8,16], updated docstrings
- Docs + architecture PNG regenerated

handoff(Gemini): CNN v3 [8,16] upgrade complete. All code, tests, web
tool, training scripts, and docs updated. Next: run training pass.
</content>
</entry>
<entry>
<title>feat(cnn_v3): 3×3 dilated bottleneck + Sobel loss + FiLM warmup + architecture PNG</title>
<updated>2026-03-25T09:05:42Z</updated>
<author>
<name>skal</name>
<email>pascal.massimino@gmail.com</email>
</author>
<published>2026-03-25T09:05:42Z</published>
<link rel='alternate' type='text/html' href='https://git.taar-o.com/demo.git/commit/?id=ce6e5b99f26e4e7c69a3cacf360bd0d492de928c'/>
<id>urn:sha1:ce6e5b99f26e4e7c69a3cacf360bd0d492de928c</id>
<content type='text'>
- Replace 1×1 pointwise bottleneck with Conv(8→8, 3×3, dilation=2):
  effective RF grows from ~13px to ~29px at ¼res (~+1 KB weights)
- Add Sobel edge loss in training (--edge-loss-weight, default 0.1)
- Add FiLM 2-phase training: freeze MLP for warmup epochs then
  unfreeze at lr×0.1 (--film-warmup-epochs, default 50)
- Update weight layout: BN 72→584 f16, total 1964→2476 f16 (4952 B)
- Cascade offsets in C++ effect, JS tool, export/gen_test_vectors scripts
- Regenerate test_vectors.h (1238 u32); parity max_err=9.77e-04
- Generate dark-theme U-Net+FiLM architecture PNG (gen_architecture_png.py)
- Replace ASCII art in CNN_V3.md and HOW_TO_CNN.md with PNG embed

handoff(Gemini): bottleneck dilation + Sobel loss + FiLM warmup landed.
Next: run first real training pass (see cnn_v3/docs/HOWTO.md §3).
</content>
</entry>
<entry>
<title>feat(cnn_v3/training): add --single-sample option + doc fixes</title>
<updated>2026-03-25T07:27:39Z</updated>
<author>
<name>skal</name>
<email>pascal.massimino@gmail.com</email>
</author>
<published>2026-03-25T07:27:39Z</published>
<link rel='alternate' type='text/html' href='https://git.taar-o.com/demo.git/commit/?id=3e4fece8fce11b368b4c7bab284242bf18e6a0b1'/>
<id>urn:sha1:3e4fece8fce11b368b4c7bab284242bf18e6a0b1</id>
<content type='text'>
- train_cnn_v3.py: --single-sample &lt;dir&gt; implies --full-image + --batch-size 1
- cnn_v3_utils.py: CNNv3Dataset accepts single_sample= kwarg (explicit override)
- HOWTO.md: document --single-sample workflow, fix pack_photo_sample.py usage (--target required)
- HOW_TO_CNN.md: fix GBufferEffect seq input (prev_cnn→source), fix binary name (demo→demo64k), add --resume to flag table, remove stale "pack without target" block

handoff(Gemini): --single-sample &lt;dir&gt; added to train_cnn_v3.py; docs audited and corrected
</content>
</entry>
<entry>
<title>feat(cnn_v3): add infer_cnn_v3.py + rewrite cnn_test for v3 parity</title>
<updated>2026-03-25T07:07:53Z</updated>
<author>
<name>skal</name>
<email>pascal.massimino@gmail.com</email>
</author>
<published>2026-03-25T07:07:53Z</published>
<link rel='alternate' type='text/html' href='https://git.taar-o.com/demo.git/commit/?id=64095c683f15e8bd7c19d32041fcc81b1bd6c214'/>
<id>urn:sha1:64095c683f15e8bd7c19d32041fcc81b1bd6c214</id>
<content type='text'>
- cnn_v3/training/infer_cnn_v3.py: PyTorch inference tool; simple mode
  (single PNG, zeroed geometry) and full mode (sample directory); supports
  --identity-film (γ=1 β=0) to match C++ default, --cond for FiLM MLP,
  --blend, --debug-hex for pixel comparison
- tools/cnn_test.cc: full rewrite, v3 only; packs 20-channel features
  on CPU (training format: [0,1] oct normals, pyrdown mip), uploads to
  GPU, runs CNNv3Effect, reads back RGBA16Float, saves PNG; --sample-dir
  for full G-buffer input, --weights for .bin override, --debug-hex
- cmake/DemoTests.cmake: add cnn_v3/src include path, drop unused
  offscreen_render_target.cc from cnn_test sources
- cnn_v3/docs/HOWTO.md: new §10 documenting both tools, comparison
  workflow, and feature-format convention (training vs runtime)

handoff(Gemini): cnn_test + infer_cnn_v3.py ready for parity testing.
Run both with --identity-film / --debug-hex on same image to compare.
</content>
</entry>
<entry>
<title>feat(cnn_v3/tools): embed default weights in HTML tool; add --html export flag</title>
<updated>2026-03-25T05:25:53Z</updated>
<author>
<name>skal</name>
<email>pascal.massimino@gmail.com</email>
</author>
<published>2026-03-25T05:25:53Z</published>
<link rel='alternate' type='text/html' href='https://git.taar-o.com/demo.git/commit/?id=a71c95c8caf7e570c3f484ce1a53b7acb5ef2006'/>
<id>urn:sha1:a71c95c8caf7e570c3f484ce1a53b7acb5ef2006</id>
<content type='text'>
- cnn_v3/tools/weights.js: new file — base64-encoded cnn_v3_weights.bin +
  cnn_v3_film_mlp.bin; loaded at startup so the tool works without dropping files
- tester.js: preload() falls back to embedded weights.js constants when fetch
  fails; logs "Loaded embedded" vs "Preloaded" to distinguish the two paths
- index.html: load weights.js before tester.js
- export_cnn_v3_weights.py: add --html / --html-output flags that call
  update_weights_js() to regenerate weights.js after a training run
- HOW_TO_CNN.md: update pipeline diagram, §3 export commands, §7 HTML tool
  section (file table, workflow, weights.js description), Appendix A

handoff(Gemini): weights.js now the canonical source for HTML tool defaults;
regenerate with `uv run export_cnn_v3_weights.py &lt;ckpt&gt; --output ... --html`
</content>
</entry>
<entry>
<title>docs: update temporal feedback docs — wire_dag auto-wiring, F16X8 format</title>
<updated>2026-03-23T07:07:31Z</updated>
<author>
<name>skal</name>
<email>pascal.massimino@gmail.com</email>
</author>
<published>2026-03-23T07:07:31Z</published>
<link rel='alternate' type='text/html' href='https://git.taar-o.com/demo.git/commit/?id=193fdc276a43ad94a961aa9214be27cfa879d225'/>
<id>urn:sha1:193fdc276a43ad94a961aa9214be27cfa879d225</id>
<content type='text'>
- HOWTO.md: replace manual set_cnn_output_node() instructions with
  wire_dag() auto-wiring explanation; add timeline.seq snippet as canonical
  wiring example; document F16X8/GBUF_ALBEDO format requirement
- CNN_V3.md: fix prev_tex format (rgba16float, not rgba8unorm);
  mark timeline.seq CNNv3Effect TODO as done
- SEQUENCE.md: already updated in previous commit (wire_dag pattern,
  format-matching rule, sink guard)

Co-Authored-By: Claude Sonnet 4.6 &lt;noreply@anthropic.com&gt;
</content>
</entry>
<entry>
<title>feat(cnn_v3): GBufferEffect temporal feedback via post_render()</title>
<updated>2026-03-23T06:31:14Z</updated>
<author>
<name>skal</name>
<email>pascal.massimino@gmail.com</email>
</author>
<published>2026-03-23T06:31:14Z</published>
<link rel='alternate' type='text/html' href='https://git.taar-o.com/demo.git/commit/?id=1e3813355e37f903314ec2069ff788c6f69becfd'/>
<id>urn:sha1:1e3813355e37f903314ec2069ff788c6f69becfd</id>
<content type='text'>
- Add Effect::post_render() virtual hook, called after all effects in
  the sequence have rendered each frame. Default is no-op.
- Sequence::render_effects() runs a second pass invoking post_render()
  on all DAG nodes after the render pass completes.
- GBufferEffect: declare internal node_prev_tex_ (U8X4_NORM) for
  persistent prev-frame CNN output. post_render() copies cnn_output_node_
  → node_prev_tex_ via CopyTextureToTexture. render() binds node_prev_tex_
  as prev_cnn (binding 6) — zero on frame 0 (matches training convention).
- Expose set_cnn_output_node(name) API; call once at setup.
- Drop brittle ping-pong / input_nodes_[0] fallback.
- Update doc/SEQUENCE.md: post_render() semantics, frame execution order,
  temporal feedback canonical pattern, node types table with G-buffer types.
- Update cnn_v3/docs/HOWTO.md: temporal feedback wiring section.

36/36 tests passing.

handoff(Gemini): prev.rgb temporal feedback now correct and generic.
Set set_cnn_output_node("sink") (or CNN output node name) once at setup.
</content>
</entry>
<entry>
<title>feat(cnn_v3): shadow→dif migration complete (ch18)</title>
<updated>2026-03-22T23:43:20Z</updated>
<author>
<name>skal</name>
<email>pascal.massimino@gmail.com</email>
</author>
<published>2026-03-22T23:43:20Z</published>
<link rel='alternate' type='text/html' href='https://git.taar-o.com/demo.git/commit/?id=13cf1438caa56b34529d4031ddf73d38286b70e5'/>
<id>urn:sha1:13cf1438caa56b34529d4031ddf73d38286b70e5</id>
<content type='text'>
Replace raw shadow (ch18) with dif = max(0,dot(normal,KEY_LIGHT))*shadow
across all layers. Channel count stays 20, weight shapes unchanged.

- gbuf_pack.wgsl: t1.z = pack4x8unorm(mip2.g, mip2.b, dif, transp); t1.w = 0u
- gbuf_deferred.wgsl: read dif from unpack4x8unorm(t1.z).z
- gbuf_view.wgsl: revert to 4×5 grid, ch18=dif label, ch19=trns label
- tools/shaders.js: FULL_PACK_SHADER adds oct_decode + computes dif
- cnn_v3_utils.py: assemble_features() computes dif on-the-fly via oct_decode
- docs: CNN_V3.md, HOWTO.md, HOW_TO_CNN.md, GBUF_DIF_MIGRATION.md updated

handoff(Gemini): shadow→dif migration done, ready for first training pass
</content>
</entry>
</feed>
