<feed xmlns='http://www.w3.org/2005/Atom'>
<title>demo.git/src/tests/gpu, branch main</title>
<subtitle>Vide-coded 64k demo system</subtitle>
<id>https://git.taar-o.com/demo.git/atom?h=main</id>
<link rel='self' href='https://git.taar-o.com/demo.git/atom?h=main'/>
<link rel='alternate' type='text/html' href='https://git.taar-o.com/demo.git/'/>
<updated>2026-05-21T06:10:47Z</updated>
<entry>
<title>style: apply clang-format</title>
<updated>2026-05-21T06:10:47Z</updated>
<author>
<name>skal</name>
<email>pascal.massimino@gmail.com</email>
</author>
<published>2026-05-21T06:10:47Z</published>
<link rel='alternate' type='text/html' href='https://git.taar-o.com/demo.git/commit/?id=d806027dcaeadcdd8d2febd88bc46b2fd2c465de'/>
<id>urn:sha1:d806027dcaeadcdd8d2febd88bc46b2fd2c465de</id>
<content type='text'>
</content>
</entry>
<entry>
<title>feat(cnn_v3): upgrade architecture to enc_channels=[8,16]</title>
<updated>2026-03-26T06:03:01Z</updated>
<author>
<name>skal</name>
<email>pascal.massimino@gmail.com</email>
</author>
<published>2026-03-26T06:03:01Z</published>
<link rel='alternate' type='text/html' href='https://git.taar-o.com/demo.git/commit/?id=8f14bdd66cb002b2f89265b2a578ad93249089c9'/>
<id>urn:sha1:8f14bdd66cb002b2f89265b2a578ad93249089c9</id>
<content type='text'>
Double encoder capacity: enc0 4→8ch, enc1 8→16ch, bottleneck 16→16ch,
dec1 32→8ch, dec0 16→4ch. Total weights 2476→7828 f16 (~15.3 KB).
FiLM MLP output 40→72 params (L1: 16×40→16×72).

16-ch textures split into _lo/_hi rgba32uint pairs (enc1, bottleneck).
enc0 and dec1 textures changed from rgba16float to rgba32uint (8ch).
GBUF_RGBA32UINT node gains CopySrc for parity test readback.

- WGSL shaders: all 5 passes rewritten for new channel counts
- C++ CNNv3Effect: new weight offsets/sizes, 8ch uniform structs
- Web tool (shaders.js + tester.js): matching texture formats and bindings
- Parity test: readback_rgba32uint_8ch helper, updated vector counts
- Training scripts: default enc_channels=[8,16], updated docstrings
- Docs + architecture PNG regenerated

handoff(Gemini): CNN v3 [8,16] upgrade complete. All code, tests, web
tool, training scripts, and docs updated. Next: run training pass.
</content>
</entry>
<entry>
<title>feat(cnn_v3): 3×3 dilated bottleneck + Sobel loss + FiLM warmup + architecture PNG</title>
<updated>2026-03-25T09:05:42Z</updated>
<author>
<name>skal</name>
<email>pascal.massimino@gmail.com</email>
</author>
<published>2026-03-25T09:05:42Z</published>
<link rel='alternate' type='text/html' href='https://git.taar-o.com/demo.git/commit/?id=ce6e5b99f26e4e7c69a3cacf360bd0d492de928c'/>
<id>urn:sha1:ce6e5b99f26e4e7c69a3cacf360bd0d492de928c</id>
<content type='text'>
- Replace 1×1 pointwise bottleneck with Conv(8→8, 3×3, dilation=2):
  effective RF grows from ~13px to ~29px at ¼res (~+1 KB weights)
- Add Sobel edge loss in training (--edge-loss-weight, default 0.1)
- Add FiLM 2-phase training: freeze MLP for warmup epochs then
  unfreeze at lr×0.1 (--film-warmup-epochs, default 50)
- Update weight layout: BN 72→584 f16, total 1964→2476 f16 (4952 B)
- Cascade offsets in C++ effect, JS tool, export/gen_test_vectors scripts
- Regenerate test_vectors.h (1238 u32); parity max_err=9.77e-04
- Generate dark-theme U-Net+FiLM architecture PNG (gen_architecture_png.py)
- Replace ASCII art in CNN_V3.md and HOW_TO_CNN.md with PNG embed

handoff(Gemini): bottleneck dilation + Sobel loss + FiLM warmup landed.
Next: run first real training pass (see cnn_v3/docs/HOWTO.md §3).
</content>
</entry>
<entry>
<title>feat(gbuffer): wire_dag() + find_downstream_output() for temporal feedback</title>
<updated>2026-03-23T06:54:18Z</updated>
<author>
<name>skal</name>
<email>pascal.massimino@gmail.com</email>
</author>
<published>2026-03-23T06:54:18Z</published>
<link rel='alternate' type='text/html' href='https://git.taar-o.com/demo.git/commit/?id=491a3c1ccbd0f46be655e97d2e3697135df6e3a2'/>
<id>urn:sha1:491a3c1ccbd0f46be655e97d2e3697135df6e3a2</id>
<content type='text'>
- Add Effect::wire_dag() virtual (called from init_effect_nodes after full DAG built)
- Add Effect::find_downstream_output() protected helper (first downstream consumer output)
- GBufferEffect::wire_dag() auto-sets cnn_output_node_ via find_downstream_output,
  guarding against sink (external view, null texture)
- GBufferEffect::post_render() null-checks src texture before CopyTextureToTexture
- Tests: find_downstream_output cases + wire_dag integration in test_effect_base
- Doc: SEQUENCE.md updated with wire_dag pattern, helper contract, and sink guard

Co-Authored-By: Claude Sonnet 4.6 &lt;noreply@anthropic.com&gt;
</content>
</entry>
<entry>
<title>feat(cnn_v3): GBufDeferredEffect — simple deferred render (albedo * shadow)</title>
<updated>2026-03-22T18:58:04Z</updated>
<author>
<name>skal</name>
<email>pascal.massimino@gmail.com</email>
</author>
<published>2026-03-22T18:58:04Z</published>
<link rel='alternate' type='text/html' href='https://git.taar-o.com/demo.git/commit/?id=9bf9b0aa0573f77bd667e6976a8bb413153daa1d'/>
<id>urn:sha1:9bf9b0aa0573f77bd667e6976a8bb413153daa1d</id>
<content type='text'>
New effect unpacks feat_tex0/feat_tex1 and outputs albedo * shadow.
Replaces CNNv3Effect in cnn_v3_test sequence until training is complete.
37/37 tests passing.

handoff(Gemini): GBufDeferredEffect wired in timeline; CNN v3 pipeline: GBufferEffect → GBufDeferredEffect → sink.
</content>
</entry>
<entry>
<title>feat(cnn_v3): add G-buffer visualizer + web sample loader (Phase 7)</title>
<updated>2026-03-22T15:21:25Z</updated>
<author>
<name>skal</name>
<email>pascal.massimino@gmail.com</email>
</author>
<published>2026-03-22T15:21:25Z</published>
<link rel='alternate' type='text/html' href='https://git.taar-o.com/demo.git/commit/?id=159ca2ca19345515cdfebed9fd88646730492cd2'/>
<id>urn:sha1:159ca2ca19345515cdfebed9fd88646730492cd2</id>
<content type='text'>
C++ GBufViewEffect: renders all 20 feature channels from feat_tex0/feat_tex1
in a 4×5 tiled grid. Custom BGL with WGPUTextureSampleType_Uint; bind group
rebuilt per frame via wgpuRenderPipelineGetBindGroupLayout.

Web tool: "Load sample directory" button — webkitdirectory picker, FULL_PACK_SHADER
compute (matches gbuf_pack.wgsl packing), runFromFeat() skips photo-pack step,
computePSNR() readback + comparison vs target.png side-by-side.

36/36 tests pass. Docs updated: HOWTO.md §9, README, PROJECT_CONTEXT, TODO,
COMPLETED.

handoff(Gemini): CNN v3 Phase 7 done. Next: run train_cnn_v3.py (see HOWTO §3).
</content>
</entry>
<entry>
<title>feat(cnn_v3): wire trained weights into CNNv3Effect + add timeline test sequence</title>
<updated>2026-03-22T11:53:13Z</updated>
<author>
<name>skal</name>
<email>pascal.massimino@gmail.com</email>
</author>
<published>2026-03-22T11:53:13Z</published>
<link rel='alternate' type='text/html' href='https://git.taar-o.com/demo.git/commit/?id=581c67b75aa3c089c86f764b67e6de7476a13993'/>
<id>urn:sha1:581c67b75aa3c089c86f764b67e6de7476a13993</id>
<content type='text'>
- CNNv3Effect constructor loads ASSET_WEIGHTS_CNN_V3 via GetAsset on startup
- seq_compiler.py: CLASS_TO_HEADER supports full #include paths for cnn_v3/ classes
- timeline.seq: add cnn_v3_test sequence at 48s (GBufferEffect → CNNv3Effect)
- test_cnn_v3_parity: zero_weights test now explicitly uploads zeros to override asset

handoff(Gemini): CNNv3Effect ready; export weights to workspaces/main/weights/ and seek to 48s to test
</content>
</entry>
<entry>
<title>refactor(cnn_v3): code review — comments, simplifications, test fix</title>
<updated>2026-03-21T13:01:30Z</updated>
<author>
<name>skal</name>
<email>pascal.massimino@gmail.com</email>
</author>
<published>2026-03-21T13:01:30Z</published>
<link rel='alternate' type='text/html' href='https://git.taar-o.com/demo.git/commit/?id=bf33fee131b1eee03bc5a765ba360299bbcead06'/>
<id>urn:sha1:bf33fee131b1eee03bc5a765ba360299bbcead06</id>
<content type='text'>
C++:
- cnn_v3_effect.cc: fix declare_nodes comment (output node declared by caller)
- cnn_v3_effect.cc: add TODO(phase-7) marker for FiLM MLP replacement

WGSL:
- cnn_v3_bottleneck.wgsl: consolidate _pad fields onto one line, explain why
  array&lt;u32,3&gt; is invalid in uniform address space
- cnn_v3_enc0.wgsl: fix "12xu8" → "12ch u8norm" in header comment
- cnn_v3_dec0.wgsl: clarify parity note (sigmoid after FiLM+ReLU, not raw conv)
- cnn_v3_common.wgsl: clarify unpack_8ch pack layout (low/high 16 bits)

Python:
- cnn_v3_utils.py: replace PIL-based _upsample_nearest (uint8 round-trip) with
  pure numpy index arithmetic; rename _resize_rgb → _resize_img (handles any
  channel count); add comment on normal zero-pad workaround
- export_cnn_v3_weights.py: add cross-ref to cnn_v3_effect.cc constants;
  clarify weight count comments with Conv notation

Test:
- test_cnn_v3_parity.cc: enc0/dec1 layer failures now return 0 (were print-only)

handoff(Gemini): CNN v3 review complete, 36/36 tests passing.
</content>
</entry>
<entry>
<title>feat(cnn_v3): Phase 5 complete — parity validation passing (36/36 tests)</title>
<updated>2026-03-21T08:51:58Z</updated>
<author>
<name>skal</name>
<email>pascal.massimino@gmail.com</email>
</author>
<published>2026-03-21T08:51:58Z</published>
<link rel='alternate' type='text/html' href='https://git.taar-o.com/demo.git/commit/?id=673a24215b2670007317060325256059d1448f3b'/>
<id>urn:sha1:673a24215b2670007317060325256059d1448f3b</id>
<content type='text'>
- Add test_cnn_v3_parity.cc: zero_weights + random_weights tests
- Add gen_test_vectors.py: PyTorch reference implementation for enc0/enc1/bn/dec1/dec0
- Add test_vectors.h: generated C header with enc0, dec1, output expected values
- Fix declare_nodes(): intermediate textures at fractional resolutions (W/2, W/4)
  using new NodeRegistry::default_width()/default_height() getters
- Add layer-by-layer readback (enc0, dec1) for regression coverage
- Final parity: enc0 max_err=1.95e-3, dec1 max_err=1.95e-3, out max_err=4.88e-4

handoff(Claude): CNN v3 parity done. Next: train_cnn_v3.py (FiLM MLP training).
</content>
</entry>
<entry>
<title>feat(cnn_v3): Phase 4 complete — CNNv3Effect C++ + FiLM uniform upload</title>
<updated>2026-03-21T07:52:53Z</updated>
<author>
<name>skal</name>
<email>pascal.massimino@gmail.com</email>
</author>
<published>2026-03-21T07:52:53Z</published>
<link rel='alternate' type='text/html' href='https://git.taar-o.com/demo.git/commit/?id=fe008df92f7a68d81c9bedb4328da7001e0775f0'/>
<id>urn:sha1:fe008df92f7a68d81c9bedb4328da7001e0775f0</id>
<content type='text'>
- cnn_v3/src/cnn_v3_effect.{h,cc}: full Effect subclass with 5 compute
  passes (enc0→enc1→bottleneck→dec1→dec0), shared weights storage buffer,
  per-pass uniform buffers, set_film_params() API
- Fixed WGSL/C++ struct alignment: vec3u has align=16, so CnnV3Params4ch
  is 64 bytes and CnnV3ParamsEnc1 is 96 bytes (not 48/80)
- Weight offsets computed as explicit formulas (e.g. 20*4*9+4) for clarity
- Registered in CMake, shaders.h/cc, demo_effects.h, test_demo_effects.cc
- 35/35 tests pass

handoff(Gemini): CNN v3 Phase 5 next — parity validation (Python ref vs WGSL)
</content>
</entry>
</feed>
