<feed xmlns='http://www.w3.org/2005/Atom'>
<title>demo.git/checkpoints, branch main</title>
<subtitle>Vide-coded 64k demo system</subtitle>
<id>https://git.taar-o.com/demo.git/atom?h=main</id>
<link rel='self' href='https://git.taar-o.com/demo.git/atom?h=main'/>
<link rel='alternate' type='text/html' href='https://git.taar-o.com/demo.git/'/>
<updated>2026-02-14T00:04:07Z</updated>
<entry>
<title>Fix --mix option: blend prev layer with static p4-p7, not p0-p3</title>
<updated>2026-02-14T00:04:07Z</updated>
<author>
<name>skal</name>
<email>pascal.massimino@gmail.com</email>
</author>
<published>2026-02-14T00:01:52Z</published>
<link rel='alternate' type='text/html' href='https://git.taar-o.com/demo.git/commit/?id=1b760f3b413d28652965a51f629d3c2b8d33ce22'/>
<id>urn:sha1:1b760f3b413d28652965a51f629d3c2b8d33ce22</id>
<content type='text'>
Updated gen_identity_weights.py --mix mode to use static features
p4-p7 (uv_x, uv_y, sin20_y, bias) at channels 8-11 instead of
p0-p3 (RGB+D) at channels 4-7.

Before: 0.5*prev[i] + 0.5*static_p{i} (channels 4-7)
After:  0.5*prev[i] + 0.5*static_p{4+i} (channels 8-11)

Co-Authored-By: Claude Sonnet 4.5 &lt;noreply@anthropic.com&gt;
</content>
</entry>
<entry>
<title>CNN v2: Remove vizScale, always clip to [0,1]</title>
<updated>2026-02-13T22:42:53Z</updated>
<author>
<name>skal</name>
<email>pascal.massimino@gmail.com</email>
</author>
<published>2026-02-13T22:42:53Z</published>
<link rel='alternate' type='text/html' href='https://git.taar-o.com/demo.git/commit/?id=112d7f82a4fccbe26ae2b4696d87464494253ce1'/>
<id>urn:sha1:112d7f82a4fccbe26ae2b4696d87464494253ce1</id>
<content type='text'>
All layers now use scale 1.0, shader clamps values &gt;1.

Co-Authored-By: Claude Sonnet 4.5 &lt;noreply@anthropic.com&gt;
</content>
</entry>
<entry>
<title>CNN v2: Fix Layer 0 visualization scale (was 0.5, now 1.0)</title>
<updated>2026-02-13T22:40:30Z</updated>
<author>
<name>skal</name>
<email>pascal.massimino@gmail.com</email>
</author>
<published>2026-02-13T22:40:30Z</published>
<link rel='alternate' type='text/html' href='https://git.taar-o.com/demo.git/commit/?id=25044d63057cdb134cc3930bb67b178cff1aebb4'/>
<id>urn:sha1:25044d63057cdb134cc3930bb67b178cff1aebb4</id>
<content type='text'>
Layer 0 output is clamped [0,1], does not need 0.5 dimming.
Middle layers (ReLU) keep 0.5 scale for values &gt;1.

Co-Authored-By: Claude Sonnet 4.5 &lt;noreply@anthropic.com&gt;
</content>
</entry>
<entry>
<title>CNN v2: Alpha channel depth handling and layer visualization</title>
<updated>2026-02-13T22:17:42Z</updated>
<author>
<name>skal</name>
<email>pascal.massimino@gmail.com</email>
</author>
<published>2026-02-13T22:17:42Z</published>
<link rel='alternate' type='text/html' href='https://git.taar-o.com/demo.git/commit/?id=6fa9ccf86b0bbefb48cefae19d4162115a3d63d3'/>
<id>urn:sha1:6fa9ccf86b0bbefb48cefae19d4162115a3d63d3</id>
<content type='text'>
Training changes:
- Changed p3 default depth from 0.0 to 1.0 (far plane semantics)
- Extract depth from target alpha channel in both datasets
- Consistent alpha-as-depth across training/validation

Test tool enhancements (cnn_test):
- Added load_depth_from_alpha() for R32Float depth texture
- Fixed bind group layout for UnfilterableFloat sampling
- Added --save-intermediates with per-channel grayscale composites
- Each layer saved as 4x wide PNG (p0-p3 stacked horizontally)
- Global layers_composite.png for vertical layer stack overview

Investigation notes:
- Static features p4-p7 ARE computed and bound correctly
- Sin_20_y pattern visibility difference between tools under investigation
- Binary weights timestamp (Feb 13 20:36) vs HTML tool (Feb 13 22:12)
- Next: Update HTML tool with canonical binary weights

handoff(Claude): HTML tool weights update pending - base64 encoded
canonical weights ready in /tmp/weights_b64.txt for line 392 replacement.

Co-Authored-By: Claude Sonnet 4.5 &lt;noreply@anthropic.com&gt;
</content>
</entry>
<entry>
<title>Refactor: Move application entry points to src/app/</title>
<updated>2026-02-13T07:14:07Z</updated>
<author>
<name>skal</name>
<email>pascal.massimino@gmail.com</email>
</author>
<published>2026-02-13T07:14:07Z</published>
<link rel='alternate' type='text/html' href='https://git.taar-o.com/demo.git/commit/?id=10673f00dfece584ba81d581b69c9ba706a5ea5a'/>
<id>urn:sha1:10673f00dfece584ba81d581b69c9ba706a5ea5a</id>
<content type='text'>
Moved main.cc, stub_main.cc, and test_demo.cc from src/ to src/app/
for better organization. Updated cmake/DemoExecutables.cmake paths.

handoff(Claude): App files reorganized into src/app/ directory
</content>
</entry>
<entry>
<title>Refine training script output and validation</title>
<updated>2026-02-12T11:17:59Z</updated>
<author>
<name>skal</name>
<email>pascal.massimino@gmail.com</email>
</author>
<published>2026-02-12T11:17:59Z</published>
<link rel='alternate' type='text/html' href='https://git.taar-o.com/demo.git/commit/?id=ff4c1213636e66d4457a95cad12300c58e8d6781'/>
<id>urn:sha1:ff4c1213636e66d4457a95cad12300c58e8d6781</id>
<content type='text'>
1. Loss printed at every epoch with \r (no scrolling)
2. Validation only on final epoch (not all checkpoints)
3. Process all input images (not just img_000.png)

Training output now shows live progress with single line update.
</content>
</entry>
<entry>
<title>TODO: 8-bit weight quantization for 2× size reduction</title>
<updated>2026-02-12T11:11:53Z</updated>
<author>
<name>skal</name>
<email>pascal.massimino@gmail.com</email>
</author>
<published>2026-02-12T11:11:53Z</published>
<link rel='alternate' type='text/html' href='https://git.taar-o.com/demo.git/commit/?id=eaf0bd855306e70ca03f2d6579b4d6551aff6482'/>
<id>urn:sha1:eaf0bd855306e70ca03f2d6579b4d6551aff6482</id>
<content type='text'>
- Add QAT (quantization-aware training) notes
- Requires training with fake quantization
- Target: ~1.6 KB weights (vs 3.2 KB f16)
- Shader unpacking needs adaptation (4× u8 per u32)
</content>
</entry>
<entry>
<title>CNN v2: Storage buffer complete - real weights exported</title>
<updated>2026-02-12T11:10:40Z</updated>
<author>
<name>skal</name>
<email>pascal.massimino@gmail.com</email>
</author>
<published>2026-02-12T11:10:40Z</published>
<link rel='alternate' type='text/html' href='https://git.taar-o.com/demo.git/commit/?id=e8344bc84ec0f571e5c5aafffe7c914abe226bd6'/>
<id>urn:sha1:e8344bc84ec0f571e5c5aafffe7c914abe226bd6</id>
<content type='text'>
- Export weights from epoch 70 checkpoint (3.2 KB binary)
- Disable shader template generation (use manual cnn_v2_compute.wgsl)
- Build successful with real weights
- Ready for integration testing

Storage buffer architecture complete:
- Dynamic layer count support
- ~0.3ms overhead vs constants (negligible)
- Single shader, flexible configuration
- Binary format: header + layer info + f16 weights
</content>
</entry>
<entry>
<title>CNN v2: Complete multi-layer compute execution</title>
<updated>2026-02-12T11:09:27Z</updated>
<author>
<name>skal</name>
<email>pascal.massimino@gmail.com</email>
</author>
<published>2026-02-12T11:09:27Z</published>
<link rel='alternate' type='text/html' href='https://git.taar-o.com/demo.git/commit/?id=b7b61f63bbc9645d843ef81c159c14e7e79aea6a'/>
<id>urn:sha1:b7b61f63bbc9645d843ef81c159c14e7e79aea6a</id>
<content type='text'>
- Create bind groups per layer with ping-pong buffers
- Update layer params uniform per dispatch
- Execute all layers in sequence with proper input/output swapping
- Ready for weight export and end-to-end testing
</content>
</entry>
<entry>
<title>CNN v2: storage buffer architecture foundation</title>
<updated>2026-02-12T11:08:22Z</updated>
<author>
<name>skal</name>
<email>pascal.massimino@gmail.com</email>
</author>
<published>2026-02-12T11:08:22Z</published>
<link rel='alternate' type='text/html' href='https://git.taar-o.com/demo.git/commit/?id=4d87a6d781c3f159d216f4cd9251e3d7bd63554f'/>
<id>urn:sha1:4d87a6d781c3f159d216f4cd9251e3d7bd63554f</id>
<content type='text'>
- Add binary weight format (header + layer info + packed f16)
- New export_cnn_v2_weights.py for binary weight export
- Single cnn_v2_compute.wgsl shader with storage buffer
- Load weights in CNNv2Effect::load_weights()
- Create layer compute pipeline with 5 bindings
- Fast training config: 100 epochs, 3×3 kernels, 8→4→4 channels

Next: Complete bind group creation and multi-layer compute execution
</content>
</entry>
</feed>
