diff options
| author | skal <pascal.massimino@gmail.com> | 2026-03-21 10:50:02 +0100 |
|---|---|---|
| committer | skal <pascal.massimino@gmail.com> | 2026-03-21 10:50:02 +0100 |
| commit | 35355b17576e93b035a2a78ecd05771e98f068ee (patch) | |
| tree | a1c1a4563a62ad69c808383fcf0bce1ccf4c5765 /cnn_v3/docs/HOW_TO_CNN.md | |
| parent | e343021ac007549c76e58b27a361b11dd3f6a136 (diff) | |
feat(cnn_v3): HTML WebGPU tool (index.html + shaders.js + tester.js)
3-file tool, 939 lines total. Implements full U-Net+FiLM inference in
the browser: Pack→Enc0→Enc1→Bottleneck→Dec1→Dec0 compute passes,
layer visualisation (Feat/Enc0/Enc1/BN/Dec1/Output), FiLM MLP sliders,
drag-drop weights + image/video, Save PNG, diff/blend view modes.
HOW_TO_CNN.md §7 updated to reflect tool is implemented.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Diffstat (limited to 'cnn_v3/docs/HOW_TO_CNN.md')
| -rw-r--r-- | cnn_v3/docs/HOW_TO_CNN.md | 141 |
1 files changed, 50 insertions, 91 deletions
diff --git a/cnn_v3/docs/HOW_TO_CNN.md b/cnn_v3/docs/HOW_TO_CNN.md index 8c41ab0..f325a38 100644 --- a/cnn_v3/docs/HOW_TO_CNN.md +++ b/cnn_v3/docs/HOW_TO_CNN.md @@ -646,120 +646,76 @@ If results drift after shader edits, verify these invariants match the Python re ## 7. HTML WebGPU Tool -### Current state +**Location:** `cnn_v3/tools/` — three files, no build step. -There is no dedicated CNN v3 HTML tool yet. -The CNN v2 tool (`cnn_v2/tools/cnn_v2_test/index.html`) is the reference pattern. +| File | Lines | Contents | +|------|-------|----------| +| `index.html` | 147 | HTML + CSS | +| `shaders.js` | 252 | WGSL shader constants, weight-offset constants | +| `tester.js` | 540 | `CNNv3Tester` class, event wiring | -### CNN v2 tool as reference +### Usage -The v2 tool is a single self-contained HTML file demonstrating: -- Inline WGSL shaders (no build step) -- Drag-and-drop `.bin` weight loading -- Image/video file input -- Intermediate layer visualisation -- View modes: CNN output / original / diff×10 -- Side panel with per-layer weight statistics - -A v3 tool follows the same pattern with a more complex texture chain. - -### What a CNN v3 HTML tool requires - -**WGSL shaders to inline** (resolve `#include "cnn_v3/common"` via JS string substitution): - -```js -const common = `/* contents of cnn_v3_common.wgsl */`; -const enc0_src = enc0_template.replace('#include "cnn_v3/common"', common); +```bash +# Requires HTTP server (WebGPU blocked on file://) +cd /path/to/demo +python3 -m http.server 8080 +# Open: http://localhost:8080/cnn_v3/tools/ ``` -**Texture chain:** - -| Texture | Format | Size | -|---------|--------|------| -| feat_tex0 (input) | rgba32uint | W × H | -| feat_tex1 (input) | rgba32uint | W × H | -| enc0_tex | rgba16float | W × H | -| enc1_tex | rgba32uint | W/2 × H/2 | -| bottleneck_tex | rgba32uint | W/4 × H/4 | -| dec1_tex | rgba16float | W/2 × H/2 | -| output_tex | rgba16float | W × H | - -`rgba32uint` textures cannot be sampled; use `textureLoad` — already done in the shaders. - -**Weight loading:** - -```js -const resp = await fetch('cnn_v3_weights.bin'); -const buf = await resp.arrayBuffer(); -const gpu_buf = device.createBuffer({ - size: buf.byteLength, - usage: GPUBufferUsage.STORAGE | GPUBufferUsage.COPY_DST -}); -device.queue.writeBuffer(gpu_buf, 0, buf); +Or on macOS with Chrome: +```bash +open -a "Google Chrome" --args --allow-file-access-from-files +open cnn_v3/tools/index.html ``` -**FiLM MLP inference (JS-side):** +### Workflow -```js -// Load cnn_v3_film_mlp.bin as Float32Array -const mlp = new Float32Array(await (await fetch('cnn_v3_film_mlp.bin')).arrayBuffer()); -const L0_W = mlp.subarray(0, 80); // (16×5) row-major -const L0_b = mlp.subarray(80, 96); -const L1_W = mlp.subarray(96, 736); // (40×16) row-major -const L1_b = mlp.subarray(736, 776); +1. **Drop `cnn_v3_weights.bin`** onto the left "weights" drop zone. +2. **Drop a PNG or video** onto the centre canvas → CNN runs immediately. +3. _(Optional)_ **Drop `cnn_v3_film_mlp.bin`** → FiLM sliders become active. +4. Adjust **beat_phase / beat_norm / audio_int / style_p0 / style_p1** sliders → reruns on change. +5. Click layer buttons (**Feat · Enc0 · Enc1 · BN · Dec1 · Output**) in the right panel to inspect activations. +6. **Save PNG** to export the current output. -function mlp_forward(cond5) { - // h = relu(L0_W @ cond + L0_b) - const h = new Float32Array(16); - for (let o = 0; o < 16; o++) { - let s = L0_b[o]; - for (let i = 0; i < 5; i++) s += L0_W[o * 5 + i] * cond5[i]; - h[o] = Math.max(0, s); - } - // out = L1_W @ h + L1_b - const out = new Float32Array(40); - for (let o = 0; o < 40; o++) { - let s = L1_b[o]; - for (let i = 0; i < 16; i++) s += L1_W[o * 16 + i] * h[i]; - out[o] = s; - } - return out; // [γenc0×4, βenc0×4, γenc1×8, βenc1×8, γdec1×4, βdec1×4, γdec0×4, βdec0×4] -} -``` +Keyboard: `[SPACE]` toggle original · `[D]` diff×10. -The 40 outputs split into per-layer γ/β and uploaded to the 5 params uniform buffers -before each compute dispatch. +### Input files -**Input feature assembly from a photo:** +| File | Format | Notes | +|------|--------|-------| +| `cnn_v3_weights.bin` | raw u32 (no header) | 982 u32 = 1964 f16 = ~3.9 KB | +| `cnn_v3_film_mlp.bin` | raw f32 | 776 f32 = 3.1 KB; optional — identity FiLM used if absent | -For simple (photo-only) mode, build `feat_tex0` and `feat_tex1` from the image data: -- `feat_tex0`: pack albedo RGB (f16×3), normal XY (128,128 neutral → 0.0 in oct), depth (0), depth_grad (0,0) as `pack2x16float` into rgba32uint -- `feat_tex1`: pack mat_id (0), prev.rgb (0,0,0), mip1.rgb, mip2.rgb, shadow (1.0), transp (0) as `pack4x8unorm` into rgba32uint +Both produced by `export_cnn_v3_weights.py` (§3). -See `cnn_v3/shaders/gbuf_pack.wgsl` for the exact packing layout (mirrors `GBufferEffect`). +### Texture chain -### Serving locally +| Texture | Format | Size | +|---------|--------|------| +| `feat_tex0` | rgba32uint | W × H (8 f16: albedo, normal, depth, depth_grad) | +| `feat_tex1` | rgba32uint | W × H (12 u8: mat_id, prev, mip1, mip2, shadow, transp) | +| `enc0_tex` | rgba16float | W × H | +| `enc1_tex` | rgba32uint | W/2 × H/2 (8 f16 packed) | +| `bn_tex` | rgba32uint | W/4 × H/4 | +| `dec1_tex` | rgba16float | W/2 × H/2 | +| `output_tex` | rgba16float | W × H → displayed on canvas | -Chrome requires a real HTTP server for WebGPU (not `file://`): +### Simple mode (photo input) -```bash -python3 -m http.server 8080 -# Open: http://localhost:8080/cnn_v3/tools/cnn_v3_test/index.html -``` +Albedo = image RGB, mip1/mip2 from GPU mipmaps, shadow = 1.0, transp = 1 − alpha, +all geometric channels (normal, depth, depth_grad, mat_id, prev) = 0. ### Browser requirements -- Chrome 113+ with WebGPU enabled (default on desktop) +- Chrome 113+ / Edge 113+ (WebGPU on by default) - Firefox Nightly with `dom.webgpu.enabled = true` -- Required features: check `device.features.has('shader-f16')` for f16 support; - fall back to f32 accumulation if absent ### Pitfalls -- `rgba32uint` requires `STORAGE` + `TEXTURE_BINDING` usage flags; missing either causes bind group creation failure -- WGSL `#include "cnn_v3/common"` must be resolved via JS string replace before passing to `device.createShaderModule()` -- Workgroup dispatch: `Math.ceil(W / 8)` × `Math.ceil(H / 8)` — same formula as C++ -- Cross-origin image loading requires CORS headers or same-origin hosting +- `rgba32uint` and `rgba16float` textures both need `STORAGE_BINDING | TEXTURE_BINDING` usage. +- Weight offsets are **f16 indices** (enc0=0, enc1=724, bn=1020, dec1=1092, dec0=1672). +- Uniform buffer layouts must match WGSL `Params` structs exactly (padding included). --- @@ -780,6 +736,9 @@ python3 -m http.server 8080 | `cnn_v3/src/gbuffer_effect.h/.cc` | GBufferEffect: rasterise + pack G-buffer feature textures | | `src/tests/gpu/test_cnn_v3_parity.cc` | Per-pixel parity test (WGSL vs. Python reference) | | `cnn_v3/docs/CNN_V3.md` | Full architecture spec (U-Net, FiLM, WGSL uniform layouts) | +| `cnn_v3/tools/index.html` | HTML tool — UI shell + CSS | +| `cnn_v3/tools/shaders.js` | HTML tool — inline WGSL shaders + weight-offset constants | +| `cnn_v3/tools/tester.js` | HTML tool — CNNv3Tester class, inference pipeline, layer viz | | `cnn_v2/tools/cnn_v2_test/index.html` | HTML tool reference pattern (v2) | --- |
