summaryrefslogtreecommitdiff
path: root/cnn_v3
diff options
context:
space:
mode:
Diffstat (limited to 'cnn_v3')
-rw-r--r--cnn_v3/docs/CNN_V3.md87
-rw-r--r--cnn_v3/docs/HOWTO.md47
-rw-r--r--cnn_v3/docs/HOW_TO_CNN.md32
-rw-r--r--cnn_v3/docs/cnn_v3_architecture.pngbin254783 -> 256685 bytes
-rw-r--r--cnn_v3/docs/gen_architecture_png.py18
-rw-r--r--cnn_v3/shaders/cnn_v3_bottleneck.wgsl54
-rw-r--r--cnn_v3/shaders/cnn_v3_dec0.wgsl43
-rw-r--r--cnn_v3/shaders/cnn_v3_dec1.wgsl85
-rw-r--r--cnn_v3/shaders/cnn_v3_enc0.wgsl37
-rw-r--r--cnn_v3/shaders/cnn_v3_enc1.wgsl75
-rw-r--r--cnn_v3/src/cnn_v3_effect.cc247
-rw-r--r--cnn_v3/src/cnn_v3_effect.h101
-rw-r--r--cnn_v3/test_vectors.h667
-rw-r--r--cnn_v3/tools/shaders.js186
-rw-r--r--cnn_v3/tools/tester.js88
-rw-r--r--cnn_v3/tools/weights.js4
-rw-r--r--cnn_v3/training/export_cnn_v3_weights.py51
-rw-r--r--cnn_v3/training/gen_test_vectors.py91
-rw-r--r--cnn_v3/training/infer_cnn_v3.py4
-rw-r--r--cnn_v3/training/train_cnn_v3.py28
20 files changed, 1241 insertions, 704 deletions
diff --git a/cnn_v3/docs/CNN_V3.md b/cnn_v3/docs/CNN_V3.md
index d775e2b..a197a1d 100644
--- a/cnn_v3/docs/CNN_V3.md
+++ b/cnn_v3/docs/CNN_V3.md
@@ -19,7 +19,7 @@ CNN v3 is a next-generation post-processing effect using:
- Training from both Blender renders and real photos
- Strict test framework: per-pixel bit-exact validation across all implementations
-**Status:** Phases 1–5 complete. Parity validated (max_err=4.88e-4 ≤ 1/255). Next: `train_cnn_v3.py` for FiLM MLP training.
+**Status:** Phases 1–7 complete. Architecture upgraded to enc_channels=[8,16] for improved capacity. Parity test and runtime updated. Next: training pass.
---
@@ -52,14 +52,14 @@ A small MLP takes a conditioning vector `c` and outputs all γ/β:
c = [beat_phase, beat_time/8, audio_intensity, style_p0, style_p1] (5D)
↓ Linear(5 → 16) → ReLU
↓ Linear(16 → N_film_params)
- → [γ_enc0(4ch), β_enc0(4ch), γ_enc1(8ch), β_enc1(8ch),
- γ_dec1(4ch), β_dec1(4ch), γ_dec0(4ch), β_dec0(4ch)]
- = 2 × (4+8+4+4) = 40 parameters output
+ → [γ_enc0(8ch), β_enc0(8ch), γ_enc1(16ch), β_enc1(16ch),
+ γ_dec1(8ch), β_dec1(8ch), γ_dec0(4ch), β_dec0(4ch)]
+ = 2 × (8+16+8+4) = 72 parameters output
```
**Runtime cost:** trivial (one MLP forward pass per frame, CPU-side).
**Training:** jointly trained with U-Net — backprop through FiLM to MLP.
-**Size:** MLP weights ~(5×16 + 16×40) × 2 bytes f16 ≈ 1.4 KB.
+**Size:** MLP weights ~(5×16 + 16×72) × 2 bytes f16 ≈ 2.5 KB.
**Why FiLM instead of just uniform parameters?**
- γ/β are per-channel, enabling fine-grained style control
@@ -318,22 +318,25 @@ All f16, little-endian, same packing as v2 (`pack2x16float`).
## Size Budget
-**CNN v3 target: ≤ 6 KB weights**
+**CNN v3 target: ≤ 6 KB weights (conv only); current arch prioritises quality**
-**Implemented architecture (fits ≤ 4 KB):**
+**Implemented architecture (enc_channels=[8,16] — ~15.3 KB conv f16):**
| Component | Weights | Bias | Total f16 |
|-----------|---------|------|-----------|
-| enc0: Conv(20→4, 3×3) | 20×4×9=720 | +4 | 724 |
-| enc1: Conv(4→8, 3×3) | 4×8×9=288 | +8 | 296 |
-| bottleneck: Conv(8→8, 3×3, dil=2) | 8×8×9=576 | +8 | 584 |
-| dec1: Conv(16→4, 3×3) | 16×4×9=576 | +4 | 580 |
-| dec0: Conv(8→4, 3×3) | 8×4×9=288 | +4 | 292 |
-| FiLM MLP (5→16→40) | 5×16+16×40=720 | +16+40 | 776 |
-| **Total conv** | | | **~4.84 KB f16** |
+| enc0: Conv(20→8, 3×3) | 20×8×9=1440 | +8 | 1448 |
+| enc1: Conv(8→16, 3×3) | 8×16×9=1152 | +16 | 1168 |
+| bottleneck: Conv(16→16, 3×3, dil=2) | 16×16×9=2304 | +16 | 2320 |
+| dec1: Conv(32→8, 3×3) | 32×8×9=2304 | +8 | 2312 |
+| dec0: Conv(16→4, 3×3) | 16×4×9=576 | +4 | 580 |
+| **Total conv** | | | **7828 f16 = ~15.3 KB** |
+| FiLM MLP (5→16→72) | 5×16+16×72=1232 | +16+72 | 1320 |
+| **Total incl. MLP** | | | **9148 f16 = ~17.9 KB** |
-Skip connections: dec1 input = 8ch (bottleneck) + 8ch (enc1 skip) = 16ch.
-dec0 input = 4ch (dec1) + 4ch (enc0 skip) = 8ch.
+Skip connections: dec1 input = 16ch (bottleneck up) + 16ch (enc1 skip) = 32ch.
+dec0 input = 8ch (dec1 up) + 8ch (enc0 skip) = 16ch.
+
+**Smaller variant (enc_channels=[4,8] — ~4.84 KB conv f16):** fits 6 KB target but has lower representational capacity. Train with `--enc-channels 4,8` if size-critical.
---
@@ -507,7 +510,7 @@ All tests: max per-pixel per-channel absolute error ≤ 1/255 (PyTorch f32 vs We
```python
class CNNv3(nn.Module):
- def __init__(self, enc_channels=[4,8], film_cond_dim=5):
+ def __init__(self, enc_channels=[8,16], film_cond_dim=5):
super().__init__()
# Encoder
self.enc = nn.ModuleList([
@@ -681,11 +684,11 @@ Parity results:
```
Pass 0: pack_gbuffer.wgsl — assemble G-buffer channels into storage texture
-Pass 1: cnn_v3_enc0.wgsl — encoder level 0 (20→4ch, 3×3)
-Pass 2: cnn_v3_enc1.wgsl — encoder level 1 (4→8ch, 3×3) + downsample
-Pass 3: cnn_v3_bottleneck.wgsl — bottleneck (8→8, 3×3, dilation=2)
-Pass 4: cnn_v3_dec1.wgsl — decoder level 1: upsample + skip + (16→4, 3×3)
-Pass 5: cnn_v3_dec0.wgsl — decoder level 0: upsample + skip + (8→4, 3×3)
+Pass 1: cnn_v3_enc0.wgsl — encoder level 0 (20→8ch, 3×3)
+Pass 2: cnn_v3_enc1.wgsl — encoder level 1 (8→16ch, 3×3) + downsample
+Pass 3: cnn_v3_bottleneck.wgsl — bottleneck (16→16, 3×3, dilation=2)
+Pass 4: cnn_v3_dec1.wgsl — decoder level 1: upsample + skip + (32→8, 3×3)
+Pass 5: cnn_v3_dec0.wgsl — decoder level 0: upsample + skip + (16→4, 3×3)
Pass 6: cnn_v3_output.wgsl — sigmoid + composite to framebuffer
```
@@ -788,11 +791,11 @@ Status bar shows which channels are loaded.
| Shader | Replaces | Notes |
|--------|----------|-------|
| `PACK_SHADER` | `STATIC_SHADER` | 20ch into feat_tex0 + feat_tex1 (rgba32uint each) |
-| `ENC0_SHADER` | part of `CNN_SHADER` | Conv(20→4, 3×3) + FiLM + ReLU; writes enc0_tex |
-| `ENC1_SHADER` | | Conv(4→8, 3×3) + FiLM + ReLU + avg_pool2×2; writes enc1_tex (half-res) |
-| `BOTTLENECK_SHADER` | | Conv(8→8, 3×3, dilation=2) + ReLU; writes bn_tex |
-| `DEC1_SHADER` | | nearest upsample×2 + concat(bn, enc1_skip) + Conv(16→4, 3×3) + FiLM + ReLU |
-| `DEC0_SHADER` | | nearest upsample×2 + concat(dec1, enc0_skip) + Conv(8→4, 3×3) + FiLM + ReLU |
+| `ENC0_SHADER` | part of `CNN_SHADER` | Conv(20→8, 3×3) + FiLM + ReLU; writes enc0_tex (rgba32uint, 8ch) |
+| `ENC1_SHADER` | | Conv(8→16, 3×3) + FiLM + ReLU + avg_pool2×2; writes enc1_lo+enc1_hi (2× rgba32uint, 16ch split) |
+| `BOTTLENECK_SHADER` | | Conv(16→16, 3×3, dilation=2) + ReLU; writes bn_lo+bn_hi (2× rgba32uint, 16ch split) |
+| `DEC1_SHADER` | | nearest upsample×2 + concat(bn, enc1_skip) + Conv(32→8, 3×3) + FiLM + ReLU; writes dec1_tex (rgba32uint, 8ch) |
+| `DEC0_SHADER` | | nearest upsample×2 + concat(dec1, enc0_skip) + Conv(16→4, 3×3) + FiLM + ReLU; writes rgba16float |
| `OUTPUT_SHADER` | | Conv(4→4, 1×1) + sigmoid → composites to canvas |
FiLM γ/β computed JS-side from sliders (tiny MLP forward pass in JS), uploaded as uniform.
@@ -805,15 +808,15 @@ FiLM γ/β computed JS-side from sliders (tiny MLP forward pass in JS), uploaded
|------|------|--------|----------|
| `feat_tex0` | W×H | rgba32uint | feature buffer slots 0–7 (f16) |
| `feat_tex1` | W×H | rgba32uint | feature buffer slots 8–19 (u8+spare) |
-| `enc0_tex` | W×H | rgba32uint | 4 channels f16 (enc0 output, skip) |
-| `enc1_tex` | W/2×H/2 | rgba32uint | 8 channels f16 (enc1 out, skip) — 2 texels per pixel |
-| `bn_tex` | W/2×H/2 | rgba32uint | 8 channels f16 (bottleneck output) |
-| `dec1_tex` | W×H | rgba32uint | 4 channels f16 (dec1 output) |
-| `dec0_tex` | W×H | rgba32uint | 4 channels f16 (dec0 output) |
+| `enc0_tex` | W×H | rgba32uint | 8 channels f16 (enc0 output, skip) |
+| `enc1_lo` + `enc1_hi` | W/2×H/2 each | rgba32uint | 16 channels f16 split (enc1 out, skip) |
+| `bn_lo` + `bn_hi` | W/4×H/4 each | rgba32uint | 16 channels f16 split (bottleneck output) |
+| `dec1_tex` | W/2×H/2 | rgba32uint | 8 channels f16 (dec1 output) |
+| `dec0_tex` | W×H | rgba16float | 4 channels f16 (final RGBA output) |
| `prev_tex` | W×H | rgba16float | previous CNN output (temporal, `F16X8`) |
-Skip connections: enc0_tex and enc1_tex are **kept alive** across the full forward pass
-(not ping-ponged away). DEC1 and DEC0 read them directly.
+Skip connections: enc0_tex (8ch) and enc1_lo/enc1_hi (16ch split) are **kept alive** across the
+full forward pass (not ping-ponged away). DEC1 and DEC0 read them directly.
---
@@ -856,7 +859,7 @@ python3 -m http.server 8000
Ordered for parallel execution where possible. Phases 1 and 2 are independent.
-**Architecture locked:** enc_channels = [4, 8]. See Size Budget for weight counts.
+**Architecture:** enc_channels = [8, 16]. See Size Budget for weight counts.
---
@@ -881,7 +884,7 @@ before the real G-buffer exists. Wire real G-buffer in Phase 5.
**1a. PyTorch model**
- [ ] `cnn_v3/training/train_cnn_v3.py`
- - [ ] `CNNv3` class: U-Net [4,8], FiLM MLP (5→16→48), channel dropout
+ - [ ] `CNNv3` class: U-Net [8,16], FiLM MLP (5→16→72), channel dropout
- [ ] `GBufferDataset`: loads 20-channel feature tensors from packed PNGs
- [ ] Training loop, checkpointing, grayscale/RGBA loss option
@@ -919,11 +922,11 @@ no batch norm at inference, `#include` existing snippets where possible.
- writes feat_tex0 (f16×8) + feat_tex1 (u8×12, spare)
**2b. U-Net compute shaders**
-- [ ] `src/effects/cnn_v3_enc0.wgsl` — Conv(20→4, 3×3) + FiLM + ReLU
-- [ ] `src/effects/cnn_v3_enc1.wgsl` — Conv(4→8, 3×3) + FiLM + ReLU + avg_pool 2×2
-- [ ] `src/effects/cnn_v3_bottleneck.wgsl` — Conv(8→8, 1×1) + FiLM + ReLU
-- [ ] `src/effects/cnn_v3_dec1.wgsl` — nearest upsample×2 + concat enc1_skip + Conv(16→4, 3×3) + FiLM + ReLU
-- [ ] `src/effects/cnn_v3_dec0.wgsl` — nearest upsample×2 + concat enc0_skip + Conv(8→4, 3×3) + FiLM + ReLU
+- [ ] `src/effects/cnn_v3_enc0.wgsl` — Conv(20→8, 3×3) + FiLM + ReLU
+- [ ] `src/effects/cnn_v3_enc1.wgsl` — Conv(8→16, 3×3) + FiLM + ReLU + avg_pool 2×2
+- [ ] `src/effects/cnn_v3_bottleneck.wgsl` — Conv(16→16, 3×3, dilation=2) + ReLU
+- [ ] `src/effects/cnn_v3_dec1.wgsl` — nearest upsample×2 + concat enc1_skip + Conv(32→8, 3×3) + FiLM + ReLU
+- [ ] `src/effects/cnn_v3_dec0.wgsl` — nearest upsample×2 + concat enc0_skip + Conv(16→4, 3×3) + FiLM + ReLU
- [ ] `src/effects/cnn_v3_output.wgsl` — Conv(4→4, 1×1) + sigmoid → composite to framebuffer
Reuse from existing shaders:
@@ -941,7 +944,7 @@ Reuse from existing shaders:
- [ ] `src/effects/cnn_v3_effect.h` — class declaration
- textures: feat_tex0, feat_tex1, enc0_tex, enc1_tex (half-res), bn_tex (half-res), dec1_tex, dec0_tex
- **`WGPUTexture prev_cnn_tex_`** — persistent RGBA8, owned by effect, initialized black
- - `FilmParams` uniform buffer (γ/β for 4 levels = 48 floats = 192 bytes)
+ - `FilmParams` uniform buffer (γ/β for 4 levels = 72 floats = 288 bytes)
- FiLM MLP weights (loaded from .bin, run CPU-side per frame)
- [ ] `src/effects/cnn_v3_effect.cc` — implementation
diff --git a/cnn_v3/docs/HOWTO.md b/cnn_v3/docs/HOWTO.md
index 9a3efdf..ff8793f 100644
--- a/cnn_v3/docs/HOWTO.md
+++ b/cnn_v3/docs/HOWTO.md
@@ -267,22 +267,30 @@ Two source files:
```bash
cd cnn_v3/training
-# Patch-based (default) — 64×64 patches around Harris corners
-python3 train_cnn_v3.py \
+# Recommended: [8,16] channels + multi-scale loss (matches runtime)
+uv run python3 train_cnn_v3.py \
--input dataset/ \
- --input-mode simple \
- --epochs 200
+ --enc-channels 8,16 \
+ --epochs 5000 \
+ --checkpoint-dir checkpoints_8_16
# Full-image mode (resizes to 256×256)
-python3 train_cnn_v3.py \
+uv run python3 train_cnn_v3.py \
--input dataset/ \
- --input-mode full \
+ --enc-channels 8,16 \
--full-image --image-size 256 \
- --epochs 500
+ --epochs 5000
+
+# Size-budget variant [4,8] (fits 6 KB)
+uv run python3 train_cnn_v3.py \
+ --input dataset/ \
+ --enc-channels 4,8 \
+ --epochs 5000
# Quick smoke test: 1 epoch, small patches, random detector
-python3 train_cnn_v3.py \
+uv run python3 train_cnn_v3.py \
--input dataset/ --epochs 1 \
+ --enc-channels 8,16 \
--patch-size 32 --detector random
```
@@ -318,7 +326,7 @@ All other flags (`--epochs`, `--lr`, `--checkpoint-dir`, `--enc-channels`, etc.)
| `--detector` | `harris` | `harris` \| `shi-tomasi` \| `fast` \| `gradient` \| `random` |
| `--channel-dropout-p F` | `0.3` | Dropout prob for geometric channels |
| `--full-image` | off | Resize full image instead of cropping patches |
-| `--enc-channels C` | `4,8` | Encoder channel counts, comma-separated |
+| `--enc-channels C` | `4,8` | Encoder channel counts: `8,16` (current default runtime), `4,8` (size budget) |
| `--film-cond-dim N` | `5` | FiLM conditioning input size |
| `--epochs N` | `200` | Training epochs |
| `--batch-size N` | `16` | Batch size |
@@ -397,6 +405,7 @@ Test vectors generated by `cnn_v3/training/gen_test_vectors.py` (PyTorch referen
| 5 — Parity validation | ✅ Done | test_cnn_v3_parity.cc, max_err=4.88e-4 |
| 6 — FiLM MLP training | ✅ Done | train_cnn_v3.py + cnn_v3_utils.py written |
| 7 — G-buffer visualizer (C++) | ✅ Done | GBufViewEffect, 36/36 tests pass |
+| 8 — Architecture upgrade [8,16] | ✅ Done | enc_channels=[8,16], multi-scale loss, 16ch textures split into lo/hi pairs |
| 7 — Sample loader (web tool) | ✅ Done | "Load sample directory" in cnn_v3/tools/ |
---
@@ -408,10 +417,10 @@ The common snippet provides `get_w()` and `unpack_8ch()`.
| Pass | Shader | Input(s) | Output | Dims |
|------|--------|----------|--------|------|
-| enc0 | `cnn_v3_enc0.wgsl` | feat_tex0+feat_tex1 (20ch) | enc0_tex rgba16float (4ch) | full |
-| enc1 | `cnn_v3_enc1.wgsl` | enc0_tex (AvgPool2×2 inline) | enc1_tex rgba32uint (8ch) | ½ |
-| bottleneck | `cnn_v3_bottleneck.wgsl` | enc1_tex (AvgPool2×2 inline) | bottleneck_tex rgba32uint (8ch) | ¼ |
-| dec1 | `cnn_v3_dec1.wgsl` | bottleneck_tex + enc1_tex (skip) | dec1_tex rgba16float (4ch) | ½ |
+| enc0 | `cnn_v3_enc0.wgsl` | feat_tex0+feat_tex1 (20ch) | enc0_tex rgba32uint (8ch) | full |
+| enc1 | `cnn_v3_enc1.wgsl` | enc0_tex (AvgPool2×2 inline) | enc1_lo+enc1_hi rgba32uint (16ch split) | ½ |
+| bottleneck | `cnn_v3_bottleneck.wgsl` | enc1_lo+enc1_hi (AvgPool2×2 inline) | bn_lo+bn_hi rgba32uint (16ch split) | ¼ |
+| dec1 | `cnn_v3_dec1.wgsl` | bn_lo+bn_hi + enc1_lo+enc1_hi (skip) | dec1_tex rgba32uint (8ch) | ½ |
| dec0 | `cnn_v3_dec0.wgsl` | dec1_tex + enc0_tex (skip) | output_tex rgba16float (4ch) | full |
**Parity rules baked into the shaders:**
@@ -437,12 +446,12 @@ FiLM γ/β are computed CPU-side by the FiLM MLP (Phase 4) and uploaded each fra
**Weight offsets** (f16 units, including bias):
| Layer | Weights | Bias | Total f16 |
|-------|---------|------|-----------|
-| enc0 | 20×4×9=720 | +4 | 724 |
-| enc1 | 4×8×9=288 | +8 | 296 |
-| bottleneck | 8×8×9=576 | +8 | 584 |
-| dec1 | 16×4×9=576 | +4 | 580 |
-| dec0 | 8×4×9=288 | +4 | 292 |
-| **Total** | | | **2476 f16 = ~4.84 KB** |
+| enc0 | 20×8×9=1440 | +8 | 1448 |
+| enc1 | 8×16×9=1152 | +16 | 1168 |
+| bottleneck | 16×16×9=2304 | +16 | 2320 |
+| dec1 | 32×8×9=2304 | +8 | 2312 |
+| dec0 | 16×4×9=576 | +4 | 580 |
+| **Total** | | | **7828 f16 = ~15.3 KB** |
**Asset IDs** (registered in `workspaces/main/assets.txt` + `src/effects/shaders.cc`):
`SHADER_CNN_V3_COMMON`, `SHADER_CNN_V3_ENC0`, `SHADER_CNN_V3_ENC1`,
diff --git a/cnn_v3/docs/HOW_TO_CNN.md b/cnn_v3/docs/HOW_TO_CNN.md
index 09db97c..11ed260 100644
--- a/cnn_v3/docs/HOW_TO_CNN.md
+++ b/cnn_v3/docs/HOW_TO_CNN.md
@@ -358,7 +358,7 @@ uv run train_cnn_v3.py \
The model prints its parameter count:
```
-Model: enc=[4, 8] film_cond_dim=5 params=3252 (~6.4 KB f16)
+Model: enc=[8, 16] film_cond_dim=5 params=9148 (~17.9 KB f16)
```
If `params` is much higher, `--enc-channels` was changed; update C++ constants accordingly.
@@ -492,12 +492,12 @@ WEIGHTS_CNN_V3_FILM_MLP, BINARY, weights/cnn_v3_film_mlp.bin, "CNN v3 FiLM MLP w
| Layer | f16 count | Bytes |
|-------|-----------|-------|
-| enc0 Conv(20→4,3×3)+bias | 724 | — |
-| enc1 Conv(4→8,3×3)+bias | 296 | — |
-| bottleneck Conv(8→8,3×3,dil=2)+bias | 584 | — |
-| dec1 Conv(16→4,3×3)+bias | 580 | — |
-| dec0 Conv(8→4,3×3)+bias | 292 | — |
-| **Total** | **2476 f16** | **4952 bytes** |
+| enc0 Conv(20→8,3×3)+bias | 1448 | — |
+| enc1 Conv(8→16,3×3)+bias | 1168 | — |
+| bottleneck Conv(16→16,3×3,dil=2)+bias | 2320 | — |
+| dec1 Conv(32→8,3×3)+bias | 2312 | — |
+| dec0 Conv(16→4,3×3)+bias | 580 | — |
+| **Total** | **7828 f16** | **15656 bytes** |
**`cnn_v3_film_mlp.bin`** — FiLM MLP weights as raw f32, row-major:
@@ -505,9 +505,9 @@ WEIGHTS_CNN_V3_FILM_MLP, BINARY, weights/cnn_v3_film_mlp.bin, "CNN v3 FiLM MLP w
|-------|-------|-----------|
| L0 weight | (16, 5) | 80 |
| L0 bias | (16,) | 16 |
-| L1 weight | (40, 16) | 640 |
-| L1 bias | (40,) | 40 |
-| **Total** | | **776 f32 = 3104 bytes** |
+| L1 weight | (72, 16) | 1152 |
+| L1 bias | (72,) | 72 |
+| **Total** | | **1320 f32 = 5280 bytes** |
The FiLM MLP is for CPU-side inference (future — see §4d). The U-Net weights in
`cnn_v3_weights.bin` are what you need immediately.
@@ -524,16 +524,16 @@ The export script produces this layout: `u32 = u16[0::2] | (u16[1::2] << 16)`.
```
Checkpoint: epoch=200 loss=0.012345
- enc_channels=[4, 8] film_cond_dim=5
+ enc_channels=[8, 16] film_cond_dim=5
cnn_v3_weights.bin
- 2476 f16 values → 1238 u32 → 4952 bytes
- Upload via CNNv3Effect::upload_weights(queue, data, 4952)
+ 7828 f16 values → 3914 u32 → 15656 bytes
+ Upload via CNNv3Effect::upload_weights(queue, data, 15656)
cnn_v3_film_mlp.bin
L0: weight (16, 5) + bias (16,)
- L1: weight (40, 16) + bias (40,)
- 776 f32 values → 3104 bytes
+ L1: weight (72, 16) + bias (72,)
+ 1320 f32 values → 5280 bytes
```
### Pitfalls
@@ -542,7 +542,7 @@ cnn_v3_film_mlp.bin
assertion in the export script fires. The C++ weight-offset constants (`kEnc0Weights` etc.)
in `cnn_v3_effect.cc` must also be updated to match.
- **Old checkpoint missing `config`:** if `config` key is absent (checkpoint from a very early
- version), the script defaults to `enc_channels=[4,8], film_cond_dim=5`.
+ version), the script defaults to `enc_channels=[8,16], film_cond_dim=5`.
- **`weights_only=True`:** requires PyTorch ≥ 2.0. If you get a warning, upgrade torch.
---
diff --git a/cnn_v3/docs/cnn_v3_architecture.png b/cnn_v3/docs/cnn_v3_architecture.png
index 2116c2b..474f488 100644
--- a/cnn_v3/docs/cnn_v3_architecture.png
+++ b/cnn_v3/docs/cnn_v3_architecture.png
Binary files differ
diff --git a/cnn_v3/docs/gen_architecture_png.py b/cnn_v3/docs/gen_architecture_png.py
index bd60a97..1c2ff65 100644
--- a/cnn_v3/docs/gen_architecture_png.py
+++ b/cnn_v3/docs/gen_architecture_png.py
@@ -108,20 +108,20 @@ def dim_label(x, y, txt):
box(EX, Y_IN, BW, BH_IO, C_IO, 'G-Buffer Features',
'20 channels · full res')
-box(EX, Y_E0, BW, BH, C_ENC, 'enc0 Conv(20→4, 3×3) + FiLM + ReLU',
- 'full res · 4 ch')
+box(EX, Y_E0, BW, BH, C_ENC, 'enc0 Conv(20→8, 3×3) + FiLM + ReLU',
+ 'full res · 8 ch')
-box(EX, Y_E1, BW, BH, C_ENC, 'enc1 Conv(4→8, 3×3) + FiLM + ReLU',
- '½ res · 8 ch · (AvgPool↓ on input)')
+box(EX, Y_E1, BW, BH, C_ENC, 'enc1 Conv(8→16, 3×3) + FiLM + ReLU',
+ '½ res · 16 ch · (AvgPool↓ on input)')
box(BX, Y_BN, BW_BN, BH_BN, C_BN,
- 'bottleneck Conv(8→8, 3×3, dilation=2) + ReLU',
- '¼ res · 8 ch · no FiLM · effective RF ≈ 10 px @ ½res')
+ 'bottleneck Conv(16→16, 3×3, dilation=2) + ReLU',
+ '¼ res · 16 ch · no FiLM · effective RF ≈ 10 px @ ½res')
-box(DX, Y_D1, BW, BH, C_DEC, 'dec1 Conv(16→4, 3×3) + FiLM + ReLU',
- '½ res · 4 ch · (upsample↑ + cat enc1 skip)')
+box(DX, Y_D1, BW, BH, C_DEC, 'dec1 Conv(32→8, 3×3) + FiLM + ReLU',
+ '½ res · 8 ch · (upsample↑ + cat enc1 skip)')
-box(DX, Y_D0, BW, BH, C_DEC, 'dec0 Conv(8→4, 3×3) + FiLM + sigmoid',
+box(DX, Y_D0, BW, BH, C_DEC, 'dec0 Conv(16→4, 3×3) + FiLM + sigmoid',
'full res · 4 ch · (upsample↑ + cat enc0 skip)')
box(DX, Y_OUT, BW, BH_IO, C_IO, 'RGBA Output',
diff --git a/cnn_v3/shaders/cnn_v3_bottleneck.wgsl b/cnn_v3/shaders/cnn_v3_bottleneck.wgsl
index e30682b..09819cc 100644
--- a/cnn_v3/shaders/cnn_v3_bottleneck.wgsl
+++ b/cnn_v3/shaders/cnn_v3_bottleneck.wgsl
@@ -1,43 +1,49 @@
// CNN v3 — Bottleneck
-// AvgPool2x2(enc1) + Conv(8->8, 3x3, dilation=2) + ReLU (no FiLM)
+// AvgPool2x2(enc1) + Conv(16->16, 3x3, dilation=2) + ReLU (no FiLM)
//
-// Input: enc1_tex (rgba32uint, 8xf16) half-res
-// Output: bottleneck_out (rgba32uint, 8xf16) quarter-res (dispatch at quarter-res dims)
+// Input: enc1_tex_lo (rgba32uint, 8xf16) half-res ch 0-7
+// enc1_tex_hi (rgba32uint, 8xf16) half-res ch 8-15
+// Output: bn_out_lo (rgba32uint, 8xf16) quarter-res
+// bn_out_hi (rgba32uint, 8xf16) quarter-res
//
// Weight layout (f16, OIHW + bias):
-// [0 .. 8*8*9) conv: w[out][in][ky*3+kx] (3x3 kernel, OIHW)
-// [576 .. +8) bias: b[out]
+// [0 .. 16*16*9) conv: w[out][in][ky*3+kx]
+// [2304 .. +16) bias: b[out]
#include "cnn_v3/common"
-const BN_IN: u32 = 8u;
-const BN_OUT: u32 = 8u;
+const BN_IN: u32 = 16u;
+const BN_OUT: u32 = 16u;
const BN_DILATION: i32 = 2;
struct Params {
weight_offset: u32,
- _pad0: u32, _pad1: u32, _pad2: u32, // 3 explicit pads: array<u32,3> invalid in uniform
+ _pad0: u32, _pad1: u32, _pad2: u32,
}
-@group(0) @binding(0) var enc1_tex: texture_2d<u32>;
-@group(0) @binding(1) var<storage, read> weights: array<u32>;
-@group(0) @binding(2) var<uniform> params: Params;
-@group(0) @binding(3) var bottleneck_out: texture_storage_2d<rgba32uint, write>;
+@group(0) @binding(0) var enc1_tex_lo: texture_2d<u32>;
+@group(0) @binding(1) var enc1_tex_hi: texture_2d<u32>;
+@group(0) @binding(2) var<storage, read> weights: array<u32>;
+@group(0) @binding(3) var<uniform> params: Params;
+@group(0) @binding(4) var bn_out_lo: texture_storage_2d<rgba32uint, write>;
+@group(0) @binding(5) var bn_out_hi: texture_storage_2d<rgba32uint, write>;
-// Avg-pool 2x2 from enc1_tex at quarter-res coord qcoord.
-// Returns zeros for OOB quarter-res coords (zero-padding for the 3x3 conv).
-fn load_enc1_avg(qcoord: vec2i, half_dims: vec2i) -> array<f32, 8> {
+fn load_enc1_avg(qcoord: vec2i, half_dims: vec2i) -> array<f32, 16> {
let quart_dims = half_dims / 2;
if (qcoord.x < 0 || qcoord.y < 0 || qcoord.x >= quart_dims.x || qcoord.y >= quart_dims.y) {
- return array<f32, 8>(0., 0., 0., 0., 0., 0., 0., 0.);
+ return array<f32, 16>(0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.);
}
let base = qcoord * 2;
var s: array<f32, BN_IN>;
for (var dy: i32 = 0; dy < 2; dy++) {
for (var dx: i32 = 0; dx < 2; dx++) {
let hc = clamp(base + vec2i(dx, dy), vec2i(0), half_dims - vec2i(1));
- let f = unpack_8ch(enc1_tex, hc);
- for (var i: u32 = 0u; i < BN_IN; i++) { s[i] += f[i]; }
+ let lo = unpack_8ch(enc1_tex_lo, hc);
+ let hi = unpack_8ch(enc1_tex_hi, hc);
+ for (var i: u32 = 0u; i < 8u; i++) {
+ s[i] += lo[i];
+ s[i + 8u] += hi[i];
+ }
}
}
for (var i: u32 = 0u; i < BN_IN; i++) { s[i] *= 0.25; }
@@ -46,7 +52,7 @@ fn load_enc1_avg(qcoord: vec2i, half_dims: vec2i) -> array<f32, 8> {
@compute @workgroup_size(8, 8)
fn bottleneck_main(@builtin(global_invocation_id) id: vec3u) {
- let half_dims = vec2i(textureDimensions(enc1_tex));
+ let half_dims = vec2i(textureDimensions(enc1_tex_lo));
let quart_dims = half_dims / 2;
let coord = vec2i(id.xy);
if (coord.x >= quart_dims.x || coord.y >= quart_dims.y) { return; }
@@ -55,7 +61,7 @@ fn bottleneck_main(@builtin(global_invocation_id) id: vec3u) {
var out: array<f32, BN_OUT>;
for (var o: u32 = 0u; o < BN_OUT; o++) {
- var sum = get_w(wo, BN_OUT * BN_IN * 9u + o); // bias (at end of 3x3 conv weights)
+ var sum = get_w(wo, BN_OUT * BN_IN * 9u + o); // bias
for (var ky: i32 = -1; ky <= 1; ky++) {
for (var kx: i32 = -1; kx <= 1; kx++) {
let feat = load_enc1_avg(coord + vec2i(kx, ky) * BN_DILATION, half_dims);
@@ -68,10 +74,16 @@ fn bottleneck_main(@builtin(global_invocation_id) id: vec3u) {
out[o] = max(0.0, sum);
}
- textureStore(bottleneck_out, coord, vec4u(
+ textureStore(bn_out_lo, coord, vec4u(
pack2x16float(vec2f(out[0], out[1])),
pack2x16float(vec2f(out[2], out[3])),
pack2x16float(vec2f(out[4], out[5])),
pack2x16float(vec2f(out[6], out[7]))
));
+ textureStore(bn_out_hi, coord, vec4u(
+ pack2x16float(vec2f(out[8], out[9])),
+ pack2x16float(vec2f(out[10], out[11])),
+ pack2x16float(vec2f(out[12], out[13])),
+ pack2x16float(vec2f(out[14], out[15]))
+ ));
}
diff --git a/cnn_v3/shaders/cnn_v3_dec0.wgsl b/cnn_v3/shaders/cnn_v3_dec0.wgsl
index a2a70ac..617b5a2 100644
--- a/cnn_v3/shaders/cnn_v3_dec0.wgsl
+++ b/cnn_v3/shaders/cnn_v3_dec0.wgsl
@@ -1,19 +1,17 @@
// CNN v3 — Decoder level 0 + output
-// NearestUp2x(dec1) + cat(enc0_skip) -> Conv(8->4, 3x3, zero-pad) + FiLM + ReLU + Sigmoid
+// NearestUp2x(dec1) + cat(enc0_skip) -> Conv(16->4, 3x3) + FiLM + ReLU + Sigmoid
//
-// Inputs: dec1_tex (rgba16float, 4ch) half-res
-// enc0_tex (rgba16float, 4ch) full-res (skip connection)
-// Output: output_tex (rgba16float, 4ch) full-res (dispatch at full-res dims)
+// Inputs: dec1_tex (rgba32uint, 8xf16) half-res
+// enc0_tex (rgba32uint, 8xf16) full-res (skip connection)
+// Output: output_tex (rgba16float, 4ch) full-res
//
// Weight layout (f16, OIHW + bias):
-// [0 .. 8*4*9) conv: w[out][in][ky][kx] (in=8: 4 dec1 + 4 enc0 skip)
-// [288 .. +4) bias: b[out]
-//
-// Parity note: sigmoid applied after FiLM+ReLU, not after raw conv (matches train_cnn_v3.py).
+// [0 .. 16*4*9) conv: w[out][in][ky][kx] (in=16: 8 dec1 + 8 enc0 skip)
+// [576 .. +4) bias: b[out]
#include "cnn_v3/common"
-const DEC0_IN: u32 = 8u;
+const DEC0_IN: u32 = 16u;
const DEC0_OUT: u32 = 4u;
struct Params {
@@ -23,25 +21,27 @@ struct Params {
beta: vec4f,
}
-@group(0) @binding(0) var dec1_tex: texture_2d<f32>;
-@group(0) @binding(1) var enc0_tex: texture_2d<f32>;
+@group(0) @binding(0) var dec1_tex: texture_2d<u32>;
+@group(0) @binding(1) var enc0_tex: texture_2d<u32>;
@group(0) @binding(2) var<storage, read> weights: array<u32>;
@group(0) @binding(3) var<uniform> params: Params;
@group(0) @binding(4) var output_tex: texture_storage_2d<rgba16float, write>;
-// Load 8 concatenated channels at full-res coord:
-// ch 0-3: dec1 nearest-up (dec1_tex[coord/2])
-// ch 4-7: enc0 skip (enc0_tex[coord])
-// Returns zeros for OOB coord (zero-padding for the conv).
-fn load_dec0_concat(coord: vec2i, full_dims: vec2i) -> array<f32, 8> {
+// Load 16ch: ch 0-7 from dec1 nearest-up, ch 8-15 from enc0 skip.
+fn load_dec0_concat(coord: vec2i, full_dims: vec2i) -> array<f32, 16> {
+ var r: array<f32, 16>;
if (coord.x < 0 || coord.y < 0 || coord.x >= full_dims.x || coord.y >= full_dims.y) {
- return array<f32, 8>(0., 0., 0., 0., 0., 0., 0., 0.);
+ return r;
}
let half_dims = vec2i(textureDimensions(dec1_tex));
- let hc = clamp(coord / 2, vec2i(0), half_dims - vec2i(1));
- let d = textureLoad(dec1_tex, hc, 0);
- let e = textureLoad(enc0_tex, coord, 0);
- return array<f32, 8>(d.x, d.y, d.z, d.w, e.x, e.y, e.z, e.w);
+ let hc = clamp(coord / 2, vec2i(0), half_dims - vec2i(1));
+ let d = unpack_8ch(dec1_tex, hc);
+ let e = unpack_8ch(enc0_tex, coord);
+ for (var i: u32 = 0u; i < 8u; i++) {
+ r[i] = d[i];
+ r[i + 8u] = e[i];
+ }
+ return r;
}
@compute @workgroup_size(8, 8)
@@ -64,7 +64,6 @@ fn dec0_main(@builtin(global_invocation_id) id: vec3u) {
}
}
}
- // FiLM + ReLU + Sigmoid (matches training forward())
let v = max(0.0, params.gamma[o] * sum + params.beta[o]);
out[o] = 1.0 / (1.0 + exp(-v));
}
diff --git a/cnn_v3/shaders/cnn_v3_dec1.wgsl b/cnn_v3/shaders/cnn_v3_dec1.wgsl
index 28ae3dc..fadea3b 100644
--- a/cnn_v3/shaders/cnn_v3_dec1.wgsl
+++ b/cnn_v3/shaders/cnn_v3_dec1.wgsl
@@ -1,53 +1,71 @@
// CNN v3 — Decoder level 1
-// NearestUp2x(bottleneck) + cat(enc1_skip) -> Conv(16->4, 3x3, zero-pad) + FiLM + ReLU
+// NearestUp2x(bottleneck) + cat(enc1_skip) -> Conv(32->8, 3x3) + FiLM + ReLU
//
-// Inputs: bottleneck_tex (rgba32uint, 8xf16) quarter-res
-// enc1_tex (rgba32uint, 8xf16) half-res (skip connection)
-// Output: dec1_out (rgba16float, 4ch) half-res (dispatch at half-res dims)
+// Inputs: bn_tex_lo (rgba32uint, 8xf16) quarter-res ch 0-7
+// bn_tex_hi (rgba32uint, 8xf16) quarter-res ch 8-15
+// enc1_tex_lo (rgba32uint, 8xf16) half-res skip ch 0-7
+// enc1_tex_hi (rgba32uint, 8xf16) half-res skip ch 8-15
+// Output: dec1_out (rgba32uint, 8xf16) half-res
//
// Weight layout (f16, OIHW + bias):
-// [0 .. 16*4*9) conv: w[out][in][ky][kx] (in=16: 8 bottleneck + 8 enc1 skip)
-// [576 .. +4) bias: b[out]
+// [0 .. 32*8*9) conv: w[out][in][ky][kx] (in=32: 16 bn + 16 enc1 skip)
+// [2304 .. +8) bias: b[out]
#include "cnn_v3/common"
-const DEC1_IN: u32 = 16u;
-const DEC1_OUT: u32 = 4u;
+const DEC1_IN: u32 = 32u;
+const DEC1_OUT: u32 = 8u;
struct Params {
weight_offset: u32,
_pad: vec3u,
- gamma: vec4f,
- beta: vec4f,
+ gamma_lo: vec4f,
+ gamma_hi: vec4f,
+ beta_lo: vec4f,
+ beta_hi: vec4f,
}
-@group(0) @binding(0) var bottleneck_tex: texture_2d<u32>;
-@group(0) @binding(1) var enc1_tex: texture_2d<u32>;
-@group(0) @binding(2) var<storage, read> weights: array<u32>;
-@group(0) @binding(3) var<uniform> params: Params;
-@group(0) @binding(4) var dec1_out: texture_storage_2d<rgba16float, write>;
+@group(0) @binding(0) var bn_tex_lo: texture_2d<u32>;
+@group(0) @binding(1) var bn_tex_hi: texture_2d<u32>;
+@group(0) @binding(2) var enc1_tex_lo: texture_2d<u32>;
+@group(0) @binding(3) var enc1_tex_hi: texture_2d<u32>;
+@group(0) @binding(4) var<storage, read> weights: array<u32>;
+@group(0) @binding(5) var<uniform> params: Params;
+@group(0) @binding(6) var dec1_out: texture_storage_2d<rgba32uint, write>;
-// Load 16 concatenated channels at half-res coord hcoord:
-// ch 0-7: bottleneck nearest-up (bottleneck_tex[hcoord/2])
-// ch 8-15: enc1 skip (enc1_tex[hcoord])
-// Returns zeros for OOB hcoord (zero-padding for the conv).
-fn load_dec1_concat(hcoord: vec2i, half_dims: vec2i) -> array<f32, 16> {
+fn film_gamma(o: u32) -> f32 {
+ if (o < 4u) { return params.gamma_lo[o]; }
+ return params.gamma_hi[o - 4u];
+}
+fn film_beta(o: u32) -> f32 {
+ if (o < 4u) { return params.beta_lo[o]; }
+ return params.beta_hi[o - 4u];
+}
+
+// Load 32ch: [bn_nearest_up(16ch), enc1_skip(16ch)]
+fn load_dec1_concat(hcoord: vec2i, half_dims: vec2i) -> array<f32, 32> {
+ var r: array<f32, 32>;
if (hcoord.x < 0 || hcoord.y < 0 || hcoord.x >= half_dims.x || hcoord.y >= half_dims.y) {
- return array<f32, 16>(0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.);
+ return r;
}
let quart_dims = half_dims / 2;
- let qc = clamp(hcoord / 2, vec2i(0), quart_dims - vec2i(1));
- let b = unpack_8ch(bottleneck_tex, qc);
- let s = unpack_8ch(enc1_tex, hcoord);
- return array<f32, 16>(
- b[0], b[1], b[2], b[3], b[4], b[5], b[6], b[7],
- s[0], s[1], s[2], s[3], s[4], s[5], s[6], s[7]
- );
+ let qc = clamp(hcoord / 2, vec2i(0), quart_dims - vec2i(1));
+ let blo = unpack_8ch(bn_tex_lo, qc);
+ let bhi = unpack_8ch(bn_tex_hi, qc);
+ let slo = unpack_8ch(enc1_tex_lo, hcoord);
+ let shi = unpack_8ch(enc1_tex_hi, hcoord);
+ for (var i: u32 = 0u; i < 8u; i++) {
+ r[i] = blo[i];
+ r[i + 8u] = bhi[i];
+ r[i + 16u] = slo[i];
+ r[i + 24u] = shi[i];
+ }
+ return r;
}
@compute @workgroup_size(8, 8)
fn dec1_main(@builtin(global_invocation_id) id: vec3u) {
- let half_dims = vec2i(textureDimensions(enc1_tex));
+ let half_dims = vec2i(textureDimensions(enc1_tex_lo));
let coord = vec2i(id.xy);
if (coord.x >= half_dims.x || coord.y >= half_dims.y) { return; }
@@ -65,8 +83,13 @@ fn dec1_main(@builtin(global_invocation_id) id: vec3u) {
}
}
}
- out[o] = max(0.0, params.gamma[o] * sum + params.beta[o]);
+ out[o] = max(0.0, film_gamma(o) * sum + film_beta(o));
}
- textureStore(dec1_out, coord, vec4f(out[0], out[1], out[2], out[3]));
+ textureStore(dec1_out, coord, vec4u(
+ pack2x16float(vec2f(out[0], out[1])),
+ pack2x16float(vec2f(out[2], out[3])),
+ pack2x16float(vec2f(out[4], out[5])),
+ pack2x16float(vec2f(out[6], out[7]))
+ ));
}
diff --git a/cnn_v3/shaders/cnn_v3_enc0.wgsl b/cnn_v3/shaders/cnn_v3_enc0.wgsl
index e171ca7..84d40fd 100644
--- a/cnn_v3/shaders/cnn_v3_enc0.wgsl
+++ b/cnn_v3/shaders/cnn_v3_enc0.wgsl
@@ -1,32 +1,42 @@
// CNN v3 — Encoder level 0
-// Conv(20->4, 3x3, zero-pad) + FiLM + ReLU
+// Conv(20->8, 3x3, zero-pad) + FiLM + ReLU
//
// Input: feat_tex0 (rgba32uint, 8xf16), feat_tex1 (rgba32uint, 12ch u8norm) full-res
-// Output: enc0_out (rgba16float, 4ch) full-res
+// Output: enc0_out (rgba32uint, 8xf16) full-res
//
// Weight layout (f16, OIHW + bias):
-// [0 .. 20*4*9) conv: w[out][in][ky][kx]
-// [720 .. +4) bias: b[out]
+// [0 .. 20*8*9) conv: w[out][in][ky][kx]
+// [1440 .. +8) bias: b[out]
#include "cnn_v3/common"
const ENC0_IN: u32 = 20u;
-const ENC0_OUT: u32 = 4u;
+const ENC0_OUT: u32 = 8u;
struct Params {
weight_offset: u32,
_pad: vec3u,
- gamma: vec4f,
- beta: vec4f,
+ gamma_lo: vec4f,
+ gamma_hi: vec4f,
+ beta_lo: vec4f,
+ beta_hi: vec4f,
}
@group(0) @binding(0) var feat_tex0: texture_2d<u32>;
@group(0) @binding(1) var feat_tex1: texture_2d<u32>;
@group(0) @binding(2) var<storage, read> weights: array<u32>;
@group(0) @binding(3) var<uniform> params: Params;
-@group(0) @binding(4) var enc0_out: texture_storage_2d<rgba16float, write>;
+@group(0) @binding(4) var enc0_out: texture_storage_2d<rgba32uint, write>;
+
+fn film_gamma(o: u32) -> f32 {
+ if (o < 4u) { return params.gamma_lo[o]; }
+ return params.gamma_hi[o - 4u];
+}
+fn film_beta(o: u32) -> f32 {
+ if (o < 4u) { return params.beta_lo[o]; }
+ return params.beta_hi[o - 4u];
+}
-// Unpack all 20 feature channels at coord. Returns zeros for OOB (zero-padding).
fn load_feat(coord: vec2i, dims: vec2i) -> array<f32, 20> {
if (coord.x < 0 || coord.y < 0 || coord.x >= dims.x || coord.y >= dims.y) {
return array<f32, 20>(0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.);
@@ -68,8 +78,13 @@ fn enc0_main(@builtin(global_invocation_id) id: vec3u) {
}
}
}
- out[o] = max(0.0, params.gamma[o] * sum + params.beta[o]);
+ out[o] = max(0.0, film_gamma(o) * sum + film_beta(o));
}
- textureStore(enc0_out, coord, vec4f(out[0], out[1], out[2], out[3]));
+ textureStore(enc0_out, coord, vec4u(
+ pack2x16float(vec2f(out[0], out[1])),
+ pack2x16float(vec2f(out[2], out[3])),
+ pack2x16float(vec2f(out[4], out[5])),
+ pack2x16float(vec2f(out[6], out[7]))
+ ));
}
diff --git a/cnn_v3/shaders/cnn_v3_enc1.wgsl b/cnn_v3/shaders/cnn_v3_enc1.wgsl
index 23e485d..eb41279 100644
--- a/cnn_v3/shaders/cnn_v3_enc1.wgsl
+++ b/cnn_v3/shaders/cnn_v3_enc1.wgsl
@@ -1,58 +1,67 @@
// CNN v3 — Encoder level 1
-// AvgPool2x2(enc0) + Conv(4->8, 3x3, zero-pad) + FiLM + ReLU
+// AvgPool2x2(enc0) + Conv(8->16, 3x3, zero-pad) + FiLM + ReLU
//
-// Input: enc0_tex (rgba16float, 4ch) full-res
-// Output: enc1_out (rgba32uint, 8xf16) half-res (dispatch at half-res dims)
+// Input: enc0_tex (rgba32uint, 8xf16) full-res
+// Output: enc1_out_lo (rgba32uint, 8xf16) half-res ch 0-7
+// enc1_out_hi (rgba32uint, 8xf16) half-res ch 8-15
//
// Weight layout (f16, OIHW + bias):
-// [0 .. 4*8*9) conv: w[out][in][ky][kx]
-// [288 .. +8) bias: b[out]
+// [0 .. 8*16*9) conv: w[out][in][ky][kx]
+// [1152 .. +16) bias: b[out]
#include "cnn_v3/common"
-const ENC1_IN: u32 = 4u;
-const ENC1_OUT: u32 = 8u;
+const ENC1_IN: u32 = 8u;
+const ENC1_OUT: u32 = 16u;
struct Params {
weight_offset: u32,
_pad: vec3u,
- gamma_lo: vec4f, // FiLM gamma ch 0-3
- gamma_hi: vec4f, // FiLM gamma ch 4-7
- beta_lo: vec4f, // FiLM beta ch 0-3
- beta_hi: vec4f, // FiLM beta ch 4-7
+ gamma_0: vec4f,
+ gamma_1: vec4f,
+ gamma_2: vec4f,
+ gamma_3: vec4f,
+ beta_0: vec4f,
+ beta_1: vec4f,
+ beta_2: vec4f,
+ beta_3: vec4f,
}
-@group(0) @binding(0) var enc0_tex: texture_2d<f32>;
+@group(0) @binding(0) var enc0_tex: texture_2d<u32>;
@group(0) @binding(1) var<storage, read> weights: array<u32>;
@group(0) @binding(2) var<uniform> params: Params;
-@group(0) @binding(3) var enc1_out: texture_storage_2d<rgba32uint, write>;
+@group(0) @binding(3) var enc1_out_lo: texture_storage_2d<rgba32uint, write>;
+@group(0) @binding(4) var enc1_out_hi: texture_storage_2d<rgba32uint, write>;
fn film_gamma(o: u32) -> f32 {
- if (o < 4u) { return params.gamma_lo[o]; }
- return params.gamma_hi[o - 4u];
+ if (o < 4u) { return params.gamma_0[o]; }
+ if (o < 8u) { return params.gamma_1[o - 4u]; }
+ if (o < 12u) { return params.gamma_2[o - 8u]; }
+ return params.gamma_3[o - 12u];
}
fn film_beta(o: u32) -> f32 {
- if (o < 4u) { return params.beta_lo[o]; }
- return params.beta_hi[o - 4u];
+ if (o < 4u) { return params.beta_0[o]; }
+ if (o < 8u) { return params.beta_1[o - 4u]; }
+ if (o < 12u) { return params.beta_2[o - 8u]; }
+ return params.beta_3[o - 12u];
}
-// Avg-pool 2x2 from enc0_tex at half-res coord hcoord.
-// Returns zeros for OOB half-res coords (zero-padding for the conv).
-fn load_enc0_avg(hcoord: vec2i, full_dims: vec2i) -> array<f32, 4> {
+fn load_enc0_avg(hcoord: vec2i, full_dims: vec2i) -> array<f32, 8> {
let half_dims = full_dims / 2;
if (hcoord.x < 0 || hcoord.y < 0 || hcoord.x >= half_dims.x || hcoord.y >= half_dims.y) {
- return array<f32, 4>(0., 0., 0., 0.);
+ return array<f32, 8>(0., 0., 0., 0., 0., 0., 0., 0.);
}
let base = hcoord * 2;
- var s = vec4f(0.);
+ var s: array<f32, 8>;
for (var dy: i32 = 0; dy < 2; dy++) {
for (var dx: i32 = 0; dx < 2; dx++) {
let fc = clamp(base + vec2i(dx, dy), vec2i(0), full_dims - vec2i(1));
- s += textureLoad(enc0_tex, fc, 0);
+ let f = unpack_8ch(enc0_tex, fc);
+ for (var i: u32 = 0u; i < 8u; i++) { s[i] += f[i]; }
}
}
- let avg = s * 0.25;
- return array<f32, 4>(avg.x, avg.y, avg.z, avg.w);
+ for (var i: u32 = 0u; i < 8u; i++) { s[i] *= 0.25; }
+ return s;
}
@compute @workgroup_size(8, 8)
@@ -79,10 +88,16 @@ fn enc1_main(@builtin(global_invocation_id) id: vec3u) {
out[o] = max(0.0, film_gamma(o) * sum + film_beta(o));
}
- textureStore(enc1_out, coord, vec4u(
- pack2x16float(vec2f(out[0], out[1])),
- pack2x16float(vec2f(out[2], out[3])),
- pack2x16float(vec2f(out[4], out[5])),
- pack2x16float(vec2f(out[6], out[7]))
+ textureStore(enc1_out_lo, coord, vec4u(
+ pack2x16float(vec2f(out[0], out[1])),
+ pack2x16float(vec2f(out[2], out[3])),
+ pack2x16float(vec2f(out[4], out[5])),
+ pack2x16float(vec2f(out[6], out[7]))
+ ));
+ textureStore(enc1_out_hi, coord, vec4u(
+ pack2x16float(vec2f(out[8], out[9])),
+ pack2x16float(vec2f(out[10], out[11])),
+ pack2x16float(vec2f(out[12], out[13])),
+ pack2x16float(vec2f(out[14], out[15]))
));
}
diff --git a/cnn_v3/src/cnn_v3_effect.cc b/cnn_v3/src/cnn_v3_effect.cc
index 1391eba..dc26751 100644
--- a/cnn_v3/src/cnn_v3_effect.cc
+++ b/cnn_v3/src/cnn_v3_effect.cc
@@ -1,5 +1,5 @@
// CNN v3 Effect — U-Net + FiLM inference (5 compute passes)
-// See cnn_v3/docs/CNN_V3.md for architecture, HOWTO.md §7 for shader details.
+// See cnn_v3/docs/CNN_V3.md for architecture, HOWTO.md for shader details.
#include "cnn_v3_effect.h"
@@ -17,17 +17,16 @@
#include <cstring>
// ---------------------------------------------------------------------------
-// Weight layout constants — explicit formulas matching WGSL shader comments
-// ---------------------------------------------------------------------------
+// Weight layout constants — enc_channels=[8,16]
//
// Format: Conv(IN→OUT, KxK) has OUT*IN*K*K weights + OUT biases
-// Layout: OIHW order (out × in × kH × kW), biases appended after conv weights
-//
-static const uint32_t kEnc0Weights = 20 * 4 * 9 + 4; // Conv(20→4,3×3)+bias
-static const uint32_t kEnc1Weights = 4 * 8 * 9 + 8; // Conv(4→8,3×3)+bias
-static const uint32_t kBnWeights = 8 * 8 * 9 + 8; // Conv(8→8,3×3,dilation=2)+bias
-static const uint32_t kDec1Weights = 16 * 4 * 9 + 4; // Conv(16→4,3×3)+bias
-static const uint32_t kDec0Weights = 8 * 4 * 9 + 4; // Conv(8→4,3×3)+bias
+// Layout: OIHW order (out × in × kH × kW), biases appended
+// ---------------------------------------------------------------------------
+static const uint32_t kEnc0Weights = 20 * 8 * 9 + 8; // Conv(20→8, 3×3)+bias = 1448
+static const uint32_t kEnc1Weights = 8 * 16 * 9 + 16; // Conv(8→16, 3×3)+bias = 1168
+static const uint32_t kBnWeights = 16 * 16 * 9 + 16; // Conv(16→16, 3×3,dil=2)+bias = 2320
+static const uint32_t kDec1Weights = 32 * 8 * 9 + 8; // Conv(32→8, 3×3)+bias = 2312
+static const uint32_t kDec0Weights = 16 * 4 * 9 + 4; // Conv(16→4, 3×3)+bias = 580
static const uint32_t kEnc0Offset = 0;
static const uint32_t kEnc1Offset = kEnc0Offset + kEnc0Weights;
@@ -35,13 +34,12 @@ static const uint32_t kBnOffset = kEnc1Offset + kEnc1Weights;
static const uint32_t kDec1Offset = kBnOffset + kBnWeights;
static const uint32_t kDec0Offset = kDec1Offset + kDec1Weights;
static const uint32_t kTotalF16 = kDec0Offset + kDec0Weights;
+// = 1448 + 1168 + 2320 + 2312 + 580 = 7828 f16
-// Weights buffer size in bytes: f16 values are packed two-per-u32.
-// Round up to u32 boundary.
static const uint32_t kWeightsBufBytes = ((kTotalF16 + 1) / 2) * 4;
// ---------------------------------------------------------------------------
-// Shader source externs (registered in shaders.cc via InitShaderComposer)
+// Shader source externs
// ---------------------------------------------------------------------------
extern const char* cnn_v3_enc0_wgsl;
extern const char* cnn_v3_enc1_wgsl;
@@ -103,14 +101,6 @@ static WGPUBindGroupLayoutEntry bgl_uint_tex(uint32_t binding) {
e.texture.viewDimension = WGPUTextureViewDimension_2D;
return e;
}
-static WGPUBindGroupLayoutEntry bgl_float_tex(uint32_t binding) {
- WGPUBindGroupLayoutEntry e = {};
- e.binding = binding;
- e.visibility = WGPUShaderStage_Compute;
- e.texture.sampleType = WGPUTextureSampleType_Float;
- e.texture.viewDimension = WGPUTextureViewDimension_2D;
- return e;
-}
static WGPUBindGroupLayoutEntry bgl_storage_buf(uint32_t binding) {
WGPUBindGroupLayoutEntry e = {};
e.binding = binding;
@@ -151,45 +141,46 @@ CNNv3Effect::CNNv3Effect(const GpuContext& ctx,
const std::string& prefix =
outputs.empty() ? std::string("cnn_v3") : outputs[0];
- node_enc0_ = prefix + "_enc0";
- node_enc1_ = prefix + "_enc1";
- node_bottleneck_ = prefix + "_bottleneck";
- node_dec1_ = prefix + "_dec1";
+ node_enc0_ = prefix + "_enc0";
+ node_enc1_lo_ = prefix + "_enc1_lo";
+ node_enc1_hi_ = prefix + "_enc1_hi";
+ node_bn_lo_ = prefix + "_bn_lo";
+ node_bn_hi_ = prefix + "_bn_hi";
+ node_dec1_ = prefix + "_dec1";
- // Allocate zeroed weights buffer (f16 pairs packed as u32).
- // Weights are zero-initialized; load_weights() can fill from file later.
weights_buf_ = gpu_create_buffer(
ctx_.device, kWeightsBufBytes,
WGPUBufferUsage_Storage | WGPUBufferUsage_CopyDst);
- // Initialize uniform buffers.
enc0_params_buf_.init(ctx_.device);
enc1_params_buf_.init(ctx_.device);
bn_params_buf_.init(ctx_.device);
dec1_params_buf_.init(ctx_.device);
dec0_params_buf_.init(ctx_.device);
- // Set weight offsets (FiLM γ/β default to identity: γ=1, β=0).
+ // Set weight offsets; FiLM γ/β default to identity (γ=1, β=0).
enc0_params_.weight_offset = kEnc0Offset;
- for (int i = 0; i < 4; ++i) { enc0_params_.gamma[i] = 1.0f; }
-
- enc1_params_.weight_offset = kEnc1Offset;
for (int i = 0; i < 4; ++i) {
- enc1_params_.gamma_lo[i] = 1.0f;
- enc1_params_.gamma_hi[i] = 1.0f;
+ enc0_params_.gamma_lo[i] = 1.0f;
+ enc0_params_.gamma_hi[i] = 1.0f;
}
+ enc1_params_.weight_offset = kEnc1Offset;
+ for (int i = 0; i < 16; ++i) { enc1_params_.gamma[i] = 1.0f; }
+
bn_params_.weight_offset = kBnOffset;
dec1_params_.weight_offset = kDec1Offset;
- for (int i = 0; i < 4; ++i) { dec1_params_.gamma[i] = 1.0f; }
+ for (int i = 0; i < 4; ++i) {
+ dec1_params_.gamma_lo[i] = 1.0f;
+ dec1_params_.gamma_hi[i] = 1.0f;
+ }
dec0_params_.weight_offset = kDec0Offset;
for (int i = 0; i < 4; ++i) { dec0_params_.gamma[i] = 1.0f; }
create_pipelines();
- // Load trained weights from asset system (zero-initialized if absent).
size_t weights_size = 0;
const void* weights_data =
GetAsset(AssetId::ASSET_WEIGHTS_CNN_V3, &weights_size);
@@ -206,20 +197,21 @@ void CNNv3Effect::declare_nodes(NodeRegistry& registry) {
const int W = registry.default_width();
const int H = registry.default_height();
- // enc0_tex: rgba16float full-res
- registry.declare_node(node_enc0_, NodeType::GBUF_ALBEDO, W, H);
- // enc1_tex: rgba32uint half-res — shaders use textureDimensions() for bounds
- registry.declare_node(node_enc1_, NodeType::GBUF_RGBA32UINT, W / 2, H / 2);
- // bottleneck_tex: rgba32uint quarter-res
- registry.declare_node(node_bottleneck_, NodeType::GBUF_RGBA32UINT, W / 4, H / 4);
- // dec1_tex: rgba16float half-res
- registry.declare_node(node_dec1_, NodeType::GBUF_ALBEDO, W / 2, H / 2);
+ // enc0: rgba32uint full-res (8ch packed f16)
+ registry.declare_node(node_enc0_, NodeType::GBUF_RGBA32UINT, W, H);
+ // enc1: two rgba32uint half-res (8ch each = 16ch total)
+ registry.declare_node(node_enc1_lo_, NodeType::GBUF_RGBA32UINT, W / 2, H / 2);
+ registry.declare_node(node_enc1_hi_, NodeType::GBUF_RGBA32UINT, W / 2, H / 2);
+ // bottleneck: two rgba32uint quarter-res (8ch each = 16ch total)
+ registry.declare_node(node_bn_lo_, NodeType::GBUF_RGBA32UINT, W / 4, H / 4);
+ registry.declare_node(node_bn_hi_, NodeType::GBUF_RGBA32UINT, W / 4, H / 4);
+ // dec1: rgba32uint half-res (8ch packed f16)
+ registry.declare_node(node_dec1_, NodeType::GBUF_RGBA32UINT, W / 2, H / 2);
// output_nodes_[0]: rgba16float full-res — declared externally by caller
}
// ---------------------------------------------------------------------------
-// set_film_params — simple linear mapping (placeholder, no MLP yet)
-// TODO(phase-7): replace with CPU forward pass through cnn_v3_film_mlp.bin
+// upload_weights / set_film_params
// ---------------------------------------------------------------------------
void CNNv3Effect::upload_weights(WGPUQueue queue, const void* data,
@@ -228,26 +220,26 @@ void CNNv3Effect::upload_weights(WGPUQueue queue, const void* data,
}
void CNNv3Effect::set_film_params(const CNNv3FiLMParams& fp) {
- // Identity + audio/beat modulation.
- // Replace with FiLM MLP output once training is done.
const float a = fp.audio_intensity;
const float b = fp.beat_phase;
for (int i = 0; i < 4; ++i) {
- enc0_params_.gamma[i] = 1.0f + a * 0.5f;
- enc0_params_.beta[i] = b * 0.1f;
+ enc0_params_.gamma_lo[i] = 1.0f + a * 0.5f;
+ enc0_params_.gamma_hi[i] = 1.0f + a * 0.5f;
+ enc0_params_.beta_lo[i] = b * 0.1f;
+ enc0_params_.beta_hi[i] = b * 0.1f;
}
- for (int i = 0; i < 4; ++i) {
- enc1_params_.gamma_lo[i] = 1.0f + a * 0.3f;
- enc1_params_.gamma_hi[i] = 1.0f + a * 0.3f;
- enc1_params_.beta_lo[i] = fp.beat_norm * 0.1f;
- enc1_params_.beta_hi[i] = fp.beat_norm * 0.1f;
+ for (int i = 0; i < 16; ++i) {
+ enc1_params_.gamma[i] = 1.0f + a * 0.3f;
+ enc1_params_.beta[i] = fp.beat_norm * 0.1f;
}
for (int i = 0; i < 4; ++i) {
- dec1_params_.gamma[i] = 1.0f + fp.style_p0 * 0.5f;
- dec1_params_.beta[i] = fp.style_p1 * 0.1f;
- dec0_params_.gamma[i] = 1.0f + fp.style_p0 * 0.5f;
- dec0_params_.beta[i] = fp.style_p1 * 0.1f;
+ dec1_params_.gamma_lo[i] = 1.0f + fp.style_p0 * 0.5f;
+ dec1_params_.gamma_hi[i] = 1.0f + fp.style_p0 * 0.5f;
+ dec1_params_.beta_lo[i] = fp.style_p1 * 0.1f;
+ dec1_params_.beta_hi[i] = fp.style_p1 * 0.1f;
+ dec0_params_.gamma[i] = 1.0f + fp.style_p0 * 0.5f;
+ dec0_params_.beta[i] = fp.style_p1 * 0.1f;
}
}
@@ -258,7 +250,6 @@ void CNNv3Effect::set_film_params(const CNNv3FiLMParams& fp) {
void CNNv3Effect::render(WGPUCommandEncoder encoder,
const UniformsSequenceParams& params,
NodeRegistry& nodes) {
- // Upload params uniforms.
enc0_params_buf_.update(ctx_.queue, enc0_params_);
enc1_params_buf_.update(ctx_.queue, enc1_params_);
bn_params_buf_.update(ctx_.queue, bn_params_);
@@ -270,7 +261,6 @@ void CNNv3Effect::render(WGPUCommandEncoder encoder,
const int W = (int)params.resolution.x;
const int H = (int)params.resolution.y;
- // Dispatch helper: ceil(dim / 8) workgroups
auto dispatch = [&](WGPUComputePipeline pipe, WGPUBindGroup bg,
int w, int h) {
WGPUComputePassDescriptor pass_desc = {};
@@ -304,14 +294,14 @@ void CNNv3Effect::create_pipelines() {
// --- enc0 ---
// B0: feat_tex0 (u32), B1: feat_tex1 (u32), B2: weights (storage),
- // B3: params (uniform), B4: enc0_out (storage_tex rgba16float write)
+ // B3: params (uniform, 96B), B4: enc0_out (rgba32uint write)
{
WGPUBindGroupLayoutEntry e[5] = {
bgl_uint_tex(0),
bgl_uint_tex(1),
bgl_storage_buf(2),
- bgl_uniform_buf(3, sizeof(CnnV3Params4ch)), // 64 bytes
- bgl_storage_tex_write(4, WGPUTextureFormat_RGBA16Float),
+ bgl_uniform_buf(3, sizeof(CnnV3Params8ch)),
+ bgl_storage_tex_write(4, WGPUTextureFormat_RGBA32Uint),
};
WGPUBindGroupLayout bgl = make_bgl(dev, e, 5);
WGPUShaderModule sh = make_shader(dev, cnn_v3_enc0_wgsl);
@@ -321,16 +311,18 @@ void CNNv3Effect::create_pipelines() {
}
// --- enc1 ---
- // B0: enc0_tex (f32), B1: weights (storage),
- // B2: params (uniform), B3: enc1_out (storage_tex rgba32uint write)
+ // B0: enc0_tex (u32), B1: weights (storage),
+ // B2: params (uniform, 160B), B3: enc1_out_lo (rgba32uint write),
+ // B4: enc1_out_hi (rgba32uint write)
{
- WGPUBindGroupLayoutEntry e[4] = {
- bgl_float_tex(0),
+ WGPUBindGroupLayoutEntry e[5] = {
+ bgl_uint_tex(0),
bgl_storage_buf(1),
- bgl_uniform_buf(2, sizeof(CnnV3ParamsEnc1)),
+ bgl_uniform_buf(2, sizeof(CnnV3Params16ch)),
bgl_storage_tex_write(3, WGPUTextureFormat_RGBA32Uint),
+ bgl_storage_tex_write(4, WGPUTextureFormat_RGBA32Uint),
};
- WGPUBindGroupLayout bgl = make_bgl(dev, e, 4);
+ WGPUBindGroupLayout bgl = make_bgl(dev, e, 5);
WGPUShaderModule sh = make_shader(dev, cnn_v3_enc1_wgsl);
enc1_pipeline_.set(make_compute_pipeline(dev, sh, "enc1_main", bgl));
wgpuShaderModuleRelease(sh);
@@ -338,16 +330,19 @@ void CNNv3Effect::create_pipelines() {
}
// --- bottleneck ---
- // B0: enc1_tex (u32), B1: weights (storage),
- // B2: params (uniform), B3: bottleneck_out (storage_tex rgba32uint write)
+ // B0: enc1_tex_lo (u32), B1: enc1_tex_hi (u32), B2: weights (storage),
+ // B3: params (uniform, 16B), B4: bn_out_lo (rgba32uint write),
+ // B5: bn_out_hi (rgba32uint write)
{
- WGPUBindGroupLayoutEntry e[4] = {
+ WGPUBindGroupLayoutEntry e[6] = {
bgl_uint_tex(0),
- bgl_storage_buf(1),
- bgl_uniform_buf(2, sizeof(CnnV3ParamsBn)),
- bgl_storage_tex_write(3, WGPUTextureFormat_RGBA32Uint),
+ bgl_uint_tex(1),
+ bgl_storage_buf(2),
+ bgl_uniform_buf(3, sizeof(CnnV3ParamsBn)),
+ bgl_storage_tex_write(4, WGPUTextureFormat_RGBA32Uint),
+ bgl_storage_tex_write(5, WGPUTextureFormat_RGBA32Uint),
};
- WGPUBindGroupLayout bgl = make_bgl(dev, e, 4);
+ WGPUBindGroupLayout bgl = make_bgl(dev, e, 6);
WGPUShaderModule sh = make_shader(dev, cnn_v3_bottleneck_wgsl);
bn_pipeline_.set(make_compute_pipeline(dev, sh, "bottleneck_main", bgl));
wgpuShaderModuleRelease(sh);
@@ -355,17 +350,21 @@ void CNNv3Effect::create_pipelines() {
}
// --- dec1 ---
- // B0: bottleneck_tex (u32), B1: enc1_tex (u32), B2: weights (storage),
- // B3: params (uniform), B4: dec1_out (storage_tex rgba16float write)
+ // B0: bn_tex_lo (u32), B1: bn_tex_hi (u32),
+ // B2: enc1_tex_lo (u32), B3: enc1_tex_hi (u32),
+ // B4: weights (storage), B5: params (uniform, 96B),
+ // B6: dec1_out (rgba32uint write)
{
- WGPUBindGroupLayoutEntry e[5] = {
+ WGPUBindGroupLayoutEntry e[7] = {
bgl_uint_tex(0),
bgl_uint_tex(1),
- bgl_storage_buf(2),
- bgl_uniform_buf(3, sizeof(CnnV3Params4ch)), // 64 bytes
- bgl_storage_tex_write(4, WGPUTextureFormat_RGBA16Float),
+ bgl_uint_tex(2),
+ bgl_uint_tex(3),
+ bgl_storage_buf(4),
+ bgl_uniform_buf(5, sizeof(CnnV3Params8ch)),
+ bgl_storage_tex_write(6, WGPUTextureFormat_RGBA32Uint),
};
- WGPUBindGroupLayout bgl = make_bgl(dev, e, 5);
+ WGPUBindGroupLayout bgl = make_bgl(dev, e, 7);
WGPUShaderModule sh = make_shader(dev, cnn_v3_dec1_wgsl);
dec1_pipeline_.set(make_compute_pipeline(dev, sh, "dec1_main", bgl));
wgpuShaderModuleRelease(sh);
@@ -373,14 +372,14 @@ void CNNv3Effect::create_pipelines() {
}
// --- dec0 ---
- // B0: dec1_tex (f32), B1: enc0_tex (f32), B2: weights (storage),
- // B3: params (uniform), B4: output_tex (storage_tex rgba16float write)
+ // B0: dec1_tex (u32), B1: enc0_tex (u32), B2: weights (storage),
+ // B3: params (uniform, 64B), B4: output_tex (rgba16float write)
{
WGPUBindGroupLayoutEntry e[5] = {
- bgl_float_tex(0),
- bgl_float_tex(1),
+ bgl_uint_tex(0),
+ bgl_uint_tex(1),
bgl_storage_buf(2),
- bgl_uniform_buf(3, sizeof(CnnV3Params4ch)), // 64 bytes
+ bgl_uniform_buf(3, sizeof(CnnV3Params4ch)),
bgl_storage_tex_write(4, WGPUTextureFormat_RGBA16Float),
};
WGPUBindGroupLayout bgl = make_bgl(dev, e, 5);
@@ -395,14 +394,12 @@ void CNNv3Effect::create_pipelines() {
// update_bind_groups — rebuilt each frame (node views may be recreated)
// ---------------------------------------------------------------------------
-// Helper: set a texture view binding entry.
static void bg_tex(WGPUBindGroupEntry& e, uint32_t binding,
WGPUTextureView view) {
e = {};
e.binding = binding;
e.textureView = view;
}
-// Helper: set a buffer binding entry.
static void bg_buf(WGPUBindGroupEntry& e, uint32_t binding, WGPUBuffer buf,
uint64_t size) {
e = {};
@@ -414,13 +411,15 @@ static void bg_buf(WGPUBindGroupEntry& e, uint32_t binding, WGPUBuffer buf,
void CNNv3Effect::update_bind_groups(NodeRegistry& nodes) {
WGPUDevice dev = ctx_.device;
- WGPUTextureView feat0_view = nodes.get_view(input_nodes_[0]);
- WGPUTextureView feat1_view = nodes.get_view(input_nodes_[1]);
- WGPUTextureView enc0_view = nodes.get_view(node_enc0_);
- WGPUTextureView enc1_view = nodes.get_view(node_enc1_);
- WGPUTextureView bn_view = nodes.get_view(node_bottleneck_);
- WGPUTextureView dec1_view = nodes.get_view(node_dec1_);
- WGPUTextureView out_view = nodes.get_view(output_nodes_[0]);
+ WGPUTextureView feat0_view = nodes.get_view(input_nodes_[0]);
+ WGPUTextureView feat1_view = nodes.get_view(input_nodes_[1]);
+ WGPUTextureView enc0_view = nodes.get_view(node_enc0_);
+ WGPUTextureView enc1_lo_view = nodes.get_view(node_enc1_lo_);
+ WGPUTextureView enc1_hi_view = nodes.get_view(node_enc1_hi_);
+ WGPUTextureView bn_lo_view = nodes.get_view(node_bn_lo_);
+ WGPUTextureView bn_hi_view = nodes.get_view(node_bn_hi_);
+ WGPUTextureView dec1_view = nodes.get_view(node_dec1_);
+ WGPUTextureView out_view = nodes.get_view(output_nodes_[0]);
WGPUBuffer wb = weights_buf_.buffer;
@@ -437,49 +436,55 @@ void CNNv3Effect::update_bind_groups(NodeRegistry& nodes) {
return bg;
};
- // enc0: feat_tex0(B0), feat_tex1(B1), weights(B2), params(B3), enc0_out(B4)
+ // enc0: feat0(B0), feat1(B1), weights(B2), params(B3), enc0_out(B4)
{
WGPUBindGroupEntry e[5] = {};
bg_tex(e[0], 0, feat0_view);
bg_tex(e[1], 1, feat1_view);
bg_buf(e[2], 2, wb, kWeightsBufBytes);
- bg_buf(e[3], 3, enc0_params_buf_.get().buffer, sizeof(CnnV3Params4ch));
+ bg_buf(e[3], 3, enc0_params_buf_.get().buffer, sizeof(CnnV3Params8ch));
bg_tex(e[4], 4, enc0_view);
enc0_bg_.replace(make_bg(enc0_pipeline_.get(), e, 5));
}
- // enc1: enc0_tex(B0), weights(B1), params(B2), enc1_out(B3)
+ // enc1: enc0(B0), weights(B1), params(B2), enc1_lo(B3), enc1_hi(B4)
{
- WGPUBindGroupEntry e[4] = {};
+ WGPUBindGroupEntry e[5] = {};
bg_tex(e[0], 0, enc0_view);
bg_buf(e[1], 1, wb, kWeightsBufBytes);
- bg_buf(e[2], 2, enc1_params_buf_.get().buffer, sizeof(CnnV3ParamsEnc1));
- bg_tex(e[3], 3, enc1_view);
- enc1_bg_.replace(make_bg(enc1_pipeline_.get(), e, 4));
+ bg_buf(e[2], 2, enc1_params_buf_.get().buffer, sizeof(CnnV3Params16ch));
+ bg_tex(e[3], 3, enc1_lo_view);
+ bg_tex(e[4], 4, enc1_hi_view);
+ enc1_bg_.replace(make_bg(enc1_pipeline_.get(), e, 5));
}
- // bottleneck: enc1_tex(B0), weights(B1), params(B2), bn_out(B3)
+ // bottleneck: enc1_lo(B0), enc1_hi(B1), weights(B2), params(B3), bn_lo(B4), bn_hi(B5)
{
- WGPUBindGroupEntry e[4] = {};
- bg_tex(e[0], 0, enc1_view);
- bg_buf(e[1], 1, wb, kWeightsBufBytes);
- bg_buf(e[2], 2, bn_params_buf_.get().buffer, sizeof(CnnV3ParamsBn));
- bg_tex(e[3], 3, bn_view);
- bn_bg_.replace(make_bg(bn_pipeline_.get(), e, 4));
+ WGPUBindGroupEntry e[6] = {};
+ bg_tex(e[0], 0, enc1_lo_view);
+ bg_tex(e[1], 1, enc1_hi_view);
+ bg_buf(e[2], 2, wb, kWeightsBufBytes);
+ bg_buf(e[3], 3, bn_params_buf_.get().buffer, sizeof(CnnV3ParamsBn));
+ bg_tex(e[4], 4, bn_lo_view);
+ bg_tex(e[5], 5, bn_hi_view);
+ bn_bg_.replace(make_bg(bn_pipeline_.get(), e, 6));
}
- // dec1: bn_tex(B0), enc1_tex(B1), weights(B2), params(B3), dec1_out(B4)
+ // dec1: bn_lo(B0), bn_hi(B1), enc1_lo(B2), enc1_hi(B3),
+ // weights(B4), params(B5), dec1_out(B6)
{
- WGPUBindGroupEntry e[5] = {};
- bg_tex(e[0], 0, bn_view);
- bg_tex(e[1], 1, enc1_view);
- bg_buf(e[2], 2, wb, kWeightsBufBytes);
- bg_buf(e[3], 3, dec1_params_buf_.get().buffer, sizeof(CnnV3Params4ch));
- bg_tex(e[4], 4, dec1_view);
- dec1_bg_.replace(make_bg(dec1_pipeline_.get(), e, 5));
+ WGPUBindGroupEntry e[7] = {};
+ bg_tex(e[0], 0, bn_lo_view);
+ bg_tex(e[1], 1, bn_hi_view);
+ bg_tex(e[2], 2, enc1_lo_view);
+ bg_tex(e[3], 3, enc1_hi_view);
+ bg_buf(e[4], 4, wb, kWeightsBufBytes);
+ bg_buf(e[5], 5, dec1_params_buf_.get().buffer, sizeof(CnnV3Params8ch));
+ bg_tex(e[6], 6, dec1_view);
+ dec1_bg_.replace(make_bg(dec1_pipeline_.get(), e, 7));
}
- // dec0: dec1_tex(B0), enc0_tex(B1), weights(B2), params(B3), output(B4)
+ // dec0: dec1(B0), enc0(B1), weights(B2), params(B3), output(B4)
{
WGPUBindGroupEntry e[5] = {};
bg_tex(e[0], 0, dec1_view);
diff --git a/cnn_v3/src/cnn_v3_effect.h b/cnn_v3/src/cnn_v3_effect.h
index 36e2797..070f988 100644
--- a/cnn_v3/src/cnn_v3_effect.h
+++ b/cnn_v3/src/cnn_v3_effect.h
@@ -2,6 +2,13 @@
// Runs 5 compute passes (enc0→enc1→bottleneck→dec1→dec0) on G-buffer feature
// textures produced by GBufferEffect.
//
+// Architecture: enc_channels=[8,16]
+// enc0: Conv(20→8, 3×3) + FiLM8 + ReLU H×W rgba32uint
+// enc1: Conv(8→16, 3×3) + FiLM16 + ReLU H/2×W/2 2× rgba32uint
+// bottleneck: Conv(16→16, 3×3, dil=2) + ReLU H/4×W/4 2× rgba32uint
+// dec1: Conv(32→8, 3×3) + FiLM8 + ReLU H/2×W/2 rgba32uint
+// dec0: Conv(16→4, 3×3) + FiLM4 + ReLU + sig H×W rgba16float
+//
// Inputs: feat_tex0, feat_tex1 (rgba32uint, 20-channel G-buffer)
// Output: output_tex (rgba16float, 4-channel RGBA)
@@ -18,35 +25,19 @@
// Per-pass params uniform layouts (mirror WGSL Params structs exactly)
// ---------------------------------------------------------------------------
-// enc0, dec1, dec0: 4-channel FiLM
+// enc0, dec1: 8-channel FiLM (lo/hi vec4 split)
//
-// WGSL layout (vec3u has align=16, so _pad sits at offset 16):
-// offset 0: weight_offset (u32, 4 bytes)
-// offset 4: (12 bytes implicit padding before vec3u)
-// offset 16: _pad (vec3u, 12 bytes)
-// offset 28: (4 bytes implicit padding before vec4f)
-// offset 32: gamma (vec4f, 16 bytes)
-// offset 48: beta (vec4f, 16 bytes)
-// total: 64 bytes
-struct CnnV3Params4ch {
- uint32_t weight_offset; // offset 0
- uint32_t _pad[7]; // offsets 4-31 (mirrors implicit + vec3u + post-pad)
- float gamma[4]; // offset 32
- float beta[4]; // offset 48
-};
-static_assert(sizeof(CnnV3Params4ch) == 64, "CnnV3Params4ch must be 64 bytes");
-
-// enc1: 8-channel FiLM (split into lo/hi vec4 pairs)
-//
-// WGSL layout (same header padding as above):
-// offset 0: weight_offset (u32, 4 bytes)
-// offset 16: _pad (vec3u, 12 bytes)
-// offset 32: gamma_lo (vec4f, 16 bytes)
-// offset 48: gamma_hi (vec4f, 16 bytes)
-// offset 64: beta_lo (vec4f, 16 bytes)
-// offset 80: beta_hi (vec4f, 16 bytes)
+// WGSL layout:
+// offset 0: weight_offset (u32)
+// offset 4-15: implicit pad, vec3u aligned to 16
+// offset 16: _pad (vec3u, 12 bytes)
+// offset 28-31: implicit pad
+// offset 32: gamma_lo (vec4f)
+// offset 48: gamma_hi (vec4f)
+// offset 64: beta_lo (vec4f)
+// offset 80: beta_hi (vec4f)
// total: 96 bytes
-struct CnnV3ParamsEnc1 {
+struct CnnV3Params8ch {
uint32_t weight_offset; // offset 0
uint32_t _pad[7]; // offsets 4-31
float gamma_lo[4]; // offset 32
@@ -54,10 +45,41 @@ struct CnnV3ParamsEnc1 {
float beta_lo[4]; // offset 64
float beta_hi[4]; // offset 80
};
-static_assert(sizeof(CnnV3ParamsEnc1) == 96,
- "CnnV3ParamsEnc1 must be 96 bytes");
+static_assert(sizeof(CnnV3Params8ch) == 96, "CnnV3Params8ch must be 96 bytes");
+
+// enc1: 16-channel FiLM (four vec4 groups for gamma + four for beta)
+//
+// WGSL layout:
+// offset 0: weight_offset (u32)
+// offset 16: _pad (vec3u)
+// offset 32: gamma_0..3 (4x vec4f = 64 bytes)
+// offset 96: beta_0..3 (4x vec4f = 64 bytes)
+// total: 160 bytes
+struct CnnV3Params16ch {
+ uint32_t weight_offset; // offset 0
+ uint32_t _pad[7]; // offsets 4-31
+ float gamma[16]; // offsets 32-95
+ float beta[16]; // offsets 96-159
+};
+static_assert(sizeof(CnnV3Params16ch) == 160, "CnnV3Params16ch must be 160 bytes");
+
+// dec0: 4-channel FiLM
+//
+// WGSL layout:
+// offset 0: weight_offset (u32)
+// offset 16: _pad (vec3u)
+// offset 32: gamma (vec4f)
+// offset 48: beta (vec4f)
+// total: 64 bytes
+struct CnnV3Params4ch {
+ uint32_t weight_offset; // offset 0
+ uint32_t _pad[7]; // offsets 4-31
+ float gamma[4]; // offset 32
+ float beta[4]; // offset 48
+};
+static_assert(sizeof(CnnV3Params4ch) == 64, "CnnV3Params4ch must be 64 bytes");
-// bottleneck: no FiLM — 4 plain u32s, no alignment gap
+// bottleneck: no FiLM — weight_offset + 3 pads
struct CnnV3ParamsBn {
uint32_t weight_offset;
uint32_t _pad[3];
@@ -90,14 +112,15 @@ class CNNv3Effect : public Effect {
void set_film_params(const CNNv3FiLMParams& fp);
// Upload packed-f16 weights (kWeightsBufBytes bytes of u32 pairs).
- // Used for testing and inference from trained .bin files.
void upload_weights(WGPUQueue queue, const void* data, uint32_t size_bytes);
private:
// Intermediate node names (prefixed from output[0])
std::string node_enc0_;
- std::string node_enc1_;
- std::string node_bottleneck_;
+ std::string node_enc1_lo_;
+ std::string node_enc1_hi_;
+ std::string node_bn_lo_;
+ std::string node_bn_hi_;
std::string node_dec1_;
// 5 compute pipelines
@@ -115,20 +138,20 @@ class CNNv3Effect : public Effect {
BindGroup dec0_bg_;
// Params uniform buffers (one per pass)
- UniformBuffer<CnnV3Params4ch> enc0_params_buf_;
- UniformBuffer<CnnV3ParamsEnc1> enc1_params_buf_;
+ UniformBuffer<CnnV3Params8ch> enc0_params_buf_;
+ UniformBuffer<CnnV3Params16ch> enc1_params_buf_;
UniformBuffer<CnnV3ParamsBn> bn_params_buf_;
- UniformBuffer<CnnV3Params4ch> dec1_params_buf_;
+ UniformBuffer<CnnV3Params8ch> dec1_params_buf_;
UniformBuffer<CnnV3Params4ch> dec0_params_buf_;
// Shared packed-f16 weights (storage buffer, read-only in all shaders)
GpuBuffer weights_buf_;
// Per-pass params shadow (updated by set_film_params, uploaded in render)
- CnnV3Params4ch enc0_params_{};
- CnnV3ParamsEnc1 enc1_params_{};
+ CnnV3Params8ch enc0_params_{};
+ CnnV3Params16ch enc1_params_{};
CnnV3ParamsBn bn_params_{};
- CnnV3Params4ch dec1_params_{};
+ CnnV3Params8ch dec1_params_{};
CnnV3Params4ch dec0_params_{};
void create_pipelines();
diff --git a/cnn_v3/test_vectors.h b/cnn_v3/test_vectors.h
index 3e256a3..647b84e 100644
--- a/cnn_v3/test_vectors.h
+++ b/cnn_v3/test_vectors.h
@@ -7,80 +7,83 @@
static const int kCnnV3TestW = 8;
static const int kCnnV3TestH = 8;
+// ENC0_OUT=8 ENC1_OUT=16 BN=16 DEC1_OUT=8 DEC0_OUT=4
+// TOTAL_F16=7828 (enc_channels=[8,16])
+
// 256 u32 values
static const uint32_t kCnnV3TestFeat0U32[256] = {
- 0x322d3094u, 0x3b8e35f9u, 0x3384380bu, 0x356a2ec5u, 0x36223a87u, 0x3a6c3a2eu, 0x38df2f18u, 0x366f3bacu,
- 0x3b632a85u, 0x382d3b12u, 0x32c7386bu, 0x37fa3a8eu, 0x39772856u, 0x3aa23b9bu, 0x2ad3346fu, 0x3b7f3b86u,
- 0x39233842u, 0x36b73767u, 0x2e312fa5u, 0x3ab13373u, 0x334130abu, 0x32e23864u, 0x38823139u, 0x390235e4u,
- 0x30d53b4cu, 0x3b383b4fu, 0x390d3aa2u, 0x391d38f6u, 0x24383986u, 0x38af3baau, 0x36093a40u, 0x38142259u,
- 0x36fe380fu, 0x33ff356fu, 0x36013838u, 0x31893bc2u, 0x34b4351du, 0x37fd3859u, 0x3b9a3926u, 0x398b3490u,
- 0x34332669u, 0x27ef376bu, 0x396d38f1u, 0x382239f9u, 0x365638d5u, 0x2e662948u, 0x3bf7393du, 0x3876240cu,
- 0x3a9d3a02u, 0x38f6385du, 0x3adf3993u, 0x3b692fd4u, 0x3ab126a9u, 0x323a2ce9u, 0x37201bfau, 0x3150355au,
- 0x36703738u, 0x3b253a24u, 0x2ff63938u, 0x3b4e34bau, 0x36822daeu, 0x3b9b3b8au, 0x39573694u, 0x3a07374fu,
- 0x309d280bu, 0x337138eau, 0x359f3954u, 0x3a8e3b18u, 0x3a2f37e3u, 0x37e83457u, 0x3ae33252u, 0x3b383a96u,
- 0x3bad3b05u, 0x3b74334fu, 0x36c33892u, 0x357b387cu, 0x33b9349eu, 0x37f22d47u, 0x390b3b1au, 0x36dc382bu,
- 0x32b2376eu, 0x32593a95u, 0x3a1439bcu, 0x3ae73899u, 0x3b0e34a8u, 0x3a6439d6u, 0x3ac53951u, 0x36b93bf2u,
- 0x39f53a83u, 0x3b6a373fu, 0x38863650u, 0x333a2ec8u, 0x36583abau, 0x33df364eu, 0x3a7237deu, 0x2c7d3b29u,
- 0x377a3899u, 0x372838eau, 0x378d3661u, 0x380238a8u, 0x3a8b378eu, 0x357639f7u, 0x3ad43a68u, 0x38a930e9u,
- 0x39ea3491u, 0x395f33a4u, 0x38173415u, 0x361a3b97u, 0x3be53b02u, 0x314f3b00u, 0x281d3a8fu, 0x3af7364bu,
- 0x38433983u, 0x3a803635u, 0x377f39adu, 0x335c3b24u, 0x39243174u, 0x33ea3bc7u, 0x307733fdu, 0x333f3ae2u,
- 0x3bed3807u, 0x38742237u, 0x3a763819u, 0x369135afu, 0x39ed3160u, 0x30603a47u, 0x3b25364cu, 0x34c8198bu,
- 0x35583871u, 0x375c345au, 0x383d31cfu, 0x389a39a7u, 0x3ac12df6u, 0x3a1e3199u, 0x3a4335c5u, 0x31f9329au,
- 0x283737f4u, 0x39cb3336u, 0x2d2c3ab3u, 0x3a613b0eu, 0x39963af5u, 0x38333965u, 0x3b5a3939u, 0x350d2e6fu,
- 0x3b8f2ca3u, 0x39673720u, 0x3bee3abbu, 0x3a65312du, 0x2a423b19u, 0x35ad3a08u, 0x381d3930u, 0x30543428u,
- 0x2e9d2f7cu, 0x359f391au, 0x398932efu, 0x3850397fu, 0x362b3b7bu, 0x2ccf3ab0u, 0x3be839ebu, 0x38a33ac6u,
- 0x35a73904u, 0x3a2a3970u, 0x37e13bfcu, 0x38c42bd9u, 0x33d52f9eu, 0x39d93543u, 0x314e31e2u, 0x3afc29c1u,
- 0x291d398cu, 0x3878273eu, 0x38c63485u, 0x3b6336f4u, 0x396f349bu, 0x3ba62aebu, 0x39ea3bd9u, 0x330a3772u,
- 0x39e43a80u, 0x3738331au, 0x3a9c3768u, 0x39253979u, 0x34543933u, 0x29d835f3u, 0x36ee3a4cu, 0x33da3703u,
- 0x38b432b4u, 0x2c1c3371u, 0x36063a24u, 0x36e73615u, 0x35223a85u, 0x3b843a10u, 0x36e83949u, 0x375439fbu,
- 0x383436a1u, 0x2eac3515u, 0x2fed36a3u, 0x38753691u, 0x28a33b72u, 0x375338f9u, 0x33fc2530u, 0x32f02f95u,
- 0x366c3465u, 0x140e383bu, 0x2dfd312eu, 0x35443866u, 0x33193863u, 0x3b882634u, 0x300f2eefu, 0x3bda30b1u,
- 0x38e238f1u, 0x2da93be5u, 0x32873bccu, 0x36b938fcu, 0x3b733625u, 0x3bfa30c6u, 0x39313611u, 0x2b5f3bbeu,
- 0x388b3b62u, 0x30c639a3u, 0x39633844u, 0x30f6374du, 0x3ad633d0u, 0x39ac286au, 0x1faa3bffu, 0x39653127u,
- 0x38b82baeu, 0x38b53979u, 0x399435d8u, 0x32a538c1u, 0x3b0e3881u, 0x378c3956u, 0x2d7f3525u, 0x21ba33d4u,
- 0x331f3be5u, 0x31663a85u, 0x36b1348au, 0x3a633531u, 0x3b013ba9u, 0x3a3730eau, 0x3b4f30bcu, 0x35623825u,
- 0x220c3106u, 0x3b5033efu, 0x3bc23a61u, 0x38bd2e73u, 0x3858341du, 0x34893521u, 0x31de3897u, 0x39353782u,
- 0x3b72301au, 0x3a8e380cu, 0x39ae393bu, 0x3b0039bbu, 0x347438e9u, 0x38da2e5eu, 0x33b92c3fu, 0x38642bc5u,
+ 0x31f13a6fu, 0x397d3bbdu, 0x398c2dc4u, 0x356738d4u, 0x39c0342cu, 0x353c39eeu, 0x344d3b03u, 0x38cd2c79u,
+ 0x346d2edau, 0x3495380fu, 0x39a032e1u, 0x3ab83860u, 0x37bc38a6u, 0x3a56342eu, 0x33c0324du, 0x39ee3abcu,
+ 0x28fb38e7u, 0x38003012u, 0x376d39f9u, 0x31e6246du, 0x398e38b6u, 0x298c2847u, 0x2d58342eu, 0x2feb3856u,
+ 0x3a383930u, 0x3b963bc4u, 0x27873460u, 0x39ab3545u, 0x35c12ba8u, 0x330534c5u, 0x394639a6u, 0x3b493849u,
+ 0x30ca2e9au, 0x393f3623u, 0x2cbd38acu, 0x34e23babu, 0x38fd35beu, 0x394d3b90u, 0x32013ba3u, 0x3aa238f3u,
+ 0x35cb3812u, 0x30833be1u, 0x3afd3693u, 0x3bd83b34u, 0x2a863b14u, 0x36da3b70u, 0x364f3937u, 0x389b3a82u,
+ 0x32333ac4u, 0x370c31ebu, 0x398b1cdeu, 0x387534bdu, 0x31762c7bu, 0x34e438fcu, 0x335332dfu, 0x38fc32e0u,
+ 0x35ce315eu, 0x381c3981u, 0x340438c7u, 0x369f3b76u, 0x3b3c3497u, 0x38e9393eu, 0x2b7e3b18u, 0x396c3ab4u,
+ 0x341a35c3u, 0x3b872eb5u, 0x3427308bu, 0x39ae3987u, 0x3a833125u, 0x3aca3815u, 0x297b2d96u, 0x30cd3919u,
+ 0x23393833u, 0x3b323a3du, 0x3bab3ab6u, 0x371b3b3au, 0x3b2b3660u, 0x3bc83b6au, 0x32773a26u, 0x3b943b8fu,
+ 0x3adc3954u, 0x3b8e387du, 0x3ae92e03u, 0x356d2b1bu, 0x37ab3055u, 0x3b9e336fu, 0x3b2f3879u, 0x26a63473u,
+ 0x399e3a35u, 0x3a86385du, 0x38b43ab8u, 0x39c23bbau, 0x35c8396au, 0x33b618a0u, 0x39ae3a1du, 0x3901354cu,
+ 0x34383240u, 0x369c3b94u, 0x37cb33d7u, 0x39d63631u, 0x38bc3bb2u, 0x35d638bbu, 0x351c3884u, 0x3aee1479u,
+ 0x37e036dfu, 0x3ab22d58u, 0x36923173u, 0x37533ab9u, 0x32b33b73u, 0x3bd339b9u, 0x398a3866u, 0x3a03364bu,
+ 0x39a03b45u, 0x38b63b7du, 0x34e0389eu, 0x3b0e39adu, 0x3375392cu, 0x3a0c385au, 0x31033b84u, 0x369b3014u,
+ 0x3bd83005u, 0x384536cau, 0x37303344u, 0x37d23be1u, 0x33c32085u, 0x3bea3bd4u, 0x38ec32bcu, 0x389f2cecu,
+ 0x3a8d3008u, 0x31893b84u, 0x2e883767u, 0x2c2b356du, 0x3bec343cu, 0x3adf29b6u, 0x383b2cf5u, 0x3b74373cu,
+ 0x31c0325fu, 0x39ce39fdu, 0x26b03998u, 0x3ab83429u, 0x38d138cfu, 0x342a34bcu, 0x39d23b93u, 0x3b80300fu,
+ 0x355538a9u, 0x39ce365bu, 0x38a13605u, 0x324139afu, 0x35b33b8eu, 0x399736b6u, 0x370b38dfu, 0x3b393933u,
+ 0x3a983a67u, 0x36b63422u, 0x29a8386eu, 0x352d3a29u, 0x34263486u, 0x35693489u, 0x38ab38d7u, 0x329b3417u,
+ 0x2def3a9cu, 0x382a381au, 0x3b163898u, 0x3bca38feu, 0x32df3af9u, 0x358433b8u, 0x386d3644u, 0x331d3371u,
+ 0x3960361au, 0x3bcf3981u, 0x3649327au, 0x39853bb8u, 0x385138bdu, 0x3b262cd5u, 0x38933901u, 0x36453851u,
+ 0x3a0b3413u, 0x28583814u, 0x3b6e3663u, 0x369d35fbu, 0x3a6a38feu, 0x3acd3959u, 0x3b662f4bu, 0x38273b3bu,
+ 0x32cf38ffu, 0x327d372bu, 0x3bd925a0u, 0x38bf3425u, 0x35ee34ceu, 0x355c365du, 0x3bb03919u, 0x3bc936dcu,
+ 0x38db3958u, 0x2c073396u, 0x38983ad1u, 0x3a7f36a1u, 0x3a4534f8u, 0x3b5e353fu, 0x3067344au, 0x31363996u,
+ 0x393f3b98u, 0x32af3363u, 0x38dd3b64u, 0x2f0e3b26u, 0x38293b67u, 0x35f73959u, 0x3baa2b92u, 0x35d53bf5u,
+ 0x315e3b8du, 0x2cae30f3u, 0x2ab23643u, 0x306e3651u, 0x3aab39d8u, 0x326a3981u, 0x33fc3a23u, 0x388a3a13u,
+ 0x2d5835deu, 0x3b843a9fu, 0x3bb73954u, 0x323a3b79u, 0x359739dau, 0x310637cdu, 0x39cf3bccu, 0x35f23445u,
+ 0x39fb36c2u, 0x35253b68u, 0x38bf378fu, 0x3809364bu, 0x315e2ec4u, 0x3b52366bu, 0x33fe3597u, 0x3b9839f1u,
+ 0x20683759u, 0x36a43981u, 0x34723964u, 0x358318efu, 0x364a39a3u, 0x376d3303u, 0x2fa53a05u, 0x39573989u,
+ 0x3bc335efu, 0x350f3b28u, 0x34ef27f6u, 0x3861328du, 0x3af63b5au, 0x36dc34e5u, 0x385a28adu, 0x3b822c0eu,
+ 0x32db3aefu, 0x388b3998u, 0x379c3a69u, 0x391e3223u, 0x38ad2d23u, 0x397b32ceu, 0x2f023aa0u, 0x39de3667u,
};
// 256 u32 values
static const uint32_t kCnnV3TestFeat1U32[256] = {
- 0xee7c0a1du, 0x290beb5au, 0x34aedb72u, 0x00000000u, 0x9c43a772u, 0x9ac02fbau, 0xca762320u, 0x00000000u,
- 0xed95234bu, 0xd266c660u, 0x23e572b0u, 0x00000000u, 0x4f3e3e4cu, 0xe9f050c2u, 0x8c8848c4u, 0x00000000u,
- 0xddf4a20bu, 0x90217921u, 0x0cbbcb9bu, 0x00000000u, 0x790f2266u, 0xd31ceb5cu, 0xa7b58b42u, 0x00000000u,
- 0x21fdd340u, 0x35c8450eu, 0xdab84239u, 0x00000000u, 0xfaafaf58u, 0xc0bd647bu, 0x191bc271u, 0x00000000u,
- 0x9e839693u, 0xd447d632u, 0xa3e3cd34u, 0x00000000u, 0x9816acb2u, 0x77a4c5f5u, 0x3eaeccfbu, 0x00000000u,
- 0x47e04ba9u, 0xbee48e8du, 0x11df34c8u, 0x00000000u, 0x15a08a3cu, 0x658be5c3u, 0xc6403f48u, 0x00000000u,
- 0xa8337739u, 0x97094582u, 0x88bce4acu, 0x00000000u, 0x1c5a2203u, 0x54f080bcu, 0x145a7a01u, 0x00000000u,
- 0xc216a0ffu, 0xc036cf58u, 0x42127f23u, 0x00000000u, 0x4afdd8fau, 0x5144b748u, 0xe3a9493du, 0x00000000u,
- 0x7d1010ddu, 0xc31737aeu, 0x72e658f1u, 0x00000000u, 0xb2bc988bu, 0x874068abu, 0x4752b9ecu, 0x00000000u,
- 0xe055263eu, 0xb57d6353u, 0xc4f356bdu, 0x00000000u, 0xf2b9ce80u, 0x3faf6989u, 0x1770771eu, 0x00000000u,
- 0x950fc854u, 0x537f6518u, 0x6f8f1b03u, 0x00000000u, 0x3c137b49u, 0x660207d5u, 0x64ac0a72u, 0x00000000u,
- 0x59be07efu, 0xbe09834bu, 0x97b811efu, 0x00000000u, 0x7967f639u, 0x1cdaeda5u, 0x921b66a8u, 0x00000000u,
- 0x2cce2e38u, 0x506c746au, 0x6a374c25u, 0x00000000u, 0x242b888du, 0x63b59666u, 0x4455c37cu, 0x00000000u,
- 0xd98a0ed3u, 0xdc14021au, 0x012b5d82u, 0x00000000u, 0x9a37ff7fu, 0xa3fb2747u, 0x60c3dd9du, 0x00000000u,
- 0x7818642eu, 0xca374746u, 0x60c22570u, 0x00000000u, 0x10804844u, 0x5f5ca629u, 0x40ff019fu, 0x00000000u,
- 0x61fa17b2u, 0x3ae80a51u, 0x265e1089u, 0x00000000u, 0xfc40da19u, 0x20fd6d3au, 0xb4c2e06fu, 0x00000000u,
- 0xb7b31acdu, 0x9e273818u, 0xe955351fu, 0x00000000u, 0x0146b1d6u, 0x4d3790ceu, 0x2f2ef0b7u, 0x00000000u,
- 0x93b16f10u, 0xa2b2d58cu, 0xe5dcdf1fu, 0x00000000u, 0x61354928u, 0x3c63db78u, 0xec9da3a4u, 0x00000000u,
- 0xac48ee35u, 0xc3c4f767u, 0x71ea1e0bu, 0x00000000u, 0x7287c339u, 0x63988fb6u, 0xbfe036acu, 0x00000000u,
- 0x35eae594u, 0xf9b41907u, 0x2d097146u, 0x00000000u, 0x7602d6deu, 0x508a8127u, 0xa47c939bu, 0x00000000u,
- 0xae41d19eu, 0xeb2d9aadu, 0xca0a22dbu, 0x00000000u, 0x3fa92484u, 0x34e77d30u, 0xe2f5759du, 0x00000000u,
- 0x7ce514bbu, 0x18f8b09du, 0xd3314b39u, 0x00000000u, 0xa600b305u, 0x068bd432u, 0xc86814d2u, 0x00000000u,
- 0x9b7cfb72u, 0x9d56d54bu, 0xdd6c8907u, 0x00000000u, 0x7edb5e71u, 0x7615827du, 0x9e0a75a4u, 0x00000000u,
- 0x32a1e232u, 0x26d36ecdu, 0xd801ced0u, 0x00000000u, 0x372fa45eu, 0x811cb66bu, 0x45181f97u, 0x00000000u,
- 0x3aff4aa1u, 0x9908111eu, 0xcd679c4eu, 0x00000000u, 0x71206dc3u, 0x2383b298u, 0x3e95f804u, 0x00000000u,
- 0x2a217f2du, 0xe1ffcadau, 0x51ccb6e1u, 0x00000000u, 0x5fb9577bu, 0x122f7d23u, 0x722f227fu, 0x00000000u,
- 0xe9f6f5f2u, 0x68e22b74u, 0xa6b7e5eeu, 0x00000000u, 0x2e93d042u, 0x2497b6f1u, 0xbb4be878u, 0x00000000u,
- 0x10d4106bu, 0x72ce2922u, 0x511385eau, 0x00000000u, 0x04296d0bu, 0x87fd229fu, 0xf6c99a1cu, 0x00000000u,
- 0x11b3b25eu, 0xd0d5e251u, 0x8a07a0e6u, 0x00000000u, 0xb93b2f92u, 0x18b76f8du, 0xde7cce09u, 0x00000000u,
- 0x02ec3339u, 0xe824852au, 0xa8660512u, 0x00000000u, 0x5665b9b3u, 0x01d16dd3u, 0x9c67c9b7u, 0x00000000u,
- 0x16622051u, 0x9bdad41eu, 0xc5ecdbb8u, 0x00000000u, 0x446dc047u, 0x3d1cea2eu, 0x38d1dcddu, 0x00000000u,
- 0x398f04ebu, 0x1d29069eu, 0x3fec755bu, 0x00000000u, 0xa8c8d0adu, 0x4d71c198u, 0xc7ea4e97u, 0x00000000u,
+ 0x7a83c8aau, 0x29e8719fu, 0x3699bcbcu, 0x00000000u, 0x9107b1fbu, 0x558c0259u, 0xabfda3b7u, 0x00000000u,
+ 0xec4ac44au, 0x5ad3c0fbu, 0x8d47c5b9u, 0x00000000u, 0xd4fcca52u, 0x35d9a170u, 0x82ba7eacu, 0x00000000u,
+ 0x4e248fe3u, 0xa082bbdcu, 0xbe3a97b4u, 0x00000000u, 0x24103d56u, 0x2ffdc6e0u, 0x05edd340u, 0x00000000u,
+ 0x03161c84u, 0x0cfddbbcu, 0xb4f18c97u, 0x00000000u, 0xe7674f92u, 0x68e263f6u, 0xe1f4d9eeu, 0x00000000u,
+ 0xbacccb89u, 0xcebe5003u, 0xbd69c8aau, 0x00000000u, 0x3319c72bu, 0x85fb249au, 0x8684754du, 0x00000000u,
+ 0xa44067f2u, 0xcbce548eu, 0x1cd64d08u, 0x00000000u, 0xacb71e95u, 0xd541bdb0u, 0x1d92cc04u, 0x00000000u,
+ 0x6a3aec01u, 0xe5423025u, 0x063d68a2u, 0x00000000u, 0x29227b36u, 0xcdcd9d0au, 0x2bfacd74u, 0x00000000u,
+ 0xb3b535f2u, 0xe6a88063u, 0x7fa256a6u, 0x00000000u, 0x7c0fdb10u, 0x6dc2874bu, 0x5f75d6b5u, 0x00000000u,
+ 0x614b2490u, 0x8dbc72c1u, 0xe5427ceau, 0x00000000u, 0xfb3cdb08u, 0xdda9be44u, 0x1ea019fcu, 0x00000000u,
+ 0x4e88770cu, 0xc1c76656u, 0x8ba7cf95u, 0x00000000u, 0x55966134u, 0x4faa2974u, 0x1df6f0e6u, 0x00000000u,
+ 0x55195f40u, 0x6c03d30au, 0x6a7a1ba0u, 0x00000000u, 0x8c7b118fu, 0x45e2bf42u, 0x34b9e6e2u, 0x00000000u,
+ 0xe3345f64u, 0x67bfbd40u, 0x4137594eu, 0x00000000u, 0xa2bba9fcu, 0x8ad5501du, 0x939218f0u, 0x00000000u,
+ 0x38b53f86u, 0xb9210d7du, 0x313a2732u, 0x00000000u, 0xd61e9b83u, 0x6c8e8f0du, 0x68bd6bb8u, 0x00000000u,
+ 0x63741f09u, 0x6c479557u, 0xaa5246b0u, 0x00000000u, 0x7f273739u, 0x2076c006u, 0x90fc88f6u, 0x00000000u,
+ 0x445a89adu, 0x1bbb08e7u, 0xd705b821u, 0x00000000u, 0x5008deddu, 0x9854c474u, 0x2c0119c7u, 0x00000000u,
+ 0x56dcbd72u, 0x62c73f23u, 0x22471b81u, 0x00000000u, 0xf92dbb85u, 0x4fc512eeu, 0x952ddb21u, 0x00000000u,
+ 0x90c5dde4u, 0x6debf281u, 0x95ea6a56u, 0x00000000u, 0x90e13d88u, 0x147f3a0cu, 0xab4899e4u, 0x00000000u,
+ 0x1d3cca56u, 0x6f591c34u, 0x5d5dccf7u, 0x00000000u, 0x729d1c17u, 0x1268402au, 0xb38f0640u, 0x00000000u,
+ 0x6b272347u, 0x0f94b0b4u, 0x00e78afeu, 0x00000000u, 0x8a42b85au, 0xaa8fd193u, 0x8c1dacedu, 0x00000000u,
+ 0xb86f48f9u, 0xc1ba8424u, 0xdf392d7bu, 0x00000000u, 0xfb8e9b42u, 0xe281dff4u, 0xcbb695f0u, 0x00000000u,
+ 0x6e543497u, 0x80f84a5fu, 0x9db53238u, 0x00000000u, 0x2b614898u, 0xad7a95e1u, 0x96984562u, 0x00000000u,
+ 0x852b8218u, 0xd5949ca1u, 0xafea1ba3u, 0x00000000u, 0xeeb2b025u, 0xf9fca2cau, 0xd3478a80u, 0x00000000u,
+ 0x4c43b114u, 0x30603f20u, 0x4c9fa38au, 0x00000000u, 0xb66b5f31u, 0xdd426aaau, 0xe151d5aau, 0x00000000u,
+ 0xaf1977c2u, 0xae5720bfu, 0xf3b236ecu, 0x00000000u, 0xaf8d721cu, 0x2416d805u, 0x800aa2b6u, 0x00000000u,
+ 0x5bd23787u, 0xa310dfaau, 0xa2c60893u, 0x00000000u, 0x4b5c88e9u, 0x4cb8f96eu, 0x16bc4202u, 0x00000000u,
+ 0x4b1275c5u, 0xd3e51fbeu, 0xa7b6a819u, 0x00000000u, 0xf171e7c3u, 0xed23415bu, 0xdb58b564u, 0x00000000u,
+ 0x2dffef8fu, 0x557b5bd1u, 0x0eb74243u, 0x00000000u, 0x17f1978du, 0x3b26e0cfu, 0x47b7263du, 0x00000000u,
+ 0x03f0a396u, 0xd7c40f27u, 0x86e9d1e7u, 0x00000000u, 0xa7e375a6u, 0xa74f7353u, 0xf7794623u, 0x00000000u,
+ 0x25b792e0u, 0x57b9f177u, 0x4ef220d1u, 0x00000000u, 0x8d2d46e1u, 0x86b44f5fu, 0xca3ff1e4u, 0x00000000u,
+ 0x8f860c12u, 0x3be6e55du, 0x90925db2u, 0x00000000u, 0xf6ce99bfu, 0x94306929u, 0x97a75fb5u, 0x00000000u,
+ 0x2d0e5092u, 0x02059320u, 0x35a780d5u, 0x00000000u, 0x067071f0u, 0x73c6b996u, 0xe1d8e8aau, 0x00000000u,
+ 0x725dd47fu, 0xa5312b92u, 0xdb0a3019u, 0x00000000u, 0xc358b879u, 0xa68f590eu, 0xc8b545cbu, 0x00000000u,
};
-// 1238 u32 values
-static const uint32_t kCnnV3TestWeightsU32[1238] = {
+// 3914 u32 values
+static const uint32_t kCnnV3TestWeightsU32[3914] = {
0xa8b23143u, 0x2f9432e3u, 0x3491b3cbu, 0x317e3104u, 0xa79fb324u, 0x3419acf6u, 0x32322d86u, 0xb13da859u,
0xb4302831u, 0x2d0e324au, 0xad9630f5u, 0x338c3485u, 0xb1dd3158u, 0xb461a51du, 0x2f07b2a3u, 0x347d30b3u,
0xacf9aeb0u, 0xb1f6a4adu, 0xa377b31bu, 0x2e85b13eu, 0x3263a8d4u, 0xaf352fb1u, 0x31da3261u, 0xb010ac52u,
@@ -235,91 +238,467 @@ static const uint32_t kCnnV3TestWeightsU32[1238] = {
0x243fb10eu, 0x3424b427u, 0xb1ccb339u, 0xb3bd3118u, 0x305533afu, 0x2f5eb424u, 0x30f12d0eu, 0x3031324du,
0xaed12a9eu, 0x34632f93u, 0x2e502ab9u, 0x30eba8d4u, 0xb28534c7u, 0x260fb1b7u, 0x297fa1b9u, 0xab5ab454u,
0x2a8b2a5fu, 0x303a2e0bu, 0x31932d6fu, 0x25c32ccau, 0xb3a82c14u, 0x2435b05bu, 0x2ee03329u, 0x2b16b3ddu,
- 0x307eb158u, 0x2b2d3249u, 0xae332b04u, 0x32fea821u, 0x2211304au, 0xb451ad0fu,
+ 0x307eb158u, 0x2b2d3249u, 0xae332b04u, 0x32fea821u, 0x2211304au, 0xb451ad0fu, 0xb1e5b2dbu, 0x3444acddu,
+ 0xb1171a55u, 0xae36b392u, 0xac7b3210u, 0x31d0313bu, 0x2c2cb379u, 0xab843468u, 0x3410b450u, 0x22cd335eu,
+ 0xb1892803u, 0x92e23222u, 0x2f07b47au, 0x32523453u, 0xb44ab047u, 0x3432343au, 0x2d7724fbu, 0xaa29a5bfu,
+ 0xb3beb34eu, 0x3277b122u, 0xb13fb2cdu, 0xb1782778u, 0x28ddb277u, 0x2cd8ad11u, 0xb2b333eau, 0x33ba33f0u,
+ 0x2d0b3252u, 0x2d592c9du, 0xb4a42f51u, 0x2a933466u, 0xacb63167u, 0x1dfbb4aeu, 0xa8d71c6bu, 0xb0ceae28u,
+ 0xacca243bu, 0xb2483483u, 0xafe9aeedu, 0x8f1326a7u, 0x34522d85u, 0x2f67b020u, 0xb090b48fu, 0xb481a593u,
+ 0x2ed92c85u, 0x211730bdu, 0xabfb2bfcu, 0xb3aeb467u, 0x34c22df4u, 0x286cb4a6u, 0x324630d2u, 0x2c9b26f2u,
+ 0x32e42f8eu, 0x3417b341u, 0x3274b48du, 0xb1ddb410u, 0xa834b4c3u, 0xb26aae5cu, 0xab7fa783u, 0x338c3124u,
+ 0xb3362ddcu, 0x33f0afdbu, 0xab27b3e5u, 0x3454343fu, 0x2e6eaad2u, 0x30dea6a5u, 0xb2d5b47fu, 0xb1232c63u,
+ 0xadb72e60u, 0x3221336cu, 0x313d9c67u, 0x9b0eb065u, 0x32eeb1cfu, 0x33b93234u, 0x346a333fu, 0x3425b137u,
+ 0xa9f42978u, 0xae0b28a3u, 0xb0f7b00fu, 0x9839b402u, 0x2cff3372u, 0xa97c2272u, 0xb195a578u, 0xb1cb3233u,
+ 0x30fe3029u, 0x32f829c0u, 0x3354b003u, 0x31bc3068u, 0x32a62e52u, 0xaa2234bcu, 0x30b23207u, 0x3419a73du,
+ 0x2907ac0eu, 0xb144b391u, 0xabf2328bu, 0xb0e1ac11u, 0x31df9d28u, 0xb4203396u, 0xa50729bfu, 0xa80d2c63u,
+ 0xa452abc9u, 0x10d52a4cu, 0x321aa447u, 0xae1730b7u, 0x32ca31c5u, 0x2a5ab2a7u, 0x3098b01fu, 0x2e95b104u,
+ 0x1ed9b0b4u, 0xac8f344fu, 0x34ad3337u, 0xb26a3332u, 0xb47e3223u, 0x331eac19u, 0x25072f40u, 0x31ffac4eu,
+ 0xa4da3006u, 0xb12f3389u, 0x2d7cb254u, 0xb0da3488u, 0xb2ecb0ceu, 0xb14132ecu, 0x34b617d0u, 0x2857b4afu,
+ 0x31e71f8du, 0xaae3ad8fu, 0x30a0b260u, 0xb2fa3177u, 0x338bac17u, 0xafbab4c6u, 0xae5f283au, 0xa623b060u,
+ 0x248db21du, 0x29c82ff0u, 0x329db3d0u, 0x3116b23eu, 0x316ead5bu, 0xb204b1a4u, 0xb47c974fu, 0x304db146u,
+ 0xb406327au, 0x31b63356u, 0x2f9f3319u, 0x23b52eb0u, 0x34062ddeu, 0xaf13b3acu, 0x3445b41bu, 0x2ebca834u,
+ 0x34b7328du, 0x31c0b27eu, 0xb4553370u, 0xad9430e1u, 0x20512db5u, 0xb301b09du, 0xb39eb35bu, 0xadb52d4bu,
+ 0x2f5eb171u, 0x26042f2du, 0xac66342du, 0xb4143273u, 0x34b0309bu, 0x2a1e32a8u, 0xada22cdeu, 0x31312ee4u,
+ 0x9ca134c8u, 0x2b5db436u, 0xb0e7b351u, 0x3070ae93u, 0xb26bb212u, 0x332bb45eu, 0xb46b2f6eu, 0x2882b487u,
+ 0x2b6eb02du, 0x3410a906u, 0x2ee1b013u, 0x3460b448u, 0x3097349eu, 0xb160a558u, 0x308a3200u, 0xa77eb157u,
+ 0x3244a5b7u, 0x2d7f2f0fu, 0xb0682dc1u, 0xb45dacedu, 0xa9253182u, 0xb0e4a8beu, 0x2abfb194u, 0xb42fb123u,
+ 0xacbe3123u, 0xa946ac99u, 0xaee1320du, 0x343830f4u, 0xa93f2e2du, 0xa67830c2u, 0x23c4aa92u, 0xb399af00u,
+ 0xb339aa8au, 0x285faae1u, 0xb4743423u, 0xa67e2caau, 0xb0cfb49bu, 0xb170b353u, 0xab94b054u, 0xb4ca2476u,
+ 0xb3ceb27eu, 0xae9027a1u, 0xb157276fu, 0x343db491u, 0xb32ab385u, 0x349fb2c9u, 0x2c3c2c85u, 0xb3e734acu,
+ 0xb1af348eu, 0xaa232cbau, 0x3424ac75u, 0x34c6b2bcu, 0x2db7aca4u, 0xb43f347eu, 0x2935340fu, 0xb2bc2fd9u,
+ 0x2eaa2521u, 0xb2a0a6b8u, 0x32ceb0eau, 0x3003b478u, 0xb4ba34cbu, 0x2eb0b282u, 0x2ae5b439u, 0x2ac52f11u,
+ 0x2f93ad2du, 0xb19d2b3eu, 0x335528d9u, 0xa4552e68u, 0xb3f3aedau, 0xb4b1b0e7u, 0xb15434acu, 0xb25d320cu,
+ 0xaa47b028u, 0x31baaebeu, 0x33353465u, 0x3150b2a7u, 0x33f0b2c2u, 0xae48217eu, 0xb4b0b296u, 0x33f3b0d7u,
+ 0x348331b5u, 0x2b14b3aau, 0x2699b0aau, 0xb028aee5u, 0xb21529acu, 0x2dcba4beu, 0x3422b324u, 0x32231b09u,
+ 0x30082de9u, 0x33343026u, 0xb0422c5du, 0x2c14b3b0u, 0xb0f8b42au, 0x2782b438u, 0x2c3db286u, 0x322e3196u,
+ 0x2785af50u, 0xa42a28d7u, 0xb1912df5u, 0xb3b8349au, 0x2b3a3252u, 0x30b8b0e6u, 0x3145b3fau, 0xb3792ae9u,
+ 0x30d1b409u, 0x3379aff4u, 0x2f9c3108u, 0x3003b040u, 0xb3d8344du, 0xb118aeadu, 0x30a7ac19u, 0x296fa3eau,
+ 0xb2e3af23u, 0x32e8b07au, 0xacaf2bdeu, 0xb42e3191u, 0xb12fb0bau, 0x2fea34a9u, 0xb4c12c86u, 0x2d34b210u,
+ 0x3413ac9eu, 0xa80a3115u, 0x349030cau, 0x2d79a5feu, 0xb0d33405u, 0xb3ba336cu, 0x316e2db3u, 0x997d2c67u,
+ 0xb2b02c82u, 0xb06bb176u, 0xa8762b82u, 0x335ab0e8u, 0xa825acd7u, 0xb21baaefu, 0xa7eb3072u, 0x3474b4a5u,
+ 0x304b3210u, 0x33d03316u, 0x2c43ae3eu, 0xb0782c2eu, 0xb36ab157u, 0x31652e1eu, 0xaf0e3103u, 0xa720b1a9u,
+ 0x34062731u, 0x33fd3332u, 0x347632bau, 0xb458a54du, 0xac37b3ffu, 0xb256b0a7u, 0xb1d9345au, 0x31023469u,
+ 0x2ec22d6au, 0x2d9fae79u, 0x3434b204u, 0xb0b3af9cu, 0x2e02300eu, 0x2d16a77bu, 0x2ea8b28au, 0xb05db067u,
+ 0xae70b27cu, 0xa70aae7bu, 0xb1e5b1f2u, 0xb3202669u, 0x34432f41u, 0x33b4a99fu, 0xb463b3b6u, 0x31033040u,
+ 0xb4413296u, 0xacbe3151u, 0xb247af8du, 0xb43034c3u, 0x2d5b3247u, 0x30113083u, 0xb21232e2u, 0x2b91b4cbu,
+ 0xb339ad9du, 0xae76b207u, 0xb441b420u, 0xaf3733fdu, 0x347534a1u, 0x3020b2ddu, 0x2f522bbeu, 0xb1fbb3c1u,
+ 0x2bffb410u, 0x2e9b34acu, 0x34a7300cu, 0xb46a2ed6u, 0xad32330cu, 0x1c72b4c2u, 0xb0a93432u, 0xb2439d37u,
+ 0xb407b43fu, 0x32fd305cu, 0xb010b2d9u, 0xb48bb1dbu, 0x2fad281bu, 0xac4c2daeu, 0xb3cb283au, 0x3425b2a3u,
+ 0x340f3105u, 0x2dbd3400u, 0xaa66341cu, 0xb433b4c4u, 0xb3c5b37fu, 0x2fbbadb2u, 0x2aee3195u, 0x30c2b2cau,
+ 0xb4adad93u, 0xae81b036u, 0x31fdb056u, 0x2f8230feu, 0x31e9b2eeu, 0xb1dc2eceu, 0xa1112db0u, 0xb372a8a8u,
+ 0xb3ae21a0u, 0xb30eb272u, 0xb021345cu, 0xaed5ad8fu, 0x28d633a5u, 0xafaf3406u, 0x2851b365u, 0x31ccadeeu,
+ 0xa4363349u, 0x28123486u, 0x9ec6b066u, 0x31d6b471u, 0xaf58b3a7u, 0xb4632904u, 0x2f88b40au, 0xa30ab41cu,
+ 0xb139ae32u, 0x310fb1fau, 0xb2e93456u, 0x2b47346au, 0x30082d71u, 0x32242fd7u, 0x2520a114u, 0xb394aec4u,
+ 0xb47a2e52u, 0xb3cab04fu, 0x2d54b1f0u, 0xaa102cefu, 0xa91eb3deu, 0x303cada8u, 0x9d2f2a12u, 0x347aa000u,
+ 0x976c30ecu, 0xb48ca262u, 0xaef23440u, 0x34cbb0b0u, 0xb4583033u, 0xb48eb435u, 0xb41d2ba7u, 0x34323429u,
+ 0x2c7b2713u, 0xb430affeu, 0xb3a5308fu, 0xaf40b3f5u, 0xa911b4aeu, 0x31c92ffau, 0x32a92c01u, 0x32b2ab2eu,
+ 0xb1511bf4u, 0xb0523444u, 0xb0f1b48cu, 0xadb5ad1fu, 0x335b2394u, 0x2261aff9u, 0x322eae23u, 0xb39db293u,
+ 0x2e9a175cu, 0x2ae231adu, 0xac1f3142u, 0xb35fb47cu, 0xae92b46eu, 0xad84998au, 0xa1f528a7u, 0x2e692ed5u,
+ 0xade8aefdu, 0x2b682a8au, 0xae84af1eu, 0x2da92f74u, 0xa887b302u, 0xb474338au, 0xb27832d2u, 0xa9d01c74u,
+ 0x33fab4b4u, 0x345ca6ecu, 0x3308b018u, 0x285d340eu, 0x2f1db1b5u, 0x34442786u, 0xb072a8aau, 0x2fa1b2f2u,
+ 0xb487348du, 0x3419309eu, 0xb37cb415u, 0x2daf2f5fu, 0x326cb424u, 0x33d12332u, 0x32003342u, 0xb0c420e2u,
+ 0xb3aa1b58u, 0xb3072e1au, 0xb4471eddu, 0xb125afb3u, 0x31822c9cu, 0x32bab440u, 0xae0f32edu, 0xab7b2e1bu,
+ 0x2467b2a4u, 0x3344b3f1u, 0x322d3413u, 0x286c28cdu, 0xb0d734a7u, 0x28c2b412u, 0xaed2ae93u, 0x2b64b4bbu,
+ 0xb1feb088u, 0xad35b469u, 0xae463347u, 0x30ed2ee0u, 0x322dae9au, 0x288d2f5eu, 0xb477b4cau, 0x316b34b6u,
+ 0x33b93433u, 0x319ab49bu, 0x318fa781u, 0xb011b4bcu, 0x31a030d3u, 0xafa22c99u, 0x347baf48u, 0xa810a8abu,
+ 0xaaf32805u, 0xa932280cu, 0x33db3246u, 0x28f03426u, 0x2fe2a871u, 0x28d6327fu, 0x30cb2faeu, 0x30ba2ec6u,
+ 0x93fdb198u, 0xb4c72ac7u, 0x2c3a32d6u, 0x3213b31fu, 0x30f6302bu, 0x32cdac25u, 0x296aaa9au, 0x2d90b4cbu,
+ 0x29273030u, 0x31f3983bu, 0xb0432ef7u, 0x2ff63468u, 0x348634afu, 0x30e73406u, 0xb24b2c34u, 0x311826a9u,
+ 0x33122268u, 0xac92b37fu, 0xa9331b54u, 0xb2a7b384u, 0xa0cc327bu, 0xb4a4b455u, 0xb272a413u, 0x31ac2fe0u,
+ 0x2b14b137u, 0xaf8bb2a2u, 0x26c63377u, 0xb1463369u, 0xa476ae87u, 0x34c7a21fu, 0x32c2b367u, 0x281632fcu,
+ 0x21e8adfdu, 0x31beaff8u, 0x3417943bu, 0x20662cb6u, 0x18d5300du, 0xb20f33c4u, 0x31fe348bu, 0xa14032edu,
+ 0xaf6b3279u, 0xb2e73108u, 0xa82baf69u, 0xac57aed3u, 0xb0ac3225u, 0xb41830b6u, 0x319b311au, 0x3046ab8du,
+ 0x31c633f8u, 0xa0d9ac35u, 0xaff7315du, 0xae9a31d6u, 0xb3ef31e0u, 0x34c7b460u, 0x335a3318u, 0xa8bd3455u,
+ 0xb31bb018u, 0xb16011f1u, 0xb4c3342eu, 0xb0deb2dau, 0xb144b1c8u, 0xaea02dfeu, 0x337e2c6bu, 0x2f782be0u,
+ 0xb288ad2au, 0x3475aee4u, 0xb0462cadu, 0x3271a78au, 0x28a82e78u, 0x2e9f3365u, 0xb1f52d7bu, 0x3477b38fu,
+ 0xaa7a2b98u, 0xb4352ad6u, 0xb171322fu, 0x254831d9u, 0xb483b04au, 0xb2363210u, 0x30eab44du, 0xac40adc2u,
+ 0x32e5b0dbu, 0xb190afc8u, 0xb1b3af53u, 0xb401af01u, 0xaa5fb28eu, 0xb4403349u, 0x33d7b184u, 0x2ee4337du,
+ 0xaa29b0fcu, 0xb322b18bu, 0x29d3ac9cu, 0x2c41ad61u, 0xadebafd3u, 0x33433195u, 0x297eaed5u, 0xb44ab042u,
+ 0xb2513091u, 0x9ff6a9b5u, 0xb3c9b20fu, 0x3458add6u, 0xac8e2c6cu, 0x2bb3b46du, 0x2689b3d4u, 0xa9de30d6u,
+ 0xb458afb4u, 0xa960b490u, 0xb0ea335cu, 0x2f1429b0u, 0xb2a9b437u, 0x2e88ae7bu, 0x307fa8afu, 0x30f6330fu,
+ 0x30e234bau, 0x326232c3u, 0x31cb306cu, 0xa39e342cu, 0x20972d4fu, 0xa94632bdu, 0x31bf3224u, 0xb0d2346fu,
+ 0x30ef3006u, 0x33d03430u, 0x2a1429a6u, 0x348fb263u, 0x3051b0b4u, 0xa88431aeu, 0xb1f9b18cu, 0x1e9c32d8u,
+ 0x302bb3dfu, 0x333ab0ccu, 0xb1c1af50u, 0x1f16b3f9u, 0x30d3a791u, 0xadffa570u, 0xb042b094u, 0xb26e341bu,
+ 0xaea534b0u, 0xb0d3b3e7u, 0x3192b43du, 0xae421fb6u, 0x2d042e39u, 0x346e3439u, 0x321a2ef0u, 0x9d58319du,
+ 0xb40c2fdau, 0x346f30f4u, 0x3274b3f8u, 0xb1423428u, 0x341c239fu, 0xb325346bu, 0x2f9cadccu, 0xaebbb159u,
+ 0xb2a7b494u, 0x2de128c9u, 0x31582db8u, 0xb059aa82u, 0xb1a2b4a7u, 0xb22eaf34u, 0xb294b49bu, 0x2cfa2e61u,
+ 0x31ba342fu, 0xb2f52934u, 0xb43528ecu, 0xac37a94fu, 0xb03aa98bu, 0x328c2f23u, 0x2ce83128u, 0xb48025e4u,
+ 0x26e93048u, 0xae9cb067u, 0x2cc7342fu, 0x2ca5a366u, 0xa494aaa0u, 0xa89fb007u, 0xb46b2f20u, 0xb02b31aau,
+ 0x2d7b2f03u, 0x2c272ddcu, 0x1d203468u, 0x2fe43076u, 0xb029b110u, 0xb045a9d6u, 0xac30ad1eu, 0xb1612bfau,
+ 0xa8cc31d8u, 0xb4052febu, 0xb487a49du, 0xae002ec1u, 0xaec13450u, 0xb0a828e6u, 0xa5102c9du, 0xb1b0acf8u,
+ 0xb34cb375u, 0xaabc34cau, 0x3459b4c8u, 0xaa41b49cu, 0x33f6302du, 0xa5963016u, 0xb46c348eu, 0x32b534adu,
+ 0xb0b5341eu, 0x2e45b2ddu, 0x231e20dfu, 0x2c9c3414u, 0xabadaf41u, 0xb1e8b309u, 0x31f5b2d9u, 0x24a92fdeu,
+ 0x2e78acd3u, 0x33523202u, 0x32242739u, 0xa3d31ef2u, 0xac5d3304u, 0xaf7eabfeu, 0x26e421bfu, 0xa9562e29u,
+ 0x330bb16bu, 0x340fadeau, 0x289db19eu, 0x324f25a0u, 0x33dfb2efu, 0x2a9d24d4u, 0xb4741e3fu, 0xb08a9726u,
+ 0x2dbaae0au, 0xb3cf95a9u, 0xb209b419u, 0x2c29b1efu, 0x3354b230u, 0xb3dab193u, 0xafdc3337u, 0x3468a9bau,
+ 0x30203444u, 0x2b2bb423u, 0x344a34b2u, 0xabfdb391u, 0xb31caf15u, 0xb0582f7bu, 0xb3ddadf5u, 0xb29094ffu,
+ 0xb2d9a937u, 0xb2d1345bu, 0xb1d122b9u, 0xb1b7b411u, 0xb2ecb446u, 0x2dd49c01u, 0xa888b3ecu, 0xb11a2a3du,
+ 0xb41caceau, 0xa51bb160u, 0x34c434b5u, 0xa9e8349bu, 0xaec6b407u, 0xb4aab109u, 0xa731b3c6u, 0xb223ae28u,
+ 0xad7d3026u, 0xb0f932c2u, 0xb21125e6u, 0xb2ac99c6u, 0x2c4fb27fu, 0xb24ab089u, 0xb4633420u, 0x276a2901u,
+ 0xa741b0cau, 0x3040b3eau, 0xb1e3b158u, 0x33baaf1cu, 0x344a31b1u, 0x2e422dd6u, 0x33c8a222u, 0xad911eafu,
+ 0x27162d64u, 0xac1fb338u, 0x32a9af2bu, 0x322daca7u, 0x30dea86du, 0x3414b301u, 0xb09da1f9u, 0xb083b0f7u,
+ 0x302dad1eu, 0x2e7e3451u, 0xa7e5273fu, 0x32b7324du, 0x345aaff3u, 0x31c3ac07u, 0x2fd2347bu, 0x2a9eb280u,
+ 0x318db352u, 0x2f7fb033u, 0xb22033e0u, 0x2da6ac1du, 0xb23eaa2cu, 0xab2cb485u, 0x29af2dbbu, 0xa793b29au,
+ 0xb056ad03u, 0xaf4f3067u, 0x3410262au, 0x2c7e2608u, 0xb22eb419u, 0x3116b3a9u, 0xb2053273u, 0xaecab164u,
+ 0x2cad2affu, 0x33c82f2cu, 0x2e19afdfu, 0xaf752896u, 0xb3c433d3u, 0x30603012u, 0x21b533f9u, 0x324baedau,
+ 0x0a353133u, 0xb0753125u, 0x33032ae0u, 0xa10eaefdu, 0x31ba3454u, 0xac68336eu, 0xb2ac34c4u, 0xa25e21c4u,
+ 0x33d831bbu, 0x2cdf3070u, 0x1e58a993u, 0xb49b1db7u, 0x32d6177cu, 0x31ad3005u, 0x2473ac35u, 0xac7eb47bu,
+ 0xb0b3a3dcu, 0x2f50b480u, 0xaa7aa8fcu, 0x25d0b0cdu, 0xa3fea8c2u, 0xb4bd2a36u, 0xb41e33d4u, 0xb19ea78au,
+ 0xb038234eu, 0xabf632eau, 0xb0d22d54u, 0xab5cb49du, 0xa637259bu, 0x2ff1346fu, 0xafdf34c2u, 0x2844318du,
+ 0x34aa2c4bu, 0x33a82deau, 0xb3273056u, 0xa0d12fa8u, 0x31e43405u, 0xb307b39au, 0x9c26b422u, 0x304bb406u,
+ 0x32732805u, 0x21e1b03au, 0x3362b358u, 0xb3a2b23du, 0x31e1b1aau, 0x2c65b392u, 0x30d230bau, 0xb435b1f3u,
+ 0xb1b6a7ddu, 0xad812fa7u, 0xb36ca7c7u, 0xb20a2930u, 0xb41e2ee0u, 0x2f163492u, 0xb4aab21cu, 0xb2aaae5bu,
+ 0x3328b168u, 0xb2362bedu, 0xb0b0285au, 0xb47c3492u, 0x313e3076u, 0xb24db428u, 0x3312ac71u, 0x2df725afu,
+ 0x324d316fu, 0x1d67a349u, 0xb08b3030u, 0xb49a32ceu, 0x31c4b1d7u, 0xb2b9ae22u, 0x24b9b204u, 0xb41f32a6u,
+ 0xa55ab482u, 0xb03e3461u, 0x33723468u, 0x3216b02au, 0x1f442f23u, 0x320fa9ecu, 0x30ef31a1u, 0xa29a34c9u,
+ 0x32f631e5u, 0xb465b18au, 0x32d9ab51u, 0x2faeb00du, 0xad8eb4bbu, 0xb31c2c70u, 0xb299339du, 0xb17233deu,
+ 0x2c0634bfu, 0xb42cb3fbu, 0x2ae7b415u, 0xab7431f2u, 0x2adba83du, 0x2f392513u, 0x301ba455u, 0xadc034c6u,
+ 0xb330b4afu, 0xadfb281cu, 0x2ce53439u, 0x31b92241u, 0x3224b0a3u, 0x2c98b4c5u, 0x3187342cu, 0xabea32cau,
+ 0x27cf323au, 0xb43c2c8du, 0x27d4b1bbu, 0x30a3b43cu, 0x34093287u, 0x3439b3a5u, 0xa3f8b427u, 0xb4203288u,
+ 0xb48fb4a0u, 0x31fc1eedu, 0xb335b327u, 0xb11a2944u, 0x3451306cu, 0x333e311cu, 0x32612204u, 0xb425b217u,
+ 0xb44d2dfdu, 0x2f5b3389u, 0xaf263457u, 0x30a4a041u, 0x266eb0f0u, 0x34992e20u, 0xb35530b3u, 0x2ad1a394u,
+ 0xa93ab2c9u, 0xb352b026u, 0xb26b1f66u, 0xb2582af4u, 0xb1da304bu, 0xb48c339cu, 0xaee822c2u, 0xa45db147u,
+ 0x33f93291u, 0xb2a3af8eu, 0x2a4c303au, 0x30ffb1beu, 0x2de421d1u, 0xa8adad67u, 0x30edb223u, 0x3158b296u,
+ 0x30bc31b9u, 0xb05bb49du, 0x2b56a4ddu, 0xacc73406u, 0xb2763333u, 0xb44cacf1u, 0xb2b530a8u, 0x3149b0aau,
+ 0xb4aa2555u, 0x1e8db4c7u, 0xb223af7du, 0xb1372c5du, 0xaca32b15u, 0xb2f3b204u, 0xb483b10du, 0x33b8af5eu,
+ 0xacea325du, 0x292c29d3u, 0xb1103250u, 0xaf9dafd2u, 0xaf1eb468u, 0x33e93342u, 0xa671b37fu, 0x9e8aac8eu,
+ 0xb49b2834u, 0x348c3292u, 0x2e292de4u, 0xae5d244cu, 0x2a75b2bdu, 0x323bb45eu, 0xaa8caa13u, 0x3481aea3u,
+ 0xb2b2b23eu, 0x2e16b26cu, 0xb11434b9u, 0xb04033cau, 0xb4692daeu, 0x2def9957u, 0xb1092594u, 0x3483324fu,
+ 0x2d38a644u, 0xb0f73193u, 0xb03832ffu, 0xb373314cu, 0x252ab010u, 0x32ba23a8u, 0xb38b34c3u, 0xad78b064u,
+ 0x9b7ca27au, 0x313232d7u, 0xa6e9b49eu, 0xa6152c89u, 0xabe0b317u, 0xb1ffaac9u, 0xaedc30ecu, 0x20d52ad6u,
+ 0xb342333fu, 0xafabb368u, 0xafbca948u, 0xb0e4310eu, 0xaa9fb3f9u, 0x32bb3140u, 0x2872307bu, 0x2ef7304eu,
+ 0x27ccb11au, 0xaf063132u, 0xa1b02deau, 0xb3df17c3u, 0x33052e62u, 0xad24b2e2u, 0xb484b455u, 0x9c0334b3u,
+ 0xb053ad8cu, 0xb0c2349cu, 0xb0273055u, 0x2f0ab420u, 0x2ec0b4c9u, 0x30bab0cbu, 0x304e312eu, 0x2fd3b491u,
+ 0xb0f134b5u, 0x3205b056u, 0x2e2bb285u, 0xb47130e6u, 0x29d42821u, 0x333aaf8cu, 0xad37b27au, 0xb1f132b2u,
+ 0xb40a322cu, 0xb4149e9eu, 0x31942ed5u, 0x349f32b8u, 0x312a2e9bu, 0x33ed1b13u, 0x335d2d85u, 0x9802b114u,
+ 0xb1b5306au, 0x3497b18bu, 0xb21ca906u, 0xabe2ae79u, 0x34102cbau, 0x2dd334acu, 0xaed72fadu, 0xb16b32dcu,
+ 0x348bb38fu, 0xaf8a32feu, 0xa5112ebau, 0xb018aecbu, 0x2793b2fdu, 0x1f7da81au, 0x3436b177u, 0x33ac2d52u,
+ 0x2cb8b381u, 0xae7e32b6u, 0xb083b1dau, 0xa80eb05cu, 0xb064aeafu, 0xb44a2db4u, 0x2f49ae21u, 0xb15cb239u,
+ 0xb4cbb1bfu, 0xb46da407u, 0x34c0b05cu, 0xb1783404u, 0xa593af4cu, 0x31d0329du, 0xb336b32du, 0x331baa65u,
+ 0xb38a3356u, 0xb29c29b5u, 0x33ff2f26u, 0xb401a6f3u, 0xab072a16u, 0xae2131f8u, 0xb0583333u, 0xaf6eb293u,
+ 0xb033b477u, 0xb2de2e26u, 0xb30fb376u, 0x3331b40eu, 0xb45a3443u, 0x308e30feu, 0xadd734c5u, 0xa735b01eu,
+ 0x306432d9u, 0xb0e8a45cu, 0x32cf305cu, 0x2e79abafu, 0xa29babaau, 0x314b23a7u, 0xa8463496u, 0xacdbae0eu,
+ 0xafccb308u, 0xa98ab3f1u, 0xade6b263u, 0xb0e131a3u, 0x33b42c4cu, 0xb1e12924u, 0x2af1adf6u, 0xb2bf1c10u,
+ 0x269da816u, 0x2c51b078u, 0xae46b420u, 0xb09f31d8u, 0xaae92c0au, 0x33fdaf0cu, 0xb10eae05u, 0x30b933dcu,
+ 0xb4532efbu, 0x2d50b315u, 0xb145b194u, 0xb01c315cu, 0xb473b41du, 0x2c7fa85du, 0xb3793315u, 0x2494b1d3u,
+ 0xa15dad38u, 0x26e53412u, 0x34b3aa85u, 0x3426b42du, 0x189e1416u, 0x1c80340du, 0xb05ea8bbu, 0x29592e19u,
+ 0x3005b17cu, 0xabe22d88u, 0xa81ab4c2u, 0xb414b4beu, 0xb403a954u, 0x2df9b04fu, 0x34c7af59u, 0x2a8d33feu,
+ 0x3311347cu, 0x30cbaef4u, 0x2f16b2f9u, 0xb4a39d7bu, 0x345fb056u, 0xb21eaf12u, 0xa263b200u, 0x30f43294u,
+ 0xb198a8dau, 0xa5bdb45cu, 0x3069320bu, 0x2b46ab05u, 0x2a12a8e5u, 0x2ef32161u, 0x343b1cbcu, 0xb4a6a987u,
+ 0x0df7b110u, 0x2806263bu, 0xb054301fu, 0x2fd62e55u, 0xab83aad0u, 0xaa9bb22au, 0x33da3182u, 0xafb42b31u,
+ 0x309c2f00u, 0x3245ae71u, 0xb4b2b418u, 0xb29bb33du, 0xa9c33347u, 0xa972b052u, 0xafa0ac0au, 0x2dfdb2f4u,
+ 0x314ab2fau, 0x3071b1bau, 0xb33231deu, 0x3451b3efu, 0x2d00a5adu, 0xb040b494u, 0x336a1aa2u, 0x3149b063u,
+ 0xb3632e3bu, 0xb108afefu, 0x2d32a86au, 0xb0e8a91du, 0x2a7d2f86u, 0x282e21efu, 0x2eeeb157u, 0xad3eac52u,
+ 0x2b70b2bau, 0xae222be2u, 0x32a232e9u, 0x2f7db382u, 0x34ccac14u, 0xb037b148u, 0xb071a872u, 0xa1dd2f5fu,
+ 0x1bd5b15cu, 0x319d3377u, 0xb01091c2u, 0x3344b225u, 0x298b3009u, 0xb3433349u, 0xb1702f02u, 0xa548b349u,
+ 0x343cac53u, 0x320caec2u, 0x2e7330c7u, 0xab123409u, 0xb283ac3du, 0x334bb04eu, 0xae18a3dfu, 0xa09fb173u,
+ 0xb0183029u, 0x2cfc30f3u, 0xb0c83185u, 0xb2a62d08u, 0x304fad3bu, 0xb1fe2d0eu, 0xb44a32e1u, 0x34a52b50u,
+ 0x31c624d7u, 0x316f2b4fu, 0x2f572dfau, 0x30a5b13au, 0x349db108u, 0xb10eb13au, 0xb0ef32e3u, 0x301fad57u,
+ 0xb28cb28du, 0xb0eca031u, 0xb4b6b113u, 0xaefbacfau, 0x2ca8b480u, 0xb41f2cddu, 0x3328af07u, 0xaac32a4fu,
+ 0xb19aa55cu, 0x33f331fdu, 0xb14d33e2u, 0xb284af44u, 0xae6caf67u, 0xb1f52d1cu, 0xb39f3407u, 0x32d1b4bcu,
+ 0xad8e2fd3u, 0xb14f2b56u, 0xa46eb25eu, 0x2c6dada3u, 0x2e42b169u, 0xb2673120u, 0xa8a4b461u, 0x2dca3143u,
+ 0xab762fc1u, 0x24e2b117u, 0x3058b1adu, 0xb002b1d1u, 0xa0151c05u, 0xa8ce2bbfu, 0x32873407u, 0x2cf93347u,
+ 0x2c312d2fu, 0xaed5a900u, 0xb48e2b11u, 0x32c0b4bau, 0x239eb45bu, 0xb1cc2c6au, 0xb1f33399u, 0x310d3362u,
+ 0xb2b5b0fau, 0xabd6af0fu, 0x34cab0e2u, 0x3259b131u, 0x349cb28cu, 0xaa04b145u, 0xb34a287fu, 0xb0533215u,
+ 0x27a02791u, 0xb4442be2u, 0xb281b25du, 0xb2d1af9au, 0x1ff6340du, 0xad792cbfu, 0x17cfb04cu, 0xae01afe6u,
+ 0x1b0ca742u, 0xb15034a6u, 0xb2bb9f63u, 0x302cadc3u, 0x2a3fb240u, 0xb44eaa66u, 0xaaa33428u, 0xaa6d3174u,
+ 0xb0063433u, 0x30f8aabeu, 0xad35ae01u, 0x284534c0u, 0x2d822f46u, 0x32f2324cu, 0xafe428dcu, 0x2c65b495u,
+ 0xa7993240u, 0xb012b270u, 0x32771e23u, 0x2c4eb24du, 0x343030ddu, 0x3418b35cu, 0x311f2bcbu, 0x1b8eb449u,
+ 0xaf6b2cb5u, 0x282ca940u, 0xb4662809u, 0xb17fa8a6u, 0x34933400u, 0xb23dafb2u, 0xa2d6b163u, 0x016d3331u,
+ 0xb46bb445u, 0x3295b13cu, 0x28d62caau, 0x341faef4u, 0x326eae7cu, 0xa11f339au, 0xb0493392u, 0xb3e92e15u,
+ 0xb16da401u, 0xb4c4b433u, 0x2ae0aebbu, 0xaf9da6e0u, 0x343834b7u, 0xaa85b330u, 0x307c2db7u, 0x30bbad22u,
+ 0xae75b337u, 0x2cd43028u, 0xb45a3279u, 0xb456b240u, 0xb4c2b1e9u, 0x2459192cu, 0xac6b2e10u, 0x2eadb4c5u,
+ 0x3308b447u, 0x24aeb495u, 0x29913395u, 0x2ef92c2au, 0x2eb8b340u, 0x348d334eu, 0xadf9a860u, 0x269630ddu,
+ 0xb24bb0a9u, 0xa645b255u, 0xb49cb255u, 0x3420b44du, 0xb419b419u, 0xb0e5a77fu, 0x33e43471u, 0xa7e0ab51u,
+ 0xb470acccu, 0xb33faff4u, 0x28adb3a2u, 0xb3a0b3b4u, 0xa86fa1fcu, 0xb4c2b2f5u, 0xb112b1e5u, 0x33fa31c4u,
+ 0xab4bab39u, 0x2ce03014u, 0xb388b211u, 0x28fba8cbu, 0xaff0b2f0u, 0xb095af74u, 0x2de1a607u, 0xb1bb1dc5u,
+ 0x322eb3a0u, 0xb434a25cu, 0x31d6b0a5u, 0x333730d5u, 0xa9ec2bd7u, 0xb1e0a9a2u, 0x2a892dcau, 0x3368b2a0u,
+ 0xb453ad0du, 0x331aae9cu, 0x9de63401u, 0x33343462u, 0x32e8323fu, 0x30edac89u, 0x303829f1u, 0x3071b230u,
+ 0xaf943465u, 0xacac34abu, 0xb372b1c4u, 0x28363179u, 0x2c2aaf68u, 0xa7efae73u, 0x2e7532f7u, 0x2a6a310au,
+ 0xb0f734b8u, 0xb3d3b090u, 0x2c5a2826u, 0xae523057u, 0x2d14b4afu, 0xafe434cau, 0xb031b212u, 0xb0c8a9bau,
+ 0x2d4fade0u, 0xabfcadf3u, 0x3403b46au, 0xae99ab27u, 0x2d27b3f1u, 0xb46832fcu, 0xb2cab2d0u, 0x340b2bdbu,
+ 0xadffa8c8u, 0x30a8311bu, 0xb283b0ddu, 0x314bad76u, 0x2e5e2f7eu, 0xb0473253u, 0xb465ae28u, 0x2a553314u,
+ 0xb422afc1u, 0x30c7304au, 0xb1b02e62u, 0x30dc33efu, 0x2dcc2e0fu, 0x341d33c3u, 0xaa662b58u, 0x3220ad3cu,
+ 0xb40f2d8eu, 0x33afb2dfu, 0x34563362u, 0x3174a98bu, 0xaf14324du, 0x34c5b2cfu, 0x2b7ab483u, 0x2efb2dc5u,
+ 0x3041b0b1u, 0x3251b172u, 0x2f8eb165u, 0xb1f93171u, 0x315cb108u, 0xb1f0b3f3u, 0x3421a6d5u, 0xac5028c1u,
+ 0xa5933498u, 0xaac9a917u, 0xaeb1af36u, 0xb4682f37u, 0xb1bd2efbu, 0xb2c9ac72u, 0x3067280du, 0xb276ad95u,
+ 0x321cb140u, 0x3146b076u, 0xb1faae7bu, 0xa4d5305du, 0x346ab480u, 0xada1b2c3u, 0x9e7fac6du, 0xb4be317bu,
+ 0x2d9b3427u, 0x32183454u, 0x2bc2341bu, 0xa862ad7bu, 0xa99d2e12u, 0xb3d63073u, 0xb1ae3256u, 0xb29d3443u,
+ 0x31fdb31fu, 0x3096321eu, 0xb068346cu, 0xadaf3246u, 0xa82e25bfu, 0x29b8b4beu, 0xb24030c1u, 0x24ab3425u,
+ 0x2d9bb22du, 0xb431b1eau, 0xb37a2f1eu, 0x309bb390u, 0x31f3b445u, 0xb2e331b0u, 0xb3bf21a6u, 0xb4322d17u,
+ 0xb19d3444u, 0xb3232f06u, 0xa789b193u, 0x34ca299du, 0xb4ccb48du, 0xad24adbfu, 0x33ec335fu, 0xb2da334eu,
+ 0x30b9aac0u, 0x2e45af32u, 0xaec8b10cu, 0xb0ca32ecu, 0xaf713475u, 0xb0b5b493u, 0x1c32b250u, 0x2cd7b370u,
+ 0xad473333u, 0xabaeb25du, 0x2b07317eu, 0x25b6b3a8u, 0xaea4a4e7u, 0x349eb37cu, 0xb1e6b238u, 0x2964b2ceu,
+ 0xb0cfa854u, 0x2eaaa96cu, 0x33f9b031u, 0xb1ccaf70u, 0x31f6345du, 0x328eae06u, 0xade221c8u, 0x30d73285u,
+ 0x2c19afc3u, 0xb4caad04u, 0x279f346fu, 0x2bac2b8fu, 0xb00827c7u, 0x2cac3487u, 0xa6f22e90u, 0xb42fb013u,
+ 0xb238adc0u, 0x33472e39u, 0xae6baf93u, 0xb2bc2e8bu, 0xad3a341fu, 0xb23fb3a8u, 0xb0db3428u, 0xb29a2e3bu,
+ 0x3348b1e0u, 0xb49fb2c2u, 0x32f71210u, 0xb324aeafu, 0x2ec934a1u, 0x30d732bcu, 0xac6f2f9du, 0x302aaebfu,
+ 0x34533029u, 0xafe32946u, 0x2c5e9e6fu, 0x3197a9a1u, 0xb4b62a80u, 0x2d21295du, 0xaaa033f9u, 0xa3ae29b8u,
+ 0xb42724ddu, 0x33e133a1u, 0x28582762u, 0x2b86b43fu, 0x3003a971u, 0x300db459u, 0x2dd3b017u, 0xb13f2ce2u,
+ 0x3272a893u, 0xb4242c0eu, 0xaf3aa862u, 0x2c662d70u, 0x2c23b4b3u, 0x31293082u, 0xaf8ab43cu, 0xb226aa0cu,
+ 0xb41eaa16u, 0x2e7ab36du, 0x300fb08fu, 0x28d1b457u, 0x31b4ac2bu, 0xacc73174u, 0xb13b2cb6u, 0xb4052a2du,
+ 0xb217a906u, 0xa83b335cu, 0xad5d2af9u, 0xb1b4b00eu, 0x31c033e8u, 0x30bdb30bu, 0xb4a7aa42u, 0xb31a245bu,
+ 0x2f87b192u, 0x3386b0aau, 0x32612e09u, 0x33152c6cu, 0x278cad50u, 0x33c120e2u, 0x29ffb26fu, 0xacfd30e3u,
+ 0xace0330fu, 0x2a74b246u, 0xb4a7aa1au, 0xb1dab421u, 0x3387312cu, 0xb3932703u, 0x32d42549u, 0xb00330cau,
+ 0x33b1b339u, 0xb44932ccu, 0xb3bc3237u, 0x30f7b431u, 0xaa902594u, 0x328433c1u, 0x2c75b43fu, 0x27c1313au,
+ 0x2f2534a3u, 0xb1cbb39au, 0x34c53244u, 0xb3acb3b6u, 0xb42f310fu, 0x3377b424u, 0x335331c5u, 0xaed12dc8u,
+ 0xafbb33cdu, 0x336bb04fu, 0x33992ef2u, 0x3458b29bu, 0x2e742c54u, 0xac9433f3u, 0x2d643037u, 0x339f3186u,
+ 0xb4272eeau, 0x345cb215u, 0x347d3480u, 0xac84b472u, 0x30db30fcu, 0x2c34b3f3u, 0x9c2cb2dbu, 0xb44ab0ffu,
+ 0x33e4b408u, 0x342530b2u, 0xaaa9a49cu, 0x2c77ad70u, 0x30baadb3u, 0x333731c9u, 0x2427b0f0u, 0xb471b14fu,
+ 0x27862ffbu, 0xb3eb304du, 0x34bf34c2u, 0x3307a560u, 0x3090b0ecu, 0x320b3348u, 0xa5ae31fcu, 0x2f533225u,
+ 0xb368b30fu, 0x2cd13139u, 0x33d221f4u, 0xabfa2e21u, 0x2c0baa9cu, 0x28f130ffu, 0x305b1d62u, 0x32582fe2u,
+ 0x30e5b433u, 0xb415203au, 0x30282ee0u, 0x31a42342u, 0xb1b9afacu, 0xaae8adbbu, 0x32a22f86u, 0xb242303eu,
+ 0xb28fad23u, 0xb0d62f2cu, 0x319a23cdu, 0xafbe2e90u, 0x2de4ae6bu, 0xb3c02799u, 0x1fc0332eu, 0xb1caa417u,
+ 0xae6bacb9u, 0x292db067u, 0x3379a5cbu, 0xb163343du, 0x317ab3aeu, 0x32fb34c0u, 0xb0c7b493u, 0x3229b462u,
+ 0x20372a41u, 0xb0c1282fu, 0xb4812bceu, 0x30802d9bu, 0xae722ea5u, 0x30233244u, 0xb1dab323u, 0xb00b2d44u,
+ 0xb2a0b34fu, 0xa2d3b26cu, 0xb2372c20u, 0xb0343014u, 0xb14faf3cu, 0x346a317au, 0xb4669c9cu, 0xb291b099u,
+ 0xb3f93295u, 0x2acc347au, 0x32ab2173u, 0x3152b489u, 0x328bb35cu, 0xb437247bu, 0xb48a3266u, 0x31b8b2f1u,
+ 0x3353a565u, 0x2cbe2d33u, 0xb309affau, 0xb396346du, 0xb05e2475u, 0x2b44b087u, 0x315231b8u, 0xaff43315u,
+ 0xb4b6a543u, 0xaf9730bdu, 0xb25928b9u, 0x32173222u, 0xb45c333fu, 0xb0d834bau, 0xb0d0af51u, 0x280bb077u,
+ 0xac14b0ffu, 0x32742dfau, 0x2b3e2f67u, 0x3212b2c3u, 0xb087b471u, 0x2e6eb441u, 0x2dd7ad0eu, 0xa6ccade0u,
+ 0xb45da675u, 0x2d6ba95cu, 0xb4142713u, 0xaf572e74u, 0xb29db4a9u, 0xaf4895ffu, 0xb0fb2f77u, 0xb01a313du,
+ 0x31bf2ca4u, 0xb45aac65u, 0x34c22572u, 0xb15b3481u, 0xadf430e9u, 0x2e5fb286u, 0x26071045u, 0xae473249u,
+ 0xad232e6bu, 0xacafb464u, 0xa1243452u, 0x2f36b386u, 0x33d53473u, 0xacc02987u, 0x32112a32u, 0x321f32f0u,
+ 0xb351a6d7u, 0xb2992950u, 0x928cb437u, 0x32cd2df7u, 0x32dc316eu, 0x2fce30feu, 0x2cd8b4b4u, 0xb43db349u,
+ 0x3235b299u, 0x344f30afu, 0x338d3499u, 0x3344b30cu, 0x33203280u, 0xb35cb2b0u, 0xb2433412u, 0xa6c3ad0cu,
+ 0xb49ab336u, 0x333b34beu, 0xb02333ebu, 0xa9b2b160u, 0xb4bfa72cu, 0x34c5315du, 0xac4a2468u, 0xb3c1ac1eu,
+ 0x153eb07du, 0x26eba93cu, 0xb14632b4u, 0xb46fb489u, 0x296ab306u, 0x336d31ebu, 0xab702d30u, 0xab92b2a2u,
+ 0x331bae6fu, 0xb142a014u, 0xb2fbb00eu, 0xb4513405u, 0xb216b4adu, 0x233631f8u, 0xad772cd5u, 0x32d7b424u,
+ 0xb47cb1eau, 0xb4002995u, 0x31c4b3c3u, 0xb0f79d28u, 0x3067add0u, 0xb35db42au, 0x2c1c2c77u, 0xacd13489u,
+ 0x25c4b42du, 0xb2eb2d50u, 0xb1ce280fu, 0x326f329eu, 0x347ba968u, 0xb31e349bu, 0x2bb4ad14u, 0xac82b3f9u,
+ 0x3457b096u, 0x310baacau, 0xb312ac5au, 0xb4cc32f9u, 0xad35b443u, 0xae142f2du, 0xaeffb42au, 0xb4bfb0e5u,
+ 0xb35c2558u, 0x2d6cb014u, 0xb08b3052u, 0xb0dcae8fu, 0xb1f13463u, 0xa839b215u, 0xb297b1f7u, 0x2d62b3c2u,
+ 0x2dc42c4eu, 0xaeab313fu, 0x307a3404u, 0x303e31dau, 0xb02aad6fu, 0xb3bda9f3u, 0xb463a77cu, 0x3424b4c3u,
+ 0x28c02f45u, 0x296bae1au, 0x2dda3385u, 0xa61d2bdfu, 0x2fa0adedu, 0xa9c7349eu, 0xa985ac24u, 0x34b6a9b8u,
+ 0xad682ff1u, 0xa951acefu, 0x2446b473u, 0x2e2a31fau, 0x3061b1cau, 0xb08f3037u, 0x2c432cb8u, 0xb2fd3286u,
+ 0xa04fb1dcu, 0xaa7bb460u, 0xb2bf3189u, 0x34b4a627u, 0x941f1b46u, 0xb2072884u, 0x2ecfb2c6u, 0xb009ac01u,
+ 0x32a5b480u, 0x30cdb47du, 0x2b94ac88u, 0xb29eb0d8u, 0x2f832521u, 0xa8e3b1d9u, 0xb174a840u, 0xb2cc28ffu,
+ 0xb36c1dfdu, 0xaf3ca071u, 0xa917a756u, 0x9e2f33d5u, 0xb144321du, 0xb414adaeu, 0x2e7aaa16u, 0x2910327eu,
+ 0xb2322d61u, 0x319ab471u, 0xb3d9b265u, 0x34733433u, 0xaca92c77u, 0xb1ab3266u, 0xad7ea96cu, 0x340833edu,
+ 0xb3792dcbu, 0xb0e4a178u, 0xb2f13364u, 0x2fdab447u, 0x327fa72du, 0x31c8b415u, 0xb488b471u, 0xb15aa5f1u,
+ 0xb48d2b33u, 0x305cb476u, 0x2e02b191u, 0xb43b30ccu, 0xb072312eu, 0xb3f0ac3eu, 0x28aa3280u, 0x348019bcu,
+ 0x3381ae5bu, 0x341d3038u, 0x34b733d4u, 0x31afb312u, 0x2d8b31d9u, 0x3477b03fu, 0x8c2db1adu, 0xb27e229au,
+ 0xb3e53046u, 0x32b22f6au, 0x3005adc8u, 0x32a4b297u, 0xb2ab3457u, 0x1b2a2f5eu, 0x2fee24d7u, 0x2bb49eebu,
+ 0xacaab12bu, 0xb37d3303u, 0xb2102b7eu, 0x2a1232a7u, 0xb0ceab72u, 0x3465a8dau, 0xa4b01da8u, 0xaca530bau,
+ 0xae43a993u, 0x301bb2b4u, 0x33ca332au, 0xa7683350u, 0x2d4cb025u, 0x305c345du, 0x2f409a55u, 0x3406313cu,
+ 0xa922310eu, 0x84a7b059u, 0x3440ae99u, 0xa44b1ce5u, 0x2d5aadd4u, 0xb31f2229u, 0xad6a3317u, 0xb1a7a92eu,
+ 0x2fc2281au, 0xb44e3352u, 0x2fd4348bu, 0xb1adb024u, 0x3337afd5u, 0xa5b6b001u, 0x33242fe6u, 0x30dc2e17u,
+ 0x333fb246u, 0xb323b04eu, 0xb1ba3492u, 0x2cd7b2fau, 0xb30aaef5u, 0x30f92bc4u, 0xb29d31a1u, 0x3206285eu,
+ 0xb19db282u, 0x2391b4c7u, 0x304eb1ddu, 0xb1983470u, 0xb1882dbau, 0x30f7b4a1u, 0x344aa912u, 0xb2942544u,
+ 0xb1b4a7b7u, 0x2f553342u, 0x2446ae19u, 0x316e2e82u, 0xb495b408u, 0xac08b367u, 0xb2fba51du, 0xa87caad3u,
+ 0x3400b381u, 0xac482b9du, 0xaf231c3eu, 0x29c332ceu, 0x33e49a46u, 0x3201327du, 0xb2a9b145u, 0xa4d12f4du,
+ 0x31012f0eu, 0xb022afdcu, 0xb0599da2u, 0x30bd2e6du, 0x304ca44bu, 0x30161e6au, 0x3096b148u, 0xb07b3021u,
+ 0x32d2284cu, 0x3088b0cfu, 0x3308345cu, 0x2c1eab90u, 0x294234b9u, 0xb45db364u, 0xa9de3193u, 0xadd9b0dcu,
+ 0xa55730f1u, 0x30833313u, 0x3376348du, 0x2e1cb0b6u, 0x34b7b49au, 0x2e54b097u, 0xb28532b4u, 0xb4af32aeu,
+ 0xa44ca875u, 0x2aa2acd6u, 0xaea533f7u, 0x333f2a16u, 0xb469313eu, 0xb45a3440u, 0x29ca2ad1u, 0xb2c03180u,
+ 0x30d7b228u, 0xa4b1b2bbu, 0x2dcf3073u, 0xaf87b18eu, 0xa7e23481u, 0xb162b460u, 0x34ccae0du, 0xb48caf94u,
+ 0xb3ff1ff0u, 0x34bab057u, 0xb4502fdcu, 0xb24ea7e2u, 0x2ab0b33du, 0x30aa1af3u, 0xb0dd3388u, 0x21873410u,
+ 0x32a4b0a1u, 0x2892b475u, 0xb0393041u, 0xb0fd32fbu, 0xb40c3287u, 0xac92b288u, 0xb2d332ccu, 0xb1303335u,
+ 0xada8a74bu, 0x3444b355u, 0x2f4fa8bfu, 0x143829c8u, 0x33a62c7au, 0x24fd2fa7u, 0x2c162d93u, 0x2ccaa923u,
+ 0x30f2a855u, 0x346a3480u, 0x2bd13228u, 0xb22e30cau, 0xb0a9a272u, 0x347bb4c4u, 0x203532fcu, 0x2f69b0b8u,
+ 0x31d3a902u, 0xadcc34ccu, 0xb077b0e8u, 0xac12b3cfu, 0xb367b4a2u, 0xb288b062u, 0xb3933075u, 0xb15b3307u,
+ 0xb4b4b384u, 0xb0ac3022u, 0xb3751dd6u, 0xb33134a9u, 0x32c0aac4u, 0xb079b0e6u, 0xb05a31ecu, 0xa989b4c6u,
+ 0x33deb4aeu, 0x2a602d72u, 0x2faeb1bbu, 0xb4382eceu, 0x2a193350u, 0xb08d319eu, 0x2fc6b342u, 0xac6031a4u,
+ 0x321cb493u, 0xae2eb1a5u, 0xb3ae2cedu, 0xb48eb41cu, 0xaf3e3400u, 0x34b22e2bu, 0x23b62e82u, 0xacba226du,
+ 0x34682baau, 0xa9d23188u, 0x335dae6fu, 0x32b6b2feu, 0x3024b0b4u, 0xa31131e1u, 0x329fb1e8u, 0xaee3ad16u,
+ 0xa8fc1e27u, 0xb2f72f07u, 0x3385a8f0u, 0xa68eb474u, 0x33722e2au, 0x323b2b34u, 0xada7b153u, 0x3369343fu,
+ 0xa3882873u, 0xb43ca784u, 0x321c3389u, 0xb03ca6b8u, 0xb33cae08u, 0x285a3042u, 0x2cb9b11bu, 0xb340349au,
+ 0x9b2da660u, 0x24dab266u, 0xb2942c12u, 0x32d8b2a1u, 0x2d01af3eu, 0x32832cdeu, 0x328a309bu, 0xa7c32ccfu,
+ 0xaca5ac16u, 0xb31eaa44u, 0xac3fa9d7u, 0xb051b478u, 0x29fc30d6u, 0x268f31e1u, 0x31af32ccu, 0x3494b3b7u,
+ 0x2b70b4a9u, 0xae75319eu, 0xad4b2b92u, 0xb4b8b438u, 0xb3b6b3c7u, 0x23f22c42u, 0x2f5fa6d7u, 0xb066b4a3u,
+ 0x33df3153u, 0xb405a8bcu, 0x336531bcu, 0xb0edaaa2u, 0x2affb461u, 0x2ed533dfu, 0x2d7daf05u, 0xa36d3260u,
+ 0xac273224u, 0x2a60add1u, 0x3495a56du, 0xac692109u, 0x275dac09u, 0x3048ad92u, 0xac73ac95u, 0xb4372c7eu,
+ 0x1a1c2df7u, 0x3036a737u, 0x32a630f5u, 0xaea4accfu, 0x2a48301eu, 0x28442d10u, 0xb1f4313cu, 0xb2902847u,
+ 0x2e4cab20u, 0xaca09c42u, 0x313d3364u, 0x3452240au, 0x3188a9cfu, 0xb01db260u, 0xb391ae38u, 0xb1e9146fu,
+ 0xb1f32d27u, 0x3481b33bu, 0x34312e14u, 0xb47b9e37u, 0x287b29b6u, 0x32f0b4a8u, 0x30b32c9bu, 0x2c8c9b78u,
+ 0x348cb0a8u, 0xb27caba0u, 0x2d3bb443u, 0x339934a4u, 0x34b62fc3u, 0xae2c3047u, 0xb4092bc6u, 0xaa55b158u,
+ 0x345bb1bau, 0xacf534c2u, 0xab153004u, 0xae43b2ecu, 0xae9ab11fu, 0xb371b179u, 0x33cab16du, 0xaebbb3f7u,
+ 0xad96245fu, 0xb4622169u, 0x28e02db4u, 0x292a2ef3u, 0xadc1a8c5u, 0x02aab141u, 0xae98b48du, 0xb3832ed5u,
+ 0x2dd8b432u, 0x31ccabdau, 0x34633433u, 0xb086a9b7u, 0x2cec2c2cu, 0xaed82cb3u, 0xb191b430u, 0xb339b2a3u,
+ 0xb4ab3418u, 0x344c2bd3u, 0x242aad60u, 0xaf052f85u, 0x3217a2deu, 0x31ec30b6u, 0xacfb34aeu, 0x2be530bfu,
+ 0x309b2e66u, 0x336830cbu, 0x2de5adc9u, 0x2b43b038u, 0x319630d3u, 0x1d5caa22u, 0xb25bae61u, 0x33eab099u,
+ 0x348faf56u, 0x3048b0f7u, 0x2f8f29adu, 0x2ff02d87u, 0x28e033c9u, 0xb066b426u, 0x9b482fffu, 0x2b6b2775u,
+ 0x3142ac82u, 0x301d304bu, 0x337f3022u, 0x347c2b53u, 0x2d11b350u, 0xa2ff302au, 0x27b0b011u, 0x33dbad32u,
+ 0xb23aad80u, 0xabfd29deu, 0x28fbb3c2u, 0xb24625d0u, 0x292c2b80u, 0xad513141u, 0x3455b0d3u, 0xb48a28b4u,
+ 0xb1d23228u, 0x2fe1317eu, 0x34783480u, 0x311eb055u, 0x2cf6af08u, 0xb4472b82u, 0xb43fb4c0u, 0xb0a7b325u,
+ 0x21b933d0u, 0x327f31b3u, 0xac2c3083u, 0x30a632fcu, 0x33b630abu, 0x24642b94u, 0xa6ff337eu, 0xb44831fbu,
+ 0x2459b4cau, 0x3493b418u, 0x3030b4bfu, 0x347db0edu, 0xb48aac8cu, 0x2819b48bu, 0x349b34acu, 0xa150ae40u,
+ 0x22d0b3acu, 0xb0c82ba8u, 0x33ad2c4eu, 0xb134b4cau, 0xb339b192u, 0x2cd6b269u, 0x3094b3a1u, 0x3418336eu,
+ 0xb1cf2e27u, 0xb125b3f6u, 0x3470346fu, 0xaeb1ac89u, 0xb0001e27u, 0xb15a349bu, 0xa96dad06u, 0xac80b333u,
+ 0xb2e832d9u, 0xab2c2a74u, 0xb4aeb253u, 0xb435b05au, 0x31f5acbbu, 0x344b3038u, 0x2ffba99bu, 0xb16f2e59u,
+ 0x2495af8bu, 0xa5de31b2u, 0x34bdb13au, 0x3460b362u, 0x328231efu, 0xb19a31aau, 0x30ee2d17u, 0xb06330b8u,
+ 0xae9ab21fu, 0x29aba231u, 0x312a3278u, 0xb08a2ec4u, 0xae52a27du, 0xb4b4b117u, 0xb0afae2au, 0xb4342986u,
+ 0xb45fb068u, 0x293d2d6eu, 0xb2c5ae3eu, 0x2a7b3465u, 0x24d12ee1u, 0x339c31e3u, 0x2f4cac81u, 0x2dd5b130u,
+ 0xb4bcabbfu, 0x33133460u, 0x34c7b004u, 0xa652af06u, 0xaf8132bfu, 0x2e5ca799u, 0xa9f7b318u, 0x2ecaaf52u,
+ 0xb122b2fdu, 0xb29a300fu, 0x3016a9fdu, 0xb3c73119u, 0x304d32c2u, 0x3111ae2cu, 0x2bc03062u, 0x3082287eu,
+ 0x2a0834cbu, 0x30b131edu, 0x345d32d5u, 0xb26cb1e5u, 0xa7ca2aa2u, 0x31b9296eu, 0xaa583425u, 0xb1d2302fu,
+ 0xb497254fu, 0x26f7189cu, 0x2dbd3085u, 0xb4bfad70u, 0x348bb362u, 0xaeb7340eu, 0xa8e2b0a1u, 0xad7f2f4bu,
+ 0xab77b427u, 0xaeb9b099u, 0x2e19b279u, 0xb0662417u, 0x9c7b2823u, 0x346131afu, 0x335fb317u, 0x2f652fddu,
+ 0xadfeacd4u, 0x34a1b4bfu, 0xab802c17u, 0xb46926b9u, 0x2cbdb244u, 0x31832062u, 0x2bca2e19u, 0xb425320eu,
+ 0x30242f54u, 0xb1d8b051u, 0xb1ccb17eu, 0xb2ff234eu, 0xb3952e1eu, 0xaef0b48bu, 0x34b1b1ebu, 0xb156af91u,
+ 0xacd8b1d2u, 0x316a2c11u, 0x2cd6ad33u, 0x32a7a183u, 0xb009294du, 0x32252fd1u, 0x34ada54fu, 0xb1d23347u,
+ 0x2bc13181u, 0x20482559u, 0x2c4232c9u, 0xb45525cbu, 0x33d6a7f1u, 0xb4503011u, 0x3246335fu, 0xb496272cu,
+ 0xb44728cbu, 0x2cf3aee6u, 0xad7633eeu, 0x340432eau, 0xb101b499u, 0x30ee3008u, 0x2e05ac22u, 0x309b2c4fu,
+ 0xb3f43033u, 0xb30db290u, 0x33f13427u, 0xb35c31b6u, 0xb20129b4u, 0xb079ad0eu, 0xb2d43251u, 0x34a52d9du,
+ 0xb40a2f2fu, 0x2d7f33d5u, 0x2e22a6b6u, 0xa3073347u, 0xa74e24d0u, 0x3183b41au, 0xac0cb32cu, 0xb039ac59u,
+ 0x2c88ab7du, 0xb24aaf3eu, 0x34853318u, 0xaea4a65du, 0xb472b458u, 0x3421ac1du, 0xadca306eu, 0xa979347eu,
+ 0xa9a8b049u, 0x3367b18au, 0x2e11ad55u, 0x33b7ac43u, 0x9f382d78u, 0x33b8b310u, 0xb0212731u, 0xb4473342u,
+ 0x25a8af24u, 0xa326ae27u, 0xb31ab086u, 0xb293b472u, 0x29ca3459u, 0xac719ec8u, 0xb2e33000u, 0x2dfdb04cu,
+ 0x3130a26eu, 0x323533a0u, 0xb3d8b210u, 0x3172b259u, 0x341633f2u, 0x3289afc0u, 0x32332591u, 0xb2cd276eu,
+ 0x3464b40au, 0xadbda9b3u, 0xaca6ad85u, 0xb4333446u, 0x3190b350u, 0xb3222797u, 0xb14bb478u, 0xb4aab098u,
+ 0x3093b173u, 0x33a532d3u, 0xaeefac3au, 0x312c34ccu, 0xb0b525bdu, 0x20b8aeddu, 0xb4ab2677u, 0x29ce3469u,
+ 0xb18a2ad9u, 0x308e3116u, 0x2b4cb3c6u, 0x32712e13u, 0x3340b1a5u, 0xb3bfa5ebu, 0xb1cdb392u, 0x986bb02cu,
+ 0xb20430c7u, 0xb28034bbu, 0x341e2ba2u, 0x2dae3262u, 0xb3063023u, 0xa90fb4b5u, 0xad76b3e2u, 0x33feb14eu,
+ 0x3220334cu, 0xb2a4301cu, 0x2de9b456u, 0xad042d76u, 0x9eaba432u, 0xaf23b0feu, 0x2bedaa45u, 0x25e93460u,
+ 0x31e53070u, 0x2de1b36cu, 0x2990a17fu, 0x33cdb46eu, 0xb4c43459u, 0xb224b184u, 0x2b883167u, 0x3207b232u,
+ 0xa603ada0u, 0x2291ac54u, 0xb054a21au, 0x33d7a9cbu, 0x33ff171du, 0x312927b3u, 0x3193afedu, 0xb49db086u,
+ 0xb498b284u, 0x34cd344bu, 0xb3863109u, 0x30ae33a9u, 0x1599ae71u, 0xb4922e36u, 0x30761393u, 0xa851b1fcu,
+ 0x308624ffu, 0x32b03475u, 0x2a6c2cabu, 0x26f02bedu, 0x2f9db43bu, 0x1fab3027u, 0x2e9f331du, 0xb49cad46u,
+ 0xadb1b43bu, 0xb265a5d2u, 0xb12eb1f7u, 0x31353163u, 0xb24aac7fu, 0xb3edb378u, 0xb45dac2du, 0x346eae63u,
+ 0x32321c9cu, 0xb074b31eu, 0xac793449u, 0x34922db3u, 0xaf7a27cdu, 0xb44724e8u, 0xb4bbb371u, 0x2e943360u,
+ 0xb15bafbeu, 0x34072f08u, 0xaedaafb7u, 0x338e32cdu, 0xaff133f3u, 0xa89cb34au, 0x335fafd8u, 0x2d24b473u,
+ 0x349333ccu, 0x34522dc4u, 0xad67b413u, 0x349fb4b7u, 0xb48833bbu, 0xad1cb4a0u, 0x32a9b0a3u, 0x3195b285u,
+ 0xacb5282fu, 0x34c7b2cdu, 0xa8a82c88u, 0x33caacf3u, 0x31fa322bu, 0x33732c76u, 0x2fd3afb5u, 0x2baeab57u,
+ 0x34a734c8u, 0x29ac1fc1u, 0x34a933a1u, 0xb42d3159u, 0x339daed0u, 0x30a73127u, 0xaa822e73u, 0x31eeab6eu,
+ 0x2f19afebu, 0x329e3058u, 0x30a5aeebu, 0xb2f3b38eu, 0xb445aa59u, 0x33b7344cu, 0x327fb455u, 0x2fc11d74u,
+ 0x29e9336eu, 0xb4c6b35eu, 0x34509791u, 0xace12994u, 0x33f8b481u, 0x2fddaea3u, 0xb0ba2f75u, 0xac6ab430u,
+ 0x2fb1af06u, 0xb1543224u, 0xb4692456u, 0x3107b17cu, 0x314f31a2u, 0x27d7b2deu, 0x34b8a9f0u, 0x342e2debu,
+ 0xa78cb2cfu, 0xad133137u, 0x1f312d17u, 0xb24334b9u, 0xab8f342eu, 0x2465a4ebu, 0x3473b081u, 0x348a33e7u,
+ 0x2e8534c5u, 0x32bca76cu, 0x326cb492u, 0x2ea796abu, 0xad0bacd6u, 0xa9dab47fu, 0xb149ab0au, 0x2a76b0dau,
+ 0x3257aeffu, 0x29f7b376u, 0x2456b287u, 0x2d772809u, 0x1dd2a993u, 0xa5aeb220u, 0x2e61ace0u, 0x315da901u,
+ 0xb113a9afu, 0xb45eb058u, 0x325a2622u, 0xa8c7312bu, 0x32fbb1d6u, 0xabbdb441u, 0xac1a32d9u, 0xb07eb356u,
+ 0xb2ae2bafu, 0xa8c02df3u, 0xb0f9ae5cu, 0xabef3235u, 0xa8953094u, 0xa892b3d6u, 0xa67a1cb9u, 0x33e33212u,
+ 0x2a5c3182u, 0x2d8e265eu, 0x31fdb013u, 0xb07832dau, 0x32b233bcu, 0x33daa9cdu, 0xb488b34au, 0x2907add8u,
+ 0x343eadb4u, 0xb033b0b6u, 0xb4c32d2du, 0xacb5305eu, 0x2f5a3496u, 0xb3d2b44cu, 0x3458b3e8u, 0xa98f2f11u,
+ 0x332db1c0u, 0x2ff0b1bbu, 0xafe3310eu, 0xb1521a0au, 0x3477aca6u, 0x32a82f6bu, 0xb103263cu, 0x0d953016u,
+ 0xaabab234u, 0x30852d2eu, 0x3338b0eeu, 0xb065aca0u, 0x31d7b1dbu, 0xa11aad9au, 0x2f25b152u, 0xae2b3165u,
+ 0xb4823446u, 0x2ad2b30cu, 0xb0c92fbcu, 0xaef92827u, 0x2e1e2827u, 0xa6ce348au, 0x29fe2db1u, 0xb0beaf02u,
+ 0xb1a134afu, 0xb2c93193u,
};
-// 256 uint16 values (raw f16 bits)
-static const uint16_t kCnnV3ExpectedEnc0U16[256] = {
- 0x3c3fu, 0x0000u, 0x2aeeu, 0x3cdfu, 0x0000u, 0x0000u, 0x3a34u, 0x0000u,
- 0x33e1u, 0x251du, 0x29e7u, 0x3dd0u, 0x0000u, 0x3996u, 0x2e7du, 0x3847u,
- 0x259bu, 0x29a6u, 0x3a17u, 0x0000u, 0x3022u, 0x0000u, 0x3c4bu, 0x3c15u,
- 0x0000u, 0x0000u, 0x38e0u, 0x3a98u, 0x0000u, 0x37dbu, 0x0000u, 0x0000u,
- 0x0000u, 0x0000u, 0x0000u, 0x4027u, 0x0000u, 0x393cu, 0x0000u, 0x3c3bu,
- 0x0000u, 0x31c4u, 0x3918u, 0x3f6fu, 0x0000u, 0x0000u, 0x0000u, 0x3c35u,
- 0x0000u, 0x0000u, 0x0000u, 0x403eu, 0x0000u, 0x32b6u, 0x0000u, 0x4008u,
- 0x3440u, 0x0000u, 0x0000u, 0x4003u, 0x0000u, 0x0000u, 0x0000u, 0x3d6bu,
- 0x0000u, 0x0000u, 0x0000u, 0x4115u, 0x0000u, 0x0000u, 0x0000u, 0x3bcdu,
- 0x30acu, 0x301eu, 0x3a8eu, 0x40e1u, 0x0000u, 0x0000u, 0x2dc0u, 0x401au,
- 0x0000u, 0x0000u, 0x3638u, 0x3df2u, 0x0000u, 0x3c65u, 0x0000u, 0x3feau,
- 0x2d79u, 0x0000u, 0x2e52u, 0x3f56u, 0x0000u, 0x0000u, 0x0000u, 0x3e3fu,
- 0x34d0u, 0x0000u, 0x0000u, 0x3c46u, 0x38b0u, 0x3324u, 0x0000u, 0x4018u,
- 0x0000u, 0x3385u, 0x0000u, 0x408du, 0x31ddu, 0x3585u, 0x40bau, 0x4009u,
- 0x0000u, 0x2fd2u, 0x0000u, 0x4147u, 0x3baau, 0x0000u, 0x0000u, 0x3c42u,
- 0x0000u, 0x0000u, 0x3378u, 0x3fc6u, 0x30cbu, 0x0000u, 0x3978u, 0x3440u,
- 0x0000u, 0x0000u, 0x0000u, 0x38eeu, 0x0000u, 0x0000u, 0x0000u, 0x4117u,
- 0x0000u, 0x0000u, 0x0000u, 0x4089u, 0x0000u, 0x3647u, 0x0000u, 0x43cfu,
- 0x3752u, 0x2d2bu, 0x0000u, 0x3c2bu, 0x0000u, 0x3615u, 0x39cau, 0x0000u,
- 0x0000u, 0x0000u, 0x0000u, 0x3e2du, 0x0000u, 0x0000u, 0x0000u, 0x3e18u,
- 0x0000u, 0x0000u, 0x0000u, 0x3d99u, 0x2ca5u, 0x0000u, 0x0000u, 0x3d64u,
- 0x0000u, 0x2b7fu, 0x0000u, 0x3f9eu, 0x0000u, 0x0000u, 0x0000u, 0x4133u,
- 0x0000u, 0x0000u, 0x0000u, 0x3fc4u, 0x0000u, 0x0000u, 0x0000u, 0x3c91u,
- 0x0000u, 0x2a5du, 0x0000u, 0x4166u, 0x0000u, 0x0000u, 0x0000u, 0x4089u,
- 0x3165u, 0x0000u, 0x0000u, 0x3f6eu, 0x0000u, 0x0000u, 0x358du, 0x417fu,
- 0x0000u, 0x356cu, 0x0000u, 0x4243u, 0x3c04u, 0x0000u, 0x0000u, 0x406bu,
- 0x0000u, 0x315bu, 0x0000u, 0x40b7u, 0x0000u, 0x34beu, 0x0000u, 0x4108u,
- 0x0000u, 0x390au, 0x2607u, 0x408fu, 0x0000u, 0x0000u, 0x0000u, 0x3b05u,
- 0x3407u, 0x0000u, 0x0000u, 0x3d13u, 0x0000u, 0x33b5u, 0x0000u, 0x3dafu,
- 0x0000u, 0x0000u, 0x0000u, 0x3d80u, 0x0000u, 0x2f2fu, 0x0000u, 0x3d4cu,
- 0x0000u, 0x0000u, 0x0000u, 0x416eu, 0x0000u, 0x0000u, 0x0000u, 0x402au,
- 0x0000u, 0x3b06u, 0x0000u, 0x3f77u, 0x0000u, 0x37fbu, 0x0000u, 0x4060u,
+// enc0: 8ch rgba32uint → W*H*8 f16 values
+// 512 uint16 values (raw f16 bits)
+static const uint16_t kCnnV3ExpectedEnc0U16[512] = {
+ 0x0000u, 0x0000u, 0x35f8u, 0x3d38u, 0x0000u, 0x3482u, 0x3b5fu, 0x395fu,
+ 0x0000u, 0x0000u, 0x3c07u, 0x3e43u, 0x0000u, 0x3596u, 0x3a1au, 0x0000u,
+ 0x0000u, 0x0000u, 0x3833u, 0x3683u, 0x0000u, 0x0000u, 0x3a89u, 0x0000u,
+ 0x0000u, 0x0000u, 0x3652u, 0x0000u, 0x0000u, 0x2d9eu, 0x396fu, 0x3b94u,
+ 0x0000u, 0x0000u, 0x38cau, 0x0000u, 0x0000u, 0x3778u, 0x39bdu, 0x3d2cu,
+ 0x0000u, 0x0000u, 0x3499u, 0x3d27u, 0x0000u, 0x3d28u, 0x3a27u, 0x34b7u,
+ 0x2f46u, 0x0000u, 0x288bu, 0x3d16u, 0x0000u, 0x346bu, 0x39eau, 0x380au,
+ 0x0000u, 0x0000u, 0x0000u, 0x0000u, 0x0000u, 0x3847u, 0x3b39u, 0x391du,
+ 0x0000u, 0x0000u, 0x0000u, 0x3db2u, 0x0000u, 0x32b1u, 0x0000u, 0x3c0eu,
+ 0x0000u, 0x0000u, 0x3605u, 0x41ccu, 0x0000u, 0x387bu, 0x0000u, 0x3a0cu,
+ 0x0000u, 0x0000u, 0x0000u, 0x4052u, 0x0000u, 0x0000u, 0x0000u, 0x38aeu,
+ 0x0000u, 0x3c4eu, 0x0000u, 0x4098u, 0x0000u, 0x0000u, 0x3819u, 0x3808u,
+ 0x0000u, 0x2e82u, 0x0000u, 0x3a36u, 0x0000u, 0x28d2u, 0x0000u, 0x3cf2u,
+ 0x0000u, 0x338fu, 0x2e75u, 0x3e0bu, 0x0000u, 0x0000u, 0x0000u, 0x0000u,
+ 0x0000u, 0x3a58u, 0x0000u, 0x3ec8u, 0x0000u, 0x3f9eu, 0x0000u, 0x3c8du,
+ 0x0000u, 0x0000u, 0x3020u, 0x3a66u, 0x0000u, 0x0000u, 0x361du, 0x0000u,
+ 0x0000u, 0x0000u, 0x3461u, 0x3f2eu, 0x0000u, 0x0000u, 0x0000u, 0x3c54u,
+ 0x0000u, 0x0000u, 0x0000u, 0x4086u, 0x0000u, 0x394cu, 0x0000u, 0x3945u,
+ 0x0000u, 0x0000u, 0x0000u, 0x3e53u, 0x0000u, 0x30a6u, 0x0000u, 0x3df7u,
+ 0x3985u, 0x2ca1u, 0x0000u, 0x41b9u, 0x0000u, 0x0000u, 0x3c0bu, 0x3a7cu,
+ 0x360eu, 0x0000u, 0x0000u, 0x3defu, 0x0000u, 0x38dcu, 0x0000u, 0x3821u,
+ 0x0000u, 0x2ceeu, 0x0000u, 0x3db1u, 0x0000u, 0x3936u, 0x0000u, 0x3963u,
+ 0x2bffu, 0x0000u, 0x0000u, 0x3c6au, 0x0000u, 0x3351u, 0x0000u, 0x36ccu,
+ 0x0000u, 0x0000u, 0x0000u, 0x2df7u, 0x0000u, 0x3a2fu, 0x392eu, 0x0000u,
+ 0x0000u, 0x0000u, 0x0000u, 0x393eu, 0x0000u, 0x0000u, 0x0000u, 0x2f2du,
+ 0x0000u, 0x2dc9u, 0x3838u, 0x3a4au, 0x0000u, 0x33fau, 0x0000u, 0x3d8bu,
+ 0x0000u, 0x0000u, 0x0000u, 0x3fdbu, 0x0000u, 0x3d14u, 0x3415u, 0x3cdau,
+ 0x365bu, 0x0000u, 0x0000u, 0x3c96u, 0x0000u, 0x3981u, 0x3540u, 0x3773u,
+ 0x0000u, 0x3860u, 0x0000u, 0x38fbu, 0x0000u, 0x358cu, 0x38cau, 0x3c02u,
+ 0x0000u, 0x0000u, 0x0000u, 0x4072u, 0x0000u, 0x349bu, 0x363fu, 0x3d97u,
+ 0x0000u, 0x0000u, 0x0000u, 0x3e0eu, 0x0000u, 0x3838u, 0x3153u, 0x4168u,
+ 0x392du, 0x0000u, 0x2e0du, 0x3332u, 0x0000u, 0x3580u, 0x3c0au, 0x0000u,
+ 0x3084u, 0x0000u, 0x0000u, 0x3e45u, 0x0000u, 0x3960u, 0x0000u, 0x3906u,
+ 0x0000u, 0x0000u, 0x0000u, 0x404cu, 0x0000u, 0x0000u, 0x0000u, 0x3c9eu,
+ 0x36d6u, 0x3c01u, 0x0000u, 0x4098u, 0x0000u, 0x2d2du, 0x384bu, 0x32a5u,
+ 0x0000u, 0x0000u, 0x0000u, 0x3ad5u, 0x0000u, 0x3b5au, 0x38b8u, 0x2eeeu,
+ 0x0000u, 0x3829u, 0x0000u, 0x4026u, 0x0000u, 0x0000u, 0x3c89u, 0x324cu,
+ 0x0000u, 0x0000u, 0x3537u, 0x3b1au, 0x0000u, 0x3a4du, 0x36e4u, 0x3dd5u,
+ 0x351fu, 0x0000u, 0x343du, 0x3d90u, 0x0000u, 0x3d1cu, 0x0000u, 0x3c39u,
+ 0x0000u, 0x0000u, 0x0000u, 0x3479u, 0x0000u, 0x0000u, 0x3924u, 0x3b23u,
+ 0x0000u, 0x0000u, 0x3814u, 0x3b8eu, 0x0000u, 0x0000u, 0x0000u, 0x3caeu,
+ 0x0000u, 0x2b1au, 0x0000u, 0x3e9eu, 0x0000u, 0x3cdeu, 0x39d5u, 0x3c24u,
+ 0x2917u, 0x0000u, 0x0000u, 0x401bu, 0x0000u, 0x3891u, 0x3d51u, 0x0000u,
+ 0x0000u, 0x0000u, 0x0000u, 0x381du, 0x0000u, 0x3bd6u, 0x303bu, 0x3c46u,
+ 0x0000u, 0x0000u, 0x0000u, 0x3e2cu, 0x0000u, 0x0000u, 0x0000u, 0x3a03u,
+ 0x0000u, 0x24d3u, 0x0000u, 0x3feau, 0x0000u, 0x0000u, 0x0000u, 0x37cbu,
+ 0x0000u, 0x2dacu, 0x0000u, 0x40b0u, 0x0000u, 0x3a84u, 0x3bb0u, 0x3c68u,
+ 0x0000u, 0x0000u, 0x3600u, 0x3871u, 0x0000u, 0x0000u, 0x0000u, 0x377eu,
+ 0x0000u, 0x0000u, 0x0000u, 0x3c08u, 0x0000u, 0x0000u, 0x390au, 0x338bu,
+ 0x39ddu, 0x0000u, 0x0000u, 0x4047u, 0x0000u, 0x3d25u, 0x0000u, 0x3e6bu,
+ 0x0000u, 0x2f9cu, 0x331fu, 0x4008u, 0x0000u, 0x3dacu, 0x0000u, 0x3baeu,
+ 0x0000u, 0x0000u, 0x0000u, 0x37c3u, 0x0000u, 0x0000u, 0x392au, 0x3a02u,
+ 0x0000u, 0x349fu, 0x3993u, 0x3c83u, 0x0000u, 0x407du, 0x0000u, 0x3865u,
+ 0x2524u, 0x0000u, 0x0000u, 0x4015u, 0x0000u, 0x0000u, 0x3615u, 0x4083u,
+ 0x0000u, 0x338cu, 0x38b5u, 0x3ba1u, 0x0000u, 0x3a01u, 0x3890u, 0x3e4fu,
+ 0x0000u, 0x2b96u, 0x0000u, 0x31fbu, 0x0000u, 0x0000u, 0x3a31u, 0x0000u,
+ 0x0000u, 0x0000u, 0x0000u, 0x3960u, 0x0000u, 0x3ce2u, 0x2f69u, 0x3ad2u,
+ 0x0000u, 0x31e5u, 0x0000u, 0x3f45u, 0x0000u, 0x401au, 0x0000u, 0x3a5bu,
+ 0x0000u, 0x0000u, 0x0000u, 0x3c62u, 0x0000u, 0x3fceu, 0x0000u, 0x0000u,
+ 0x0000u, 0x0000u, 0x0000u, 0x3c3bu, 0x0000u, 0x3f32u, 0x0000u, 0x0000u,
+ 0x39f1u, 0x0000u, 0x0000u, 0x3eabu, 0x0000u, 0x3ed2u, 0x3066u, 0x0000u,
+ 0x0000u, 0x3a6bu, 0x0000u, 0x3bc6u, 0x0000u, 0x41e9u, 0x0000u, 0x3c93u,
+ 0x0000u, 0x0000u, 0x0000u, 0x3f39u, 0x0000u, 0x41b8u, 0x0000u, 0x0000u,
+ 0x0000u, 0x0000u, 0x0000u, 0x3e10u, 0x0000u, 0x3310u, 0x2701u, 0x35d3u,
};
-// kCnnV3Dec1HW = (W/2) x (H/2) = 4 x 4
-// 64 uint16 values (raw f16 bits)
-static const uint16_t kCnnV3ExpectedDec1U16[64] = {
- 0x38dcu, 0x3d03u, 0x0000u, 0x39b0u, 0x3965u, 0x3dd1u, 0x30fdu, 0x3adau,
- 0x387au, 0x3c79u, 0x3114u, 0x3c0eu, 0x0000u, 0x3a66u, 0x2ed6u, 0x3816u,
- 0x3a16u, 0x3dbau, 0x0000u, 0x3a4du, 0x3cf6u, 0x3fccu, 0x0000u, 0x3c1cu,
- 0x367bu, 0x3f06u, 0x0000u, 0x3b5cu, 0x0000u, 0x39ecu, 0x3660u, 0x3781u,
- 0x3936u, 0x3accu, 0x0000u, 0x38dbu, 0x3d0fu, 0x3e45u, 0x0000u, 0x38bau,
- 0x3905u, 0x3b8eu, 0x265du, 0x3c1eu, 0x0000u, 0x3881u, 0x2c6cu, 0x0000u,
- 0x3905u, 0x3c23u, 0x0000u, 0x3271u, 0x3837u, 0x35e1u, 0x0000u, 0x0000u,
- 0x3961u, 0x3c10u, 0x0000u, 0x0000u, 0x3594u, 0x3af9u, 0x382cu, 0x0000u,
+// dec1: 8ch rgba32uint half-res → (W/2)*(H/2)*8 f16 values
+// 128 uint16 values (raw f16 bits)
+static const uint16_t kCnnV3ExpectedDec1U16[128] = {
+ 0x0000u, 0x0000u, 0x3c85u, 0x0000u, 0x0000u, 0x0000u, 0x3d0eu, 0x3346u,
+ 0x0000u, 0x0000u, 0x386du, 0x0000u, 0x0000u, 0x3473u, 0x4075u, 0x0000u,
+ 0x0000u, 0x0000u, 0x0000u, 0x0000u, 0x0000u, 0x0000u, 0x415fu, 0x0000u,
+ 0x0000u, 0x0000u, 0x387eu, 0x0000u, 0x0000u, 0x0000u, 0x4021u, 0x0000u,
+ 0x0000u, 0x0000u, 0x0000u, 0x0000u, 0x0000u, 0x0000u, 0x3da5u, 0x3636u,
+ 0x0000u, 0x0000u, 0x3699u, 0x0000u, 0x0000u, 0x0000u, 0x40c6u, 0x0000u,
+ 0x0000u, 0x0000u, 0x0000u, 0x0000u, 0x0000u, 0x0000u, 0x42a2u, 0x0000u,
+ 0x0000u, 0x0000u, 0x35d0u, 0x0000u, 0x0000u, 0x2cc3u, 0x411fu, 0x0000u,
+ 0x33a6u, 0x0000u, 0x0000u, 0x3926u, 0x0000u, 0x38b5u, 0x3fb4u, 0x2ca8u,
+ 0x0000u, 0x0000u, 0x0000u, 0x0000u, 0x0000u, 0x0000u, 0x40aeu, 0x30b4u,
+ 0x0000u, 0x0000u, 0x0000u, 0x0000u, 0x0000u, 0x0000u, 0x3f72u, 0x0000u,
+ 0x0000u, 0x0000u, 0x0000u, 0x0000u, 0x0000u, 0x0000u, 0x3cf4u, 0x0000u,
+ 0x0000u, 0x0000u, 0x0000u, 0x38aau, 0x0000u, 0x393bu, 0x3f32u, 0x3621u,
+ 0x0000u, 0x0000u, 0x0000u, 0x2b74u, 0x3a04u, 0x0000u, 0x3a33u, 0x3780u,
+ 0x0000u, 0x0000u, 0x0000u, 0x2f55u, 0x0000u, 0x35a3u, 0x36ccu, 0x32f2u,
+ 0x0000u, 0x0000u, 0x0000u, 0x2e9eu, 0x31acu, 0x0000u, 0x2e94u, 0x0000u,
};
// 256 uint16 values (raw f16 bits)
static const uint16_t kCnnV3ExpectedOutputU16[256] = {
- 0x3988u, 0x391du, 0x3800u, 0x390au, 0x3800u, 0x39e6u, 0x3800u, 0x3836u,
- 0x3959u, 0x39e8u, 0x3800u, 0x3817u, 0x38c4u, 0x39cbu, 0x3800u, 0x392au,
- 0x3837u, 0x3961u, 0x3800u, 0x3884u, 0x38a4u, 0x391fu, 0x3800u, 0x3800u,
- 0x3943u, 0x38e9u, 0x3800u, 0x3800u, 0x3920u, 0x397fu, 0x3800u, 0x3800u,
- 0x3a53u, 0x3800u, 0x3800u, 0x39deu, 0x393cu, 0x3956u, 0x3800u, 0x3b15u,
- 0x3960u, 0x383cu, 0x3800u, 0x3aa5u, 0x38b9u, 0x3966u, 0x3800u, 0x3a4bu,
- 0x38eau, 0x392au, 0x3800u, 0x3b2fu, 0x38c2u, 0x3800u, 0x3800u, 0x3aafu,
- 0x3a59u, 0x3879u, 0x3800u, 0x3a5bu, 0x3924u, 0x3933u, 0x3800u, 0x38c0u,
- 0x393bu, 0x3800u, 0x3800u, 0x3a0bu, 0x38ecu, 0x385cu, 0x3800u, 0x3b25u,
- 0x3968u, 0x384bu, 0x3800u, 0x39dbu, 0x3800u, 0x3972u, 0x3800u, 0x3b7cu,
- 0x38b9u, 0x3800u, 0x3800u, 0x3b3fu, 0x388eu, 0x3898u, 0x3800u, 0x39d2u,
- 0x38fau, 0x3800u, 0x3800u, 0x391eu, 0x3872u, 0x3966u, 0x3800u, 0x38c1u,
- 0x38c5u, 0x3800u, 0x3800u, 0x3a4au, 0x3a61u, 0x3800u, 0x3800u, 0x3b9cu,
- 0x38edu, 0x3800u, 0x3800u, 0x3b9du, 0x3844u, 0x38a2u, 0x3800u, 0x3b5au,
- 0x3800u, 0x38edu, 0x3800u, 0x3a57u, 0x3800u, 0x3828u, 0x3800u, 0x3ad7u,
- 0x3810u, 0x3800u, 0x3800u, 0x3aa6u, 0x38ceu, 0x38e7u, 0x3800u, 0x3800u,
- 0x3921u, 0x3800u, 0x3800u, 0x3a61u, 0x3a11u, 0x3800u, 0x3800u, 0x3b23u,
- 0x3994u, 0x3800u, 0x3800u, 0x3b95u, 0x3995u, 0x3800u, 0x3800u, 0x3b83u,
- 0x38c6u, 0x3a05u, 0x3800u, 0x3b7cu, 0x3887u, 0x385au, 0x3800u, 0x3b0bu,
- 0x38efu, 0x3800u, 0x3800u, 0x398eu, 0x39edu, 0x38d8u, 0x3800u, 0x381bu,
- 0x3932u, 0x3800u, 0x3800u, 0x3a29u, 0x3992u, 0x3800u, 0x3800u, 0x3ac4u,
- 0x394du, 0x3800u, 0x3800u, 0x3b3bu, 0x384bu, 0x3800u, 0x3800u, 0x3b07u,
- 0x3991u, 0x384cu, 0x3800u, 0x3b38u, 0x392eu, 0x3834u, 0x3800u, 0x3ab9u,
- 0x397fu, 0x3800u, 0x3800u, 0x3948u, 0x38d1u, 0x3800u, 0x3800u, 0x3825u,
- 0x3938u, 0x3800u, 0x3800u, 0x39a1u, 0x3991u, 0x3800u, 0x3800u, 0x3ac0u,
- 0x3998u, 0x3800u, 0x3800u, 0x3adfu, 0x3973u, 0x3800u, 0x3800u, 0x3b7bu,
- 0x39fdu, 0x3800u, 0x3800u, 0x3b0du, 0x3991u, 0x3800u, 0x3800u, 0x3a5du,
- 0x38b6u, 0x3800u, 0x3800u, 0x39cau, 0x38acu, 0x3840u, 0x3800u, 0x3825u,
- 0x3813u, 0x3800u, 0x3800u, 0x398fu, 0x3800u, 0x3800u, 0x3800u, 0x3a33u,
- 0x3800u, 0x3800u, 0x3800u, 0x398eu, 0x3845u, 0x3800u, 0x3800u, 0x3a2du,
- 0x384fu, 0x3800u, 0x3800u, 0x3a2eu, 0x3800u, 0x3800u, 0x3800u, 0x3a3fu,
- 0x3834u, 0x3800u, 0x3800u, 0x39ebu, 0x387eu, 0x3839u, 0x393au, 0x3989u,
+ 0x3800u, 0x3800u, 0x3a22u, 0x3800u, 0x3800u, 0x3800u, 0x3b3du, 0x391fu,
+ 0x3800u, 0x3800u, 0x3b8au, 0x3979u, 0x3800u, 0x3800u, 0x3b53u, 0x38ceu,
+ 0x3800u, 0x3800u, 0x3af2u, 0x38a3u, 0x3800u, 0x3800u, 0x3b4au, 0x389du,
+ 0x3800u, 0x3898u, 0x3addu, 0x3800u, 0x3800u, 0x3800u, 0x3accu, 0x3800u,
+ 0x3800u, 0x3800u, 0x3b36u, 0x3800u, 0x3800u, 0x3977u, 0x3b7du, 0x39a3u,
+ 0x3800u, 0x3800u, 0x3b90u, 0x3971u, 0x3800u, 0x3800u, 0x3bceu, 0x38e1u,
+ 0x3800u, 0x3800u, 0x3be5u, 0x3925u, 0x3800u, 0x3800u, 0x3ac5u, 0x38bbu,
+ 0x3800u, 0x3925u, 0x3b70u, 0x3a5fu, 0x3800u, 0x3800u, 0x3aeeu, 0x3800u,
+ 0x3800u, 0x3800u, 0x3afeu, 0x3800u, 0x3800u, 0x3952u, 0x3bb0u, 0x39fbu,
+ 0x3800u, 0x3800u, 0x3b64u, 0x3800u, 0x3800u, 0x39e9u, 0x3bb4u, 0x38c8u,
+ 0x3800u, 0x3852u, 0x3bd6u, 0x396au, 0x3800u, 0x3808u, 0x3bdfu, 0x3800u,
+ 0x3800u, 0x3826u, 0x3bc8u, 0x3928u, 0x3800u, 0x3800u, 0x3b10u, 0x38e7u,
+ 0x3800u, 0x3800u, 0x3baau, 0x3800u, 0x3800u, 0x3800u, 0x3b11u, 0x3a64u,
+ 0x3800u, 0x3800u, 0x3bccu, 0x3b12u, 0x3800u, 0x3800u, 0x3bd3u, 0x3a4fu,
+ 0x3800u, 0x3800u, 0x3b8bu, 0x3923u, 0x3800u, 0x3800u, 0x3b90u, 0x3927u,
+ 0x3800u, 0x3800u, 0x3ba9u, 0x38d6u, 0x3800u, 0x3800u, 0x3b40u, 0x3800u,
+ 0x3800u, 0x3800u, 0x3b85u, 0x3800u, 0x3800u, 0x3800u, 0x3bd1u, 0x3800u,
+ 0x3800u, 0x3882u, 0x3bc3u, 0x3800u, 0x3800u, 0x3800u, 0x3b2cu, 0x3a92u,
+ 0x3800u, 0x3800u, 0x3baeu, 0x3816u, 0x3800u, 0x3800u, 0x3bb0u, 0x3a63u,
+ 0x3800u, 0x3800u, 0x3b2bu, 0x3acdu, 0x3800u, 0x3800u, 0x3ae6u, 0x393eu,
+ 0x3800u, 0x3800u, 0x3b7bu, 0x3800u, 0x3800u, 0x3800u, 0x3bc9u, 0x3999u,
+ 0x3800u, 0x3800u, 0x3bbau, 0x3863u, 0x3800u, 0x3800u, 0x3bafu, 0x392du,
+ 0x3800u, 0x3800u, 0x3b65u, 0x3853u, 0x3800u, 0x3800u, 0x3b82u, 0x3800u,
+ 0x3800u, 0x3800u, 0x3b73u, 0x3953u, 0x3800u, 0x3800u, 0x3a01u, 0x3800u,
+ 0x3800u, 0x3800u, 0x3b96u, 0x3800u, 0x3800u, 0x3800u, 0x3b3du, 0x3800u,
+ 0x3800u, 0x3800u, 0x3b88u, 0x3b20u, 0x3800u, 0x3800u, 0x3ae2u, 0x395au,
+ 0x3803u, 0x38f7u, 0x3b09u, 0x3a3eu, 0x3800u, 0x3800u, 0x3b58u, 0x399cu,
+ 0x3800u, 0x3825u, 0x3a2fu, 0x3a2fu, 0x38bfu, 0x393eu, 0x38d7u, 0x39e9u,
+ 0x3870u, 0x3800u, 0x3b09u, 0x3800u, 0x3800u, 0x3800u, 0x3a92u, 0x3800u,
+ 0x3800u, 0x3a3au, 0x3800u, 0x3a01u, 0x38eau, 0x3945u, 0x3939u, 0x3800u,
+ 0x3800u, 0x3801u, 0x3856u, 0x3800u, 0x3800u, 0x3800u, 0x385du, 0x3829u,
+ 0x3800u, 0x384du, 0x3828u, 0x383au, 0x3800u, 0x380bu, 0x38beu, 0x3800u,
};
diff --git a/cnn_v3/tools/shaders.js b/cnn_v3/tools/shaders.js
index 36f53c8..6f2176d 100644
--- a/cnn_v3/tools/shaders.js
+++ b/cnn_v3/tools/shaders.js
@@ -1,10 +1,10 @@
'use strict';
// CNN v3 WGSL shaders — matches cnn_v3/shaders/*.wgsl exactly.
-// Weight offsets (f16 index): enc0=0, enc1=724, bn=1020, dec1=1604, dec0=2184, total=2476
-// BN is now Conv(8→8, 3×3, dilation=2): 8*8*9+8=584 weights (was 72 for 1×1)
+// Architecture: enc_channels=[8,16]
+// Weight offsets (f16 index): enc0=0, enc1=1448, bn=2616, dec1=4936, dec0=7248, total=7828
-const ENC0_OFF=0, ENC1_OFF=724, BN_OFF=1020, DEC1_OFF=1604, DEC0_OFF=2184;
-const TOTAL_F16=2476, TOTAL_U32=1238;
+const ENC0_OFF=0, ENC1_OFF=1448, BN_OFF=2616, DEC1_OFF=4936, DEC0_OFF=7248;
+const TOTAL_F16=7828, TOTAL_U32=3914;
// Inlined helpers — prepended to shaders that need them.
const H = `
@@ -41,20 +41,24 @@ fn main(@builtin(global_invocation_id) id:vec3u){
pack4x8unorm(vec4f(m2.g,m2.b,1.,tr)),0u));
}`;
-// Enc0: Conv(20→4, 3×3, zero-pad) + FiLM + ReLU → rgba16float
-// Params (48 bytes): weight_offset u32 _pad×3 gamma vec4f beta vec4f
+// Enc0: Conv(20→8, 3×3, zero-pad) + FiLM + ReLU → rgba32uint (pack2x16float, 8ch)
+// Params (80 bytes): wo u32 _pad×3 gl gh bl bh (vec4f×4)
const ENC0_SHADER=H+`
-struct P{wo:u32,_a:u32,_b:u32,_c:u32,g:vec4f,b:vec4f}
+struct P{wo:u32,_a:u32,_b:u32,_c:u32,gl:vec4f,gh:vec4f,bl:vec4f,bh:vec4f}
@group(0) @binding(0) var t0:texture_2d<u32>;
@group(0) @binding(1) var t1:texture_2d<u32>;
@group(0) @binding(2) var<storage,read> weights:array<u32>;
@group(0) @binding(3) var<uniform> p:P;
-@group(0) @binding(4) var out:texture_storage_2d<rgba16float,write>;
+@group(0) @binding(4) var out:texture_storage_2d<rgba32uint,write>;
+fn fg(o:u32)->f32{if(o<4u){return p.gl[o];}return p.gh[o-4u];}
+fn fb(o:u32)->f32{if(o<4u){return p.bl[o];}return p.bh[o-4u];}
fn feat(c:vec2i,d:vec2i)->array<f32,20>{
if(c.x<0||c.y<0||c.x>=d.x||c.y>=d.y){return array<f32,20>(0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.);}
- let a=unpack2x16float(textureLoad(t0,c,0).x); let b=unpack2x16float(textureLoad(t0,c,0).y);
- let cc=unpack2x16float(textureLoad(t0,c,0).z);let dd=unpack2x16float(textureLoad(t0,c,0).w);
- let e=unpack4x8unorm(textureLoad(t1,c,0).x); let f=unpack4x8unorm(textureLoad(t1,c,0).y);
+ let t=textureLoad(t0,c,0);
+ let a=unpack2x16float(t.x);let b=unpack2x16float(t.y);
+ let cc=unpack2x16float(t.z);let dd=unpack2x16float(t.w);
+ let e=unpack4x8unorm(textureLoad(t1,c,0).x);
+ let f=unpack4x8unorm(textureLoad(t1,c,0).y);
let g=unpack4x8unorm(textureLoad(t1,c,0).z);
return array<f32,20>(a.x,a.y,b.x,b.y,cc.x,cc.y,dd.x,dd.y,e.x,e.y,e.z,e.w,f.x,f.y,f.z,f.w,g.x,g.y,g.z,g.w);
}
@@ -62,41 +66,50 @@ fn feat(c:vec2i,d:vec2i)->array<f32,20>{
fn main(@builtin(global_invocation_id) id:vec3u){
let c=vec2i(id.xy); let d=vec2i(textureDimensions(t0));
if(c.x>=d.x||c.y>=d.y){return;}
- const IN:u32=20u; const OUT:u32=4u;
- var o:array<f32,4>;
+ const IN:u32=20u; const OUT:u32=8u;
+ var o:array<f32,8>;
for(var oc:u32=0u;oc<OUT;oc++){
var s=get_w(p.wo,OUT*IN*9u+oc);
for(var ky:i32=-1;ky<=1;ky++){for(var kx:i32=-1;kx<=1;kx++){
let ft=feat(c+vec2i(kx,ky),d); let ki=u32(ky+1)*3u+u32(kx+1);
for(var i:u32=0u;i<IN;i++){s+=get_w(p.wo,oc*IN*9u+i*9u+ki)*ft[i];}
}}
- o[oc]=max(0.,p.g[oc]*s+p.b[oc]);
+ o[oc]=max(0.,fg(oc)*s+fb(oc));
}
- textureStore(out,c,vec4f(o[0],o[1],o[2],o[3]));
+ textureStore(out,c,vec4u(pack2x16float(vec2f(o[0],o[1])),pack2x16float(vec2f(o[2],o[3])),
+ pack2x16float(vec2f(o[4],o[5])),pack2x16float(vec2f(o[6],o[7]))));
}`;
-// Enc1: AvgPool(enc0) + Conv(4→8, 3×3) + FiLM + ReLU → rgba32uint half-res
-// Params (80 bytes): wo u32 _pad×3 glo ghi blo bhi vec4f×4
+// Enc1: AvgPool(enc0) + Conv(8→16, 3×3) + FiLM + ReLU → 2× rgba32uint half-res (lo ch0-7, hi ch8-15)
+// Params (144 bytes): wo u32 _pad×3 g0 g1 g2 g3 b0 b1 b2 b3 (vec4f×8)
const ENC1_SHADER=H+`
-struct P{wo:u32,_a:u32,_b:u32,_c:u32,gl:vec4f,gh:vec4f,bl:vec4f,bh:vec4f}
-@group(0) @binding(0) var e0:texture_2d<f32>;
+struct P{wo:u32,_a:u32,_b:u32,_c:u32,g0:vec4f,g1:vec4f,g2:vec4f,g3:vec4f,b0:vec4f,b1:vec4f,b2:vec4f,b3:vec4f}
+@group(0) @binding(0) var e0:texture_2d<u32>;
@group(0) @binding(1) var<storage,read> weights:array<u32>;
@group(0) @binding(2) var<uniform> p:P;
-@group(0) @binding(3) var out:texture_storage_2d<rgba32uint,write>;
-fn fg(o:u32)->f32{if(o<4u){return p.gl[o];}return p.gh[o-4u];}
-fn fb(o:u32)->f32{if(o<4u){return p.bl[o];}return p.bh[o-4u];}
-fn avg(hc:vec2i,fd:vec2i)->array<f32,4>{
- let hd=fd/2; if(hc.x<0||hc.y<0||hc.x>=hd.x||hc.y>=hd.y){return array<f32,4>(0.,0.,0.,0.);}
- var s=vec4f(0.);
- for(var y:i32=0;y<2;y++){for(var x:i32=0;x<2;x++){s+=textureLoad(e0,clamp(hc*2+vec2i(x,y),vec2i(0),fd-vec2i(1)),0);}}
- let a=s*.25; return array<f32,4>(a.x,a.y,a.z,a.w);
+@group(0) @binding(3) var olo:texture_storage_2d<rgba32uint,write>;
+@group(0) @binding(4) var ohi:texture_storage_2d<rgba32uint,write>;
+fn fg(o:u32)->f32{
+ if(o<4u){return p.g0[o];}if(o<8u){return p.g1[o-4u];}
+ if(o<12u){return p.g2[o-8u];}return p.g3[o-12u];}
+fn fb(o:u32)->f32{
+ if(o<4u){return p.b0[o];}if(o<8u){return p.b1[o-4u];}
+ if(o<12u){return p.b2[o-8u];}return p.b3[o-12u];}
+fn avg(hc:vec2i,fd:vec2i)->array<f32,8>{
+ let hd=fd/2; if(hc.x<0||hc.y<0||hc.x>=hd.x||hc.y>=hd.y){return array<f32,8>(0.,0.,0.,0.,0.,0.,0.,0.);}
+ var s:array<f32,8>;
+ for(var y:i32=0;y<2;y++){for(var x:i32=0;x<2;x++){
+ let f=unpack8(e0,clamp(hc*2+vec2i(x,y),vec2i(0),fd-vec2i(1)));
+ for(var i:u32=0u;i<8u;i++){s[i]+=f[i];}
+ }}
+ for(var i:u32=0u;i<8u;i++){s[i]*=.25;} return s;
}
@compute @workgroup_size(8,8)
fn main(@builtin(global_invocation_id) id:vec3u){
let fd=vec2i(textureDimensions(e0)); let hd=fd/2; let c=vec2i(id.xy);
if(c.x>=hd.x||c.y>=hd.y){return;}
- const IN:u32=4u; const OUT:u32=8u;
- var o:array<f32,8>;
+ const IN:u32=8u; const OUT:u32=16u;
+ var o:array<f32,16>;
for(var oc:u32=0u;oc<OUT;oc++){
var s=get_w(p.wo,OUT*IN*9u+oc);
for(var ky:i32=-1;ky<=1;ky++){for(var kx:i32=-1;kx<=1;kx++){
@@ -105,97 +118,114 @@ fn main(@builtin(global_invocation_id) id:vec3u){
}}
o[oc]=max(0.,fg(oc)*s+fb(oc));
}
- textureStore(out,c,vec4u(pack2x16float(vec2f(o[0],o[1])),pack2x16float(vec2f(o[2],o[3])),
+ textureStore(olo,c,vec4u(pack2x16float(vec2f(o[0],o[1])),pack2x16float(vec2f(o[2],o[3])),
pack2x16float(vec2f(o[4],o[5])),pack2x16float(vec2f(o[6],o[7]))));
+ textureStore(ohi,c,vec4u(pack2x16float(vec2f(o[8],o[9])),pack2x16float(vec2f(o[10],o[11])),
+ pack2x16float(vec2f(o[12],o[13])),pack2x16float(vec2f(o[14],o[15]))));
}`;
-// Bottleneck: AvgPool(enc1) + Conv(8→8, 3×3, dilation=2) + ReLU → rgba32uint quarter-res (no FiLM)
+// Bottleneck: AvgPool(enc1) + Conv(16→16, 3×3, dil=2) + ReLU → 2× rgba32uint quarter-res (no FiLM)
// Params (16 bytes): wo u32 _pad×3
const BN_SHADER=H+`
struct P{wo:u32,_a:u32,_b:u32,_c:u32}
-@group(0) @binding(0) var e1:texture_2d<u32>;
-@group(0) @binding(1) var<storage,read> weights:array<u32>;
-@group(0) @binding(2) var<uniform> p:P;
-@group(0) @binding(3) var out:texture_storage_2d<rgba32uint,write>;
-fn avg(qc:vec2i,hd:vec2i)->array<f32,8>{
- let qd=hd/2; if(qc.x<0||qc.y<0||qc.x>=qd.x||qc.y>=qd.y){return array<f32,8>(0.,0.,0.,0.,0.,0.,0.,0.);}
- var s:array<f32,8>;
+@group(0) @binding(0) var elo:texture_2d<u32>;
+@group(0) @binding(1) var ehi:texture_2d<u32>;
+@group(0) @binding(2) var<storage,read> weights:array<u32>;
+@group(0) @binding(3) var<uniform> p:P;
+@group(0) @binding(4) var olo:texture_storage_2d<rgba32uint,write>;
+@group(0) @binding(5) var ohi:texture_storage_2d<rgba32uint,write>;
+fn avg(qc:vec2i,hd:vec2i)->array<f32,16>{
+ let qd=hd/2; if(qc.x<0||qc.y<0||qc.x>=qd.x||qc.y>=qd.y){return array<f32,16>(0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.);}
+ var s:array<f32,16>;
for(var y:i32=0;y<2;y++){for(var x:i32=0;x<2;x++){
- let f=unpack8(e1,clamp(qc*2+vec2i(x,y),vec2i(0),hd-vec2i(1)));
- for(var i:u32=0u;i<8u;i++){s[i]+=f[i];}
+ let c2=clamp(qc*2+vec2i(x,y),vec2i(0),hd-vec2i(1));
+ let lo=unpack8(elo,c2);let hi=unpack8(ehi,c2);
+ for(var i:u32=0u;i<8u;i++){s[i]+=lo[i];s[i+8u]+=hi[i];}
}}
- for(var i:u32=0u;i<8u;i++){s[i]*=.25;} return s;
+ for(var i:u32=0u;i<16u;i++){s[i]*=.25;} return s;
}
@compute @workgroup_size(8,8)
fn main(@builtin(global_invocation_id) id:vec3u){
- let hd=vec2i(textureDimensions(e1)); let qd=hd/2; let c=vec2i(id.xy);
+ let hd=vec2i(textureDimensions(elo)); let qd=hd/2; let c=vec2i(id.xy);
if(c.x>=qd.x||c.y>=qd.y){return;}
- var o:array<f32,8>;
- for(var oc:u32=0u;oc<8u;oc++){
- var s=get_w(p.wo,576u+oc);
+ var o:array<f32,16>;
+ for(var oc:u32=0u;oc<16u;oc++){
+ var s=get_w(p.wo,2304u+oc);
for(var ky:i32=-1;ky<=1;ky++){for(var kx:i32=-1;kx<=1;kx++){
let ft=avg(c+vec2i(kx,ky)*2,hd); let ki=u32(ky+1)*3u+u32(kx+1);
- for(var i:u32=0u;i<8u;i++){s+=get_w(p.wo,oc*72u+i*9u+ki)*ft[i];}
+ for(var i:u32=0u;i<16u;i++){s+=get_w(p.wo,oc*144u+i*9u+ki)*ft[i];}
}}
o[oc]=max(0.,s);
}
- textureStore(out,c,vec4u(pack2x16float(vec2f(o[0],o[1])),pack2x16float(vec2f(o[2],o[3])),
+ textureStore(olo,c,vec4u(pack2x16float(vec2f(o[0],o[1])),pack2x16float(vec2f(o[2],o[3])),
pack2x16float(vec2f(o[4],o[5])),pack2x16float(vec2f(o[6],o[7]))));
+ textureStore(ohi,c,vec4u(pack2x16float(vec2f(o[8],o[9])),pack2x16float(vec2f(o[10],o[11])),
+ pack2x16float(vec2f(o[12],o[13])),pack2x16float(vec2f(o[14],o[15]))));
}`;
-// Dec1: NearestUp(bn)+cat(enc1_skip) → Conv(16→4,3×3) + FiLM + ReLU → rgba16float half-res
-// Params (48 bytes): same layout as enc0
+// Dec1: NearestUp(bn_lo/hi)+cat(enc1_lo/hi) → Conv(32→8,3×3) + FiLM + ReLU → rgba32uint half-res
+// Params (80 bytes): wo u32 _pad×3 gl gh bl bh (vec4f×4)
const DEC1_SHADER=H+`
-struct P{wo:u32,_a:u32,_b:u32,_c:u32,g:vec4f,b:vec4f}
-@group(0) @binding(0) var bn:texture_2d<u32>;
-@group(0) @binding(1) var e1:texture_2d<u32>;
-@group(0) @binding(2) var<storage,read> weights:array<u32>;
-@group(0) @binding(3) var<uniform> p:P;
-@group(0) @binding(4) var out:texture_storage_2d<rgba16float,write>;
-fn cat(hc:vec2i,hd:vec2i)->array<f32,16>{
- if(hc.x<0||hc.y<0||hc.x>=hd.x||hc.y>=hd.y){return array<f32,16>(0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.);}
- let qd=hd/2; let b=unpack8(bn,clamp(hc/2,vec2i(0),qd-vec2i(1)));
- let s=unpack8(e1,hc);
- return array<f32,16>(b[0],b[1],b[2],b[3],b[4],b[5],b[6],b[7],s[0],s[1],s[2],s[3],s[4],s[5],s[6],s[7]);
+struct P{wo:u32,_a:u32,_b:u32,_c:u32,gl:vec4f,gh:vec4f,bl:vec4f,bh:vec4f}
+@group(0) @binding(0) var bnlo:texture_2d<u32>;
+@group(0) @binding(1) var bnhi:texture_2d<u32>;
+@group(0) @binding(2) var e1lo:texture_2d<u32>;
+@group(0) @binding(3) var e1hi:texture_2d<u32>;
+@group(0) @binding(4) var<storage,read> weights:array<u32>;
+@group(0) @binding(5) var<uniform> p:P;
+@group(0) @binding(6) var out:texture_storage_2d<rgba32uint,write>;
+fn fg(o:u32)->f32{if(o<4u){return p.gl[o];}return p.gh[o-4u];}
+fn fb(o:u32)->f32{if(o<4u){return p.bl[o];}return p.bh[o-4u];}
+fn cat(hc:vec2i,hd:vec2i)->array<f32,32>{
+ if(hc.x<0||hc.y<0||hc.x>=hd.x||hc.y>=hd.y){return array<f32,32>(0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.);}
+ let qd=hd/2; let q=clamp(hc/2,vec2i(0),qd-vec2i(1));
+ let b0=unpack8(bnlo,q);let b1=unpack8(bnhi,q);
+ let s0=unpack8(e1lo,hc);let s1=unpack8(e1hi,hc);
+ return array<f32,32>(b0[0],b0[1],b0[2],b0[3],b0[4],b0[5],b0[6],b0[7],
+ b1[0],b1[1],b1[2],b1[3],b1[4],b1[5],b1[6],b1[7],
+ s0[0],s0[1],s0[2],s0[3],s0[4],s0[5],s0[6],s0[7],
+ s1[0],s1[1],s1[2],s1[3],s1[4],s1[5],s1[6],s1[7]);
}
@compute @workgroup_size(8,8)
fn main(@builtin(global_invocation_id) id:vec3u){
- let hd=vec2i(textureDimensions(e1)); let c=vec2i(id.xy);
+ let hd=vec2i(textureDimensions(e1lo)); let c=vec2i(id.xy);
if(c.x>=hd.x||c.y>=hd.y){return;}
- const IN:u32=16u; const OUT:u32=4u;
- var o:array<f32,4>;
+ const IN:u32=32u; const OUT:u32=8u;
+ var o:array<f32,8>;
for(var oc:u32=0u;oc<OUT;oc++){
var s=get_w(p.wo,OUT*IN*9u+oc);
for(var ky:i32=-1;ky<=1;ky++){for(var kx:i32=-1;kx<=1;kx++){
let ft=cat(c+vec2i(kx,ky),hd); let ki=u32(ky+1)*3u+u32(kx+1);
for(var i:u32=0u;i<IN;i++){s+=get_w(p.wo,oc*IN*9u+i*9u+ki)*ft[i];}
}}
- o[oc]=max(0.,p.g[oc]*s+p.b[oc]);
+ o[oc]=max(0.,fg(oc)*s+fb(oc));
}
- textureStore(out,c,vec4f(o[0],o[1],o[2],o[3]));
+ textureStore(out,c,vec4u(pack2x16float(vec2f(o[0],o[1])),pack2x16float(vec2f(o[2],o[3])),
+ pack2x16float(vec2f(o[4],o[5])),pack2x16float(vec2f(o[6],o[7]))));
}`;
-// Dec0: NearestUp(dec1)+cat(enc0_skip) → Conv(8→4,3×3) + FiLM + ReLU + Sigmoid → rgba16float
-// Params (48 bytes): same layout as enc0
+// Dec0: NearestUp(dec1)+cat(enc0_skip) → Conv(16→4,3×3) + FiLM + ReLU + Sigmoid → rgba16float
+// Params (48 bytes): wo u32 _pad×3 g vec4f b vec4f
const DEC0_SHADER=H+`
struct P{wo:u32,_a:u32,_b:u32,_c:u32,g:vec4f,b:vec4f}
-@group(0) @binding(0) var d1:texture_2d<f32>;
-@group(0) @binding(1) var e0:texture_2d<f32>;
+@group(0) @binding(0) var d1:texture_2d<u32>;
+@group(0) @binding(1) var e0:texture_2d<u32>;
@group(0) @binding(2) var<storage,read> weights:array<u32>;
@group(0) @binding(3) var<uniform> p:P;
@group(0) @binding(4) var out:texture_storage_2d<rgba16float,write>;
-fn cat(c:vec2i,fd:vec2i)->array<f32,8>{
- if(c.x<0||c.y<0||c.x>=fd.x||c.y>=fd.y){return array<f32,8>(0.,0.,0.,0.,0.,0.,0.,0.);}
+fn cat(c:vec2i,fd:vec2i)->array<f32,16>{
+ if(c.x<0||c.y<0||c.x>=fd.x||c.y>=fd.y){return array<f32,16>(0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.,0.);}
let hd=vec2i(textureDimensions(d1));
- let a=textureLoad(d1,clamp(c/2,vec2i(0),hd-vec2i(1)),0);
- let b=textureLoad(e0,c,0);
- return array<f32,8>(a.x,a.y,a.z,a.w,b.x,b.y,b.z,b.w);
+ let a=unpack8(d1,clamp(c/2,vec2i(0),hd-vec2i(1)));
+ let b=unpack8(e0,c);
+ return array<f32,16>(a[0],a[1],a[2],a[3],a[4],a[5],a[6],a[7],
+ b[0],b[1],b[2],b[3],b[4],b[5],b[6],b[7]);
}
@compute @workgroup_size(8,8)
fn main(@builtin(global_invocation_id) id:vec3u){
let fd=vec2i(textureDimensions(e0)); let c=vec2i(id.xy);
if(c.x>=fd.x||c.y>=fd.y){return;}
- const IN:u32=8u; const OUT:u32=4u;
+ const IN:u32=16u; const OUT:u32=4u;
var o:array<f32,4>;
for(var oc:u32=0u;oc<OUT;oc++){
var s=get_w(p.wo,OUT*IN*9u+oc);
@@ -227,8 +257,6 @@ const DISP_SHADER=`
}`;
// Viz f32: show one channel of rgba16float layer
-// Uniform layout: ch(u32) _p(u32) ox(i32) oy(i32) — 16 bytes
-// ox/oy = texel offset (top-left of view); 0,0 for full-texture vignettes.
const VIZ_F32=`
struct Vu{ch:u32,_p:u32,ox:i32,oy:i32}
@group(0) @binding(0) var t:texture_2d<f32>;
@@ -265,8 +293,6 @@ struct Vu{ch:u32,_p:u32,ox:i32,oy:i32}
// Full G-buffer pack: assembles feat_tex0/feat_tex1 from individual G-buffer images.
// Bindings: albedo(0) normal(1) depth(2) matid(3) shadow(4) transp(5) f0(6) f1(7)
-// All source textures are rgba8unorm (browser-loaded images, R channel for depth/matid/shadow/transp).
-// Uses textureLoad() only (no sampler needed). Matches gbuf_pack.wgsl packing exactly.
const FULL_PACK_SHADER=`
@group(0) @binding(0) var albedo: texture_2d<f32>;
@group(0) @binding(1) var normal: texture_2d<f32>;
@@ -295,7 +321,7 @@ fn main(@builtin(global_invocation_id) id:vec3u){
if(c.x>=d.x||c.y>=d.y){return;}
let alb=textureLoad(albedo,c,0).rgb;
let nrm=textureLoad(normal,c,0).rg;
- let oct=nrm*2.-vec2f(1.); // [0,1] -> [-1,1]
+ let oct=nrm*2.-vec2f(1.);
let dv=ld(c,d);
let dzdx=(ld(c+vec2i(1,0),d)-ld(c-vec2i(1,0),d))*.5;
let dzdy=(ld(c+vec2i(0,1),d)-ld(c-vec2i(0,1),d))*.5;
diff --git a/cnn_v3/tools/tester.js b/cnn_v3/tools/tester.js
index 69f358b..ebe888a 100644
--- a/cnn_v3/tools/tester.js
+++ b/cnn_v3/tools/tester.js
@@ -105,9 +105,9 @@ class CNNv3Tester {
const u32 = new Uint32Array(buf);
if (u32.length < TOTAL_U32) throw new Error(`Too small: ${u32.length} u32, need ${TOTAL_U32}`);
const layers = [
- {n:'enc0',off:ENC0_OFF,cnt:724},{n:'enc1',off:ENC1_OFF,cnt:296},
- {n:'bn', off:BN_OFF, cnt:584},{n:'dec1',off:DEC1_OFF,cnt:580},
- {n:'dec0',off:DEC0_OFF,cnt:292},
+ {n:'enc0',off:ENC0_OFF,cnt:1448},{n:'enc1',off:ENC1_OFF,cnt:1168},
+ {n:'bn', off:BN_OFF, cnt:2320},{n:'dec1',off:DEC1_OFF,cnt:2312},
+ {n:'dec0',off:DEC0_OFF,cnt:580},
];
let html=`<div style="margin-bottom:7px"><b>Size:</b> ${(buf.byteLength/1024).toFixed(1)} KB &nbsp; <b>Weights:</b> ${TOTAL_F16} f16</div>
<table><thead><tr><th>Layer</th><th>Offset</th><th>Count</th><th>Min</th><th>Max</th></tr></thead><tbody>`;
@@ -126,11 +126,11 @@ class CNNv3Tester {
parseFilm(buf) {
const f32=new Float32Array(buf);
- if (f32.length < 776) throw new Error(`FiLM too small: ${f32.length}`);
+ if (f32.length < 1320) throw new Error(`FiLM too small: ${f32.length}`);
let o=0;
- const l0w=f32.slice(o,o+=80), l0b=f32.slice(o,o+=16);
- const l1w=f32.slice(o,o+=640),l1b=f32.slice(o,o+=40);
- this.log(`FiLM MLP: L0(16×5) L1(40×16), ${f32.length} f32`);
+ const l0w=f32.slice(o,o+=80), l0b=f32.slice(o,o+=16);
+ const l1w=f32.slice(o,o+=1152),l1b=f32.slice(o,o+=72);
+ this.log(`FiLM MLP: L0(16×5) L1(72×16), ${f32.length} f32`);
return {l0w,l0b,l1w,l1b};
}
@@ -138,22 +138,24 @@ class CNNv3Tester {
const {l0w,l0b,l1w,l1b}=this.filmMlp;
const h=new Float32Array(16);
for(let j=0;j<16;j++){let s=l0b[j];for(let i=0;i<5;i++)s+=l0w[j*5+i]*cond[i];h[j]=Math.max(0,s);}
- const o=new Float32Array(40);
- for(let j=0;j<40;j++){let s=l1b[j];for(let i=0;i<16;i++)s+=l1w[j*16+i]*h[i];o[j]=s;}
+ const o=new Float32Array(72);
+ for(let j=0;j<72;j++){let s=l1b[j];for(let i=0;i<16;i++)s+=l1w[j*16+i]*h[i];o[j]=s;}
return o;
}
filmParams() {
- const I4=[1,1,1,1],Z4=[0,0,0,0],I8=[1,1,1,1,1,1,1,1],Z8=[0,0,0,0,0,0,0,0];
- if (!this.filmMlp) return {ge0:I4,be0:Z4,ge1:I8,be1:Z8,gd1:I4,bd1:Z4,gd0:I4,bd0:Z4};
+ const I4=Array(4).fill(1),Z4=Array(4).fill(0);
+ const I8=Array(8).fill(1),Z8=Array(8).fill(0);
+ const I16=Array(16).fill(1),Z16=Array(16).fill(0);
+ if (!this.filmMlp) return {ge0:I8,be0:Z8,ge1:I16,be1:Z16,gd1:I8,bd1:Z8,gd0:I4,bd0:Z4};
const v=document.getElementById.bind(document);
const cond=[v('sBP').value,v('sBN').value,v('sAI').value,v('sP0').value,v('sP1').value].map(Number);
const f=this.filmFwd(cond);
return {
- ge0:[...f.slice(0,4)], be0:[...f.slice(4,8)],
- ge1:[...f.slice(8,16)],be1:[...f.slice(16,24)],
- gd1:[...f.slice(24,28)],bd1:[...f.slice(28,32)],
- gd0:[...f.slice(32,36)],bd0:[...f.slice(36,40)],
+ ge0:[...f.slice(0,8)], be0:[...f.slice(8,16)],
+ ge1:[...f.slice(16,32)],be1:[...f.slice(32,48)],
+ gd1:[...f.slice(48,56)],bd1:[...f.slice(56,64)],
+ gd0:[...f.slice(64,68)],bd0:[...f.slice(68,72)],
};
}
@@ -177,6 +179,14 @@ class CNNv3Tester {
for(let i=0;i<4;i++)v.setFloat32(64+i*4,b[i+4],true);
return buf;
}
+ // Params16 (144 bytes): wo u32 _pad×3 gamma[16] beta[16] vec4f×8
+ u16(wo,g,b){
+ const buf=new ArrayBuffer(144),v=new DataView(buf);
+ v.setUint32(0,wo,true);
+ for(let i=0;i<16;i++)v.setFloat32(16+i*4,g[i],true);
+ for(let i=0;i<16;i++)v.setFloat32(80+i*4,b[i],true);
+ return buf;
+ }
// ParamsBN (16 bytes): wo u32 _pad×3
ubn(wo){const buf=new ArrayBuffer(16);new DataView(buf).setUint32(0,wo,true);return buf;}
@@ -330,8 +340,10 @@ class CNNv3Tester {
const mk=(fmt,tw,th)=>this.device.createTexture({size:[tw,th],format:fmt,
usage:GPUTextureUsage.STORAGE_BINDING|GPUTextureUsage.TEXTURE_BINDING|GPUTextureUsage.COPY_SRC});
const f0=mk('rgba32uint',w,h),f1=mk('rgba32uint',w,h);
- const e0=mk('rgba16float',w,h),e1=mk('rgba32uint',W2,H2);
- const bn=mk('rgba32uint',W4,H4),d1=mk('rgba16float',W2,H2),ot=mk('rgba16float',w,h);
+ const e0=mk('rgba32uint',w,h); // 8ch
+ const e1_lo=mk('rgba32uint',W2,H2),e1_hi=mk('rgba32uint',W2,H2); // 16ch split
+ const bn_lo=mk('rgba32uint',W4,H4),bn_hi=mk('rgba32uint',W4,H4); // 16ch split
+ const d1=mk('rgba32uint',W2,H2),ot=mk('rgba16float',w,h); // d1=8ch
// Weights GPU buffer (cached)
if(!this.weightsGPU){
@@ -346,11 +358,11 @@ class CNNv3Tester {
const b=this.device.createBuffer({size:data.byteLength,usage:GPUBufferUsage.UNIFORM|GPUBufferUsage.COPY_DST});
this.device.queue.writeBuffer(b,0,data); return b;
};
- const uE0=wu(this.u4(ENC0_OFF,fp.ge0,fp.be0));
- const uE1=wu(this.u8(ENC1_OFF,fp.ge1,fp.be1));
+ const uE0=wu(this.u8( ENC0_OFF,fp.ge0,fp.be0));
+ const uE1=wu(this.u16(ENC1_OFF,fp.ge1,fp.be1));
const uBN=wu(this.ubn(BN_OFF));
- const uD1=wu(this.u4(DEC1_OFF,fp.gd1,fp.bd1));
- const uD0=wu(this.u4(DEC0_OFF,fp.gd0,fp.bd0));
+ const uD1=wu(this.u8( DEC1_OFF,fp.gd1,fp.bd1));
+ const uD0=wu(this.u4( DEC0_OFF,fp.gd0,fp.bd0));
const dispData=new ArrayBuffer(16);
const dispView=new DataView(dispData);
@@ -366,9 +378,9 @@ class CNNv3Tester {
cp(this.getPack(), bg(this.getPack(), rv(this.inputTex),this.linearSampler,rv(f0),rv(f1)), ceil8(w),ceil8(h));
cp(this.getEnc0(), bg(this.getEnc0(), rv(f0),rv(f1),{buffer:wg},{buffer:uE0},rv(e0)), ceil8(w),ceil8(h));
- cp(this.getEnc1(), bg(this.getEnc1(), rv(e0),{buffer:wg},{buffer:uE1},rv(e1)), ceil8(W2),ceil8(H2));
- cp(this.getBN(), bg(this.getBN(), rv(e1),{buffer:wg},{buffer:uBN},rv(bn)), ceil8(W4),ceil8(H4));
- cp(this.getDec1(), bg(this.getDec1(), rv(bn),rv(e1),{buffer:wg},{buffer:uD1},rv(d1)), ceil8(W2),ceil8(H2));
+ cp(this.getEnc1(), bg(this.getEnc1(), rv(e0),{buffer:wg},{buffer:uE1},rv(e1_lo),rv(e1_hi)), ceil8(W2),ceil8(H2));
+ cp(this.getBN(), bg(this.getBN(), rv(e1_lo),rv(e1_hi),{buffer:wg},{buffer:uBN},rv(bn_lo),rv(bn_hi)), ceil8(W4),ceil8(H4));
+ cp(this.getDec1(), bg(this.getDec1(), rv(bn_lo),rv(bn_hi),rv(e1_lo),rv(e1_hi),{buffer:wg},{buffer:uD1},rv(d1)), ceil8(W2),ceil8(H2));
cp(this.getDec0(), bg(this.getDec0(), rv(d1),rv(e0),{buffer:wg},{buffer:uD0},rv(ot)), ceil8(w),ceil8(h));
const dbg=bg(this.getDisp(),rv(ot),rv(this.inputTex),{buffer:uDp});
@@ -387,7 +399,7 @@ class CNNv3Tester {
// Store for layer viz & redisplay
this.destroyLayerTex();
- this.layerTextures={feat0:f0,feat1:f1,enc0:e0,enc1:e1,bn,dec1:d1,dec0:ot};
+ this.layerTextures={feat0:f0,feat1:f1,enc0:e0,enc1:e1_lo,bn:bn_lo,dec1:d1,dec0:ot};
this.lastResult={ot,itex:this.inputTex,uDp,dispPL:this.getDisp(),w,h};
this.updateVizPanel();
this.refreshZoom();
@@ -442,10 +454,10 @@ class CNNv3Tester {
updateVizPanel() {
const DEFS=[
{id:'feat0', lbl:'Feat', t:'u32',nch:8, ch:['alb.r','alb.g','alb.b','nrm.x','nrm.y','depth','dgx','dgy']},
- {id:'enc0', lbl:'Enc0', t:'f32',nch:4, ch:['c0','c1','c2','c3']},
+ {id:'enc0', lbl:'Enc0', t:'u32',nch:8, ch:['c0','c1','c2','c3','c4','c5','c6','c7']},
{id:'enc1', lbl:'Enc1', t:'u32',nch:8, ch:['c0','c1','c2','c3','c4','c5','c6','c7']},
{id:'bn', lbl:'BN', t:'u32',nch:8, ch:['c0','c1','c2','c3','c4','c5','c6','c7']},
- {id:'dec1', lbl:'Dec1', t:'f32',nch:4, ch:['c0','c1','c2','c3']},
+ {id:'dec1', lbl:'Dec1', t:'u32',nch:8, ch:['c0','c1','c2','c3','c4','c5','c6','c7']},
{id:'dec0', lbl:'Dec0', t:'f32',nch:4, ch:['R','G','B','A']},
];
this.vizDefs=DEFS;
@@ -753,8 +765,10 @@ class CNNv3Tester {
const mk = (fmt, tw, th) => this.device.createTexture({size:[tw,th], format:fmt,
usage:GPUTextureUsage.STORAGE_BINDING|GPUTextureUsage.TEXTURE_BINDING|GPUTextureUsage.COPY_SRC});
- const e0=mk('rgba16float',w,h), e1=mk('rgba32uint',W2,H2);
- const bn=mk('rgba32uint',W4,H4), d1=mk('rgba16float',W2,H2), ot=mk('rgba16float',w,h);
+ const e0=mk('rgba32uint',w,h); // 8ch
+ const e1_lo=mk('rgba32uint',W2,H2),e1_hi=mk('rgba32uint',W2,H2); // 16ch split
+ const bn_lo=mk('rgba32uint',W4,H4),bn_hi=mk('rgba32uint',W4,H4); // 16ch split
+ const d1=mk('rgba32uint',W2,H2), ot=mk('rgba16float',w,h); // d1=8ch
if (!this.weightsGPU) {
this.weightsGPU = this.device.createBuffer({size:this.weightsBuffer.byteLength,
@@ -767,11 +781,11 @@ class CNNv3Tester {
const b = this.device.createBuffer({size:data.byteLength, usage:GPUBufferUsage.UNIFORM|GPUBufferUsage.COPY_DST});
this.device.queue.writeBuffer(b, 0, data); return b;
};
- const uE0=wu(this.u4(ENC0_OFF,fp.ge0,fp.be0));
- const uE1=wu(this.u8(ENC1_OFF,fp.ge1,fp.be1));
+ const uE0=wu(this.u8( ENC0_OFF,fp.ge0,fp.be0));
+ const uE1=wu(this.u16(ENC1_OFF,fp.ge1,fp.be1));
const uBN=wu(this.ubn(BN_OFF));
- const uD1=wu(this.u4(DEC1_OFF,fp.gd1,fp.bd1));
- const uD0=wu(this.u4(DEC0_OFF,fp.gd0,fp.bd0));
+ const uD1=wu(this.u8( DEC1_OFF,fp.gd1,fp.bd1));
+ const uD0=wu(this.u4( DEC0_OFF,fp.gd0,fp.bd0));
const dispData=new ArrayBuffer(16);
new DataView(dispData).setFloat32(4, this.blend, true);
const uDp=wu(dispData);
@@ -784,9 +798,9 @@ class CNNv3Tester {
const ceil8 = (n) => Math.ceil(n/8);
cp(this.getEnc0(), bg(this.getEnc0(), rv(f0),rv(f1),{buffer:wg},{buffer:uE0},rv(e0)), ceil8(w), ceil8(h));
- cp(this.getEnc1(), bg(this.getEnc1(), rv(e0),{buffer:wg},{buffer:uE1},rv(e1)), ceil8(W2), ceil8(H2));
- cp(this.getBN(), bg(this.getBN(), rv(e1),{buffer:wg},{buffer:uBN},rv(bn)), ceil8(W4), ceil8(H4));
- cp(this.getDec1(), bg(this.getDec1(), rv(bn),rv(e1),{buffer:wg},{buffer:uD1},rv(d1)), ceil8(W2), ceil8(H2));
+ cp(this.getEnc1(), bg(this.getEnc1(), rv(e0),{buffer:wg},{buffer:uE1},rv(e1_lo),rv(e1_hi)), ceil8(W2), ceil8(H2));
+ cp(this.getBN(), bg(this.getBN(), rv(e1_lo),rv(e1_hi),{buffer:wg},{buffer:uBN},rv(bn_lo),rv(bn_hi)), ceil8(W4), ceil8(H4));
+ cp(this.getDec1(), bg(this.getDec1(), rv(bn_lo),rv(bn_hi),rv(e1_lo),rv(e1_hi),{buffer:wg},{buffer:uD1},rv(d1)), ceil8(W2), ceil8(H2));
cp(this.getDec0(), bg(this.getDec0(), rv(d1),rv(e0),{buffer:wg},{buffer:uD0},rv(ot)), ceil8(w), ceil8(h));
const dbg = bg(this.getDisp(), rv(ot), rv(this.inputTex), {buffer:uDp});
@@ -807,7 +821,7 @@ class CNNv3Tester {
}
this.destroyLayerTex();
- this.layerTextures = {feat0:f0, feat1:f1, enc0:e0, enc1:e1, bn, dec1:d1, output:ot};
+ this.layerTextures = {feat0:f0, feat1:f1, enc0:e0, enc1:e1_lo, bn:bn_lo, dec1:d1, output:ot};
this.lastResult = {ot, itex:this.inputTex, uDp, dispPL:this.getDisp(), w, h};
this.updateVizPanel();
this.refreshZoom();
diff --git a/cnn_v3/tools/weights.js b/cnn_v3/tools/weights.js
index dde1ed4..2c7b31b 100644
--- a/cnn_v3/tools/weights.js
+++ b/cnn_v3/tools/weights.js
@@ -1,4 +1,4 @@
'use strict';
// Auto-generated by export_cnn_v3_weights.py --html — do not edit by hand.
-const CNN_V3_WEIGHTS_B64='ias6I32xLDG5Masbdq4qIz+xrLQcshe3Ja1drluwb7crtHi38DZ8OL02eTh4Oe44HDTzN381TpwQqDCpCiP2pjipZywPL7CjNipXJc2qwiraJoetwijzphmqfCimJRgsX6tvqeuie6cRqoMpBhvSpbUiWSMFIlqrzCnHDiSiE6zJpYshR6udJMAmdRSkqHMVq6v0J68o6SL3p/mroia3I0uqEKobrMSdOqY5LAmgACqWqjch/SpcopUq6iJkJpEs6CumqH+lYqvLqjUs0ip5oKAkdqq9CTcnfamjJgSp36TmLZAsQLHTotyn27QPs1a0L7RwtOGwerXMswizdDeQOPw4BDj+OB85ADUuNhU4r7QGsAG0b7PsrP6ynrUosfuzALX4tNi3yrbNtDu2RbhRtSy4/TTgNrE1DDV2NyE2ijIMNqIyMibho5KvrSA1olmnQy31LbSm1KZPLHgpYqytLEKsuqVeHq+sWTWQObez6ze1PLI0krIhOdgsdaEKN6C2EzOBOzQ2WLePNx8wMrTRMiW4J7LMOBcvrrq/LkWtdq1vLIIgdKqAJSAvxa0SKXsoR6/QnXosIarzLJwjxa/VKmclhabVKL4lTSo7LNopdaspqzIp3Z4oqU2pNyxpqQ6fWKvCq3cmtqksmaEgF6hfoI2fc6rSJ0SoxxpYoRGfvijcqTqmEiNpLHKa6yDtKqUsgaj5IVsnopQupcSsNabjKqkq4SuGqjWIeavyItemJhvpK2WsI6snKwKruqijqY+sZzM/NsGzjDWZOTU0oKnJNmMtI7DBMSyx4C9JOZs0UbTvMl+nd7OMJYa1z6jONRQwy7d/EQir56pzMUqvhCzONFIj8rJVJZOvU60gLiuwMyxgNSwh9rPcLpiuDazUMWGs/6iENDsw1bSUKHCvsLFKL4UtSqloIxUw57DtoHEtYayELMEiLCDlqpymv6nIqhWqIq+9sfOwH7JjtLG0HrJAsLety7RKtbm0ibSysym0ebE1r4cdSzRJNYw0LzWVN6s3ZThgOOA4m6uKLSqsuCuOM1sucrK7pp+qPLF2J8+sIR2BM5uaH7ARLzGtn6eSK90npywwK1aoZKV6rC0qyxREq6asNCmZJVGo2is9qZ2VhSpLmYosHiZ0o3WWqCJ/qSAoXiY7Kk0owJ+QK+SoR6vnptEV3apbqVglNizUoiWigqZupOKdCayWKW8r1imMqvYiyyu6pOYr8imLKd4pGyyAK0ekOagyp1Mo36/ur8q0wa+Zsrmx567Er3e0FbXLtKO0Fa/1suazq611rHWupzVoNuY12ze3OLk32DgVOac5N7a1s5G0SrVpsZO0VLXJsvGxVLQ1tB20YbWGsTawL7JDsMSwtTXBN0E3AzhuOGM4GDcVOYM5hrGVLaSmV6DHMwowdLO9Lw2wtiHHp3ms1SUyKiccY6Z6qWciQhnhqO8o1CY9rNclpKwpKoYjlyDzo/WrwiQdpcCrXiqmqh4qgZ0TJvynaCzGnUCsTyWfLL+nqSpMHYkkOaTpmian9iiRmL0q4SWtJd4n9KiXqQKmrhVtKAusHZ/KKJGqISYXJxUhXCkEmKGsVSorpK0sDKxUpygmHCA+KWqquyfzqoqosabFqi2hoCsZp7or5ilzKtsmrxmCo2is0ycBJjUsvKyzrD8o9KkGq+aqtSz4qMYlcKTuKLOo4yClIAshSKzLJIKoDqCYpxwshqwSingh55sqkSQrgqa+JH4qQSVRKBId0JUYrDWdciwRIderTKW9LGasCSyuKpKnrKriqomXayzqJn6oMh5OIUwr4R1EKOSkaqbBqnUqoJ0mK9KfXiugH8ckgyyIqsKr8ic8ojklvCYcq86nKiaoKpeooqusoQ+s3Si6oQQfvaCgK88qviqtq0ShcBhLJxKstaXCqgEYTSrrnM2goK9ft+m5NCcrHicvOjDPoWMztSiasJ4lU7U/N0A2rbSlLO426i/psy60Qrgtti61kLDcsEmzRqfYNbMbVzc/sGYnTq6LrjIsIzDgrZAqQa+/sEcpE5jkLrCuya6uqGqkabBbIWunbyzBroyw9JkIMDkp7K7NryUoiyrIKC2wqCtvrUIo8q2ArdOtkCrdrgKuRLFXMGwtiC8+r78wSrCGsIwwRy7MMPUvKbCZqmWwzSkqIdEwOyWYKQCxX621LHmwDCFhK6UwERn4LFcrOS5SrWSnEa7qqMUmnC9uin+u1jBzN6stMjaAsf24abC+MJuxVbGZNIKrZDiBsM+xbzVOLZKzrzQtOsW3rqkiMUC8PKnlwZW9TDEMpLem4DBfrRwvQ61PLnKg7q4KtiAy0TTtqWg0jTFzMn01kjGFtX4vvjVZuW0qJTPVtNIwxzNzuNMlF6uWudsZgLaQuBW0qLrNJLEweDDdplgkAa5TrAgxbKa5MyoznTp4K7Ez3zhhqZMxHTT4Mvkl37LXr0KmILF+NrIsA6oCvMexUT3Pvw69sjjowffAyLEPsEgfMZOMqmkwB5iuptgv7K8KuKyxFaBZupe2P7G2u6a2ma6ELwc0lbEeL6I0LiwKLrM0+SY1O3Q5ITdVsJ+zSTN8vOe0iCZBI8cwTjFAKo0vYiyhrCOtvLB6rAgyfTb8tHoc3KuOrycy1TSSONswxrWINDOx2bfWLFavDrH8MsY3G7QppUM7JsAuOOs8o79nozyvW6+3LLcwZSLboPKrKCtbJH+ovKt6tV0q8rKWtFut2rgSNaozfrzPvIg0HTXvJLkvViTFL4ktfjDbNlsmIbUYKXc1ka/ROE23XLVmKlQxkat9MeK0MrAtN982nrgRMa43YjIXtUU276BsuD24Izj4OXMwQLSLLl0x8zI+uJ4xp6tctGg0RDBsNyM0VrpHuRO71DEmKaq8tadjIzy0lb0krDu6QTZxsBKudbkrtIc4GjiHORQ9Bjf3Nzk6PzVQOWk8o6gHIxsuXS0tMoczyTLoJoE1kDSdOC07MzO/N5w4XCihM+c4sDJBLls4EjFVrJ4y/jScMKsxc67nqHa2h7OcsvG3TbdbtpG47rNnNOA0aDBGNnQ1KjH9N1I3/LQ/uGS8brktur+6ji3mta6pir0Ruti9qLLMLla637i7tuS7PbESrfkf2TRVKBEsgTYppricvSxdqDkgOawrKqIoHqzmqOQpGStPKkqpIiQVodasTKhUKAgq9zEhJE6vRi1PKZUrfCyZLkk2Oi5BMewsW7FBsWk0JC+BrkouNq5VqZayjiqpLauwtTDnLtKzobX7IqOmkKfUKuUr+LaWsiuyHqupNLCt9KpBL6Om1zEUM+g2fLvgtvq8YblxuSy5I7gGtLK8UD0KOaA6jqwaLbo87zE+uZ6p5Tdnscu3TTZnMd80/y1Kp7q4HDFHMIWTSjKPLQgxR7ImtPC0DrPiNpMxz7mjtPazZLIvr2QuzTimLj23wjT1phOylzpkpU+04bsNuwc5n7kau52+p7T4uku5TbF0Phw+qzF5PPM70SkuP0M+pyYCKpcxFyBerTgzCK51sSg0VCi6KampeiwHIIYpZirjKU0jKqUjH/weJKAnI14rr6sapGgsHTPsM4Kw96J2NEC0NzYpNa223zYMN2s2I7TeMxStDyfENpMx6TBTsiuzDTEHrUYoJaXBrHirNi4PNQ46obXxuGOwSilyqPY4RLVetoe0dzDtsN21ArFet4+04z7PNBU30DsLuW67QDAuvCS8BbgtMfY277XitNkzaB9WLho5kzccs2+4ZTkBKyG2lzufMGW3Aj0OprcxQDi9tjukFTeYtW4vCLyhNni8lrqLPCa4OLwPNGq9IDUotpG7p6gmuZe8HDVNukq7xLzwpJK8kDDEO2M0Z7AGOJIx6r1DJHG0W7TgOcMwr7lRNQCy6yrjKqO3p6mYLs2zKCBsKl+xPqxPongsKKQuK3MsbiwUrdsrVCgqJv8s6yq5pBAiYyRnrGylCLUcsKm3K6mKrlG00zZhNxAxrTTtIjqwjikDsKI1lCyqs24h3ahiKTa3jbCRsO22ECQCr963R7LqNOqqva5jN1st9ykBOHgf+bLKmWiwFbM1pXOxCzi/q8M1ni+ssbAwwzTgMIk3kjwCPFY9MbFXo3C09bSksnK0mK5hsCesTbTVtaS07DTEpkYoCTydOlI89K0ltYeyJSymrHusIDhZNkA2/zBFMymwBLE/rBmxzLrVuTG79661ItEiNCxTsBs2cjqIMu81ma4VN4829rDzp6YwzLtavGO6AbtIMZ63C7u1LO+5w76EuQ693Le6Mlom3LT1NIqi8rJFNtKliSjXpxefyiIOKE8fnSj0KpusFSCWqCklBiQjKeMj5ayBnYYphiUiqFsw4i8CtHU0dqL2LEY1DSl4M5g2GbZosbUwyTFFMyy027SQri+u9pXuMTmvpbDYr0Gxf6iOJxyseaWKsM4jErSosJit4qsrLHmyEjA2I+MyBCkLOJI0PzNYMT41OTRSMOwx0jUIo80wcTFzpDIsAzMFtm20ALNms8mrIalPtDexMLKCNqguODZENfQukzQ7NKMJwi7CKzyzD7bWNKUrd6SGN641bzTyMd6xjRwYrM6z67P+MUKchZvgJ8+2C6pHsh+48a8PrWm1IbRdsD2wuK/Nr8eva7J1rae2t7ObLr4dca47q6Wls6b5rGoqOS68r8mxW7boIPWvwbLToXCuwbDcNXk0MTOtMwgr8ShONF8x5TF5tp+uTLZQtYevpLRutIeW+65GpysyOzY1tZenHiiZt0m1CrUkr/8uYZIMMDU1/jKrtIUlmSsnnDw21yp6L0c44DD0L5k1MTNBMJow5jH/MbYyPC96LRI2XjUyq28mvRpBrL6l2a4GJRWWS65srz2yGraAKqSyh7F8qmGhIbLmNTo0BzPWMjkurShmNN8wpzFbtn2uL7ZOtcGu1bSltEWewK/4rI0ycDaYtB2qkZ0zt4m1b7RFsykvCCxEMQw2NDPFtGoxvq7BJas2iR4iL8k30zEJL6k14jMbLnwvDi0JLaIvHy/6KuA0bTRFI0gtqyxloqitLRjaKbYk8izCNeo1LzQyOEE3NjcMNtU35DXXLwAz4jCJOAY6njTkOEQ43zA1Ns43ITfhN8A3BDjrNGs2zTUBM7w1mTT8Nh44yTc7Nl03TDcNNc4zdTXQMgU0KjO8NIUyZTQ4OXU4TzjcOPM3xDjLOEQ5wThuMhwphTHFJO4xQDNVL0kxOy8wLzgiIas2LlEvay2YHyittiW9sto0jjFhOw==';
-const CNN_V3_FILM_MLP_B64='3JR3PmW94L1BYem+rRCuvtBlqz0Wsa49ZSGRPVixHjxR64y+EqMcvtQAMjypcGE+37fhvtL8lD7HpeW++M6cPvIjYr4f1j8+OWUovpNyqr6x5VE9cBP4POZUSzxw/bg83+JnPX5qAr8zwDK+TzKQPWkdmT5fmsC+ZnXMvi8piz3DTba+DqKNviGjvb7BAyW+y7ajvUgKzbyCNrK96745vj5/p70s9GC9i6GKvKQiCr3Z4c298O8FvxLSTb5I0Qo+HuiqvcieCT6DHGy8sd7GuwdUx7s5zA68eC27u48Lv71xkLq+HtTNvsV13b5meBE+O0eAPHpPmTzOJY088l9vPDUwrjyf0MG+NX4dvseuz77AXnI93E3uPSkEYL6XE5O+K95EPqzOVL6lAeq9yk39vCEhRr4QcIi+KhVZvogH870TRLG+acdtvvqzB7/jigO//eUMP+Lzr74MwKq80fz9vHYujD6E6K2+IwUXP+3jjr4KnVQ++F1cvvz9Qb6snIq7lurPPKbmCj6ZqnA+t5s7vgDEsj6NZaM+rq89PvmRhD7JNWM+GUGHPVz1ijziwBA+hMBIPVA+trwvHM89wFALPmqfyz1bBzs+s4BAvhyRur2zA7o+ahQPvSyNfbsT5E09+WJ9Pkl/+L1LkoQ+fqxZvjvHFz4AcIe5KRgQPqFuJj5eOrQ+Sc2WPsEWi73F3BS+lgMtP0H/AD74d8O7+PNpPt7zGz+GB3e+VmggPxhtUD7bZgA/MvQ6PiZdTL4079s+MJFfvdSUmb2MaRE+PHRXPmQl0L0a2jS+EJsUPVj5r71eclY+igMBPiCd3b1qwy++HHcjvrCyBD6wM3C9yjhpPsmzBL2gpqw8xE5qPm9A/z1jkBA++eOSPe/oHT3QBTu9P7UMPgoWXT7B9W8+Bv9MvrdjSTxANMS7xJgzPvKzlz3uup09gGQTPuom+z0AxP09XsYNvmSFbLxv+3U+1wZzvlnPCb45cGU9ujnsviDN/zwg8+69KMKqPRl+hz6ddUG+6VukvfT1ID5rWR++w0nmvd7YI7+J4RO9T5ZuvQWiCr43WRO/JIsSPkaEJL/oZAS9wH0qv+pOLT7qoni+M9oSv4Dx4TuAkyW7mEFYPZhRib3QUqQ9hPVJvrCL5TwA3gC97EwGPhSV8L3k4x2+gMNXPJaPD77Q5y29pCWUvaS0r71+GIs9ECCDPpNcYjwyNvU8fwwRv8zjb74fQzE+/BFVvg3a/L5xX0Y++qYCv3ALgb1e4SS/QA19vh8MzL3fI4q+WBK9PZaOI75gG0i+TBvOvcINGT7gwTE83Ep8vg17e760kWs+UodWPpZrdD5Ms+49sENrPsYZKj76mFK+GAtyu1AEUz7baU0+gWfbPetWFTxZCoC9SA1YvnoDlj2HIvu91GgMvhyvxz1W+D4+GApiPabLO7wA5Ko644UCvgxTGz3LnI27LQgRvrnhw7y/McS9rXb0PkKb072Wi0a+F9PdPba+Az/hZR6+/8n4PnKPfj6Vx+c+8EacPIonVb4HE9k+PWtqvRQXJ760rxc9XHonvrSjiL7Y6Wk+X78oPXDCGT67Nny++E5iPXdDHT1sqDi+B4ItvoAH3r2zzdq85XhePIZolj5J0hY+s/isPaatrb7AdNs+8r+2vd8pDD6qzMM9eouUPsOeMz3BcII+FvpZPiF1pj6WdUG+j/devalgoTypoga9c0LEPc3fur2xte29np00viIJ5bzvYS49eWp6viSBDr6IYUe9SiOpvoBpdr3doXK+wDKou59EljxBZdC9oQC+PFTUWz0YBpY+T+5gvDaGIT+RZBs84iqJPiO15z7wWw8/nFHsvdvGID+Q2FE9OojpPqxLqD02EkS+dbfNPtnUlz2Ukgi+AiD9PPMD+z2V2i49j6osvt2EcD5sotK9a/FvvWeJ1j2C5vM9cPZ0vWTyQz72hWM+X9dMPa7bQrwyUZQ9GB83PfxbHD4g9FU+dX8XPjALx7yEcb+9up8MvhaPkD3QiYc91xN6vR4cT76kCnu+Ws0VvmA9W7yv4G89ZvsgPiVaor2ok/k9BIWJPZ5+7rno6QM+fbTrvKHqZj3NOOy8MI0HPXi5VL7iUmQ+KlTCvRSdSb7/IpU8fpXYvZHMgT2K2Us+UShZPo5z/70zGWA897+RvtLUiL4dUoi9n2wGvjaqzzznj/Q9UE0DvtqEub2WgXu++SI3PoQMk76Qa7E9PZeHvEyiXD5pKze+zpu/PepKg75irr286FJiPhIvw7tmUfk9nmWhPvCF5b2dbwU+KgwYPr6Egr5wVR6+Q0mBPoJx37xmKsg7nEbsvR+DJb0fjcw8ZlI7vTQInDw5elu+z1rWvRbWw72wLds8Qa3xvciA37008EG8EFEcvpXUKz7zSxw9tTgLPtwUgL2By/Q+Zv2+PQGEa75hnzk+mOKSPumBOj7TR6k9EBtfPuuk5z4kzPo95mqtvduNIz5nbhQ8WXzxPfMPXj6bNCU+43hzPmlAxr1LbM298vepPXV0CTyGzA4+uuxMvQIuXr4wo4Y+gG/9O73WCT6+VY29QlN9Pcs5Pj60l129snY8vjA9jD6cniU+2xM+vuAR9zshZ9Q9DufPPXYA3T3mliM+hMWzvZxOX76a5wQ+z5MvvT7p0r1+YLA98dwXPoI7ED5IrOy+V2TKPfAEEr7pdiy+rWIkvu7Z9L2RqRG+QBKju/A1Fr6QulK+rIGmvkLYHT6WOQa+4hGuvRn3o7u3Xy490raePrChM75vM20+B3g0vWQsMz5wWe69eI1IPgDCszoM0eA9aChGvvWVtb2gl/49XO2APacEGT57JqQ8mm7yPQmfij7BtZ4997PLPU49RT6skDs+6cy5va3cT72YRT69OqFpPpBEET1Ba/q9pWadPG+tVj7+1T8+0OgJvqrNSDy4bZE+90RAPljRlT6BR7Q+SKUhPoFsMz0BgVU9eDulve35Oz6Aksi7mSxxvWnAFD44KjU8FoCRvlp8Mb4+d/Q4mNqnPce5yTyQvcs+hfypPc51Ar6/P+Q92RvPPQBLmjuyIsG+eEwnvikEjT7Dtq6+O61Fvu3Ohrz7FRU+FzGvvYL5nj5chL+9eQcRvsfvhz0de4k+lwiNvuAUMj7GkTg+5JKKPeASCT7EhLy7Q++5PVd2o70jiaO9HvEzPZ/UK74CcQE/7tV3vnItBbyarB8+7oEcPvcT/L0UN/8+5ODOPZ/Tpz7gIQG+O1sZPpgroj4sATw9zVYjvvjOVD59ezc+9p8aviKffD1qmzA8R7zGvaI9rr0L/6g9hefZvTgVY77bpta9LMzuvcA/eD43miY+nTOWOa53Nb684CC+rLrmvXRhVT4IuxC+DNO0PYknQr27t/U9Y0PvPRpzCz7kGdQ9ql7BPcBAiLto9qe9HrjlvXb5T7335Zy9vFgju8NHVb5fvWo+GRQnvoaXXD3Vtiw9nonVPeokEbyDXYc+YLW7PJdgAz6isls+Qk7GveS5xr1Gits+cohJvXRoeTuwqR2+3wgHP+tNBr0B4GG9HkO3PhHJaj8QvZU9lbHfPmwI7j1ZgGI/WPXrvZutJz21hbI+N+YEPezs6bvmUFY+dfZbPd/Rnz7izxK+GquBPbJHZb0HPzs+jrptPVg8Aj8wAVi9N6jgPv5Jbr6MM00+MqIxPrwAZz0oERA+fEoLvudJH71ysgY/vhOYPaEPDr7asVQ+D8quPqWcmj1ewQg/Rqc7vrgRtz5gSTc+/Km0vEqH4j3BZ6M97evKvTmcJb5Htp492/KfPsn4vj2lx069ZroOPqXGXz74XFY7cRZWPiJgdz56er4+MAZfvg6wE71r3yc7uwqFPs7Ny71CA4o+k+GLPs7aej2b4789izC8vfOUjz5D2OU+RhUgvhVKgT5ak20+yOWBPjh0Q73A3qS9qYawvbt4YT7O1dI+mQoQP0LcDr6YwiY+yi7vvVpA8b78DAi+4jt7v5U5P75FGvu9lpeTPnhZ/r7zTIg+Hx+Avj7xOD/oW5o+D35evsYRHr5BbGi+3fQnPX3iyT2Ob3s+T6bzPUKchD5O6d6+IxjNPjWTW7wa0vk+n14dvhmT5z5tW/Y+8yOQvoRtZD6JwbU93pZLP1mnjz4C1bs9VUXBPqtl7z0=';
+const CNN_V3_WEIGHTS_B64='4iibrh0qJa0FsgStWKccogcq/6UosD2uQbT6tcmxh6uOsHmkurEjtDa0HrZ/t8+1HrTxtg60wSz1o60s/irWJi2rLaQxLXuoOR49pdEczqYxplgbkagELAYq7Kf0KbCcDSxIIXcnESLGpWyrYiSZIfYiEampI3kg1iUDI2cjGCsxqiis1Z13pQuk7SucLKqp+ChaqKIr66p5KYYpJiaFH5mrgiY4LKgUEijmmHYU2Cs3Kd4jiClPpk8hgiwEop2qIiwHJ52r/SkEqGUpLSnVp9qknCafpAqofixXKAowOSgNscGciSzTqYEw66eiqSekaq7Zsi+syqwjq2+oErIGs1qwa7SNtum0bbRmtLy0LCoKMe0vWi5HL10yzTFlMTsxICiAJ1mm8iwBKCwgqqVzp4Gk/q/QrrWwTaxLsrGxP7FxsaSrtKBmpmopd6UAIDAkYJ9vIXIuiaLVKhMneZxPI4ashCDoI2wNYq8BKD8norYwtBayX7T1spopzSUZLrYyuLNRtZmp37M7sXQsNSolLE81M7Srs0Mo9K6qsvUwwyZjqSaXkBmWKfKmX6MopVYmb6gWqSgjRJCMHPIe2Sa+jbYqRixkqLSnCKsaq1sof6wqIqUqICghqTAbdqTnHYwpgazRIBohhidKrCal7al1qnOrm6ieqe4a/6kzIU+cdaZgq3eer6hOqw0qDStJoMMqH6kNKl0s0x/jJnOkNSe9qkYpEKx9mB4pbKQnrHcVYCnyF8Ygh6ZipgwpQJhcHk6hPK6oqCGtt7QJtGGxE7B5rvysGK3ipvQv2qynrSOqBrD9rv6pbC4LL0Mwwq9hpEQp150spCAwQK+cqTusZ65yrROvCakUrmGpe6QALoYtiynTrBUslKXPLcCmzC1XLrEv0yq1Ib8obTASMGkxyihjqBumxaM0qUotk6IPn1grpKICJGCrgyk4KfGqB5gRqKAnT7HHryUsNrQWtMSuFyb8rGouq6+3st+ux7Wltq6zravesnqUeLO9tdyvR7gwuOG2c7Ywtk+yWqpSJzUm2ylGrDIbPgfjKBUtDCLmq4OpqS2zKGClYKnzq9mqOikaJqOsZKuYofYo4ys+qyil/yvupV2ripb/mcUobSwVob6ql6GwJNolRaw+nsQlbyHCKagp5qo1LNgbYCuHIH6ltyEgKD0rNiRbKksX9ijyKO2h9qrRJPChHSrop9mqYCR1LC6q/qb/qG+ksiBUqiykO6YPqLilfaq5KH8gNat0pVkwgayDsqWkW6cIrFMu4qilsPCoDrBrs0Gq5q08sYeoqLOZsyuwT7aStem127RNtWK0cCpfHyYsty9KLJMqAy6TLAovai3FKDwsmqxep6ElCiwPp+0q1qt5sG6pkbB/sIyt47HGsT+y3yZpoOaoryZsq/ogvyWRLiAaFyglKfGqRKpNrDCnYB70oY6ljB5eKXksMah2H1iqQJkHGy0oOipuKwWqOKy9IbonQCVvJcWmzKjwK2oq1iSUqpUmMKaZqWShMikgJmKseSU4JK6gKKx1qFGsfiiOqm0qIp3SKOkk/B1VrA+sLyWpIHciSab3ICMlxqROpekrCKznq8yk/qukJ8apaSteKhKlnaEMLCInLaw8LDmkFKo7pmopD6Vjq56iGiurntQqo6KlpyqhQywULOCrZSxArCSqW6z8Jzwr7qq7rASnFhrxKr0sjSparBsmIiSkJKerQCm0LJ0p4qi2rIIsRqw8Jl6lTywCLAulEyzrJ0usnKzPoJepdSzRqOAqzRoKq7Qp9KlpIbqmi6y4JU8p4KsDqpSsEioHK6arWqOho1KlIqCwIr0sAyyaLEcoeqWSqK2iz6gRLMeq1x3uqRossSuEK2MoqqttJ3eq4yhtqvkqpyynK4abJizhE7YkAiwnIImkUyy1qlEmpakRKbKg1SshGhKke5piJVqqka0YMYYwQKkesPOsMytsLlatQSjDLcitzDDPruUjdjBwJOytPLGJrRktqh61J3EjeLApsKawx6prL9CpOiqapGqu1a4rsusd3yohsV+l+66io8UsjLIkqBu0YK08smOmlLTDJUQnBpxlLF6xZLBwsWWsGLKhr3Kp+bLMLkwuSinQm1CsQbHwJB8sRCdDJmasK62DKo2nTbAmrFoqbC4PrPWyQakvsJOxFypLrLuxralqKACsmq3JoMGqritenBssBLBRKDEqBalKrZ2woKsNMTwtNaLQq02t4iXnKbExaC6XLfczoDOCMLMsOy3pJfAynjBcrVikYjDaMhc0HS/tJVcyCzHaM00yzDMoMRul0jCQMIilwi4uLA6q1K13qEkwBqmDMAUtmy4qKzIopKg6LnwsJi2DM32sZDMiHSouNrExM8oQ8zAZNNAx3TDgsH+tLrD2JAuwmS83nysumCgoqM+trawsr0uxay0eskayQrEjKjovurBHJgCiPyzNsniyzLASnqQqJi0YsjCuX6gbssqxMC6GsKwqlrIcKvCwZi78JPgeFLDIpsYutKF+K8wxBC7FM4cXEytNJv0ouTHALIM0lDGYL1ozpzEHLnwhWjQGFPYvQSb/M7wlZqi6qKAzDqskJzMdDLFBscUwCKE1MOQw8i5hr+Os4B/uKT6wbqgVLFYtmy96pEcxACcLMGAvMyBsLCuuM644LeoqN7GNrhOwXS9mJq+vaayorT0wiCxXsJywmrAHMImw+KGoMHiqQiH8M1ox2KmiGImvaS3ArIOtEZmnKiQte6LoLjAuxS67JG8xKDBeKvgtHCt9Jzsmri8oMJAwvC6IIVwuhC/VMAoggqucqKmlKqgRKGWpQCWsrhSlHSk/po6sGDA8MMAuWbLbssix+KaGK2onzC/7LQSgZKccqdwtMCUJMu4wFDL4MXkynS6BMnUypSkuKTYvXzH6MJsvFS8jMDodrqmIK+mquCWUJpOmACtmrkSvuhoOLnMxYS0VMqExBy53o28v3C3XKM6sKS9RMCkvzpYSlu0uCit4sE+uP6PUKlqvg68ZsTai4bB0saOlcqd3JJWsrrJrqRuvkqo3nJ8smS9ppZcsLa8gp62mczPCKbIx+C3bLHYzDjHbMqoxMS80Kq8nIiEvLawvjTGVKjosNa1ArP2deCyUrpIlUSgsqV2uNCLpMbsnyjEKKVsoty6IMN0jXimNLeUvSzG7MXEvYpXZJdwnUqbrrnyq7ypzqM+se66fqSUoUKjbqE8pmasoLUSq6bD/HpCxoahpLDSu1CyFLeowJSodL60rajNFLjcxDDF7Mfsr6ylJMcgpeSk5qTAxgalWJ5swui5NCg4seyxMLJGpqS0PGrwtUqpiLEwvOylrLnuq7ikkMbOkBy7OKdwhEDEaq0mpmKBFLRgpAi6sqFIujKKFpyqsLB4lGZKuQbBULHMqva7Sq1Ww9SlYKuEq/KRrrsipParqrJGZgSkSJt0ta6UGrBusBS0vKhcoryctMBklSTNELb4t2SeeMLSkcyqKLSMnAjHvKQsxf6pCqYWhkimNrHmsj6o3rQauNihOruQks68csHqqEC7hK02t7qyfr8Or8qweKGms6q0CJ2UpKiy2oe2tUy0cpC0scK3hGaOudh0jLrMq+ixSrcStvykerE2pgKnYqXysmSmdrsao3CGwrh+qlpFBJ7iqjyuWsLIunSSnH1UqliTFrUwt+Kb5LbOoYC2Yresp0x7KLHEsC608m6Wsyqm8rpwiaTDxJrukvi2Wm14dqDDdLlIuPKGjLj8uIDG9HGyfPaiSMMUwe68ksCMiF5zJKQSlX7DJrWsPALBAsVKZVq41jqOlCLA1s5gYzTDtkxuqg6rgLx8pCy/zJU4mWzFEMhskTTGuM0opfTOlME0yuzANLModgS8LMLEupC/FGgUuJBFsrTwusakpnz2rl6Ecpg8vpCYzKuwOmTGMKKwrojCMMaQwMDFuLDMspamELfSmhyc3ozAtXK9Gq8EoZa4WnCKkf6QvpR2k9K7jsKsn667CL+GpW64Qs7+s2TAGL4svP6i+EdElqCwDIxao9yd5KwkyFTAJL+YudClMlJ8wN6kSMEcpfCyuL6sw4jCSLWQweaydpFqvhik/L6wsOKLYLK6uD6+HqtYtZqjPLOQwmbeSsaaiDLLvHfQwarNXLrodCrdVrjgpOLRULDqojK/xLK0cObjispmw6Cp0pVMlJbXstNavq7LtK9w0BrLvoj0gt7KVnecmKrZ4r3GsJyUzp9MgKa0CLCUvubdJtJukJiukK6opG7DvMoYtCbRRs4azei9gMZYwSTAKqVAyDzOOLEgxTS4ULYUiuhogL1EuVC3WMC4qUC7ELzgx0zB6LdwqbC4VLvItjS8BLl0szylLKVwxWyTiMBgyGi9iMQQrrDGXLA0xVjEqMhmfvisxp9souKsXK3+lSimNKZkuljD8LYwuLDBgJyInDC1gMOUwpTD3MLUwNynNMLswFS5dLKW4qbheuC+5PLhJt7S5grf8uD6sJyGPoEctIqeSn5Qs7iyfJVMgHi/NLrMvZClOKm0sbCvKKQAwZZ7GMGYsSCKTLfsu0Sj/MDmZF6wDrBwrdixoqo2kkJUcLDwuQDIyLy2rEyq+sAQzhzNJMP8sXS47LdmofpxQKPel5CZmL4cqWi0EMBoqRy4ALx4xdS5GMG0bfCyTJm4uNS26I3wwFiyCL2UMUTF1LFAjQDFAMZkonSjQMOMrBC8zJy0smDDAKj8upZ+SMP8tIC+1JiUpSjJSMAUvCzAUKsAhTypzLfIogi/BLaiiwifCLuOlYS57LEosFagQp2gonKtcLOsr5C9OKmIxCjCKLWQpIC4/KCAYBi82MOswCjErKOae2jCrL3M46zd4NlU3FDbIN6g4Pze9N+8aFSZurF0q6Z5ALNSkyKkCLMQmqS4DMNYwOTHOLYUwWSCDMGwYgSJYLkAWmi8PL+onrTBDLygtx6iREAasUiz3JPYr56kFJ0CoQq8Tr7ocYK6gpF6opayFsfctjzCDpOgstjAUKlsuci1TKSgvUDE1HLYgaCt0MIIqXyXgKxMldCl2MEwwai01Jy8ocjC6LNAlAq1LLBUpKKSHqmsl0SkcpFIr3qqpLBWmaC1IJkom6SCfKsqqeqt7q6GljapLqsurpKxfrLesiitMJQIpO6bmoxgjmaC3qDqceix6quSsXplCrYurJRU0LMSpOS3rGBYt6yeTolEoOy0tq6Mq/qUGrJ0qrSxNqpOiwyXfKaUsmCTpqh2tsqqpFz4qVyNJraqUbym/Kyql9avyKcyqriMlltipmymMKAElOyQJrFEmLyqpJCepiqoUH9YkY6e3q5QkOKl6Fvks26hgJy+pXSb8KxctbapoqoSrJitXqj4s7qdQqH+dmCcVLdupFSmTKkkjDSQAnkIq5KgmpDysL6RLqGMZsZ6GrMwqAysfLSqoEq0FJReqVyzJKOipKi2Zmsiov6ryASivVK1prSamIq2Tqx6v9qhMrECl96gvqoiw/6QSpmWsda+1njWsA65FHKCoC7D2rjMoYK7Cqi6xVK+yqCOw9LC+rTalWarzKU6tP6oCKwGSJCBcK86v2K6uqx+w8K+eqI2sCR73rnioZKQjrKWvM6cZplCsV6wlruAt+zKwM90wTDCYNDE0YTZRNvkpfSx2rPYhNyuNKourhSU4KmCuE7APodanvKz8rO6btiZLrkCxxB9cnWIgValoJdqtNiV/Jken6SPBFGamxC09Lc4taaQ9KF4mPaY9M/qv5rH4ppCvKbC7rNmuBJ8tJsew967PqDMoragvIW2w+aThrI+wZqrKsN2wxarOrGclTCnpJxKc249bqT2tMa0QqbIxYSh1pGirVp+JLvgvmCVJLv0siy1dLIQlxK9cpjawf6TDsCKntSl6sK4oK64fqfqsh6W2LomrUyjRrDQuhaUCpmsso6zErV6tgS9AJTqo0bEurpqvKLLStM6woK/0sESu3CzyrpinPai5q+ikuDJQKYEsQ6kJsbwoR7SttGGzEa17raqw1qpPK0kreCCzrsql0ywNrQwqVakHLkonki/eJFosiyxLpassoiVir7Ish6/rKMOlAqv+r2WuQBpyohUu8KZ8rWSoQqtwrz8q0CSNKfilL6pDLfQSi5teLjYj2rIAsYmxAbS6smasW7LvsqClHSamrmshmSk/scawvy2jLhsvEihCq3yq9LKUsfuwD6Qqr9utEa19L9GLyK7nrWUtXyiQqXItSpOTLX8uTis0p+kuoqfqqGAvlCQErLyufCsDrU2oSBkaqLctqq51IJ4nvC07LSwt6S3+J4Wuihi9mGAuOatvqWUu4qo/MAsqQa/lsTaiyrOlsiyz1rHesEOwiizDrIEo1iwXsSetQTN2LbsvTbCPsmCs7q8utDiy4bHArDmwjC57nfqqtquzKuAtByqQoEUiK7BEsJizgqgxqnyyjK5SsnWznLDmsw2tiankssqwZbEIs2CwPiuBq4qpFarrLiisDygrKNOOnLGXriKoNbOcsGSyZbPKrmeqI7DUsIKxfqmxp4yx1rADsXmqYB4AsX6uzrISrc+x7rEbscGwRrFRsWuvrbEmsA0ibCBdramuMysnL3+msizWpnMsOCtrpHCokC7CJBCqoLE=';
+const CNN_V3_FILM_MLP_B64='hvXqvX4wsL37Zt2+Ium8vv4fqz7JHEc+yyEiPv+RiT5u+UK+Pv7RvY3EWD6BBpM+W10mvZ1agL5uvRg+ofWtPsNorT18tZw+coVIvoQMpL69pmy8AN/PvTbaAT4sGey9t/xuviGtlr56gVE+u+WrPmKfAD6uL46+f/sbvuU0bb5dBuK9487nvkOmgL0PsUG+BvcGPigbpD10eH4+kVu7O+s+3D7qE4k99gKAPjjW/D7DcQA9+Cenvr+rzr1jM9g+uH6cvgc8CD3sida9KaeTPo6yeD1neKi+XNk5vhKzmb3ptg4+xLvbPn0XWb1qk4a+bW3JPuhz/T6aAck+o+Svvq94PL4UGk0+DWd5PoqmbD4XMtk91T/bPnTZgz69sb2+08+mvYPKqL7Je5Q+5jEFvm2Lrr624eO+8l+GvvGyvT4MF2a+jDN0vgllDD+k50i+as2xOhYbGb50yRc+Ps6evh1Zqb6I5qa+aVNdvdaq3L5ppFy++4UaPzJb3b6kqnW9aYMavufZxj3WrJy99KMDPkSjKz4mESy+h81oPcCyvr1v5zC+CfzdvbmSGD0bdJS9HrN8vrfBVb4k/m89tzXcvfRMmr0RhHy+PlrZvpqpOL6K+Pg8diGwvhCAAr67pmc+osXavRsQYz6aXq+8otKTvoIvm7707fS+zp1pPYbINT4pbHG+9T04vvsger5QiRc+JzI9vXoKYr7/tEY+wHSBPc9Md77xpoq9BkOLvrBFPz4Y1568CrSUvr2bZD6WCMY8hPn4vWhFFT4UL/M99BDxveaDKb7UIdA9npchvjDCBT2MgiI+QqJGPq7VVb7wkkq+yO4PPTC4gD0kfHI+RDq+PVmaeL4r6Cw+XQoxPoPc9z0iOyK+Nj5tvtzsST59Sow+R3CIuxZ3MD2Qx8c9ByoFvlvNRr7+GWy+JXYZPhm80rwMFQm+MJFuPv2Txr0ZF0m+EAr/vMnGjz4s5cw9ZIVevgLeaT2N5Dq+4P5EPrAZhz1xBL49UO5PPnziXz67Seo9btIjvgLAcj27Oic+XTgmPSIfm70iHgE+M/+rPEwEK72FU6a7/NdFvjxj+721zLI91qUzvuRDv706VMq92/9QvvTUmT1ELna+4NoCvARb9b0gsG2+fCitvQDTILxwkS69wBOMPF6qdT5eURu+gITfu0J9OD5gJma+7qoTvkCAJj7wiXM+9gxuPo0Epb29d3a+oBmUPCDkRj4nc0s+mDv5PTXNT75gycs85UrOvSSyeD59O4+9jgq8PRiuHb36wGa+XetMvhKxBL0hGwS+gWkXvgtMQD7IZD2+/ABdPY80ET7Un/w9V+04PmEKcr6Ap3W+z9kpPR+Ft70gpow9qqgyPq7Kfr6DQ0W+RnGIvgzIeL5KvBc+iMhVvhoECb0rniy+nJEPPrbXsj1ArwU+X/xcPv8JrLwnAWK+z91aPqpTgz2EoHO9fDLfvcFdwT13fM29fc9FvmtBTD6XcTk+sgCFvEY/NT2qrCg+9TQ6vG81F76+hGI+C83vPQrLLz7i2Bs9AKEbPjuvzT1PEV4+WVGNPrEl8Dxyw/M9ZCg9vvN/fD7t1xg+MGe+PSCzMT7iHs09mXeqvWN80D5o4fA9eufHPeF4vj0QwbG85ip5vsPb2r10ems9aY4MvooXv7zXt1Q+FIqmvVgpzbxPxJI7gySEujCCir4Z/jG+VlfkPSLA7T3OLk6+mYvTvbhqBL6k+Q8+N60ovoEwtj2l1I4+FVbYPVQ/lzs8/ei9c8B+PfTWUz6jEII9GPpJPWCEhr17GIk9IWBZPTo4+D21lg4+P53gPdbiTL6yllQ9iUyUPe4dWT6J+zI+0PSFPOtLhz65mkG+yghLPJ4biT7Y7cc9jsRPvvK/Aj6ketY9IvV7PnXdI75mN9I9HtwtvvadrL0w/Ze8D8U2PiRQYz43flm+gMBuPpW+tb3Q03m++D5wPtA6+jyekYs99t00PiWf1r27YY4+Ra4VPgUcw71via68gXlgPpOzrrsxXgU++2OCvE+YJL6PgDg+TnSAPg3DbT7YY0A+zXMovQ7pVL4g89w9bicnPvLgAL5VTWU+hJQvvtpVJ74KPXU+7knZPQEv4z2L9XY9AqepPb682D2uyzy+Y7gmvvYjOT61vzG+c4C+PfIDpbwHdR2+nQ76PWNqZ70F1Z28Hct7Pi9gVj4XXl6+cHuhPc6vhz580k69gZXsPXp/EL3eyA++DhMqPoORX72tShO+qrYGvlfVFz62qWK+jTb2vTIli71RpV++foCsvS9+rL2pZg++1oWqPa7VIj5NRky+gpEkvsilLT4zINe9+rPVPeZ6ET4drjA+QZZCO43RQb5TSAC+OH4IPsT6Tz4CFxa+DT7fvY67eD6/6/s9RWIBPWFWNr7L6Jq91mM4Pkla/D1mx089V1LDPUfQH74qJzc9UjpvPmGAIT4JQk6+Nzn6PXCsqb1CRqa9XH64PUVK/L3Y3UO9Cr6mvWhBjz6bCvW8lCdAPQ3fTT6IqPO9wE4FPq1zBz5fbAk8oWBdvli/Hj58uri9Gh2YPuP2ab73ky4+n2EXPhlQAL5p5II94sYuvijgz70D4UK+3WzZPaJGHL5BbHU+wXksvuviaT4mKhE+3ZQrvtDFgD5snXy+uQNYvvoNWj6/iio+XLiVvFtbDL7DMx4+1O1NvgsPET5HiKQ9dwsLPsVzTD7cb+o9u0lhPRiBAr66BF0+teAoPf7ycb6U9r+9dawkPmiK6rzqEGg+SB4Cvk4qyT2AbH+9AE2oOlNeZz4s9mW+r3jyvfisyb0Yx8c8kAeovWzmeL4DHz8+giZ+vreLPLvgzV++a/tOPkwvoj06nCC+nK+cvcSmHr5Jh4i+hdAlu8blmD1KdVi+Kv4APnU5VT2/WTi+cqZ1PaviW74AaFM+xm9TPR82U7wkk0y+dsYBvk0QI715o0E+OG4KvnikZD0bik2+fo30vS859T37roU+O2CmPRqliT6Q2vW9I2NiPZvFbj2tG2U+nSjTumtvTb7pKT69eHgAvB716T0H04w986opvn2JGr53/AW+XblJvpQ0IL4xNnk8WJp5Ph6CgD2Z+xa9KdO/PPhu0T2O4Xs+VqsXPkg1AL1CIoi9rPkxPookhj3wfDY+Fp9AvnCiA76wUaO9pHthPtmrPr5XSHI+/dyPvSIoFL5QWZK9fJO9PW6tnL0CC029VbQMvuIhqT3Dj0u9JtdSvrqUUDx0U9497GsSvDDBzb10LzU9r0kevtjuJL5DaAk8n5VsvRLtQr6M/im+46MOPoan5j23jW6+eDozPtU4kbsu5w0+9vNyvhf6SD77aB8+coxAvvg8qD0VYGK+AlVJPVL3j75/YR6+Y1UNPn6zFT4NTaA9AgVxPrHw6zziMho+pfxvPYZecb7bxME9x1oYPuP4ET5KFmK+R0bJvfrmUz6bXjO9WrUPvuvk7D1Tfja+hBPRvbqeGT0kAA0+mqs4vrzlfz0HeJa+DskwPrHggL42J3++ibNXPrWNzDuXjQk+Q7pOPvAulj3Vj8A9dnVVPN74pbsBmgS7am0Tvos9CL6td4++kiGGvpt/er3+ZxI+/TN2vlRPCz6Z46Y9HinMveD3oD15ZA0+SS1EPopWJb4RC/K7xI4pvruITD5JSEK+1/iqvVDLpj1nQDW9hWPwuSD0ej7+UWe+/Vb+PeB0Fr65tKy9w8nrPTA8B77S15E9JvNQPtXOib2/1VU+G4doPrOvQ76n6j8+l6xBvoP7RL0aF429drjfvadXXr6fdDU+sxt4vgCjU76ep0C+zwMRPmBeNrxzOhS+VnCvPMHlBL5yTI69gPe/PYGWrz3Upb+9jtIrvsOFxz1ylVQ+G9A3vmoi7b2eUU2+MX+VPbO24b1xFao9QW4Lu8x0hb1Bq1Y97pvWPUZZCz4LalU9GysovloRrb6aF9O9sl4QvqDpKT02yPu7EXV2PrFtLj6eKA6+7bcOPpwiBL4T4gS8cs53vTHZED4khZO9fuwwPqpZDr4SsAS9ClxZPnqEYD3xfTo8O3nvvTBrjL04P7E+FpzwvTHNbz4JjJG8da2RvZt1Ob4Atbw+X2tHPiLV3r1pZwU+Fp6pvqkuxb4rbWC+EYjXvejynjszfje+LUucPclbBb4=';
diff --git a/cnn_v3/training/export_cnn_v3_weights.py b/cnn_v3/training/export_cnn_v3_weights.py
index 78f5f25..2fa83d1 100644
--- a/cnn_v3/training/export_cnn_v3_weights.py
+++ b/cnn_v3/training/export_cnn_v3_weights.py
@@ -15,12 +15,12 @@ Outputs
<output_dir>/cnn_v3_weights.bin
Conv+bias weights for all 5 passes, packed as f16-pairs-in-u32.
Matches the format expected by CNNv3Effect::upload_weights().
- Layout: enc0 (724) | enc1 (296) | bottleneck (584) | dec1 (580) | dec0 (292)
- = 2476 f16 values = 1238 u32 = 4952 bytes.
+ Layout: enc0 (1448) | enc1 (1168) | bottleneck (2320) | dec1 (2312) | dec0 (580)
+ = 7828 f16 values = 3914 u32 = 15656 bytes.
<output_dir>/cnn_v3_film_mlp.bin
- FiLM MLP weights as raw f32: L0_W (5×16) L0_b (16) L1_W (16×40) L1_b (40).
- = 5*16 + 16 + 16*40 + 40 = 80 + 16 + 640 + 40 = 776 f32 = 3104 bytes.
+ FiLM MLP weights as raw f32: L0_W (5×16) L0_b (16) L1_W (16×72) L1_b (72).
+ = 5*16 + 16 + 16*72 + 72 = 80 + 16 + 1152 + 72 = 1320 f32 = 5280 bytes.
For future CPU-side MLP inference in CNNv3Effect::set_film_params().
Usage
@@ -44,17 +44,19 @@ sys.path.insert(0, str(Path(__file__).parent))
from train_cnn_v3 import CNNv3
# ---------------------------------------------------------------------------
-# Weight layout constants — must stay in sync with:
-# cnn_v3/src/cnn_v3_effect.cc (kEnc0Weights, kEnc1Weights, …)
-# cnn_v3/training/gen_test_vectors.py (same constants)
+# Weight layout helpers — derived from enc_channels at runtime.
+# Must stay in sync with cnn_v3/src/cnn_v3_effect.cc and gen_test_vectors.py.
# ---------------------------------------------------------------------------
-ENC0_WEIGHTS = 20 * 4 * 9 + 4 # Conv(20→4,3×3)+bias = 724
-ENC1_WEIGHTS = 4 * 8 * 9 + 8 # Conv(4→8,3×3)+bias = 296
-BN_WEIGHTS = 8 * 8 * 9 + 8 # Conv(8→8,3×3,dil=2)+bias = 584
-DEC1_WEIGHTS = 16 * 4 * 9 + 4 # Conv(16→4,3×3)+bias = 580
-DEC0_WEIGHTS = 8 * 4 * 9 + 4 # Conv(8→4,3×3)+bias = 292
-TOTAL_F16 = ENC0_WEIGHTS + ENC1_WEIGHTS + BN_WEIGHTS + DEC1_WEIGHTS + DEC0_WEIGHTS
-# = 2476
+N_IN = 20 # feature input channels (fixed)
+
+def weight_counts(enc_channels):
+ c0, c1 = enc_channels
+ enc0 = N_IN * c0 * 9 + c0
+ enc1 = c0 * c1 * 9 + c1
+ bn = c1 * c1 * 9 + c1
+ dec1 = (c1 * 2) * c0 * 9 + c0
+ dec0 = (c0 * 2) * 4 * 9 + 4
+ return enc0, enc1, bn, dec1, dec0
def pack_weights_u32(w_f16: np.ndarray) -> np.ndarray:
@@ -86,7 +88,7 @@ def export_weights(checkpoint_path: str, output_dir: str) -> None:
ckpt = torch.load(checkpoint_path, map_location='cpu', weights_only=True)
cfg = ckpt.get('config', {})
- enc_channels = cfg.get('enc_channels', [4, 8])
+ enc_channels = cfg.get('enc_channels', [8, 16])
film_cond_dim = cfg.get('film_cond_dim', 5)
model = CNNv3(enc_channels=enc_channels, film_cond_dim=film_cond_dim)
@@ -102,13 +104,18 @@ def export_weights(checkpoint_path: str, output_dir: str) -> None:
# -----------------------------------------------------------------------
# 1. CNN conv weights → cnn_v3_weights.bin
# -----------------------------------------------------------------------
+ enc0_w, enc1_w, bn_w, dec1_w, dec0_w = weight_counts(enc_channels)
+ total_f16 = enc0_w + enc1_w + bn_w + dec1_w + dec0_w
layers = [
- ('enc0', ENC0_WEIGHTS),
- ('enc1', ENC1_WEIGHTS),
- ('bottleneck', BN_WEIGHTS),
- ('dec1', DEC1_WEIGHTS),
- ('dec0', DEC0_WEIGHTS),
+ ('enc0', enc0_w),
+ ('enc1', enc1_w),
+ ('bottleneck', bn_w),
+ ('dec1', dec1_w),
+ ('dec0', dec0_w),
]
+ print(f" Weight layout: enc0={enc0_w} enc1={enc1_w} bn={bn_w} "
+ f"dec1={dec1_w} dec0={dec0_w} total={total_f16} f16 "
+ f"({total_f16*2/1024:.1f} KB)")
all_f16 = []
for name, expected in layers:
@@ -119,13 +126,13 @@ def export_weights(checkpoint_path: str, output_dir: str) -> None:
all_f16.append(chunk)
flat_f16 = np.concatenate(all_f16)
- assert len(flat_f16) == TOTAL_F16, f"total mismatch: {len(flat_f16)} != {TOTAL_F16}"
+ assert len(flat_f16) == total_f16, f"total mismatch: {len(flat_f16)} != {total_f16}"
packed_u32 = pack_weights_u32(flat_f16)
weights_path = out / 'cnn_v3_weights.bin'
packed_u32.astype('<u4').tofile(weights_path) # little-endian u32
print(f"\ncnn_v3_weights.bin")
- print(f" {TOTAL_F16} f16 values → {len(packed_u32)} u32 → {weights_path.stat().st_size} bytes")
+ print(f" {total_f16} f16 values → {len(packed_u32)} u32 → {weights_path.stat().st_size} bytes")
print(f" Upload via CNNv3Effect::upload_weights(queue, data, {len(packed_u32)*4})")
# -----------------------------------------------------------------------
diff --git a/cnn_v3/training/gen_test_vectors.py b/cnn_v3/training/gen_test_vectors.py
index 2eb889c..cdda5a5 100644
--- a/cnn_v3/training/gen_test_vectors.py
+++ b/cnn_v3/training/gen_test_vectors.py
@@ -15,17 +15,17 @@ import argparse
# Weight layout (f16 units, matching C++ cnn_v3_effect.cc constants)
# ---------------------------------------------------------------------------
-ENC0_IN, ENC0_OUT = 20, 4
-ENC1_IN, ENC1_OUT = 4, 8
-BN_IN, BN_OUT = 8, 8
-DEC1_IN, DEC1_OUT = 16, 4
-DEC0_IN, DEC0_OUT = 8, 4
+ENC0_IN, ENC0_OUT = 20, 8
+ENC1_IN, ENC1_OUT = 8, 16
+BN_IN, BN_OUT = 16, 16
+DEC1_IN, DEC1_OUT = 32, 8
+DEC0_IN, DEC0_OUT = 16, 4
-ENC0_WEIGHTS = ENC0_IN * ENC0_OUT * 9 + ENC0_OUT # 724
-ENC1_WEIGHTS = ENC1_IN * ENC1_OUT * 9 + ENC1_OUT # 296
-BN_WEIGHTS = BN_IN * BN_OUT * 9 + BN_OUT # 584 (3x3 dilation=2)
-DEC1_WEIGHTS = DEC1_IN * DEC1_OUT * 9 + DEC1_OUT # 580
-DEC0_WEIGHTS = DEC0_IN * DEC0_OUT * 9 + DEC0_OUT # 292
+ENC0_WEIGHTS = ENC0_IN * ENC0_OUT * 9 + ENC0_OUT # 1448
+ENC1_WEIGHTS = ENC1_IN * ENC1_OUT * 9 + ENC1_OUT # 1168
+BN_WEIGHTS = BN_IN * BN_OUT * 9 + BN_OUT # 2320 (3x3 dilation=2)
+DEC1_WEIGHTS = DEC1_IN * DEC1_OUT * 9 + DEC1_OUT # 2312
+DEC0_WEIGHTS = DEC0_IN * DEC0_OUT * 9 + DEC0_OUT # 580
ENC0_OFFSET = 0
ENC1_OFFSET = ENC0_OFFSET + ENC0_WEIGHTS
@@ -33,7 +33,7 @@ BN_OFFSET = ENC1_OFFSET + ENC1_WEIGHTS
DEC1_OFFSET = BN_OFFSET + BN_WEIGHTS
DEC0_OFFSET = DEC1_OFFSET + DEC1_WEIGHTS
TOTAL_F16 = DEC0_OFFSET + DEC0_WEIGHTS
-# 724 + 296 + 584 + 580 + 292 = 2476 (BN is now 3x3 dilation=2, was 72)
+# 1448 + 1168 + 2320 + 2312 + 580 = 7828
# ---------------------------------------------------------------------------
# Helpers
@@ -50,11 +50,11 @@ def get_w(w_f32, base, idx):
def enc0_forward(feat0, feat1, w, gamma, beta):
"""
- Conv(20->4, 3x3, zero-pad) + FiLM + ReLU → rgba16float (f16 stored).
+ Conv(20->8, 3x3, zero-pad) + FiLM + ReLU → rgba32uint (pack2x16float, f16 stored).
feat0: (H, W, 8) f32 — channels from unpack2x16float(feat_tex0)
feat1: (H, W, 12) f32 — channels from unpack4x8unorm(feat_tex1)
- gamma, beta: (ENC0_OUT,) f32 — FiLM params
- Returns: (H, W, 4) f32 — f16 precision (rgba16float texture boundary)
+ gamma, beta: (ENC0_OUT=8,) f32 — FiLM params
+ Returns: (H, W, 8) f32 — f16 precision (pack2x16float boundary)
"""
H, W = feat0.shape[:2]
wo = ENC0_OFFSET
@@ -72,14 +72,15 @@ def enc0_forward(feat0, feat1, w, gamma, beta):
s += wv * fp[ky:ky+H, kx:kx+W, i]
out[:, :, o] = np.maximum(0.0, gamma[o] * s + beta[o])
- return np.float16(out).astype(np.float32) # rgba16float texture boundary
+ return np.float16(out).astype(np.float32) # pack2x16float boundary (rgba32uint)
-def enc1_forward(enc0, w, gamma_lo, gamma_hi, beta_lo, beta_hi):
+def enc1_forward(enc0, w, gamma, beta):
"""
- AvgPool2x2(enc0, clamp-border) + Conv(4->8, 3x3, zero-pad) + FiLM + ReLU
- → rgba32uint (pack2x16float, f16 precision, half-res).
- enc0: (H, W, 4) f32 — rgba16float precision
+ AvgPool2x2(enc0, clamp-border) + Conv(8->16, 3x3, zero-pad) + FiLM + ReLU
+ → 2x rgba32uint (pack2x16float, f16 precision, half-res).
+ enc0: (H, W, 8) f32 — pack2x16float precision
+ gamma, beta: (ENC1_OUT=16,) f32 — FiLM params
"""
H, W = enc0.shape[:2]
hH, hW = H // 2, W // 2
@@ -99,8 +100,6 @@ def enc1_forward(enc0, w, gamma_lo, gamma_hi, beta_lo, beta_hi):
# 3x3 conv with zero-padding at half-res borders
ap = np.pad(avg, ((1, 1), (1, 1), (0, 0)), mode='constant')
- gamma = np.concatenate([gamma_lo, gamma_hi])
- beta = np.concatenate([beta_lo, beta_hi])
out = np.zeros((hH, hW, ENC1_OUT), dtype=np.float32)
for o in range(ENC1_OUT):
@@ -159,10 +158,11 @@ def bottleneck_forward(enc1, w):
def dec1_forward(bn, enc1, w, gamma, beta):
"""
- NearestUp2x(bn) + cat(enc1_skip) → Conv(16->4, 3x3, zero-pad) + FiLM + ReLU
- → rgba16float (half-res).
- bn: (qH, qW, 8) f32 — quarter-res bottleneck
- enc1: (hH, hW, 8) f32 — half-res skip connection
+ NearestUp2x(bn) + cat(enc1_skip) → Conv(32->8, 3x3, zero-pad) + FiLM + ReLU
+ → rgba32uint (pack2x16float, half-res).
+ bn: (qH, qW, 16) f32 — quarter-res bottleneck
+ enc1: (hH, hW, 16) f32 — half-res skip connection
+ gamma, beta: (DEC1_OUT=8,) f32 — FiLM params
"""
hH, hW = enc1.shape[:2]
qH, qW = bn.shape[:2]
@@ -188,15 +188,15 @@ def dec1_forward(bn, enc1, w, gamma, beta):
s += wv * fp[ky:ky+hH, kx:kx+hW, i]
out[:, :, o] = np.maximum(0.0, gamma[o] * s + beta[o])
- return np.float16(out).astype(np.float32) # rgba16float boundary
+ return np.float16(out).astype(np.float32) # pack2x16float boundary (rgba32uint)
def dec0_forward(dec1, enc0, w, gamma, beta):
"""
- NearestUp2x(dec1) + cat(enc0_skip) → Conv(8->4, 3x3, zero-pad) + FiLM + ReLU + sigmoid
+ NearestUp2x(dec1) + cat(enc0_skip) → Conv(16->4, 3x3, zero-pad) + FiLM + ReLU + sigmoid
→ rgba16float (full-res, final output).
- dec1: (hH, hW, 4) f32 — half-res
- enc0: (H, W, 4) f32 — full-res enc0 skip
+ dec1: (hH, hW, 8) f32 — half-res
+ enc0: (H, W, 8) f32 — full-res enc0 skip
"""
H, W = enc0.shape[:2]
hH, hW = dec1.shape[:2]
@@ -231,8 +231,7 @@ def forward_pass(feat0, feat1, w_f32, film):
enc0 = enc0_forward(feat0, feat1, w_f32,
film['enc0_gamma'], film['enc0_beta'])
enc1 = enc1_forward(enc0, w_f32,
- film['enc1_gamma_lo'], film['enc1_gamma_hi'],
- film['enc1_beta_lo'], film['enc1_beta_hi'])
+ film['enc1_gamma'], film['enc1_beta'])
bn = bottleneck_forward(enc1, w_f32)
dc1 = dec1_forward(bn, enc1, w_f32, film['dec1_gamma'], film['dec1_beta'])
dc0 = dec0_forward(dc1, enc0, w_f32, film['dec0_gamma'], film['dec0_beta'])
@@ -241,16 +240,14 @@ def forward_pass(feat0, feat1, w_f32, film):
def identity_film():
return {
- 'enc0_gamma': np.ones(ENC0_OUT, dtype=np.float32),
- 'enc0_beta': np.zeros(ENC0_OUT, dtype=np.float32),
- 'enc1_gamma_lo': np.ones(4, dtype=np.float32),
- 'enc1_gamma_hi': np.ones(4, dtype=np.float32),
- 'enc1_beta_lo': np.zeros(4, dtype=np.float32),
- 'enc1_beta_hi': np.zeros(4, dtype=np.float32),
- 'dec1_gamma': np.ones(DEC1_OUT, dtype=np.float32),
- 'dec1_beta': np.zeros(DEC1_OUT, dtype=np.float32),
- 'dec0_gamma': np.ones(DEC0_OUT, dtype=np.float32),
- 'dec0_beta': np.zeros(DEC0_OUT, dtype=np.float32),
+ 'enc0_gamma': np.ones(ENC0_OUT, dtype=np.float32), # 8
+ 'enc0_beta': np.zeros(ENC0_OUT, dtype=np.float32), # 8
+ 'enc1_gamma': np.ones(ENC1_OUT, dtype=np.float32), # 16
+ 'enc1_beta': np.zeros(ENC1_OUT, dtype=np.float32), # 16
+ 'dec1_gamma': np.ones(DEC1_OUT, dtype=np.float32), # 8
+ 'dec1_beta': np.zeros(DEC1_OUT, dtype=np.float32), # 8
+ 'dec0_gamma': np.ones(DEC0_OUT, dtype=np.float32), # 4
+ 'dec0_beta': np.zeros(DEC0_OUT, dtype=np.float32), # 4
}
@@ -324,8 +321,7 @@ def generate_vectors(W=8, H=8, seed=42):
enc0 = enc0_forward(feat0, feat1, w_f32,
film['enc0_gamma'], film['enc0_beta'])
enc1 = enc1_forward(enc0, w_f32,
- film['enc1_gamma_lo'], film['enc1_gamma_hi'],
- film['enc1_beta_lo'], film['enc1_beta_hi'])
+ film['enc1_gamma'], film['enc1_beta'])
bn = bottleneck_forward(enc1, w_f32)
dc1 = dec1_forward(bn, enc1, w_f32, film['dec1_gamma'], film['dec1_beta'])
out = dec0_forward(dc1, enc0, w_f32, film['dec0_gamma'], film['dec0_beta'])
@@ -333,8 +329,9 @@ def generate_vectors(W=8, H=8, seed=42):
feat0_u32 = pack_feat0_rgba32uint(feat0, H, W)
feat1_u32 = pack_feat1_rgba32uint(feat1_u8, H, W)
w_u32 = pack_weights_u32(w_f16)
+ # enc0: 8ch stored as pack2x16float → H*W*8 f16 values
enc0_u16 = np.float16(enc0.reshape(-1)).view(np.uint16)
- # dec1 is half-res (hH x hW x 4); store as-is
+ # dec1: 8ch half-res stored as pack2x16float → (H/2)*(W/2)*8 f16 values
dc1_u16 = np.float16(dc1.reshape(-1)).view(np.uint16)
out_u16 = np.float16(out.reshape(-1)).view(np.uint16) # raw f16 bits
@@ -386,11 +383,15 @@ def emit_c_header(v):
lines.append("};")
lines.append("")
+ lines.append(f"// ENC0_OUT={ENC0_OUT} ENC1_OUT={ENC1_OUT} BN={BN_OUT} DEC1_OUT={DEC1_OUT} DEC0_OUT={DEC0_OUT}")
+ lines.append(f"// TOTAL_F16={TOTAL_F16} (enc_channels=[{ENC0_OUT},{ENC1_OUT}])")
+ lines.append("")
array_u32("kCnnV3TestFeat0U32", v['feat0_u32'])
array_u32("kCnnV3TestFeat1U32", v['feat1_u32'])
array_u32("kCnnV3TestWeightsU32", v['w_u32'])
+ lines.append(f"// enc0: {ENC0_OUT}ch rgba32uint → W*H*{ENC0_OUT} f16 values")
array_u16("kCnnV3ExpectedEnc0U16", v['enc0_u16'])
- lines.append(f"// kCnnV3Dec1HW = (W/2) x (H/2) = {v['W']//2} x {v['H']//2}")
+ lines.append(f"// dec1: {DEC1_OUT}ch rgba32uint half-res → (W/2)*(H/2)*{DEC1_OUT} f16 values")
array_u16("kCnnV3ExpectedDec1U16", v['dc1_u16'])
array_u16("kCnnV3ExpectedOutputU16", v['out_u16'])
return "\n".join(lines)
diff --git a/cnn_v3/training/infer_cnn_v3.py b/cnn_v3/training/infer_cnn_v3.py
index ca1c72a..b0fe9e6 100644
--- a/cnn_v3/training/infer_cnn_v3.py
+++ b/cnn_v3/training/infer_cnn_v3.py
@@ -129,8 +129,8 @@ def main():
p.add_argument('output', help='Output PNG')
p.add_argument('--checkpoint', '-c', metavar='CKPT',
help='Path to .pth checkpoint (auto-finds latest if omitted)')
- p.add_argument('--enc-channels', default='4,8',
- help='Encoder channels (default: 4,8 — must match checkpoint)')
+ p.add_argument('--enc-channels', default='8,16',
+ help='Encoder channels (default: 8,16 — must match checkpoint)')
p.add_argument('--cond', nargs=5, type=float, metavar='F', default=[0.0]*5,
help='FiLM conditioning: 5 floats (beat_phase beat_norm audio style0 style1)')
p.add_argument('--identity-film', action='store_true',
diff --git a/cnn_v3/training/train_cnn_v3.py b/cnn_v3/training/train_cnn_v3.py
index c61c360..5b6a0be 100644
--- a/cnn_v3/training/train_cnn_v3.py
+++ b/cnn_v3/training/train_cnn_v3.py
@@ -5,18 +5,18 @@
# ///
"""CNN v3 Training Script — U-Net + FiLM
-Architecture:
- enc0 Conv(20→4, 3×3) + FiLM + ReLU H×W
- enc1 Conv(4→8, 3×3) + FiLM + ReLU + pool2 H/2×W/2
- bottleneck Conv(8→8, 3×3, dilation=2) + ReLU H/4×W/4
- dec1 upsample×2 + cat(enc1) Conv(16→4) + FiLM H/2×W/2
- dec0 upsample×2 + cat(enc0) Conv(8→4) + FiLM H×W
+Architecture (enc_channels=[8,16]):
+ enc0 Conv(20→8, 3×3) + FiLM + ReLU H×W rgba32uint (8ch)
+ enc1 Conv(8→16, 3×3) + FiLM + ReLU + pool2 H/2×W/2 2× rgba32uint (16ch split)
+ bottleneck Conv(16→16, 3×3, dilation=2) + ReLU H/4×W/4 2× rgba32uint (16ch split)
+ dec1 upsample×2 + cat(enc1) Conv(32→8) + FiLM H/2×W/2 rgba32uint (8ch)
+ dec0 upsample×2 + cat(enc0) Conv(16→4) + FiLM H×W rgba16float (4ch)
output sigmoid → RGBA
-FiLM MLP: Linear(5→16) → ReLU → Linear(16→40)
- 40 = 2 × (γ+β) for enc0(4) enc1(8) dec1(4) dec0(4)
+FiLM MLP: Linear(5→16) → ReLU → Linear(16→72)
+ 72 = 2 × (γ+β) for enc0(8) enc1(16) dec1(8) dec0(4)
-Weight budget: ~4.84 KB conv f16 (fits ≤6 KB target)
+Weight budget: ~15.3 KB conv f16 (7828 f16); total with MLP ~17.9 KB
Training improvements:
--edge-loss-weight Sobel edge loss alongside MSE (default 0.1)
@@ -47,14 +47,14 @@ def film_apply(x: torch.Tensor, gamma: torch.Tensor, beta: torch.Tensor) -> torc
class CNNv3(nn.Module):
"""U-Net + FiLM conditioning.
- enc_channels: [c0, c1] channel counts per encoder level, default [4, 8]
+ enc_channels: [c0, c1] channel counts per encoder level, default [8, 16]
film_cond_dim: FiLM conditioning input size, default 5
"""
def __init__(self, enc_channels=None, film_cond_dim: int = 5):
super().__init__()
if enc_channels is None:
- enc_channels = [4, 8]
+ enc_channels = [8, 16]
assert len(enc_channels) == 2, "Only 2-level U-Net supported"
c0, c1 = enc_channels
@@ -227,6 +227,10 @@ def train(args):
optimizer.zero_grad()
pred = model(feat, cond)
loss = criterion(pred, target)
+ if args.multiscale_weight > 0.0:
+ for scale in [2, 4]:
+ loss = loss + args.multiscale_weight * criterion(
+ F.avg_pool2d(pred, scale), F.avg_pool2d(target, scale))
if args.edge_loss_weight > 0.0:
loss = loss + args.edge_loss_weight * sobel_loss(pred, target)
loss.backward()
@@ -321,6 +325,8 @@ def main():
help='Resume from checkpoint path; if path missing, use latest in --checkpoint-dir')
p.add_argument('--edge-loss-weight', type=float, default=0.1,
help='Weight for Sobel edge loss alongside MSE (default 0.1; 0=disable)')
+ p.add_argument('--multiscale-weight', type=float, default=0.5,
+ help='Weight per pyramid level for multi-scale MSE (default 0.5; 0=disable)')
p.add_argument('--film-warmup-epochs', type=int, default=50,
help='Epochs to train U-Net only before unfreezing FiLM MLP (default 50; 0=joint)')