diff options
| -rw-r--r-- | TODO.md | 18 | ||||
| -rw-r--r-- | cnn_v3/README.md | 3 | ||||
| -rw-r--r-- | cnn_v3/docs/CNN_V3.md | 1111 |
3 files changed, 1130 insertions, 2 deletions
@@ -60,7 +60,23 @@ Ongoing shader code hygiene for granular, reusable snippets. --- -## Future: CNN v3 8-bit Quantization +## Future: CNN v3 — U-Net + FiLM + +U-Net architecture with FiLM conditioning. Runtime style control via beat/audio. +Richer G-buffer input (normals, depth, material IDs). Per-pixel testability across +PyTorch / HTML WebGPU / C++ WebGPU. + +**Prerequisites:** G-buffer implementation (GEOM_BUFFER.md) +**Design:** `cnn_v3/docs/CNN_V3.md` + +**Phases:** +1. G-buffer prerequisite +2. Training infrastructure (Blender exporter + photo pipeline) +3. WGSL shaders (enc/dec/bottleneck, deterministic ops) +4. C++ effect class + FiLM uniform upload +5. Parity validation (test vectors, ≤1/255 per pixel) + +## Future: CNN v2 8-bit Quantization Reduce weights from f16 (~3.2 KB) to i8 (~1.6 KB). diff --git a/cnn_v3/README.md b/cnn_v3/README.md index fdbf648..a22d823 100644 --- a/cnn_v3/README.md +++ b/cnn_v3/README.md @@ -31,6 +31,7 @@ Add images directly to these directories and commit them. ## Status -**TODO:** Define CNN v3 architecture and feature set. +**Design phase.** Architecture defined, G-buffer prerequisite pending. +See `cnn_v3/docs/CNN_V3.md` for full design. See `cnn_v2/` for reference implementation. diff --git a/cnn_v3/docs/CNN_V3.md b/cnn_v3/docs/CNN_V3.md new file mode 100644 index 0000000..9d64fe3 --- /dev/null +++ b/cnn_v3/docs/CNN_V3.md @@ -0,0 +1,1111 @@ +# CNN v3: U-Net + FiLM + +**Technical Design Document** + +--- + +## Overview + +CNN v3 is a next-generation post-processing effect using: +- **U-Net architecture** — encoder/decoder with skip connections for multi-scale stylization +- **FiLM conditioning** — Feature-wise Linear Modulation, enabling runtime style control via beat, audio, or manual parameters +- **G-Buffer input** — richer geometric inputs (normals, depth, material) instead of plain RGBD +- **Per-pixel testability** — exact match between PyTorch, HTML WebGPU, and C++ WebGPU + +**Key improvements over v2:** +- Multi-scale processing (encoder captures global, decoder restores detail) +- Runtime stylization without retraining (FiLM γ/β from beat/audio/time) +- Richer scene understanding from G-buffer (normals, material IDs) +- Training from both Blender renders and real photos +- Strict test framework: per-pixel bit-exact validation across all implementations + +**Status:** Design phase. G-buffer implementation is prerequisite. + +**Prerequisites:** G-buffer (GEOM_BUFFER.md) must be implemented first. + +--- + +## Architecture + +### Pipeline Overview + +``` +G-Buffer (albedo, normal, depth, matID, UV) + │ + ▼ + FiLM Conditioning + (beat_time, audio_intensity, style_params) + │ → γ[], β[] per channel + ▼ + U-Net + ┌─────────────────────────────────────────┐ + │ Encoder │ + │ enc0 (H×W, 8ch) ────────────skip──────┤ + │ ↓ down (avg pool 2×2) │ + │ enc1 (H/2×W/2, 16ch) ───────skip──────┤ + │ ↓ down │ + │ bottleneck (H/4×W/4, 16ch) │ + │ │ + │ Decoder │ + │ ↑ up (bilinear 2×) + skip enc1 │ + │ dec1 (H/2×W/2, 16ch) │ + │ ↑ up + skip enc0 │ + │ dec0 (H×W, 8ch) │ + └─────────────────────────────────────────┘ + │ + ▼ + output RGBA (H×W) +``` + +FiLM is applied **inside each encoder/decoder block**, after each convolution. + +### U-Net Block (per level) + +``` +input → Conv 3×3 → BN (or none) → FiLM(γ,β) → ReLU → output +``` + +FiLM at level `l`: +``` +FiLM(x, γ_l, β_l) = γ_l ⊙ x + β_l (per-channel affine) +``` + +γ and β are computed from the conditioning MLP, one γ/β pair per channel per level. + +### FiLM Conditioning + +A small MLP takes a conditioning vector `c` and outputs all γ/β: + +``` +c = [beat_phase, beat_time/8, audio_intensity, style_p0, style_p1] (5D) + ↓ Linear(5 → 16) → ReLU + ↓ Linear(16 → N_film_params) + → [γ_enc0(8ch), β_enc0(8ch), γ_enc1(16ch), β_enc1(16ch), + γ_dec1(16ch), β_dec1(16ch), γ_dec0(8ch), β_dec0(8ch)] + = 2 × (8+16+16+8) = 96 parameters output +``` + +**Runtime cost:** trivial (one MLP forward pass per frame, CPU-side). +**Training:** jointly trained with U-Net — backprop through FiLM to MLP. +**Size:** MLP weights ~(5×16 + 16×96) × 2 bytes f16 ≈ 3 KB. + +**Why FiLM instead of just uniform parameters?** +- γ/β are per-channel, enabling fine-grained style control +- Network learns to use beat/audio meaningfully during training +- Same weights, different moods: dark/moody vs bright/energetic + +--- + +## G-Buffer Passes + +The G-buffer is populated by two passes writing to the same textures, merged by depth. +Textures need dual usage: `RENDER_ATTACHMENT | STORAGE_BINDING` → use `rgba16float`. + +``` +Pass 1: Rasterize triangles → MRT (fragment shader) + color[0]: albedo rgba16float material color (pre-lighting) + color[1]: normal_mat rg16float oct-normal XY + mat_id (u16 packed) + depth: depth32float hardware z-test + z-write + +Pass 2: SDF raymarching → compute shader + reads: depth32float texture (compare SDF hit depth vs rasterized) + writes: albedo, normal_mat storage textures where SDF depth < rasterized + writes: transparency r16float (volumetric density, not from rasterizer) + writes: shadow r8unorm (SDF soft shadow, or shared with light pass) + +Pass 3: Lighting / shadow pass → compute shader + reads: depth, normal_mat + writes: shadow r8unorm (shadow map lookup or SDF shadow ray) + +Pass 4: Pack → 32-byte CNN feature buffer (see below) + reads: all G-buffer textures + prev CNN output texture + computes: depth_grad (finite diff), samples albedo MIP 1 and MIP 2 + writes: feat_tex0 (rgba32uint) + feat_tex1 (rgba32uint) +``` + +**Depth unification:** the SDF pass reads the rasterized depth32float, converts its hit +distance to the same NDC depth value, and only overwrites when closer. Both sources end up +in the same depth texture which the pack pass reads for `depth` and `depth_grad`. + +--- + +## Input Feature Buffer + +**20 channels, 32 bytes/pixel**, packed into two `rgba32uint` textures (8 u32 total). +Mixed precision: geometric data as f16, color context and categorical data as u8. + +**UV is NOT stored** — computed from `coord / resolution` in every shader (free). + +--- + +### Texture 0 — 4 u32, 8 × f16 (geometric, high precision) + +| u32 | f16 lo | f16 hi | Notes | +|-----|--------|--------|-------| +| [0] | albedo.r | albedo.g | pre-lighting material color | +| [1] | albedo.b | normal.x | oct-encoded normal X | +| [2] | normal.y | depth | 1/z normalized | +| [3] | depth_grad.x | depth_grad.y | finite diff of depth, signed | + +Normal z reconstructed as `sqrt(max(0, 1 - nx²- ny²))`. +Depth gradient captures surface discontinuities and orientation cues for the CNN. + +--- + +### Texture 1 — 4 u32, 12 × u8 + 1 spare u32 (context, low precision) + +| u32 | byte 0 | byte 1 | byte 2 | byte 3 | +|-----|--------|--------|--------|--------| +| [0] | mat_id | prev.r | prev.g | prev.b | +| [1] | mip1.r | mip1.g | mip1.b | mip2.r | +| [2] | mip2.g | mip2.b | shadow | transp. | +| [3] | — spare — | | | | + +All packed via `pack4x8unorm`. Channels: +- **mat_id**: object/material index (u8/255), carries style category +- **prev.rgb**: previous CNN output (temporal feedback, recurrent) +- **mip1.rgb**: albedo at MIP 1 (½ resolution) — medium-frequency color context +- **mip2.rgb**: albedo at MIP 2 (¼ resolution) — low-frequency color context +- **shadow**: shadow intensity [0=fully shadowed, 1=fully lit] from shadow pass +- **transp.**: volumetric transparency [0=opaque, 1=transparent] for fog/smoke/volumetric light + +**Texture 1 is fully packed. u32[3] is reserved for future use.** + +--- + +### Pack compute shader + +```wgsl +@compute @workgroup_size(8, 8) +fn pack_features(@builtin(global_invocation_id) id: vec3u) { + let coord = vec2i(id.xy); + let uv = (vec2f(coord) + 0.5) / resolution; + + let albedo = textureLoad(gbuf_albedo, coord, 0).rgb; + let nm = textureLoad(gbuf_normal_mat, coord, 0); + let depth = sample_depth(coord); // from depth32float + let dzdx = (sample_depth(coord + vec2i(1,0)) - sample_depth(coord - vec2i(1,0))) * 0.5; + let dzdy = (sample_depth(coord + vec2i(0,1)) - sample_depth(coord - vec2i(0,1))) * 0.5; + let shadow = textureLoad(gbuf_shadow, coord, 0).r; + let transp = textureLoad(gbuf_transp, coord, 0).r; + let mat_id = unpack_mat_id(nm); // u8 from rg16float packing + let normal = unpack_oct_normal(nm.rg); // vec2f + + let mip1 = textureSampleLevel(gbuf_albedo, smplr, uv, 1.0).rgb; + let mip2 = textureSampleLevel(gbuf_albedo, smplr, uv, 2.0).rgb; + let prev = textureSample(prev_cnn_tex, smplr, uv).rgb; + + textureStore(feat_tex0, coord, vec4u( + pack2x16float(albedo.rg), + pack2x16float(vec2(albedo.b, normal.x)), + pack2x16float(vec2(normal.y, depth)), + pack2x16float(vec2(dzdx, dzdy)), + )); + textureStore(feat_tex1, coord, vec4u( + pack4x8unorm(vec4(mat_id, prev.r, prev.g, prev.b)), + pack4x8unorm(vec4(mip1.r, mip1.g, mip1.b, mip2.r)), + pack4x8unorm(vec4(mip2.g, mip2.b, shadow, transp)), + 0u, + )); +} +``` + +--- + +### Full channel table (20 channels, 32 bytes/pixel) + +| # | Name | Prec | Source | +|---|------|------|--------| +| 0 | albedo.r | f16 | Raster/SDF material color | +| 1 | albedo.g | f16 | | +| 2 | albedo.b | f16 | | +| 3 | normal.x | f16 | Oct-encoded, raster/SDF | +| 4 | normal.y | f16 | | +| 5 | depth | f16 | Unified depth (1/z) | +| 6 | depth_grad.x | f16 | Finite diff of depth | +| 7 | depth_grad.y | f16 | | +| 8 | mat_id | u8 | Object index / 255 | +| 9 | prev.r | u8 | Previous CNN output (temporal) | +| 10 | prev.g | u8 | | +| 11 | prev.b | u8 | | +| 12 | mip1.r | u8 | Albedo MIP 1 (½ res) | +| 13 | mip1.g | u8 | | +| 14 | mip1.b | u8 | | +| 15 | mip2.r | u8 | Albedo MIP 2 (¼ res) | +| 16 | mip2.g | u8 | | +| 17 | mip2.b | u8 | | +| 18 | shadow | u8 | Shadow intensity [0=dark, 1=lit] | +| 19 | transp. | u8 | Volumetric transparency [0=opaque, 1=clear] | + +UV computed in-shader. Bias = 1.0 implicit (standard NN, not stored). + +**Memory:** 1920×1080 × 32 bytes = **66 MB** feature buffer. +Plus prev_cnn texture (RGBA8): **8 MB**. + +--- + +### 16-byte fallback (budget-constrained) + +Drop temporal, MIPs, shadow, transparency. Geometric data only: + +| u32 | channels | +|-----|----------| +| [0] | albedo.rg (f16) | +| [1] | albedo.b, normal.x (f16) | +| [2] | normal.y, depth (f16) | +| [3] | depth_grad.x, depth_grad.y (f16) | + +8 channels, 16 bytes/pixel = **33 MB**. No temporal coherence, no lighting context. + +--- + +### Temporal parity testing + +Temporal breaks single-frame testability. Protocol: +- **Static parity test**: set `prev_cnn = black` → fully deterministic, run on single frame +- **Temporal parity test**: 2-frame sequence; frame 1's prev = frame 0's CNN output +- Test vector NPZ includes prev as explicit input: `test_<n>_{gbuf, prev, cond, expected}.npz` + +--- + +## Testability Framework + +**Goal:** Per-pixel bit-exact match (within f16 rounding tolerance) across: +1. PyTorch reference (f32) +2. HTML WebGPU validation tool +3. C++ WebGPU runtime + +### Protocol + +**Step 1: Reference generation (PyTorch f32)** +- Export test vectors: 4 canonical G-buffer images + conditioning vectors → expected output +- Store as PNG + NPZ: `cnn_v3/tests/vectors/test_<n>_{input,cond,expected}.{png,npz}` + +**Step 2: f16 export** +- Convert all weights to f16 (same as v2) +- Stored in binary format (see below) + +**Step 3: Deterministic operations (no ambiguity between impls)** +- Padding: `same` padding = zero-pad by `kernel_size//2`, explicit in all impls +- Downsampling: **average pooling 2×2**, stride 2 (not max pool — identical in all) +- Upsampling: **nearest neighbor** ×2 (no interpolation differences) +- Activation: ReLU = `max(0, x)` (exact), sigmoid = `1/(1+exp(-x))` (numerically identical) +- FiLM: `gamma * x + beta` applied per-channel (not per-pixel — channel broadcast) +- No batch norm at inference (fold BN into conv weights during export) + +**Step 4: Validation** +```bash +python3 cnn_v3/training/validate_parity.py \ + --weights cnn_v3/weights/model.bin \ + --test-vectors cnn_v3/tests/vectors/ \ + --tolerance 1 # max 1/255 per channel +``` + +Tolerance: f16 rounding introduces at most ~0.001 error. Display is 8-bit (1/255 ≈ 0.004). +**Acceptance criterion:** max per-pixel per-channel absolute error ≤ 1/255. + +### Parity Checklist + +For each layer, verify: +- [ ] Input shape matches +- [ ] Weight layout matches (OIHW = out_ch × in_ch × kH × kW) +- [ ] Padding: explicit zero-pad, not "reflect" or "replicate" +- [ ] Convolution output shape matches +- [ ] FiLM γ/β applied in correct order (after conv, before activation) +- [ ] Skip connection: concatenation along channel axis (not add) +- [ ] Upsample: nearest neighbor (not bilinear) + +--- + +## Binary Format + +Extends CNN v2 binary format with: + +**Header (v3, 28 bytes):** + +| Offset | Type | Field | Description | +|--------|------|-------|-------------| +| 0x00 | u32 | magic | `0x33_4E_4E_43` ("CNN3") | +| 0x04 | u32 | version | 3 | +| 0x08 | u32 | num_enc_levels | U-Net encoder levels (typically 2) | +| 0x0C | u32 | num_channels | Channels per level (e.g., [8,16]) | +| 0x10 | u32 | in_channels | Feature buffer input channels (20) | +| 0x14 | u32 | film_cond_dim | FiLM conditioning input size | +| 0x18 | u32 | total_weights | Total f16 weight count | + +**Sections** (sequential after header): +1. Encoder conv weights (per level) +2. Decoder conv weights (per level) +3. FiLM MLP weights (γ/β generator) + +All f16, little-endian, same packing as v2 (`pack2x16float`). + +--- + +## Size Budget + +**CNN v3 target: ≤ 6 KB weights** + +| Component | Params | f16 bytes | +|-----------|--------|-----------| +| enc0: Conv(20→8, 3×3) | 20×8×9=1440 | 2880 | +| enc1: Conv(8→16, 3×3) | 8×16×9=1152 | 2304 | +| bottleneck: Conv(16→16, 3×3) | 16×16×9=2304 | 4608 | +| dec1: Conv(32→8, 3×3) | 32×8×9=2304 | 4608 | +| dec0: Conv(16→8, 3×3) | 16×8×9=1152 | 2304 | +| output: Conv(8→4, 1×1) | 8×4=32 | 64 | +| FiLM MLP (~96 outputs) | ~1600 | 3200 | +| **Total** | | **~20 KB** | + +This exceeds target. **Mitigation strategies:** + +1. **Reduce channels:** [4, 8] instead of [8, 16] → cuts conv params by ~4× +2. **1 level only:** remove H/4 level → drops bottleneck + one dec level +3. **1×1 conv at bottleneck** (no spatial, just channel mixing) +4. **FiLM only at bottleneck** → smaller MLP output + +**Conservative plan (fits ≤ 6 KB):** +``` +enc0: Conv(20→4, 3×3) = 20×4×9 = 720 weights +enc1: Conv(4→8, 3×3) = 4×8×9 = 288 weights +bottleneck: Conv(8→8, 1×1) = 8×8×1 = 64 weights +dec1: Conv(16→4, 3×3) = 16×4×9 = 576 weights +dec0: Conv(12→4, 3×3) = 12×4×9 = 432 weights +output: Conv(4→4, 1×1) = 4×4 = 16 weights +FiLM MLP (5→24 outputs) = 5×16+16×24 = 464 weights +Total: ~2560 weights × 2B = ~5.0 KB f16 ✓ +``` + +Note: enc0 input is 20ch (feature buffer), dec1 input is 16ch (8 bottleneck + 8 skip), +dec0 input is 12ch (4 dec1 output + 8 enc0 skip). Skip connections concatenate. + +--- + +## Training Data + +## Training Sample Pipelines + +Two sample types feed the same model. The key to compatibility is **channel dropout** +during training: geometric channels are randomly zeroed with probability p=0.3, forcing +the network to learn useful behaviour even when channels are absent. Photo samples are +then a natural zero-filled subset at inference. + +--- + +### Pipeline A: Full G-buffer samples (Blender) + +Blender Cycles exports all 20 channels as render passes in a single multi-layer EXR. + +**Render passes required:** + +| Pass | Blender name | Maps to | +|------|-------------|---------| +| Beauty (target) | `Combined` | Training target RGBA | +| Diffuse color | `DiffCol` | albedo.rgb | +| World normal | `Normal` | normal.xy (octahedral encode in post) | +| Depth | `Z` | depth (normalize by far plane) | +| Object index | `IndexOB` | mat_id | +| Shadow | `Shadow` | shadow (invert: 1−shadow_catcher) | +| Alpha / transmission | `Alpha` | transp. (0=opaque, 1=clear) | + +depth_grad, mip1, mip2 computed from albedo/depth during pack, not a render pass. +prev = **zero** during training (no temporal history for static frames). + +**Blender script: `cnn_v3/training/blender_export.py`** +```python +# Enable passes +vl = bpy.context.scene.view_layers["ViewLayer"] +vl.use_pass_diffuse_color = True +vl.use_pass_normal = True +vl.use_pass_z = True +vl.use_pass_object_index = True +vl.use_pass_shadow = True + +# Output: multi-layer EXR via compositor File Output node +# One EXR per frame, all passes in separate layers + +# Run headless: +# blender -b scene.blend -P blender_export.py -- --output renders/frame_### +``` + +**Post-processing: `cnn_v3/training/pack_blender_sample.py`** +```bash +python3 pack_blender_sample.py \ + --exr renders/frame_001.exr \ + --output dataset/full/sample_001/ +# Writes: albedo.png normal.png depth.png matid.png shadow.png transp.png target.png +``` + +depth_grad computed on-the-fly during dataloader (same Sobel kernel as runtime shader). +mip1/mip2 computed from albedo via pyrDown (same as runtime). + +--- + +### Pipeline B: Simple photo samples (albedo + alpha only) + +Input: a photo (RGB) + optional alpha mask. No geometry data. +Missing channels are **zero-filled** — the network degrades gracefully due to dropout training. + +| Feature buffer channel | Value | +|-----------------------|-------| +| albedo.rgb | Photo RGB | +| normal.xy | **0, 0** (zero → network ignores) | +| depth | **0** | +| depth_grad.xy | **0, 0** | +| mat_id | **0** | +| prev.rgb | **0, 0, 0** (no history) | +| mip1.rgb | Computed from photo (pyrDown ×1) | +| mip2.rgb | Computed from photo (pyrDown ×2) | +| shadow | **1.0** (assume fully lit) | +| transp. | **1 − alpha** (from photo alpha channel, or 0 if no alpha) | + +mip1/mip2 are still meaningful (they come from albedo, which we have). +`transp` from photo alpha lets the network see foreground/background separation when +available (e.g. cutout photos, PNG with alpha). + +**Simple pack script: `cnn_v3/training/pack_photo_sample.py`** +```bash +python3 pack_photo_sample.py \ + --photo photos/img_001.png \ # RGB or RGBA + --output dataset/simple/sample_001/ +# Writes: albedo.png [zeros for normal/depth/matid/shadow] target.png (= albedo, no GT style) +``` + +For photo samples there is **no ground-truth styled target** — they are used for: +1. Fine-tuning after Blender pre-training (self-supervised or with manual target) +2. Inference-only testing (visual validation, no loss computed) +3. Parity testing (compare PyTorch vs WebGPU output on a photo input) + +--- + +### Channel dropout (training robustness) + +Applied per-sample during dataloader `__getitem__`: + +```python +GEOMETRIC_CHANNELS = [3, 4, 5, 6, 7] # normal.xy, depth, depth_grad.xy +CONTEXT_CHANNELS = [8, 18, 19] # mat_id, shadow, transp +TEMPORAL_CHANNELS = [9, 10, 11] # prev.rgb + +def apply_channel_dropout(feat, p_geom=0.3, p_context=0.2, p_temporal=0.5): + if random.random() < p_geom: + feat[GEOMETRIC_CHANNELS] = 0.0 # simulate photo-only input + if random.random() < p_context: + feat[CONTEXT_CHANNELS] = 0.0 + if random.random() < p_temporal: + feat[TEMPORAL_CHANNELS] = 0.0 # simulate first frame + return feat +``` + +This ensures the network produces reasonable output regardless of which channels +are available, and that full and simple pipelines can share one set of weights. + +--- + +### Dataset layout + +``` +cnn_v3/training/ + dataset/ + full/ # Blender samples (all 20 channels) + sample_000/ + albedo.png # RGB + normal.png # RG oct-encoded (or zero) + depth.png # R float16 EXR or 16-bit PNG + matid.png # R u8 + shadow.png # R u8 + transp.png # R u8 + target.png # RGBA styled target + simple/ # Photo samples (albedo+alpha only) + sample_000/ + albedo.png # RGB (or RGBA if alpha available) + target.png # = albedo (no GT, inference/parity only) + test_vectors/ + full_000_{feat,prev,cond,expected}.npz # parity: full G-buffer + simple_000_{feat,prev,cond,expected}.npz # parity: photo input +``` + +`feat.npz` stores the packed 20-channel float array (H×W×20, f32) ready for the model. +`prev.npz` stores the previous-frame CNN output (H×W×3, f32), zero for static tests. +`cond.npz` stores the FiLM conditioning vector (5-d). +`expected.npz` stores the PyTorch f32 reference output (H×W×4, f32). + +--- + +### Parity test matrix + +| Test | G-buffer | Prev | Notes | +|------|----------|------|-------| +| `full_static` | Blender sample | zero | Core correctness test | +| `simple_static` | Photo (zeros for geom) | zero | Photo path correctness | +| `full_temporal` | Blender frame 1 | frame 0 output | Temporal path | +| `zero_input` | All zeros | zero | Degenerate stability check | + +All tests: max per-pixel per-channel absolute error ≤ 1/255 (PyTorch f32 vs WebGPU f16). + +--- + +## Training Script: `train_cnn_v3.py` + +**Key differences from v2:** + +```python +class CNNv3(nn.Module): + def __init__(self, enc_channels=[4,8], film_cond_dim=5): + super().__init__() + # Encoder + self.enc = nn.ModuleList([ + nn.Conv2d(20, enc_channels[0], 3, padding=1), # 20-ch feature buffer in + nn.Conv2d(enc_channels[0], enc_channels[1], 3, padding=1), + ]) + # Bottleneck + self.bottleneck = nn.Conv2d(enc_channels[1], enc_channels[1], 1) + # Decoder (skip connections: concat → double channels) + self.dec = nn.ModuleList([ + nn.Conv2d(enc_channels[1]*2, enc_channels[0], 3, padding=1), + nn.Conv2d(enc_channels[0]*2, 4, 3, padding=1), + ]) + # FiLM MLP: conditioning → γ/β for each level + film_out = 2 * sum(enc_channels) * 2 # enc + dec levels, γ and β + self.film_mlp = nn.Sequential( + nn.Linear(film_cond_dim, 16), nn.ReLU(), + nn.Linear(16, film_out), + ) + + def forward(self, gbuf, cond): + # FiLM params from conditioning + film = self.film_mlp(cond) # split into γ/β per level + + # Encoder + skips = [] + x = gbuf + for i, enc_layer in enumerate(self.enc): + x = enc_layer(x) + x = film_apply(x, gamma[i], beta[i]) # FiLM + x = F.relu(x) + skips.append(x) + x = F.avg_pool2d(x, 2) # ½ resolution + + # Bottleneck + x = F.relu(self.bottleneck(x)) + + # Decoder + for i, dec_layer in enumerate(self.dec): + x = F.interpolate(x, scale_factor=2, mode='nearest') # ×2 + x = torch.cat([x, skips[-(i+1)]], dim=1) # skip + x = dec_layer(x) + x = film_apply(x, gamma[n_enc+i], beta[n_enc+i]) # FiLM + x = F.relu(x) + + return torch.sigmoid(x) # RGBA output [0,1] + +def film_apply(x, gamma, beta): + # gamma, beta: shape [B, C] → [B, C, 1, 1] + return gamma.unsqueeze(-1).unsqueeze(-1) * x + beta.unsqueeze(-1).unsqueeze(-1) +``` + +**Export:** fold BN into conv weights (if BN used), quantize to f16, write binary v3. + +--- + +## Training Pipeline Script: `cnn_v3/scripts/train_cnn_v3_full.sh` + +Modelled directly on `cnn_v2/scripts/train_cnn_v2_full.sh`. Same structure, same modes, +extended for v3 specifics (dataset packing, FiLM, parity vectors). + +### Modes (same pattern as v2) + +```bash +# Full pipeline: pack → train → export → build → validate +./train_cnn_v3_full.sh + +# Train only (dataset already packed) +./train_cnn_v3_full.sh --skip-pack + +# Validate only (skip training, use existing weights) +./train_cnn_v3_full.sh --validate +./train_cnn_v3_full.sh --validate checkpoints/checkpoint_epoch_100.pth + +# Export weights only +./train_cnn_v3_full.sh --export-only checkpoints/checkpoint_epoch_100.pth + +# Pack dataset only (run once after new Blender renders or photos) +./train_cnn_v3_full.sh --pack-only +``` + +### Pipeline steps + +``` +[1/5] Pack dataset pack_blender_sample.py / pack_photo_sample.py +[2/5] Train train_cnn_v3.py +[3/5] Export weights export_cnn_v3_weights.py → .bin + test vectors .npz +[4/5] Build demo cmake --build build -j4 --target demo64k +[5/5] Validate cnn_v3_test on all input images + parity check +``` + +Step 1 is skipped with `--skip-pack` (dataset already exists). +Steps 3–5 can be run independently with `--export-only` / `--validate`. + +### Parameters + +**New vs v2:** + +| Flag | Default | Notes | +|------|---------|-------| +| `--enc-channels C` | `4,8` | Comma-separated encoder channel counts per level | +| `--film-cond-dim N` | `5` | FiLM conditioning vector size | +| `--input-mode MODE` | `simple` | `simple` (photo) or `full` (Blender G-buffer) | +| `--channel-dropout-p F` | `0.3` | Dropout probability for geometric channels | +| `--blender-dir DIR` | `training/blender_renders/` | Source EXRs for full mode | +| `--photos-dir DIR` | `training/photos/` | Source PNGs for simple mode | +| `--generate-vectors` | off | Also run `validate_parity.py` during export step | +| `--skip-pack` | off | Skip dataset packing (step 1) | + +**Kept from v2 unchanged:** + +| Flag | Default | +|------|---------| +| `--epochs N` | 200 | +| `--batch-size N` | 16 | +| `--lr FLOAT` | 1e-3 | +| `--checkpoint-every N` | 50 | +| `--patch-size N` | 8 | +| `--patches-per-image N` | 256 | +| `--detector TYPE` | harris | +| `--full-image` | off | +| `--image-size N` | 256 | +| `--input DIR` | `training/dataset/` | +| `--target DIR` | `training/dataset/` (same — target is inside sample dirs) | +| `--checkpoint-dir DIR` | `checkpoints/` | +| `--validation-dir DIR` | `validation_results/` | +| `--output-weights PATH` | `cnn_v3/weights/cnn_v3_weights.bin` | + +### Examples + +```bash +# Quick debug run: 1 level, 5 epochs, simple photos +./train_cnn_v3_full.sh --enc-channels 4,4 --epochs 5 --input-mode simple + +# Full Blender pipeline: 500 epochs, channel dropout, generate parity vectors +./train_cnn_v3_full.sh \ + --input-mode full \ + --blender-dir training/blender_renders/ \ + --enc-channels 4,8 \ + --epochs 500 \ + --channel-dropout-p 0.3 \ + --generate-vectors + +# Re-validate existing weights without retraining +./train_cnn_v3_full.sh --validate + +# Export only and open results +./train_cnn_v3_full.sh --export-only checkpoints/checkpoint_epoch_200.pth \ + --generate-vectors +``` + +### Validation output (step 5) + +Same pattern as v2: runs `cnn_v3_test` on each image in `--input`, writes +`validation_results/<name>_output.png`, opens the folder. + +If `--generate-vectors` was passed during export: also runs `validate_parity.py`, +prints per-implementation max error table: + +``` +Parity results: + HTML vs PyTorch: max=0.0039 mean=0.0008 ✓ PASS (threshold=0.0039) + C++ vs PyTorch: max=0.0039 mean=0.0007 ✓ PASS +``` + +--- + +## WGSL Implementation + +**Compute shader approach** (same as v2, extended): + +``` +Pass 0: pack_gbuffer.wgsl — assemble G-buffer channels into storage texture +Pass 1: cnn_v3_enc0.wgsl — encoder level 0 (20→4ch, 3×3) +Pass 2: cnn_v3_enc1.wgsl — encoder level 1 (4→8ch, 3×3) + downsample +Pass 3: cnn_v3_bottleneck.wgsl — bottleneck (8→8, 1×1) +Pass 4: cnn_v3_dec1.wgsl — decoder level 1: upsample + skip + (16→4, 3×3) +Pass 5: cnn_v3_dec0.wgsl — decoder level 0: upsample + skip + (8→4, 3×3) +Pass 6: cnn_v3_output.wgsl — sigmoid + composite to framebuffer +``` + +FiLM γ/β values are computed CPU-side each frame and uploaded as a small uniform buffer. + +**Uniform: FiLM params (per-frame)** +```wgsl +struct FilmParams { + gamma_enc0: vec4f, // 4 channels + beta_enc0: vec4f, + gamma_enc1: vec4f, // wait, use array for flexibility + beta_enc1: vec4f, + // ... +} +// ~96 floats × 4 bytes = 384 bytes uniform buffer (well within limits) +``` + +--- + +## HTML Validation Tool: `cnn_v3/tools/index.html` + +**Base:** copy `cnn_v2/tools/cnn_v2_test/index.html`, adapt in-place. +Single self-contained HTML file, no build step, open directly in browser. + +--- + +### What is reused from v2 unchanged + +- Full CSS (drop zones, panels, layer-view grid, console, footer) +- WebGPU init boilerplate (adapter, device, queue) +- Drop zone + file input JS +- `FULLSCREEN_QUAD_VS` vertex shader +- Display / blit shader (output to canvas) +- Layer viz shader (grayscale 4-channel split + 4× zoom) +- Weight stats display (min/max per layer) +- Video playback controls (play/pause, step frame) +- Save PNG button, blend slider +- Console logging + +--- + +### Layout changes + +**Left sidebar** (replaces v2 left sidebar): +``` +[ Drop .bin weights ] +[ Weights Info panel ] ← same, but shows U-Net topology +[ Weights Viz panel ] ← same, shows enc0/enc1/bottleneck/dec layers +[ Input Mode toggle ] ← NEW: Simple (photo) / Full (G-buffer) +[ FiLM Conditioning panel ] ← NEW: beat_phase, audio_intensity, style_p0, style_p1 sliders +[ Temporal panel ] ← NEW: "Use temporal" toggle, "Capture prev frame" button +``` + +**Main canvas** (mostly same): +``` +[ bottom float bar ] + Video controls | Blend | View mode | G-buffer channel | Save PNG +``` +View modes (keyboard): `SPACE` = original, `D` = diff×10, `G` = G-buffer channel view. +G-buffer channel selector: albedo / normal.xy / depth / depth_grad / shadow / transp / prev. + +**Right sidebar** (replaces v2 layer viz): +``` +[ Layer Visualization panel ] + Buttons: Features | Enc0 | Enc1 | BN | Dec1 | Dec0 | Output + 4-channel grid (or 8-channel grid for Enc1/BN, shown as 2 rows) + Zoom view (4×, mouse-driven) +``` +"Features" button shows the 20-channel feature buffer split across 5 rows of 4. + +--- + +### Input modes + +**Simple mode (default):** drop one PNG or video. +- Albedo = image RGB +- Alpha → `transp = 1 − alpha` (if RGBA PNG) +- All geometric channels (normal, depth, depth_grad, mat_id) = 0 +- Shadow = 1.0 (fully lit) +- Prev = black (or captured from previous render) +- Mip1/mip2 computed from albedo in PACK_SHADER + +**Full mode:** drop multiple PNGs by filename convention. +The tool detects channel assignment by filename: +``` +*albedo* or *color* → albedo (RGB) +*normal* → normal (RG oct-encoded) +*depth* → depth (R, 16-bit PNG or EXR) +*matid* or *index* → mat_id (R u8) +*shadow* → shadow (R u8) +*transp* or *alpha* → transparency (R u8) +``` +Drop all files at once (or one-by-one). Missing channels stay zero. +Status bar shows which channels are loaded. + +--- + +### New WGSL shaders (inline, same pattern as v2) + +| Shader | Replaces | Notes | +|--------|----------|-------| +| `PACK_SHADER` | `STATIC_SHADER` | 20ch into feat_tex0 + feat_tex1 (rgba32uint each) | +| `ENC0_SHADER` | part of `CNN_SHADER` | Conv(20→4, 3×3) + FiLM + ReLU; writes enc0_tex | +| `ENC1_SHADER` | | Conv(4→8, 3×3) + FiLM + ReLU + avg_pool2×2; writes enc1_tex (half-res) | +| `BOTTLENECK_SHADER` | | Conv(8→8, 1×1) + FiLM + ReLU; writes bn_tex | +| `DEC1_SHADER` | | nearest upsample×2 + concat(bn, enc1_skip) + Conv(16→4, 3×3) + FiLM + ReLU | +| `DEC0_SHADER` | | nearest upsample×2 + concat(dec1, enc0_skip) + Conv(8→4, 3×3) + FiLM + ReLU | +| `OUTPUT_SHADER` | | Conv(4→4, 1×1) + sigmoid → composites to canvas | + +FiLM γ/β computed JS-side from sliders (tiny MLP forward pass in JS), uploaded as uniform. + +--- + +### Textures (GPU-side, all rgba32uint or rgba16float) + +| Name | Size | Format | Contents | +|------|------|--------|----------| +| `feat_tex0` | W×H | rgba32uint | feature buffer slots 0–7 (f16) | +| `feat_tex1` | W×H | rgba32uint | feature buffer slots 8–19 (u8+spare) | +| `enc0_tex` | W×H | rgba32uint | 4 channels f16 (enc0 output, skip) | +| `enc1_tex` | W/2×H/2 | rgba32uint | 8 channels f16 (enc1 out, skip) — 2 texels per pixel | +| `bn_tex` | W/2×H/2 | rgba32uint | 8 channels f16 (bottleneck output) | +| `dec1_tex` | W×H | rgba32uint | 4 channels f16 (dec1 output) | +| `dec0_tex` | W×H | rgba32uint | 4 channels f16 (dec0 output) | +| `prev_tex` | W×H | rgba8unorm | previous CNN output (temporal) | + +Skip connections: enc0_tex and enc1_tex are **kept alive** across the full forward pass +(not ping-ponged away). DEC1 and DEC0 read them directly. + +--- + +### Parity test mode + +Drop an NPZ file (from `validate_parity.py`) to activate: +- Loads `feat`, `prev`, `cond`, `expected` arrays +- Runs full forward pass on the packed features +- Computes per-pixel per-channel absolute error vs `expected` +- Reports: max error, mean error, pass/fail (threshold = 1/255) +- Shows error map on canvas (amplified ×10, same as diff mode) + +--- + +### File size estimate + +| Component | Approx size | +|-----------|-------------| +| HTML/CSS (reused) | ~4 KB | +| JS logic (reused + new) | ~15 KB | +| PACK_SHADER | ~1.5 KB | +| ENC/DEC shaders (×6) | ~9 KB | +| Display/viz shaders (reused) | ~3 KB | +| **Total** | **~33 KB** | + +--- + +### Usage + +```bash +open cnn_v3/tools/index.html +# or +python3 -m http.server 8000 +# → http://localhost:8000/cnn_v3/tools/ +``` + +--- + +## Implementation Checklist + +Ordered for parallel execution where possible. Phases 1 and 2 are independent. + +**Architecture locked:** enc_channels = [4, 8]. See Size Budget for weight counts. + +--- + +### Phase 0 — Stub G-buffer (unblocks everything else) + +Minimal compute pass, no real geometry. Lets CNN v3 be developed and trained +before the real G-buffer exists. Wire real G-buffer in Phase 5. + +- [ ] `src/effects/cnn_v3_stub_gbuf.wgsl` — compute shader: + - albedo = sample current framebuffer (RGBA) + - normal.xy = (0.5, 0.5) — neutral, pointing toward camera + - depth = 0.5 — constant mid-range + - depth_grad.xy = 0, 0 + - mat_id = 0, prev.rgb = 0, shadow = 1.0, transp = 0.0 + - mip1/mip2 sampled from framebuffer via `textureSampleLevel` + - writes feat_tex0 + feat_tex1 (2 × rgba32uint) +- [ ] Wire into `CNNv3Effect::render()` as pass 0 (swapped out later for real G-buffer) + +--- + +### Phase 1 — Training infrastructure (parallel with Phase 2) + +**1a. PyTorch model** +- [ ] `cnn_v3/training/train_cnn_v3.py` + - [ ] `CNNv3` class: U-Net [4,8], FiLM MLP (5→16→48), channel dropout + - [ ] `GBufferDataset`: loads 20-channel feature tensors from packed PNGs + - [ ] Training loop, checkpointing, grayscale/RGBA loss option + +**1b. Data preparation** +- [ ] `cnn_v3/training/pack_photo_sample.py` — photo PNG → feat tensor (albedo + zeros) +- [ ] `cnn_v3/training/pack_blender_sample.py` — multi-layer EXR → packed channel PNGs +- [ ] `cnn_v3/training/blender_export.py` — headless Blender multi-pass render script + - passes: DiffCol, Normal, Z, IndexOB, Shadow, Alpha, Combined (target) + +**1c. Export and parity** +- [ ] `cnn_v3/training/export_cnn_v3_weights.py` — checkpoint → binary v3 .bin (f16) +- [ ] `cnn_v3/training/validate_parity.py` + - [ ] Generate test vectors (4 cases: full_static, simple_static, temporal, zero) + - [ ] Compare PyTorch f32 vs HTML WebGPU and C++ outputs + - [ ] Report max/mean error per channel, pass/fail at 1/255 + +**1d. Pipeline script** +- [ ] `cnn_v3/scripts/train_cnn_v3_full.sh` — pack → train → export → build → validate + - all flags from v2 + `--enc-channels`, `--film-cond-dim`, `--input-mode`, `--channel-dropout-p`, `--generate-vectors`, `--skip-pack` + +--- + +### Phase 2 — WGSL shaders (parallel with Phase 1) + +All shaders: explicit zero-pad (not clamp), nearest-neighbor upsample, +no batch norm at inference, `#include` existing snippets where possible. + +**2a. Pack pass** (replaces stub in Phase 0 when real G-buffer exists) +- [ ] `src/effects/cnn_v3_pack.wgsl` — full 20-channel packer + - `#include "camera_common"` for depth linearization + - reads albedo MIPs via `textureSampleLevel(..., 1.0)` and `(..., 2.0)` + - reads prev_cnn_tex (persistent RGBA8 owned by effect) + - reads depth32float, normal, shadow, transp G-buffer textures + - computes depth_grad (finite diff), oct-encodes normal if needed + - writes feat_tex0 (f16×8) + feat_tex1 (u8×12, spare) + +**2b. U-Net compute shaders** +- [ ] `src/effects/cnn_v3_enc0.wgsl` — Conv(20→4, 3×3) + FiLM + ReLU +- [ ] `src/effects/cnn_v3_enc1.wgsl` — Conv(4→8, 3×3) + FiLM + ReLU + avg_pool 2×2 +- [ ] `src/effects/cnn_v3_bottleneck.wgsl` — Conv(8→8, 1×1) + FiLM + ReLU +- [ ] `src/effects/cnn_v3_dec1.wgsl` — nearest upsample×2 + concat enc1_skip + Conv(16→4, 3×3) + FiLM + ReLU +- [ ] `src/effects/cnn_v3_dec0.wgsl` — nearest upsample×2 + concat enc0_skip + Conv(8→4, 3×3) + FiLM + ReLU +- [ ] `src/effects/cnn_v3_output.wgsl` — Conv(4→4, 1×1) + sigmoid → composite to framebuffer + +Reuse from existing shaders: +- `pack2x16float` / `unpack2x16float` pattern (from CNN v2 shaders) +- `pack4x8unorm` / `unpack4x8unorm` for feat_tex1 + +**2c. Register shaders** +- [ ] Add all shaders to `workspaces/main/assets.txt` +- [ ] Add externs to `src/effects/shaders.h` + `src/effects/shaders.cc` + +--- + +### Phase 3 — C++ effect + +- [ ] `src/effects/cnn_v3_effect.h` — class declaration + - textures: feat_tex0, feat_tex1, enc0_tex, enc1_tex (half-res), bn_tex (half-res), dec1_tex, dec0_tex + - **`WGPUTexture prev_cnn_tex_`** — persistent RGBA8, owned by effect, initialized black + - `FilmParams` uniform buffer (γ/β for 4 levels = 48 floats = 192 bytes) + - FiLM MLP weights (loaded from .bin, run CPU-side per frame) + +- [ ] `src/effects/cnn_v3_effect.cc` — implementation + - [ ] Constructor: create all textures at render resolution + - [ ] `render()`: 7-pass dispatch: stub_gbuf (or real) → enc0 → enc1 → bn → dec1 → dec0 → output + - [ ] Per-frame: run FiLM MLP (CPU), upload FilmParams uniform + - [ ] **After output pass: blit output → `prev_cnn_tex_`** (one GPU copy, cheap) + - [ ] `resize()`: recreate resolution-dependent textures (enc1/bn are half-res) + +- [ ] `cmake/DemoSourceLists.cmake` — add `cnn_v3_effect.cc` to COMMON_GPU_EFFECTS +- [ ] `src/gpu/demo_effects.h` — add `#include "effects/cnn_v3_effect.h"` +- [ ] `workspaces/main/timeline.seq` — add `EFFECT + CNNv3Effect` + +--- + +### Phase 4 — Test scene (rotating cubes + fog SDF → G-buffer) + +Provides a real G-buffer for visual validation before the production G-buffer exists. +Replaces the stub when ready. + +**4a. Raster G-buffer pass** (MRT) +- [ ] `src/effects/cnn_v3_scene_raster.wgsl` + - Based on `src/effects/rotating_cube.wgsl` + - Fragment outputs: `@location(0)` albedo rgba16float, `@location(1)` normal+matid rg16float + - Depth: hardware depth32float + - mat_id from push constant / uniform (per-draw-call object index) + +**4b. Fog SDF pass** (compute) +- [ ] `src/effects/cnn_v3_scene_sdf.wgsl` + - `#include "render/raymarching_id"` — provides `object_id` → mat_id + - `#include "render/shadows"` — `calc_shadow()` → shadow channel + - `#include "math/sdf_shapes"` — sdBox, sdSphere for fog/cube SDFs + - `#include "camera_common"` — ray setup + - Reads rasterized depth32float, overwrites G-buffer textures where SDF wins + - Writes transparency channel (volumetric fog density) + +**4c. C++ wrapper** +- [ ] `src/effects/cnn_v3_scene_effect.h/.cc` — `CNNv3SceneEffect` + - Owns G-buffer textures (albedo rgba16float, normal_mat rg16float, depth32float, shadow r8unorm, transp r16float) + - Pass 1: raster rotating cubes → MRT + - Pass 2: SDF fog compute → overwrite where closer + - Pass 3: lighting/shadow pass + - Outputs are bound as inputs to `CNNv3Effect`'s pack pass +- [ ] `cmake/DemoSourceLists.cmake` — add `.cc` +- [ ] `src/gpu/demo_effects.h` — add include + +--- + +### Phase 5 — C++ test + +Separate from v1/v2 tests. Uses `CNNv3SceneEffect` + `CNNv3Effect` together. + +- [ ] `src/tests/gpu/test_cnn_v3.cc` + - [ ] Scene renders (stub G-buffer + real scene G-buffer) + - [ ] CNN v3 forward pass with random/identity weights + - [ ] Prev frame blit verified (frame 0 → frame 1 temporal path) + - [ ] FiLM conditioning: verify different cond vectors produce different outputs + - [ ] Shader compilation (all 7 passes) +- [ ] `cmake/DemoTests.cmake` — add test target + +--- + +### Phase 6 — HTML validation tool + +- [ ] Copy `cnn_v2/tools/cnn_v2_test/index.html` → `cnn_v3/tools/index.html` +- [ ] Replace `STATIC_SHADER` → `PACK_SHADER` (feat_tex0 + feat_tex1, mixed f16/u8) +- [ ] Replace `CNN_SHADER` → 6 U-Net shaders (ENC0/ENC1/BN/DEC1/DEC0/OUTPUT) +- [ ] Input mode toggle (Simple/Full) + filename-based channel detection +- [ ] FiLM conditioning sliders + JS MLP forward pass (tiny, runs in JS) +- [ ] Temporal: "capture prev frame" button + "use temporal" toggle +- [ ] Layer viz: U-Net hierarchy buttons (Features/Enc0/Enc1/BN/Dec1/Dec0/Output) +- [ ] G-buffer channel view (`G` key cycles: albedo/normal/depth/shadow/transp) +- [ ] Parity test mode: drop NPZ → run → max error report + error map + +--- + +### Phase 7 — Parity validation + +- [ ] Train model on photo samples (`--input-mode simple`, 200 epochs) +- [ ] Export weights + generate test vectors (`--generate-vectors`) +- [ ] HTML tool: drop .bin + test image → verify visual output +- [ ] `validate_parity.py`: HTML vs PyTorch ≤ 1/255, C++ vs PyTorch ≤ 1/255 +- [ ] All 4 test cases pass: full_static, simple_static, temporal, zero_input +- [ ] Wire `CNNv3SceneEffect` G-buffer into `CNNv3Effect` (replace stub) + +--- + +### Phase 8 — Production G-buffer (future) + +Wire the real hybrid renderer G-buffer (GEOM_BUFFER.md) into CNNv3Effect, +replacing `CNNv3SceneEffect`. Train on Blender full-pipeline samples. + +--- + +## Differences from CNN v2 + +| | CNN v2 | CNN v3 | +|---|---|---| +| Architecture | Flat N-layer chain | U-Net encoder/decoder | +| Input | RGBD + positional enc | 20ch feature buffer (G-buffer + temporal + MIPs + shadow + transp.) | +| Style control | Static (post-train) | FiLM: runtime γ/β from audio/beat | +| Skip connections | None | Encoder→decoder concat | +| Multi-scale | No | Yes (2 levels) | +| Testability | HTML + C++ (informal) | Strict: test vectors, per-pixel tolerance | +| Training data | Input/output image pairs | G-buffer render passes (Blender or photo) | +| Weights | ~3.2 KB | ~3.4 KB (similar) | + +--- + +## References + +- **FiLM:** "FiLM: Visual Reasoning with a General Conditioning Layer" (Perez et al., 2018) +- **U-Net:** "U-Net: Convolutional Networks for Biomedical Image Segmentation" (Ronneberger et al., 2015) +- **G-Buffer design:** `doc/archive/GEOM_BUFFER.md` +- **CNN v2 reference:** `cnn_v2/docs/CNN_V2.md` +- **Binary format base:** `cnn_v2/docs/CNN_V2_BINARY_FORMAT.md` +- **Effect workflow:** `doc/EFFECT_WORKFLOW.md` + +--- + +**Document Version:** 1.0 +**Created:** 2026-03-19 +**Status:** Design phase — G-buffer prerequisite pending |
