1 files changed, 46 insertions, 15 deletions
diff --git a/cnn_v3/docs/HOWTO.md b/cnn_v3/docs/HOWTO.md
index 5c5cc2a..5cfc371 100644
--- a/cnn_v3/docs/HOWTO.md
+++ b/cnn_v3/docs/HOWTO.md
@@ -79,7 +79,7 @@ Each frame, `GBufferEffect::render()` executes:
 3. **Pass 3 — Transparency** — TODO (deferred; transp=0 for opaque scenes)
 
 4. **Pass 4 — Pack compute** (`gbuf_pack.wgsl`) ✅
-   - Reads all G-buffer textures + `prev_cnn` input
+   - Reads all G-buffer textures + persistent `prev_cnn` texture
    - Writes `feat_tex0` + `feat_tex1` (rgba32uint, 20 channels, 32 bytes/pixel)
    - Shadow / transp nodes cleared to 1.0 / 0.0 via zero-draw render passes
      until Pass 2/3 are implemented.
@@ -90,9 +90,38 @@ Outputs are named from the `outputs` vector passed to the constructor:
 
 ```
 outputs[0]  → feat_tex0   (rgba32uint: albedo.rgb, normal.xy, depth, depth_grad.xy)
-outputs[1]  → feat_tex1   (rgba32uint: mat_id, prev.rgb, mip1.rgb, mip2.rgb, shadow, transp)
+outputs[1]  → feat_tex1   (rgba32uint: mat_id, prev.rgb, mip1.rgb, mip2.rgb, dif, transp)
 ```
 
+### Temporal feedback (prev.rgb)
+
+`GBufferEffect` owns a persistent internal node `<prefix>_prev` (`F16X8` = Rgba16Float,
+`CopySrc|CopyDst`).  Each frame it is GPU-copied from the CNN effect's output after all
+effects render (`post_render`), then bound as `prev_cnn` in the pack shader (binding 6).
+
+**Wiring is automatic** via `wire_dag()`, called by `Sequence::init_effect_nodes()`.
+`GBufferEffect` scans the DAG for the first downstream consumer of its output nodes and
+uses that effect's output as `cnn_output_node_`.  No manual call needed.
+
+**Requirement**: the sequence must include `CNNv3Effect` downstream of `GBufferEffect`.
+In `timeline.seq`, declare a `gbuf_albedo` output node and add the effect:
+
+```seq
+NODE cnn_out gbuf_albedo
+EFFECT + GBufferEffect  source              -> gbuf_feat0 gbuf_feat1  0 60
+EFFECT + CNNv3Effect    gbuf_feat0 gbuf_feat1 -> cnn_out              0 60
+```
+
+If no CNN effect follows, `cnn_output_node_` stays empty and `post_render` is a no-op
+(prev.rgb will be zero — correct for static/debug-only sequences).
+
+Frame 0 behaviour: `_prev` is zeroed on allocation → `prev.rgb = 0`, matching the training
+convention (static frames use zero history).
+
+The copy uses `wgpuCommandEncoderCopyTextureToTexture` (no extra render pass overhead).
+`node_prev_tex_` is `F16X8` (Rgba16Float) to match the `GBUF_ALBEDO` format of CNNv3Effect's
+output — `CopyTextureToTexture` requires identical formats.
+
 ---
 
 ## 1b. GBufferEffect — Implementation Plan (Pass 2: SDF Shadow)
@@ -285,7 +314,7 @@ python3 train_cnn_v3.py \
 
 Applied per-sample in `cnn_v3_utils.apply_channel_dropout()`:
 - Geometric channels (normal, depth, depth_grad) zeroed with `p=channel_dropout_p`
-- Context channels (mat_id, shadow, transp) with `p≈0.2`
+- Context channels (mat_id, dif, transp) with `p≈0.2`
 - Temporal channels (prev.rgb) with `p=0.5`
 
 This ensures the network works for both full G-buffer and photo-only inputs.
@@ -299,10 +328,12 @@ This ensures the network works for both full G-buffer and photo-only inputs.
 ```seq
 # BPM 120
 SEQUENCE 0 0 "Scene with CNN v3"
-  EFFECT + GBufferEffect prev_cnn -> gbuf_feat0 gbuf_feat1  0 60
-  EFFECT + CNNv3Effect   gbuf_feat0 gbuf_feat1 -> sink       0 60
+  EFFECT + GBufferEffect source -> gbuf_feat0 gbuf_feat1  0 60
+  EFFECT + CNNv3Effect   gbuf_feat0 gbuf_feat1 -> sink    0 60
 ```
 
+Temporal feedback is wired automatically by `wire_dag()` — no manual call needed.
+
 FiLM parameters uploaded each frame:
 ```cpp
 cnn_v3_effect->set_film_params(
@@ -455,15 +486,15 @@ GBufViewEffect(const GpuContext& ctx,
                float start_time, float end_time)
 ```
 
-**Wiring example** (alongside GBufferEffect):
+**Wiring example** — use `timeline.seq`, temporal feedback wires automatically:
 
-```cpp
-auto gbuf  = std::make_shared<GBufferEffect>(ctx,
-    std::vector<std::string>{"prev_cnn"},
-    std::vector<std::string>{"gbuf_feat0", "gbuf_feat1"}, 0.0f, 60.0f);
-auto gview = std::make_shared<GBufViewEffect>(ctx,
-    std::vector<std::string>{"gbuf_feat0", "gbuf_feat1"},
-    std::vector<std::string>{"gbuf_view_out"}, 0.0f, 60.0f);
+```seq
+NODE gbuf_feat0 gbuf_rgba32uint
+NODE gbuf_feat1 gbuf_rgba32uint
+NODE cnn_out    gbuf_albedo
+EFFECT + GBufferEffect  source              -> gbuf_feat0 gbuf_feat1  0 60
+EFFECT + CNNv3Effect    gbuf_feat0 gbuf_feat1 -> cnn_out              0 60
+EFFECT + GBufViewEffect gbuf_feat0 gbuf_feat1 -> sink                 0 60
 ```
 
 **Grid layout** (output resolution = input resolution, channel cells each 1/4 W × 1/5 H):
@@ -474,7 +505,7 @@ auto gview = std::make_shared<GBufViewEffect>(ctx,
 | 1 | `nrm.y` remap→[0,1] | `depth` (inverted) | `dzdx` ×20+0.5 | `dzdy` ×20+0.5 |
 | 2 | `mat_id` | `prev.r` | `prev.g` | `prev.b` |
 | 3 | `mip1.r` | `mip1.g` | `mip1.b` | `mip2.r` |
-| 4 | `mip2.g` | `mip2.b` | `shadow` | `transp` |
+| 4 | `mip2.g` | `mip2.b` | `dif` | `transp` |
 
 All channels displayed as grayscale. 1-pixel gray grid lines separate cells. Dark background for out-of-range cells.
 
@@ -535,7 +566,7 @@ No sampler — all reads use `textureLoad()` (integer texel coordinates).
 
 Packs channels identically to `gbuf_pack.wgsl`:
 - `feat_tex0`: `pack2x16float(alb.rg)`, `pack2x16float(alb.b, nrm.x)`, `pack2x16float(nrm.y, depth)`, `pack2x16float(dzdx, dzdy)`
-- `feat_tex1`: `pack4x8unorm(matid,0,0,0)`, `pack4x8unorm(mip1.rgb, mip2.r)`, `pack4x8unorm(mip2.gb, shadow, transp)`
+- `feat_tex1`: `pack4x8unorm(matid,0,0,0)`, `pack4x8unorm(mip1.rgb, mip2.r)`, `pack4x8unorm(mip2.gb, dif, transp)`
 - Depth gradients: central differences on depth R channel
 - Mip1 / Mip2: box2 (2×2) / box4 (4×4) average filter on albedo