2 files changed, 134 insertions, 6 deletions
diff --git a/doc/COMPLETED.md b/doc/COMPLETED.md
index 072c92f..a3a988c 100644
--- a/doc/COMPLETED.md
+++ b/doc/COMPLETED.md
@@ -36,6 +36,14 @@ Completed task archive. See `doc/archive/` for detailed historical documents.
 
 ## March 2026
 
+- [x] **CNN v3 shadow pass debugging** — Fixed 5 independent bugs in `gbuf_shadow.wgsl` + `gbuffer_effect.cc`:
+  1. **Camera Y-inversion**: `mat4::perspective` negates Y for post-process chain; fixed with `proj.m[5] = -proj.m[5]` in `upload_scene_data` + `WGPUFrontFace_CCW` on raster pipeline.
+  2. **Shadow formula**: replaced `shadowWithStoredDistance` (20 steps, bounded) with 64-step IQ soft shadow (`res = min(res, 8.0*d/t)`, unbounded march).
+  3. **Local→world SDF scale**: `sdBox/sdSphere` return local-space distance; fixed with `d *= length(obj.model[0].xyz)`.
+  4. **Shadow bias**: replaced light-direction bias (fails at terminator) with rasterized surface normal from `normal_mat_tex` (binding 4); `bias_pos = world + nor * 0.05`.
+  5. **ShaderComposer**: `GBufViewEffect` needed `ShaderComposer::Get().Compose()` to resolve `#include "debug/debug_print"`.
+  - Added per-tile labels to `gbuf_view.wgsl` via `debug_str`. Scale propagation for pulsating sphere confirmed correct end-to-end. 36/36 tests.
+
 - [x] **CNN v3 Phase 7: Validation tools** — `GBufViewEffect` (C++ 4×5 channel grid, `cnn_v3/shaders/gbuf_view.wgsl`, `cnn_v3/src/gbuf_view_effect.{h,cc}`): renders all 20 G-buffer feature channels tiled on screen; custom BGL with `WGPUTextureSampleType_Uint`, bind group rebuilt per frame via `wgpuRenderPipelineGetBindGroupLayout`. Web tool "Load sample directory" (`cnn_v3/tools/tester.js` + `shaders.js`): `webkitdirectory` picker, `FULL_PACK_SHADER` compute (matches `gbuf_pack.wgsl`), `runFromFeat()` inference, PSNR vs `target.png`. 36/36 tests.
 
 - [x] **CNN v3 Phase 5: Parity validation** — `test_cnn_v3_parity.cc` (2 tests: zero_weights, random_weights). Root cause: intermediate nodes declared at full res instead of W/2, W/4. Fix: `NodeRegistry::default_width()/default_height()` getters + fractional resolution in `declare_nodes()`. Final max_err=4.88e-4 ✓. 36/36 tests.
diff --git a/doc/SEQUENCE.md b/doc/SEQUENCE.md
index 202bf09..3d7a6ce 100644
--- a/doc/SEQUENCE.md
+++ b/doc/SEQUENCE.md
@@ -91,21 +91,141 @@ class Effect {
   std::vector<std::string> input_nodes_;
   std::vector<std::string> output_nodes_;
 
-  virtual void declare_nodes(NodeRegistry& registry) {}  // Optional temp nodes
+  // Optional: declare internal nodes (depth buffers, intermediate textures).
+  virtual void declare_nodes(NodeRegistry& registry) {}
+
+  // Required: render this effect for the current frame.
   virtual void render(WGPUCommandEncoder encoder,
                       const UniformsSequenceParams& params,
                       NodeRegistry& nodes) = 0;
+
+  // Optional: called after ALL effects in the sequence have rendered.
+  // Use for end-of-frame bookkeeping, e.g. copying temporal feedback buffers.
+  // Default implementation is a no-op.
+  virtual void post_render(WGPUCommandEncoder encoder, NodeRegistry& nodes) {}
 };
 ```
 
+### Frame execution order
+
+Each frame, `Sequence::render_effects()` runs two passes over the DAG:
+
+1. **Render pass** — `dispatch_render()` on every effect in topological order
+2. **Post-render pass** — `post_render()` on every effect in the same order
+
+This ordering guarantees that by the time any `post_render()` runs, all output
+textures for the frame are fully written.  It is safe to read any node's texture
+in `post_render()`.
+
+### Temporal feedback pattern
+
+DAG-based sequences cannot express read-after-write cycles within a single frame.
+Use `post_render()` + a persistent internal node to implement temporal feedback
+(e.g. CNN prev-frame input):
+
+```cpp
+class MyEffect : public Effect {
+  std::string node_prev_;      // internal persistent texture
+  std::string source_node_;    // node to capture at end of frame
+
+ public:
+  void set_source_node(const std::string& n) { source_node_ = n; }
+
+  void declare_nodes(NodeRegistry& reg) override {
+    // Use a NodeType whose format matches source_node_ and has CopyDst.
+    reg.declare_node(node_prev_, NodeType::F16X8, -1, -1);
+  }
+
+  void render(...) override {
+    // Read node_prev_ — contains source_node_ output from the *previous* frame.
+    WGPUTextureView prev = nodes.get_view(node_prev_);
+    // ... use prev
+  }
+
+  void post_render(WGPUCommandEncoder enc, NodeRegistry& nodes) override {
+    if (source_node_.empty() || !nodes.has_node(source_node_)) return;
+    // Copy this frame's output into node_prev_ for next frame.
+    WGPUTexelCopyTextureInfo src = {.texture = nodes.get_texture(source_node_)};
+    WGPUTexelCopyTextureInfo dst = {.texture = nodes.get_texture(node_prev_)};
+    WGPUExtent3D ext = {(uint32_t)width_, (uint32_t)height_, 1};
+    wgpuCommandEncoderCopyTextureToTexture(enc, &src, &dst, &ext);
+  }
+};
+```
+
+**Why not `input_nodes_[0]` / ping-pong as prev?**  The ping-pong alias makes
+`source` equal to last frame's `sink` only when the effect is the first in the
+sequence and no post-CNN effects overwrite `sink`.  `post_render()` is
+unconditionally correct regardless of sequence structure.
+
+**Current user**: `GBufferEffect` uses this pattern for `prev.rgb` (CNN temporal
+feedback). `cnn_output_node_` is wired automatically via `wire_dag()` — no
+manual `set_cnn_output_node()` call needed.
+
+### DAG wiring (`wire_dag`)
+
+```cpp
+// Effect base class
+virtual void wire_dag(const std::vector<EffectDAGNode>& dag) {}
+```
+
+Called once from `Sequence::init_effect_nodes()` after all `declare_nodes()`
+calls, so the full DAG is visible.  Override to resolve inter-effect
+dependencies that cannot be expressed through node names alone.
+
+`GBufferEffect::wire_dag()` delegates to the base-class helper
+`find_downstream_output(dag)`, then guards against wiring to `"sink"`:
+
+```cpp
+void GBufferEffect::wire_dag(const std::vector<EffectDAGNode>& dag) {
+  const std::string out = find_downstream_output(dag);
+  if (out != "sink") cnn_output_node_ = out;
+}
+```
+
+`"sink"` is registered as an external view (`texture == nullptr`); copying
+from it in `post_render` would crash.  When no CNN follows the G-buffer stage
+(e.g. debug/deferred sequences), `cnn_output_node_` stays empty and
+`post_render` is a no-op.
+
+#### `Effect::find_downstream_output`
+
+```cpp
+// protected helper — call from wire_dag()
+std::string find_downstream_output(const std::vector<EffectDAGNode>& dag) const;
+```
+
+Returns `output_nodes[0]` of the first direct downstream consumer in the DAG,
+or `""` if none exists.  The helper is agnostic about node semantics — it is
+the **caller's responsibility** to reject unsuitable results (e.g. `"sink"` or
+any other external/terminal node whose texture is not owned by the registry).
+
+`post_render` also null-checks the source texture as a belt-and-suspenders
+guard:
+
+```cpp
+WGPUTexture src_tex = nodes.get_texture(cnn_output_node_);
+if (!src_tex) return;  // external view — no owned texture to copy
+```
+
 ### Node System
 
 **Types**: Match WGSL texture formats
-- `U8X4_NORM`: RGBA8Unorm (default for source/sink/intermediate)
-- `F32X4`: RGBA32Float (HDR, compute outputs)
-- `F16X8`: 8-channel float16 (G-buffer normals/vectors)
-- `DEPTH24`: Depth24Plus (3D rendering)
-- `COMPUTE_F32`: Storage buffer (non-texture compute data)
+- `U8X4_NORM`: RGBA8Unorm — default for source/sink/intermediate; `COPY_SRC|COPY_DST`
+- `F32X4`: RGBA32Float — HDR, compute outputs
+- `F16X8`: 8-channel float16 — G-buffer normals/vectors
+- `DEPTH24`: Depth24Plus — 3D rendering
+- `COMPUTE_F32`: Storage buffer — non-texture compute data
+- `GBUF_ALBEDO`: RGBA16Float — G-buffer albedo/normal MRT; `RENDER_ATTACHMENT|TEXTURE_BINDING|STORAGE_BINDING|COPY_SRC`
+- `GBUF_DEPTH32`: Depth32Float — G-buffer depth; `RENDER_ATTACHMENT|TEXTURE_BINDING|COPY_SRC`
+- `GBUF_R8`: RGBA8Unorm — G-buffer single-channel (shadow, transp); `STORAGE_BINDING|TEXTURE_BINDING|RENDER_ATTACHMENT`
+- `GBUF_RGBA32UINT`: RGBA32Uint — packed feature textures (CNN v3 feat_tex0/1); `STORAGE_BINDING|TEXTURE_BINDING`
+
+**`COPY_SRC|COPY_DST`** is required on any node used with `wgpuCommandEncoderCopyTextureToTexture`.
+The `node_prev_` format **must match** the source texture format exactly —
+`CopyTextureToTexture` requires identical formats.  `F16X8` (Rgba16Float,
+`CopySrc|CopyDst`) matches `GBUF_ALBEDO` (CNNv3Effect output).  Use `U8X4_NORM`
+only when the source is also Rgba8Unorm.
 
 **Aliasing**: Compiler detects ping-pong patterns (Effect i writes A reads B, Effect i+1 writes B reads A) and aliases nodes to same backing texture.