3 files changed, 246 insertions, 3 deletions
diff --git a/cnn_v3/README.md b/cnn_v3/README.md
index a22d823..f161bf4 100644
--- a/cnn_v3/README.md
+++ b/cnn_v3/README.md
@@ -31,7 +31,9 @@ Add images directly to these directories and commit them.
 
 ## Status
 
-**Design phase.** Architecture defined, G-buffer prerequisite pending.
+**Phase 1 complete.** G-buffer integrated (raster + pack), 35/35 tests pass.
+Training infrastructure ready. U-Net WGSL shaders are next.
 
+See `cnn_v3/docs/HOWTO.md` for the practical playbook.
 See `cnn_v3/docs/CNN_V3.md` for full design.
 See `cnn_v2/` for reference implementation.
diff --git a/cnn_v3/docs/HOWTO.md b/cnn_v3/docs/HOWTO.md
new file mode 100644
index 0000000..88d4bbc
--- /dev/null
+++ b/cnn_v3/docs/HOWTO.md
@@ -0,0 +1,235 @@
+# CNN v3 How-To
+
+Practical playbook for the CNN v3 pipeline: G-buffer effect, training data,
+training the U-Net+FiLM network, and wiring everything into the demo.
+
+See `CNN_V3.md` for the full architecture design.
+
+---
+
+## 1. Using GBufferEffect in the Demo
+
+`GBufferEffect` is a full-class effect (Path B in `doc/EFFECT_WORKFLOW.md`).
+It rasterizes proxy geometry to MRT G-buffer textures and packs them into two
+`rgba32uint` feature textures (`feat_tex0`, `feat_tex1`) consumed by the CNN.
+
+### Registration (already done)
+
+- Shaders in `assets.txt`: `SHADER_GBUF_RASTER`, `SHADER_GBUF_PACK`
+- Source in `cmake/DemoSourceLists.cmake`: `cnn_v3/src/gbuffer_effect.cc`
+- Header included in `src/gpu/demo_effects.h`
+- Test in `src/tests/gpu/test_demo_effects.cc`
+
+### Adding to a Sequence
+
+`GBufferEffect` does not exist in `seq_compiler.py` as a named effect yet
+(no `.seq` syntax integration for Phase 1). Wire it directly in C++ alongside
+your scene code, or add it to the timeline when the full CNNv3Effect is ready.
+
+**C++ wiring example** (e.g. inside a Sequence or main.cc):
+
+```cpp
+#include "../../cnn_v3/src/gbuffer_effect.h"
+
+// Allocate once alongside your scene
+auto gbuf = std::make_shared<GBufferEffect>(
+    ctx, /*inputs=*/{"prev_cnn"},  // or any dummy node
+    /*outputs=*/{"gbuf_feat0", "gbuf_feat1"},
+    /*start=*/0.0f, /*end=*/60.0f);
+
+gbuf->set_scene(&my_scene, &my_camera);
+
+// In render loop, call before CNN pass:
+gbuf->render(encoder, params, nodes);
+```
+
+### Internal passes
+
+Each frame, `GBufferEffect::render()` executes:
+
+1. **Pass 1 — MRT rasterization** (`gbuf_raster.wgsl`)
+   - Proxy box (36 verts) × N objects, instanced
+   - MRT outputs: `gbuf_albedo` (rgba16float), `gbuf_normal_mat` (rgba16float)
+   - Depth test + write into `gbuf_depth` (depth32float)
+
+2. **Pass 2/3 — SDF + Lighting** — TODO (placeholder: shadow=1, transp=0)
+
+3. **Pass 4 — Pack compute** (`gbuf_pack.wgsl`)
+   - Reads all G-buffer textures + `prev_cnn` input
+   - Writes `feat_tex0` + `feat_tex1` (rgba32uint, 20 channels, 32 bytes/pixel)
+
+### Output node names
+
+By default the outputs are named from the `outputs` vector passed to the
+constructor. Use these names when binding the CNN effect input:
+
+```
+outputs[0]  → feat_tex0   (rgba32uint: albedo.rgb, normal.xy, depth, depth_grad.xy)
+outputs[1]  → feat_tex1   (rgba32uint: mat_id, prev.rgb, mip1.rgb, mip2.rgb, shadow, transp)
+```
+
+### Scene data
+
+Call `set_scene(scene, camera)` before the first render. The effect uploads
+`GlobalUniforms` (view-proj, camera pos, resolution) and `ObjectData` (model
+matrix, color) to GPU storage buffers each frame.
+
+---
+
+## 2. Preparing Training Data
+
+CNN v3 supports two data sources: Blender renders and real photos.
+
+### 2a. From Blender Renders
+
+```bash
+# 1. In Blender: run the export script (requires Blender 3.x+)
+blender --background scene.blend --python cnn_v3/training/blender_export.py \
+    -- --output /tmp/renders/ --frames 200
+
+# 2. Pack into sample directory
+python3 cnn_v3/training/pack_blender_sample.py \
+    --render-dir /tmp/renders/frame_0001/ \
+    --output dataset/blender/sample_0001/
+```
+
+Each sample directory contains:
+```
+sample_XXXX/
+  albedo.png    — RGB uint8 (material color, pre-lighting)
+  normal.png    — RG uint8 (oct-encoded XY, remap [0,1])
+  depth.png     — R uint16 (1/z normalized, 16-bit)
+  matid.png     — R uint8 (object index / 255)
+  shadow.png    — R uint8 (0=dark, 255=lit)
+  transp.png    — R uint8 (0=opaque, 255=transparent)
+  target.png    — RGB/RGBA (stylized ground truth)
+```
+
+### 2b. From Real Photos
+
+Geometric channels are zeroed; the network degrades gracefully due to
+channel-dropout training.
+
+```bash
+python3 cnn_v3/training/pack_photo_sample.py \
+    --photo cnn_v3/training/input/photo1.jpg \
+    --output dataset/photos/sample_001/
+```
+
+The output `target.png` defaults to the input photo (no style). Copy in
+your stylized version as `target.png` before training.
+
+### Dataset layout
+
+```
+dataset/
+  blender/
+    sample_0001/  sample_0002/  ...
+  photos/
+    sample_001/   sample_002/   ...
+```
+
+Mix freely; the dataloader treats all sample directories uniformly.
+
+---
+
+## 3. Training
+
+*(Network not yet implemented — this section will be filled as Phase 3+ lands.)*
+
+**Planned command:**
+```bash
+python3 cnn_v3/training/train_cnn_v3.py \
+    --dataset dataset/ \
+    --epochs 500 \
+    --output cnn_v3/weights/cnn_v3_weights.bin
+```
+
+**FiLM conditioning** during training:
+- Beat/audio inputs are randomized per sample
+- Network learns to produce varied styles from same geometry
+
+**Validation:**
+```bash
+python3 cnn_v3/training/train_cnn_v3.py --validate \
+    --checkpoint cnn_v3/weights/cnn_v3_weights.bin \
+    --input test_frame.png
+```
+
+---
+
+## 4. Running the CNN v3 Effect (Future)
+
+Once the C++ CNNv3Effect exists:
+
+```seq
+# BPM 120
+SEQUENCE 0 0 "Scene with CNN v3"
+  EFFECT + GBufferEffect prev_cnn -> gbuf_feat0 gbuf_feat1  0 60
+  EFFECT + CNNv3Effect   gbuf_feat0 gbuf_feat1 -> sink       0 60
+```
+
+FiLM parameters are uploaded via uniform each frame:
+```cpp
+cnn_v3_effect->set_film_params(
+    params.beat_phase, params.beat_time / 8.0f, params.audio_intensity,
+    style_p0, style_p1);
+```
+
+---
+
+## 5. Per-Pixel Validation
+
+The CNN v3 design requires exact parity between PyTorch, WGSL (HTML), and C++.
+
+*(Validation tooling not yet implemented.)*
+
+**Planned workflow:**
+1. Export test input + weights as JSON
+2. Run Python reference → save per-pixel output
+3. Run HTML WebGPU tool → compare against Python
+4. Run C++ `cnn_v3_test` tool → compare against Python
+5. All comparisons must pass at ≤ 1/255 per pixel
+
+---
+
+## 6. Phase Status
+
+| Phase | Status | Notes |
+|-------|--------|-------|
+| 1 — G-buffer (raster + pack) | ✅ Done | Integrated, 35/35 tests pass |
+| 1 — G-buffer (SDF + shadow passes) | TODO | Placeholder in place |
+| 2 — Training infrastructure | ✅ Done | blender_export.py, pack_*_sample.py |
+| 3 — WGSL U-Net shaders | TODO | enc/dec/bottleneck/FiLM |
+| 4 — C++ CNNv3Effect | TODO | FiLM uniform upload |
+| 5 — Parity validation | TODO | Test vectors, ≤1/255 |
+
+---
+
+## 7. Quick Troubleshooting
+
+**GBufferEffect renders nothing / albedo is black**
+- Check `set_scene()` was called before `render()`
+- Verify scene has at least one object
+- Check camera matrix is not degenerate (near/far, aspect)
+
+**Pack shader fails to compile**
+- `gbuf_pack.wgsl` uses no `#include`s; ShaderComposer compose is a no-op
+- Check `ASSET_SHADER_GBUF_PACK` resolves in assets.txt
+
+**Raster shader fails with `#include "common_uniforms"` error**
+- `ShaderComposer::Get().Compose({"common_uniforms"}, src)` must be called
+  before passing to `wgpuDeviceCreateShaderModule` — already done in effect.cc
+
+**G-buffer outputs wrong resolution**
+- `resize()` is not yet implemented in GBufferEffect; textures are fixed
+  at construction size. Will be added when resize support is needed.
+
+---
+
+## See Also
+
+- `cnn_v3/docs/CNN_V3.md` — Full architecture design (U-Net, FiLM, feature layout)
+- `doc/EFFECT_WORKFLOW.md` — General effect integration guide
+- `cnn_v2/docs/CNN_V2.md` — Reference implementation (simpler, operational)
+- `src/tests/gpu/test_demo_effects.cc` — GBufferEffect construction test
diff --git a/cnn_v3/src/gbuffer_effect.cc b/cnn_v3/src/gbuffer_effect.cc
index fb0146e..750188f 100644
--- a/cnn_v3/src/gbuffer_effect.cc
+++ b/cnn_v3/src/gbuffer_effect.cc
@@ -4,6 +4,7 @@
 #include "gbuffer_effect.h"
 #include "3d/object.h"
 #include "gpu/gpu.h"
+#include "gpu/shader_composer.h"
 #include "util/fatal_error.h"
 #include "util/mini_math.h"
 #include <cstring>
@@ -390,9 +391,12 @@ void GBufferEffect::create_raster_pipeline() {
     return; // Asset not loaded yet; pipeline creation deferred.
   }
 
+  const std::string composed =
+      ShaderComposer::Get().Compose({"common_uniforms"}, src);
+
   WGPUShaderSourceWGSL wgsl_src = {};
   wgsl_src.chain.sType = WGPUSType_ShaderSourceWGSL;
-  wgsl_src.code = str_view(src);
+  wgsl_src.code = str_view(composed.c_str());
 
   WGPUShaderModuleDescriptor shader_desc = {};
   shader_desc.nextInChain = &wgsl_src.chain;
@@ -466,9 +470,11 @@ void GBufferEffect::create_pack_pipeline() {
     return;
   }
 
+  const std::string composed = ShaderComposer::Get().Compose({}, src);
+
   WGPUShaderSourceWGSL wgsl_src = {};
   wgsl_src.chain.sType = WGPUSType_ShaderSourceWGSL;
-  wgsl_src.code = str_view(src);
+  wgsl_src.code = str_view(composed.c_str());
 
   WGPUShaderModuleDescriptor shader_desc = {};
   shader_desc.nextInChain = &wgsl_src.chain;