3 files changed, 259 insertions, 7 deletions
diff --git a/PROJECT_CONTEXT.md b/PROJECT_CONTEXT.md
index ff6bc48..fb876e5 100644
--- a/PROJECT_CONTEXT.md
+++ b/PROJECT_CONTEXT.md
@@ -35,6 +35,7 @@
 - **Audio:** Sample-accurate sync. Zero heap allocations per frame. Variable tempo. Comprehensive tests.
 - **Shaders:** Parameterized effects (UniformHelper, .seq syntax). Modular WGSL composition.
 - **3D:** Hybrid SDF/rasterization with BVH. Binary scene loader. Blender pipeline.
+- **Effects:** CNN post-processing foundation (single-layer, modular snippets, ready for training integration).
 - **Build:** Asset dependency tracking. Size measurement. Hot-reload (debug-only).
 - **Testing:** **36/36 passing (100%)**
 
@@ -54,7 +55,7 @@ See `TODO.md` for current priorities and active tasks.
 - `doc/CONTRIBUTING.md` - Development protocols
 
 **Technical Reference:**
-- Core: `ASSET_SYSTEM.md`, `SEQUENCE.md`, `TRACKER.md`, `3D.md`
+- Core: `ASSET_SYSTEM.md`, `SEQUENCE.md`, `TRACKER.md`, `3D.md`, `CNN_EFFECT.md`
 - Formats: `SCENE_FORMAT.md`, `MASKING_SYSTEM.md`
 - Tools: `BUILD.md`, `WORKSPACE_SYSTEM.md`, `SIZE_MEASUREMENT.md`
 
diff --git a/doc/CNN.md b/doc/CNN.md
index 8bf2860..2dc3362 100644
--- a/doc/CNN.md
+++ b/doc/CNN.md
@@ -1,11 +1,15 @@
 # Convolutional Neural Net Shader (CNN) post-processing
 
+**Status:** ✅ Foundation implemented (single-layer, expandable to multi-pass)
+
 ## Idea
 
 Have the input 3d scene be processed by a multi-layer CNN trained on the side.
 Input: some rendered scene.
 Output: 'stylized' scene with CNN post-processing.
 
+**See `doc/CNN_EFFECT.md` for implementation details, usage, and API reference.**
+
 ## Shader implementation
 
 ### input / output
@@ -36,16 +40,40 @@ we need 3 or 4 layer ?
 Several different shaders for each layer.
 Ping-pong for input/output texture buffer between each layers?
 
-## Training
+## Implementation Status
+
+**Completed:**
+- ✅ Modular WGSL shader architecture (6 snippet files)
+- ✅ CNNEffect C++ class (single-layer rendering)
+- ✅ ShaderComposer integration (#include resolution)
+- ✅ Asset registration (7 new shader assets)
+- ✅ Test coverage (test_demo_effects.cc)
+- ✅ Placeholder identity weights for testing
+
+**Size:** ~3-4 KB shader code + ~2-4 KB weights = **5-8 KB total**
+
+**Pending:**
+- ⏳ Training script (`scripts/train_cnn.py`) to generate real weights
+- ⏳ Multi-layer rendering with ping-pong textures
+- ⏳ Weight quantization for size optimization
+
+---
+
+## Training (To Be Implemented)
 
 The layer weight/bias data are hard-coded in the shaders.
-Need training with external python script.
-File: CNN.py contains an example of what the training script could be.
-Just an example, doesn't match our requirement yet.
+Training workflow:
+
+1. Prepare image pairs (before: raw render, after: target style)
+2. Run `python scripts/train_cnn.py --input scene.png --target stylized.png`
+3. Script generates `cnn_weights_generated.wgsl`
+4. Rebuild: `cmake --build build -j4`
+
+**Reference:** File `CNN.py` contains training example (needs adaptation).
 
 Need a repository of reference image pairs (before/after) for training and validation.
-Each input image is randomly sampled into 3x3 patch of (r,g,b,1/z) input samples.
+Each input image is randomly sampled into 3×3 patch of (r,g,b,1/z) input samples.
 And trained to match the (r,g,b,a) output.
 
-Training generates the .wgsl code for layers' shaders, and the c++ code for the post-processing 'Effect'.
+Training generates the .wgsl code for layers' shaders.
 
diff --git a/doc/CNN_EFFECT.md b/doc/CNN_EFFECT.md
new file mode 100644
index 0000000..9045739
--- /dev/null
+++ b/doc/CNN_EFFECT.md
@@ -0,0 +1,223 @@
+# CNN Post-Processing Effect
+
+Neural network-based stylization for rendered scenes.
+
+---
+
+## Overview
+
+The CNN effect applies trainable convolutional neural network layers to post-process 3D rendered output, enabling artistic stylization (e.g., painterly, sketch, cel-shaded effects) with minimal runtime overhead.
+
+**Key Features:**
+- Multi-layer convolutions (3×3, 5×5, 7×7 kernels)
+- Modular WGSL shader architecture
+- Hardcoded weights (trained offline)
+- Residual connections for stable learning
+- ~5-8 KB binary footprint
+
+---
+
+## Architecture
+
+### File Structure
+
+```
+src/gpu/effects/
+  cnn_effect.h            # CNNEffect class
+  cnn_effect.cc           # Implementation
+
+workspaces/main/shaders/cnn/
+  cnn_activation.wgsl     # Activation functions (tanh, ReLU, sigmoid, leaky_relu)
+  cnn_conv3x3.wgsl        # 3×3 convolution
+  cnn_conv5x5.wgsl        # 5×5 convolution
+  cnn_conv7x7.wgsl        # 7×7 convolution
+  cnn_weights_generated.wgsl  # Weight arrays (generated by training script)
+  cnn_layer.wgsl          # Main shader (composes above snippets)
+```
+
+### Shader Composition
+
+`cnn_layer.wgsl` uses `#include` directives (resolved by `ShaderComposer`):
+```wgsl
+#include "common_uniforms"
+#include "cnn_activation"
+#include "cnn_conv3x3"
+#include "cnn_weights_generated"
+```
+
+---
+
+## Usage
+
+### C++ Integration
+
+```cpp
+#include "gpu/effects/cnn_effect.h"
+
+// Create effect (1 layer for now, expandable to 4)
+auto cnn = std::make_shared<CNNEffect>(ctx, /*num_layers=*/1);
+
+// Add to timeline
+timeline.add_effect(cnn, start_time, end_time);
+```
+
+### Timeline Example
+
+```
+SEQUENCE 10.0 0
+  EFFECT CNNEffect 10.0 15.0 0  # Apply CNN stylization for 5 seconds
+```
+
+---
+
+## Training Workflow (Planned)
+
+**Step 1: Prepare Training Data**
+```bash
+# Collect before/after image pairs
+# - Before: Raw 3D render
+# - After: Target artistic style (hand-painted, filtered, etc.)
+```
+
+**Step 2: Train Network**
+```bash
+python scripts/train_cnn.py \
+  --input rendered_scene.png \
+  --target stylized_scene.png \
+  --layers 3 \
+  --kernel_sizes 3,5,3 \
+  --epochs 100
+```
+
+**Step 3: Export Weights**
+```python
+# scripts/train_cnn.py automatically generates:
+# workspaces/main/shaders/cnn/cnn_weights_generated.wgsl
+```
+
+**Step 4: Rebuild**
+```bash
+cmake --build build -j4
+```
+
+---
+
+## Implementation Details
+
+### Convolution Function Signature
+
+```wgsl
+fn cnn_conv3x3(
+  tex: texture_2d<f32>,
+  samp: sampler,
+  uv: vec2<f32>,
+  resolution: vec2<f32>,
+  weights: array<mat4x4<f32>, 9>,  # 9 samples × 4×4 matrix
+  bias: vec4<f32>
+) -> vec4<f32>
+```
+
+- Samples 9 pixels (3×3 neighborhood)
+- Applies 4×4 weight matrix per sample (RGBA channels)
+- Returns weighted sum + bias (pre-activation)
+
+### Weight Storage
+
+Weights are stored as WGSL constants:
+```wgsl
+const weights_layer0: array<mat4x4<f32>, 9> = array(
+  mat4x4<f32>(1.0, 0.0, 0.0, 0.0, ...),  # Center pixel
+  mat4x4<f32>(0.0, 0.0, 0.0, 0.0, ...),  # Neighbor 1
+  // ... 7 more matrices
+);
+const bias_layer0 = vec4<f32>(0.0, 0.0, 0.0, 0.0);
+```
+
+### Residual Connection
+
+Final layer adds original input:
+```wgsl
+if (params.use_residual != 0) {
+  let input = textureSample(txt, smplr, uv);
+  result = input + result * 0.3;  # Blend 30% stylization
+}
+```
+
+---
+
+## Multi-Layer Rendering (Future)
+
+For N layers, use ping-pong textures:
+
+```
+Pass 0: input → temp_a (conv + activate)
+Pass 1: temp_a → temp_b (conv + activate)
+Pass 2: temp_b → temp_a (conv + activate)
+Pass 3: temp_a → screen (conv + activate + residual)
+```
+
+**Current Status:** Single-layer implementation. Multi-pass infrastructure ready but not exposed.
+
+---
+
+## Size Budget
+
+| Component | Size | Notes |
+|-----------|------|-------|
+| `cnn_activation.wgsl` | ~200 B | 4 activation functions |
+| `cnn_conv3x3.wgsl` | ~400 B | 3×3 convolution logic |
+| `cnn_conv5x5.wgsl` | ~600 B | 5×5 convolution logic |
+| `cnn_conv7x7.wgsl` | ~800 B | 7×7 convolution logic |
+| `cnn_layer.wgsl` | ~800 B | Main shader |
+| `cnn_effect.cc` | ~300 B | C++ implementation |
+| **Weights (variable)** | **2-6 KB** | Depends on network depth/width |
+| **Total** | **5-9 KB** | Acceptable for 64k demo |
+
+**Optimization Strategies:**
+- Quantize weights (float32 → int8)
+- Prune near-zero weights
+- Share weights across layers
+- Use separable convolutions (not yet implemented)
+
+---
+
+## Testing
+
+```bash
+# Run effect test
+./build/test_demo_effects
+
+# Visual test in demo
+./build/demo64k  # CNN appears in timeline if added
+```
+
+**Test Coverage:**
+- Construction/initialization
+- Shader compilation
+- Bind group creation
+- Render pass execution
+
+---
+
+## Troubleshooting
+
+**Shader compilation fails:**
+- Check `cnn_weights_generated.wgsl` syntax
+- Verify all snippets registered in `shaders.cc::InitShaderComposer()`
+
+**Black/corrupted output:**
+- Weights likely untrained (using placeholder identity)
+- Check residual blending factor (0.3 default)
+
+**Performance issues:**
+- Reduce kernel sizes (7×7 → 3×3)
+- Decrease layer count
+- Profile with `--hot-reload` to measure frame time
+
+---
+
+## References
+
+- **Shader Composition:** `doc/SEQUENCE.md` (shader parameters)
+- **Effect System:** `src/gpu/effect.h` (Effect base class)
+- **Training (external):** TensorFlow/PyTorch CNN tutorials