# GPU Procedural Texture Generation - Phase 4: Texture Composition

## Overview

Enable compute shaders to read existing procedural textures as input samplers, allowing multi-stage texture generation (blend, mask, modulate).

## Design

### Extended API

```cpp
struct GpuProceduralInputs {
  std::vector<std::string> input_texture_names;  // Names of existing textures
  std::vector<WGPUTextureView> input_views;      // Resolved views (internal)
};

void TextureManager::create_gpu_composite_texture(
    const std::string& name,
    const std::string& shader_func,
    const GpuProceduralParams& params,
    const GpuProceduralInputs& inputs);
```

### Shader Pattern

```wgsl
// gen_blend.wgsl - Blend two textures
@group(0) @binding(0) var output_tex: texture_storage_2d<rgba8unorm, write>;
@group(0) @binding(1) var<uniform> params: BlendParams;
@group(0) @binding(2) var input_a: texture_2d<f32>;
@group(0) @binding(3) var input_b: texture_2d<f32>;
@group(0) @binding(4) var tex_sampler: sampler;

struct BlendParams {
  width: u32,
  height: u32,
  blend_factor: f32,  // 0.0 = all A, 1.0 = all B
  _pad0: f32,
}

@compute @workgroup_size(8, 8, 1)
fn main(@builtin(global_invocation_id) id: vec3<u32>) {
  if (id.x >= params.width || id.y >= params.height) { return; }

  let uv = vec2<f32>(f32(id.x) / f32(params.width),
                      f32(id.y) / f32(params.height));

  let color_a = textureSampleLevel(input_a, tex_sampler, uv, 0.0);
  let color_b = textureSampleLevel(input_b, tex_sampler, uv, 0.0);
  let blended = mix(color_a, color_b, params.blend_factor);

  textureStore(output_tex, id.xy, blended);
}
```

### Bind Group Layout Changes

**Current (single-input generators):**
- Binding 0: Storage texture (write)
- Binding 1: Uniform buffer

**New (multi-input generators):**
- Binding 0: Storage texture (write)
- Binding 1: Uniform buffer
- Binding 2+: Input textures (read, texture_2d<f32>)
- Binding N: Sampler (shared across all inputs)

### Implementation

#### 1. Extend ComputePipelineInfo

```cpp
struct ComputePipelineInfo {
  WGPUComputePipeline pipeline;
  const char* shader_code;
  size_t uniform_size;
  int num_input_textures;  // NEW: 0 for gen_noise/perlin/grid, 2+ for composite
};
```

#### 2. Update get_or_create_compute_pipeline

```cpp
WGPUComputePipeline get_or_create_compute_pipeline(
    const std::string& func_name,
    const char* shader_code,
    size_t uniform_size,
    int num_input_textures = 0);  // Default 0 (backward compatible)
```

Dynamically create bind group layout based on `num_input_textures`:
```cpp
// Binding 0: output texture
// Binding 1: uniform buffer
// Binding 2 to (2 + num_input_textures - 1): input textures
// Binding (2 + num_input_textures): sampler
```

#### 3. New dispatch_composite

```cpp
void dispatch_composite(const std::string& func_name,
                       WGPUTexture target,
                       const GpuProceduralParams& params,
                       const void* uniform_data,
                       size_t uniform_size,
                       const std::vector<WGPUTextureView>& input_views);
```

Create bind group with:
- Output storage texture (binding 0)
- Uniform buffer (binding 1)
- Input texture views (binding 2+)
- Linear sampler (binding N)

#### 4. Convenience Wrapper

```cpp
void create_gpu_composite_texture(const std::string& name,
                                  const std::string& shader_func,
                                  const GpuProceduralParams& params,
                                  const std::vector<std::string>& input_names);
```

Resolve `input_names` → `WGPUTextureView[]` via `get_texture_view()`.

### Example Shaders

**gen_blend.wgsl** (~150 bytes)
- Blend two textures with lerp factor

**gen_mask.wgsl** (~180 bytes)
- Multiply texture A by texture B (use grid as mask)

**gen_modulate.wgsl** (~200 bytes)
- Multiply texture color by noise intensity

**gen_fbm_noise.wgsl** (~250 bytes)
- FBM using multiple octaves of pre-generated noise textures

### Usage Example

```cpp
// Generate base textures
GpuProceduralParams noise_params = {256, 256, {123.0f, 4.0f}, 2};
tex_mgr.create_gpu_noise_texture("noise_a", noise_params);

GpuProceduralParams grid_params = {256, 256, {32.0f, 2.0f}, 2};
tex_mgr.create_gpu_grid_texture("grid", grid_params);

// Composite: Apply grid as mask to noise
float blend_vals[1] = {0.5f};
GpuProceduralParams composite = {256, 256, blend_vals, 1};
std::vector<std::string> inputs = {"noise_a", "grid"};
tex_mgr.create_gpu_composite_texture("masked_noise", "gen_mask", composite, inputs);
```

### Asset Packer Syntax

```
# Phase 1-3: Single-input generators
NOISE_GPU, PROC_GPU(gen_noise, 1234, 16), _, "GPU noise"

# Phase 4: Multi-input composites
MASKED_NOISE, PROC_GPU(gen_mask, NOISE_GPU, GRID_GPU), _, "Masked noise"
```

**Syntax:** `PROC_GPU(shader_func, input_asset_1, input_asset_2, ...)`
- First arg: Shader function name
- Remaining args: Asset IDs of input textures (or scalar params if no uppercase)

**asset_packer changes:**
1. Parse input asset dependencies
2. Set `depends_on` field in AssetRecord
3. Generate init-time ordering (topological sort)
4. Pass input texture names to create_gpu_composite_texture

### Size Impact

**Code additions:**
- Extended dispatch_composite: ~250 bytes
- Dynamic bind group layout: ~150 bytes
- create_gpu_composite_texture: ~100 bytes
- gen_blend.wgsl shader: ~150 bytes
- gen_mask.wgsl shader: ~180 bytes

**Total Phase 4:** ~830 bytes for 2 composite shaders

**Benefits:**
- Eliminate CPU-side texture compositing
- Zero memory for intermediate buffers
- Enables complex multi-stage effects (FBM, domain warping)

### Testing

**Unit test:**
```cpp
// Create base textures
tex_mgr.create_gpu_noise_texture("noise_a", {256, 256, {1.0f, 4.0f}, 2});
tex_mgr.create_gpu_grid_texture("grid", {256, 256, {32.0f, 2.0f}, 2});

// Composite
std::vector<std::string> inputs = {"noise_a", "grid"};
tex_mgr.create_gpu_composite_texture("masked", "gen_mask", {256, 256, {}, 0}, inputs);

// Verify
WGPUTextureView view = tex_mgr.get_texture_view("masked");
assert(view != nullptr);
```

**Integration test:**
- Visual comparison of CPU vs GPU compositing
- Verify dependency ordering (inputs generated before composite)

## Future Extensions

**Domain Warping:**
```wgsl
// Use noise texture to distort UVs of another texture
let offset = textureSampleLevel(noise, sampler, uv, 0.0).rg * 0.1;
let warped_uv = uv + offset;
let color = textureSampleLevel(base, sampler, warped_uv, 0.0);
```

**Multi-octave FBM:**
```cpp
// Generate octaves at different frequencies
tex_mgr.create_gpu_noise_texture("octave_0", {256, 256, {0.0f, 2.0f}, 2});
tex_mgr.create_gpu_noise_texture("octave_1", {256, 256, {0.0f, 4.0f}, 2});
tex_mgr.create_gpu_noise_texture("octave_2", {256, 256, {0.0f, 8.0f}, 2});

// Composite with amplitude decay
std::vector<std::string> octaves = {"octave_0", "octave_1", "octave_2"};
tex_mgr.create_gpu_composite_texture("fbm", "gen_fbm", {256, 256, {}, 0}, octaves);
```

**Mipmap Generation:**
- Use compute shaders to generate mipmaps
- Downsample with box/gaussian filter

## Architecture Notes

**Backward Compatibility:**
- Phase 1-3 generators unchanged (num_input_textures = 0)
- Existing API remains valid
- Optional feature (can defer to Phase 5+)

**Dependency Ordering:**
- Asset packer performs topological sort
- GPU init generates textures in dependency order
- Circular dependencies rejected at compile-time

**Sampler Reuse:**
- Single linear sampler shared across all composite shaders
- Created once in TextureManager::init()
- Saves ~50 bytes per shader

## Critical Files

**New:**
- `assets/final/shaders/compute/gen_blend.wgsl`
- `assets/final/shaders/compute/gen_mask.wgsl`
- `src/tests/test_gpu_composite.cc`

**Modified:**
- `src/gpu/texture_manager.h` - Add composite API (~40 lines)
- `src/gpu/texture_manager.cc` - Implement dispatch_composite (~200 lines)
- `tools/asset_packer.cc` - Parse composite syntax (~80 lines)

**Total:** ~320 lines code + 2 shaders (~330 bytes)