# GPU Procedural Texture Generation - Phase 4: Texture Composition ## Overview Enable compute shaders to read existing procedural textures as input samplers, allowing multi-stage texture generation (blend, mask, modulate). ## Design ### Extended API ```cpp struct GpuProceduralInputs { std::vector input_texture_names; // Names of existing textures std::vector input_views; // Resolved views (internal) }; void TextureManager::create_gpu_composite_texture( const std::string& name, const std::string& shader_func, const GpuProceduralParams& params, const GpuProceduralInputs& inputs); ``` ### Shader Pattern ```wgsl // gen_blend.wgsl - Blend two textures @group(0) @binding(0) var output_tex: texture_storage_2d; @group(0) @binding(1) var params: BlendParams; @group(0) @binding(2) var input_a: texture_2d; @group(0) @binding(3) var input_b: texture_2d; @group(0) @binding(4) var tex_sampler: sampler; struct BlendParams { width: u32, height: u32, blend_factor: f32, // 0.0 = all A, 1.0 = all B _pad0: f32, } @compute @workgroup_size(8, 8, 1) fn main(@builtin(global_invocation_id) id: vec3) { if (id.x >= params.width || id.y >= params.height) { return; } let uv = vec2(f32(id.x) / f32(params.width), f32(id.y) / f32(params.height)); let color_a = textureSampleLevel(input_a, tex_sampler, uv, 0.0); let color_b = textureSampleLevel(input_b, tex_sampler, uv, 0.0); let blended = mix(color_a, color_b, params.blend_factor); textureStore(output_tex, id.xy, blended); } ``` ### Bind Group Layout Changes **Current (single-input generators):** - Binding 0: Storage texture (write) - Binding 1: Uniform buffer **New (multi-input generators):** - Binding 0: Storage texture (write) - Binding 1: Uniform buffer - Binding 2+: Input textures (read, texture_2d) - Binding N: Sampler (shared across all inputs) ### Implementation #### 1. Extend ComputePipelineInfo ```cpp struct ComputePipelineInfo { WGPUComputePipeline pipeline; const char* shader_code; size_t uniform_size; int num_input_textures; // NEW: 0 for gen_noise/perlin/grid, 2+ for composite }; ``` #### 2. Update get_or_create_compute_pipeline ```cpp WGPUComputePipeline get_or_create_compute_pipeline( const std::string& func_name, const char* shader_code, size_t uniform_size, int num_input_textures = 0); // Default 0 (backward compatible) ``` Dynamically create bind group layout based on `num_input_textures`: ```cpp // Binding 0: output texture // Binding 1: uniform buffer // Binding 2 to (2 + num_input_textures - 1): input textures // Binding (2 + num_input_textures): sampler ``` #### 3. New dispatch_composite ```cpp void dispatch_composite(const std::string& func_name, WGPUTexture target, const GpuProceduralParams& params, const void* uniform_data, size_t uniform_size, const std::vector& input_views); ``` Create bind group with: - Output storage texture (binding 0) - Uniform buffer (binding 1) - Input texture views (binding 2+) - Linear sampler (binding N) #### 4. Convenience Wrapper ```cpp void create_gpu_composite_texture(const std::string& name, const std::string& shader_func, const GpuProceduralParams& params, const std::vector& input_names); ``` Resolve `input_names` → `WGPUTextureView[]` via `get_texture_view()`. ### Example Shaders **gen_blend.wgsl** (~150 bytes) - Blend two textures with lerp factor **gen_mask.wgsl** (~180 bytes) - Multiply texture A by texture B (use grid as mask) **gen_modulate.wgsl** (~200 bytes) - Multiply texture color by noise intensity **gen_fbm_noise.wgsl** (~250 bytes) - FBM using multiple octaves of pre-generated noise textures ### Usage Example ```cpp // Generate base textures GpuProceduralParams noise_params = {256, 256, {123.0f, 4.0f}, 2}; tex_mgr.create_gpu_noise_texture("noise_a", noise_params); GpuProceduralParams grid_params = {256, 256, {32.0f, 2.0f}, 2}; tex_mgr.create_gpu_grid_texture("grid", grid_params); // Composite: Apply grid as mask to noise float blend_vals[1] = {0.5f}; GpuProceduralParams composite = {256, 256, blend_vals, 1}; std::vector inputs = {"noise_a", "grid"}; tex_mgr.create_gpu_composite_texture("masked_noise", "gen_mask", composite, inputs); ``` ### Asset Packer Syntax ``` # Phase 1-3: Single-input generators NOISE_GPU, PROC_GPU(gen_noise, 1234, 16), _, "GPU noise" # Phase 4: Multi-input composites MASKED_NOISE, PROC_GPU(gen_mask, NOISE_GPU, GRID_GPU), _, "Masked noise" ``` **Syntax:** `PROC_GPU(shader_func, input_asset_1, input_asset_2, ...)` - First arg: Shader function name - Remaining args: Asset IDs of input textures (or scalar params if no uppercase) **asset_packer changes:** 1. Parse input asset dependencies 2. Set `depends_on` field in AssetRecord 3. Generate init-time ordering (topological sort) 4. Pass input texture names to create_gpu_composite_texture ### Size Impact **Code additions:** - Extended dispatch_composite: ~250 bytes - Dynamic bind group layout: ~150 bytes - create_gpu_composite_texture: ~100 bytes - gen_blend.wgsl shader: ~150 bytes - gen_mask.wgsl shader: ~180 bytes **Total Phase 4:** ~830 bytes for 2 composite shaders **Benefits:** - Eliminate CPU-side texture compositing - Zero memory for intermediate buffers - Enables complex multi-stage effects (FBM, domain warping) ### Testing **Unit test:** ```cpp // Create base textures tex_mgr.create_gpu_noise_texture("noise_a", {256, 256, {1.0f, 4.0f}, 2}); tex_mgr.create_gpu_grid_texture("grid", {256, 256, {32.0f, 2.0f}, 2}); // Composite std::vector inputs = {"noise_a", "grid"}; tex_mgr.create_gpu_composite_texture("masked", "gen_mask", {256, 256, {}, 0}, inputs); // Verify WGPUTextureView view = tex_mgr.get_texture_view("masked"); assert(view != nullptr); ``` **Integration test:** - Visual comparison of CPU vs GPU compositing - Verify dependency ordering (inputs generated before composite) ## Future Extensions **Domain Warping:** ```wgsl // Use noise texture to distort UVs of another texture let offset = textureSampleLevel(noise, sampler, uv, 0.0).rg * 0.1; let warped_uv = uv + offset; let color = textureSampleLevel(base, sampler, warped_uv, 0.0); ``` **Multi-octave FBM:** ```cpp // Generate octaves at different frequencies tex_mgr.create_gpu_noise_texture("octave_0", {256, 256, {0.0f, 2.0f}, 2}); tex_mgr.create_gpu_noise_texture("octave_1", {256, 256, {0.0f, 4.0f}, 2}); tex_mgr.create_gpu_noise_texture("octave_2", {256, 256, {0.0f, 8.0f}, 2}); // Composite with amplitude decay std::vector octaves = {"octave_0", "octave_1", "octave_2"}; tex_mgr.create_gpu_composite_texture("fbm", "gen_fbm", {256, 256, {}, 0}, octaves); ``` **Mipmap Generation:** - Use compute shaders to generate mipmaps - Downsample with box/gaussian filter ## Architecture Notes **Backward Compatibility:** - Phase 1-3 generators unchanged (num_input_textures = 0) - Existing API remains valid - Optional feature (can defer to Phase 5+) **Dependency Ordering:** - Asset packer performs topological sort - GPU init generates textures in dependency order - Circular dependencies rejected at compile-time **Sampler Reuse:** - Single linear sampler shared across all composite shaders - Created once in TextureManager::init() - Saves ~50 bytes per shader ## Critical Files **New:** - `assets/final/shaders/compute/gen_blend.wgsl` - `assets/final/shaders/compute/gen_mask.wgsl` - `src/tests/test_gpu_composite.cc` **Modified:** - `src/gpu/texture_manager.h` - Add composite API (~40 lines) - `src/gpu/texture_manager.cc` - Implement dispatch_composite (~200 lines) - `tools/asset_packer.cc` - Parse composite syntax (~80 lines) **Total:** ~320 lines code + 2 shaders (~330 bytes)