diff options
| author | skal <pascal.massimino@gmail.com> | 2026-03-21 08:52:53 +0100 |
|---|---|---|
| committer | skal <pascal.massimino@gmail.com> | 2026-03-21 08:52:53 +0100 |
| commit | fe008df92f7a68d81c9bedb4328da7001e0775f0 (patch) | |
| tree | 2c0182ef4df3b682ee5aa3ab22dcf3e2af08a4ed | |
| parent | a4ff60233fce134e8f779ef001872dfd9a8f9923 (diff) | |
feat(cnn_v3): Phase 4 complete — CNNv3Effect C++ + FiLM uniform upload
- cnn_v3/src/cnn_v3_effect.{h,cc}: full Effect subclass with 5 compute
passes (enc0→enc1→bottleneck→dec1→dec0), shared weights storage buffer,
per-pass uniform buffers, set_film_params() API
- Fixed WGSL/C++ struct alignment: vec3u has align=16, so CnnV3Params4ch
is 64 bytes and CnnV3ParamsEnc1 is 96 bytes (not 48/80)
- Weight offsets computed as explicit formulas (e.g. 20*4*9+4) for clarity
- Registered in CMake, shaders.h/cc, demo_effects.h, test_demo_effects.cc
- 35/35 tests pass
handoff(Gemini): CNN v3 Phase 5 next — parity validation (Python ref vs WGSL)
| -rw-r--r-- | PROJECT_CONTEXT.md | 4 | ||||
| -rw-r--r-- | TODO.md | 2 | ||||
| -rw-r--r-- | cmake/DemoSourceLists.cmake | 1 | ||||
| -rw-r--r-- | cnn_v3/docs/HOWTO.md | 2 | ||||
| -rw-r--r-- | cnn_v3/src/cnn_v3_effect.cc | 467 | ||||
| -rw-r--r-- | cnn_v3/src/cnn_v3_effect.h | 132 | ||||
| -rw-r--r-- | src/effects/shaders.cc | 5 | ||||
| -rw-r--r-- | src/effects/shaders.h | 7 | ||||
| -rw-r--r-- | src/gpu/demo_effects.h | 3 | ||||
| -rw-r--r-- | src/tests/gpu/test_demo_effects.cc | 5 |
10 files changed, 623 insertions, 5 deletions
diff --git a/PROJECT_CONTEXT.md b/PROJECT_CONTEXT.md index f42ccf4..4767185 100644 --- a/PROJECT_CONTEXT.md +++ b/PROJECT_CONTEXT.md @@ -36,7 +36,7 @@ - **Audio:** Sample-accurate sync. Zero heap allocations per frame. Variable tempo. OLA-IDCT synthesis (v2 .spec): Hann analysis window, rectangular synthesis, 50% overlap, click-free. V1 (raw DCT-512) preserved for generated notes. .spec files regenerated as v2. - **Shaders:** Parameterized effects (UniformHelper, .seq syntax). Beat-synchronized animation support (`beat_time`, `beat_phase`). Modular WGSL composition with ShaderComposer. 27 shared common shaders (math, render, compute). Reusable snippets: `render/scratch_lines`, `render/ntsc_common` (NTSC signal processing, RGB and YIQ input variants via `sample_ntsc_signal` hook), `math/color` (YIQ/NTSC), `math/color_c64` (C64 palette, Bayer dither, border animation). - **3D:** Hybrid SDF/rasterization with BVH. Binary scene loader. Blender pipeline. -- **Effects:** CNN post-processing: CNNEffect (v1) and CNNv2Effect operational. CNN v2: sigmoid activation, storage buffer weights (~3.2 KB), 7D static features, dynamic layers. Training stable, convergence validated. **CNN v3 Phase 3 complete:** 5 WGSL inference shaders (enc0/enc1/bottleneck/dec1/dec0) + `cnn_v3/common` snippet. Zero-pad convs, AvgPool down, NearestUp, FiLM, skip-concat, sigmoid output. Registered in assets + shaders.cc. See `cnn_v3/docs/HOWTO.md` §7. +- **Effects:** CNN post-processing: CNNEffect (v1) and CNNv2Effect operational. CNN v2: sigmoid activation, storage buffer weights (~3.2 KB), 7D static features, dynamic layers. Training stable, convergence validated. **CNN v3 Phase 4 complete:** `CNNv3Effect` C++ class (5 compute passes, FiLM uniform upload, identity γ/β defaults). `set_film_params()` modulates all layers via beat/audio. WGSL params struct alignment fix (vec3u align=16 → 64/96-byte C++ mirrors). Registered in CMake, shaders.h/cc, demo_effects.h, tests. See `cnn_v3/docs/HOWTO.md`. - **Tools:** CNN test tool operational. Texture readback utility functional. Timeline editor (web-based, beat-aligned, audio playback). - **Build:** Asset dependency tracking. Size measurement. Hot-reload (debug-only). WSL (Windows 10) supported: native Linux build and cross-compile to `.exe` via `mingw-w64`. - **Sequence:** DAG-based effect routing with explicit node system. Python compiler with topological sort and ping-pong optimization. 12 effects operational (Passthrough, Placeholder, GaussianBlur, Heptagon, Particles, RotatingCube, Hybrid3D, Flash, PeakMeter, Scene1, Scene2, Scratch). Effect times are absolute (seq_compiler adds sequence start offset). See `doc/SEQUENCE.md`. @@ -46,7 +46,7 @@ ## Next Up -**Active:** CNN v3 Phase 4 (C++ CNNv3Effect + FiLM uniform), Spectral Brush Editor +**Active:** CNN v3 Phase 5 (parity validation), Spectral Brush Editor **Ongoing:** Test infrastructure maintenance (35/35 passing) **Future:** Size optimization (64k target), 3D enhancements @@ -76,7 +76,7 @@ PyTorch / HTML WebGPU / C++ WebGPU. - Howto: `cnn_v3/docs/HOWTO.md` 2. ✅ Training infrastructure: `blender_export.py`, `pack_blender_sample.py`, `pack_photo_sample.py` 3. ✅ WGSL shaders: cnn_v3_common (snippet), enc0, enc1, bottleneck, dec1, dec0 -4. C++ CNNv3Effect + FiLM uniform upload +4. ✅ C++ CNNv3Effect + FiLM uniform upload 5. Parity validation (test vectors, ≤1/255 per pixel) ## Future: CNN v2 8-bit Quantization diff --git a/cmake/DemoSourceLists.cmake b/cmake/DemoSourceLists.cmake index 04bbb3b..742057a 100644 --- a/cmake/DemoSourceLists.cmake +++ b/cmake/DemoSourceLists.cmake @@ -41,6 +41,7 @@ set(COMMON_GPU_EFFECTS src/effects/scene1_effect.cc src/effects/scene2_effect.cc cnn_v3/src/gbuffer_effect.cc + cnn_v3/src/cnn_v3_effect.cc # TODO: Port CNN effects to v2 (complex v1 dependencies) # cnn_v1/src/cnn_v1_effect.cc # cnn_v2/src/cnn_v2_effect.cc diff --git a/cnn_v3/docs/HOWTO.md b/cnn_v3/docs/HOWTO.md index ad71f1f..22266d3 100644 --- a/cnn_v3/docs/HOWTO.md +++ b/cnn_v3/docs/HOWTO.md @@ -201,7 +201,7 @@ The CNN v3 design requires exact parity between PyTorch, WGSL (HTML), and C++. | 1 — G-buffer (SDF + shadow passes) | TODO | Placeholder in place | | 2 — Training infrastructure | ✅ Done | blender_export.py, pack_*_sample.py | | 3 — WGSL U-Net shaders | ✅ Done | 5 compute shaders + cnn_v3/common snippet | -| 4 — C++ CNNv3Effect | TODO | FiLM uniform upload | +| 4 — C++ CNNv3Effect | ✅ Done | FiLM uniform upload, 35/35 tests pass | | 5 — Parity validation | TODO | Test vectors, ≤1/255 | --- diff --git a/cnn_v3/src/cnn_v3_effect.cc b/cnn_v3/src/cnn_v3_effect.cc new file mode 100644 index 0000000..d13799c --- /dev/null +++ b/cnn_v3/src/cnn_v3_effect.cc @@ -0,0 +1,467 @@ +// CNN v3 Effect — U-Net + FiLM inference (5 compute passes) +// See cnn_v3/docs/CNN_V3.md for architecture, HOWTO.md §7 for shader details. + +#include "cnn_v3_effect.h" +#include "gpu/gpu.h" +#include "gpu/shader_composer.h" +#include "util/fatal_error.h" +#include <cstdint> +#include <cstring> + +// --------------------------------------------------------------------------- +// Weight layout constants — explicit formulas matching WGSL shader comments +// --------------------------------------------------------------------------- +// +// Format: Conv(IN→OUT, KxK) has OUT*IN*K*K weights + OUT biases +// Layout: OIHW order (out × in × kH × kW), biases appended after conv weights +// +static const uint32_t kEnc0Weights = 20 * 4 * 9 + 4; // Conv(20→4,3×3)+bias +static const uint32_t kEnc1Weights = 4 * 8 * 9 + 8; // Conv(4→8,3×3)+bias +static const uint32_t kBnWeights = 8 * 8 * 1 + 8; // Conv(8→8,1×1)+bias +static const uint32_t kDec1Weights = 16 * 4 * 9 + 4; // Conv(16→4,3×3)+bias +static const uint32_t kDec0Weights = 8 * 4 * 9 + 4; // Conv(8→4,3×3)+bias + +static const uint32_t kEnc0Offset = 0; +static const uint32_t kEnc1Offset = kEnc0Offset + kEnc0Weights; +static const uint32_t kBnOffset = kEnc1Offset + kEnc1Weights; +static const uint32_t kDec1Offset = kBnOffset + kBnWeights; +static const uint32_t kDec0Offset = kDec1Offset + kDec1Weights; +static const uint32_t kTotalF16 = kDec0Offset + kDec0Weights; + +// Weights buffer size in bytes: f16 values are packed two-per-u32. +// Round up to u32 boundary. +static const uint32_t kWeightsBufBytes = ((kTotalF16 + 1) / 2) * 4; + +// --------------------------------------------------------------------------- +// Shader source externs (registered in shaders.cc via InitShaderComposer) +// --------------------------------------------------------------------------- +extern const char* cnn_v3_enc0_wgsl; +extern const char* cnn_v3_enc1_wgsl; +extern const char* cnn_v3_bottleneck_wgsl; +extern const char* cnn_v3_dec1_wgsl; +extern const char* cnn_v3_dec0_wgsl; + +// --------------------------------------------------------------------------- +// Helpers +// --------------------------------------------------------------------------- + +static WGPUShaderModule make_shader(WGPUDevice device, const char* wgsl) { + const std::string composed = + ShaderComposer::Get().Compose({"cnn_v3/common"}, wgsl); + + WGPUShaderSourceWGSL src = {}; + src.chain.sType = WGPUSType_ShaderSourceWGSL; + src.code = str_view(composed.c_str()); + + WGPUShaderModuleDescriptor desc = {}; + desc.nextInChain = &src.chain; + return wgpuDeviceCreateShaderModule(device, &desc); +} + +static WGPUBindGroupLayout make_bgl(WGPUDevice device, + const WGPUBindGroupLayoutEntry* entries, + uint32_t count) { + WGPUBindGroupLayoutDescriptor desc = {}; + desc.entryCount = count; + desc.entries = entries; + return wgpuDeviceCreateBindGroupLayout(device, &desc); +} + +static WGPUComputePipeline make_compute_pipeline(WGPUDevice device, + WGPUShaderModule shader, + const char* entry, + WGPUBindGroupLayout bgl) { + WGPUPipelineLayoutDescriptor pl_desc = {}; + pl_desc.bindGroupLayoutCount = 1; + pl_desc.bindGroupLayouts = &bgl; + WGPUPipelineLayout pl = wgpuDeviceCreatePipelineLayout(device, &pl_desc); + + WGPUComputePipelineDescriptor pipe_desc = {}; + pipe_desc.layout = pl; + pipe_desc.compute.module = shader; + pipe_desc.compute.entryPoint = str_view(entry); + WGPUComputePipeline pipe = wgpuDeviceCreateComputePipeline(device, &pipe_desc); + + wgpuPipelineLayoutRelease(pl); + return pipe; +} + +// BGL entry helpers +static WGPUBindGroupLayoutEntry bgl_uint_tex(uint32_t binding) { + WGPUBindGroupLayoutEntry e = {}; + e.binding = binding; + e.visibility = WGPUShaderStage_Compute; + e.texture.sampleType = WGPUTextureSampleType_Uint; + e.texture.viewDimension = WGPUTextureViewDimension_2D; + return e; +} +static WGPUBindGroupLayoutEntry bgl_float_tex(uint32_t binding) { + WGPUBindGroupLayoutEntry e = {}; + e.binding = binding; + e.visibility = WGPUShaderStage_Compute; + e.texture.sampleType = WGPUTextureSampleType_Float; + e.texture.viewDimension = WGPUTextureViewDimension_2D; + return e; +} +static WGPUBindGroupLayoutEntry bgl_storage_buf(uint32_t binding) { + WGPUBindGroupLayoutEntry e = {}; + e.binding = binding; + e.visibility = WGPUShaderStage_Compute; + e.buffer.type = WGPUBufferBindingType_ReadOnlyStorage; + return e; +} +static WGPUBindGroupLayoutEntry bgl_uniform_buf(uint32_t binding, + uint64_t min_size) { + WGPUBindGroupLayoutEntry e = {}; + e.binding = binding; + e.visibility = WGPUShaderStage_Compute; + e.buffer.type = WGPUBufferBindingType_Uniform; + e.buffer.minBindingSize = min_size; + return e; +} +static WGPUBindGroupLayoutEntry bgl_storage_tex_write( + uint32_t binding, WGPUTextureFormat fmt) { + WGPUBindGroupLayoutEntry e = {}; + e.binding = binding; + e.visibility = WGPUShaderStage_Compute; + e.storageTexture.access = WGPUStorageTextureAccess_WriteOnly; + e.storageTexture.format = fmt; + e.storageTexture.viewDimension = WGPUTextureViewDimension_2D; + return e; +} + +// --------------------------------------------------------------------------- +// Constructor +// --------------------------------------------------------------------------- + +CNNv3Effect::CNNv3Effect(const GpuContext& ctx, + const std::vector<std::string>& inputs, + const std::vector<std::string>& outputs, + float start_time, float end_time) + : Effect(ctx, inputs, outputs, start_time, end_time) { + HEADLESS_RETURN_IF_NULL(ctx_.device); + + const std::string& prefix = + outputs.empty() ? std::string("cnn_v3") : outputs[0]; + node_enc0_ = prefix + "_enc0"; + node_enc1_ = prefix + "_enc1"; + node_bottleneck_ = prefix + "_bottleneck"; + node_dec1_ = prefix + "_dec1"; + + // Allocate zeroed weights buffer (f16 pairs packed as u32). + // Weights are zero-initialized; load_weights() can fill from file later. + weights_buf_ = gpu_create_buffer( + ctx_.device, kWeightsBufBytes, + WGPUBufferUsage_Storage | WGPUBufferUsage_CopyDst); + + // Initialize uniform buffers. + enc0_params_buf_.init(ctx_.device); + enc1_params_buf_.init(ctx_.device); + bn_params_buf_.init(ctx_.device); + dec1_params_buf_.init(ctx_.device); + dec0_params_buf_.init(ctx_.device); + + // Set weight offsets (FiLM γ/β default to identity: γ=1, β=0). + enc0_params_.weight_offset = kEnc0Offset; + for (int i = 0; i < 4; ++i) { enc0_params_.gamma[i] = 1.0f; } + + enc1_params_.weight_offset = kEnc1Offset; + for (int i = 0; i < 4; ++i) { + enc1_params_.gamma_lo[i] = 1.0f; + enc1_params_.gamma_hi[i] = 1.0f; + } + + bn_params_.weight_offset = kBnOffset; + + dec1_params_.weight_offset = kDec1Offset; + for (int i = 0; i < 4; ++i) { dec1_params_.gamma[i] = 1.0f; } + + dec0_params_.weight_offset = kDec0Offset; + for (int i = 0; i < 4; ++i) { dec0_params_.gamma[i] = 1.0f; } + + create_pipelines(); +} + +// --------------------------------------------------------------------------- +// declare_nodes +// --------------------------------------------------------------------------- + +void CNNv3Effect::declare_nodes(NodeRegistry& registry) { + // enc0_tex: rgba16float full-res + registry.declare_node(node_enc0_, NodeType::GBUF_ALBEDO, -1, -1); + // enc1_tex: rgba32uint half-res + registry.declare_node(node_enc1_, NodeType::GBUF_RGBA32UINT, -1, -1); + // bottleneck_tex: rgba32uint quarter-res — declare at 1/4 resolution + registry.declare_node(node_bottleneck_, NodeType::GBUF_RGBA32UINT, -1, -1); + // dec1_tex: rgba16float half-res + registry.declare_node(node_dec1_, NodeType::GBUF_ALBEDO, -1, -1); + // output_tex: rgba16float full-res (the declared output_nodes_[0]) +} + +// --------------------------------------------------------------------------- +// set_film_params — simple linear mapping, no MLP yet +// --------------------------------------------------------------------------- + +void CNNv3Effect::set_film_params(const CNNv3FiLMParams& fp) { + // Identity + audio/beat modulation. + // Replace with FiLM MLP output once training is done. + const float a = fp.audio_intensity; + const float b = fp.beat_phase; + + for (int i = 0; i < 4; ++i) { + enc0_params_.gamma[i] = 1.0f + a * 0.5f; + enc0_params_.beta[i] = b * 0.1f; + } + for (int i = 0; i < 4; ++i) { + enc1_params_.gamma_lo[i] = 1.0f + a * 0.3f; + enc1_params_.gamma_hi[i] = 1.0f + a * 0.3f; + enc1_params_.beta_lo[i] = fp.beat_norm * 0.1f; + enc1_params_.beta_hi[i] = fp.beat_norm * 0.1f; + } + for (int i = 0; i < 4; ++i) { + dec1_params_.gamma[i] = 1.0f + fp.style_p0 * 0.5f; + dec1_params_.beta[i] = fp.style_p1 * 0.1f; + dec0_params_.gamma[i] = 1.0f + fp.style_p0 * 0.5f; + dec0_params_.beta[i] = fp.style_p1 * 0.1f; + } +} + +// --------------------------------------------------------------------------- +// render +// --------------------------------------------------------------------------- + +void CNNv3Effect::render(WGPUCommandEncoder encoder, + const UniformsSequenceParams& params, + NodeRegistry& nodes) { + // Upload params uniforms. + enc0_params_buf_.update(ctx_.queue, enc0_params_); + enc1_params_buf_.update(ctx_.queue, enc1_params_); + bn_params_buf_.update(ctx_.queue, bn_params_); + dec1_params_buf_.update(ctx_.queue, dec1_params_); + dec0_params_buf_.update(ctx_.queue, dec0_params_); + + update_bind_groups(nodes); + + const int W = (int)params.resolution.x; + const int H = (int)params.resolution.y; + + // Dispatch helper: ceil(dim / 8) workgroups + auto dispatch = [&](WGPUComputePipeline pipe, WGPUBindGroup bg, + int w, int h) { + WGPUComputePassDescriptor pass_desc = {}; + WGPUComputePassEncoder pass = + wgpuCommandEncoderBeginComputePass(encoder, &pass_desc); + wgpuComputePassEncoderSetPipeline(pass, pipe); + wgpuComputePassEncoderSetBindGroup(pass, 0, bg, 0, nullptr); + wgpuComputePassEncoderDispatchWorkgroups( + pass, + (uint32_t)((w + 7) / 8), + (uint32_t)((h + 7) / 8), + 1); + wgpuComputePassEncoderEnd(pass); + wgpuComputePassEncoderRelease(pass); + }; + + dispatch(enc0_pipeline_.get(), enc0_bg_.get(), W, H); + dispatch(enc1_pipeline_.get(), enc1_bg_.get(), W / 2, H / 2); + dispatch(bn_pipeline_.get(), bn_bg_.get(), W / 4, H / 4); + dispatch(dec1_pipeline_.get(), dec1_bg_.get(), W / 2, H / 2); + dispatch(dec0_pipeline_.get(), dec0_bg_.get(), W, H); +} + +// --------------------------------------------------------------------------- +// create_pipelines +// --------------------------------------------------------------------------- + +void CNNv3Effect::create_pipelines() { + HEADLESS_RETURN_IF_NULL(ctx_.device); + WGPUDevice dev = ctx_.device; + + // --- enc0 --- + // B0: feat_tex0 (u32), B1: feat_tex1 (u32), B2: weights (storage), + // B3: params (uniform), B4: enc0_out (storage_tex rgba16float write) + { + WGPUBindGroupLayoutEntry e[5] = { + bgl_uint_tex(0), + bgl_uint_tex(1), + bgl_storage_buf(2), + bgl_uniform_buf(3, sizeof(CnnV3Params4ch)), // 64 bytes + bgl_storage_tex_write(4, WGPUTextureFormat_RGBA16Float), + }; + WGPUBindGroupLayout bgl = make_bgl(dev, e, 5); + WGPUShaderModule sh = make_shader(dev, cnn_v3_enc0_wgsl); + enc0_pipeline_.set(make_compute_pipeline(dev, sh, "enc0_main", bgl)); + wgpuShaderModuleRelease(sh); + wgpuBindGroupLayoutRelease(bgl); + } + + // --- enc1 --- + // B0: enc0_tex (f32), B1: weights (storage), + // B2: params (uniform), B3: enc1_out (storage_tex rgba32uint write) + { + WGPUBindGroupLayoutEntry e[4] = { + bgl_float_tex(0), + bgl_storage_buf(1), + bgl_uniform_buf(2, sizeof(CnnV3ParamsEnc1)), + bgl_storage_tex_write(3, WGPUTextureFormat_RGBA32Uint), + }; + WGPUBindGroupLayout bgl = make_bgl(dev, e, 4); + WGPUShaderModule sh = make_shader(dev, cnn_v3_enc1_wgsl); + enc1_pipeline_.set(make_compute_pipeline(dev, sh, "enc1_main", bgl)); + wgpuShaderModuleRelease(sh); + wgpuBindGroupLayoutRelease(bgl); + } + + // --- bottleneck --- + // B0: enc1_tex (u32), B1: weights (storage), + // B2: params (uniform), B3: bottleneck_out (storage_tex rgba32uint write) + { + WGPUBindGroupLayoutEntry e[4] = { + bgl_uint_tex(0), + bgl_storage_buf(1), + bgl_uniform_buf(2, sizeof(CnnV3ParamsBn)), + bgl_storage_tex_write(3, WGPUTextureFormat_RGBA32Uint), + }; + WGPUBindGroupLayout bgl = make_bgl(dev, e, 4); + WGPUShaderModule sh = make_shader(dev, cnn_v3_bottleneck_wgsl); + bn_pipeline_.set(make_compute_pipeline(dev, sh, "bottleneck_main", bgl)); + wgpuShaderModuleRelease(sh); + wgpuBindGroupLayoutRelease(bgl); + } + + // --- dec1 --- + // B0: bottleneck_tex (u32), B1: enc1_tex (u32), B2: weights (storage), + // B3: params (uniform), B4: dec1_out (storage_tex rgba16float write) + { + WGPUBindGroupLayoutEntry e[5] = { + bgl_uint_tex(0), + bgl_uint_tex(1), + bgl_storage_buf(2), + bgl_uniform_buf(3, sizeof(CnnV3Params4ch)), // 64 bytes + bgl_storage_tex_write(4, WGPUTextureFormat_RGBA16Float), + }; + WGPUBindGroupLayout bgl = make_bgl(dev, e, 5); + WGPUShaderModule sh = make_shader(dev, cnn_v3_dec1_wgsl); + dec1_pipeline_.set(make_compute_pipeline(dev, sh, "dec1_main", bgl)); + wgpuShaderModuleRelease(sh); + wgpuBindGroupLayoutRelease(bgl); + } + + // --- dec0 --- + // B0: dec1_tex (f32), B1: enc0_tex (f32), B2: weights (storage), + // B3: params (uniform), B4: output_tex (storage_tex rgba16float write) + { + WGPUBindGroupLayoutEntry e[5] = { + bgl_float_tex(0), + bgl_float_tex(1), + bgl_storage_buf(2), + bgl_uniform_buf(3, sizeof(CnnV3Params4ch)), // 64 bytes + bgl_storage_tex_write(4, WGPUTextureFormat_RGBA16Float), + }; + WGPUBindGroupLayout bgl = make_bgl(dev, e, 5); + WGPUShaderModule sh = make_shader(dev, cnn_v3_dec0_wgsl); + dec0_pipeline_.set(make_compute_pipeline(dev, sh, "dec0_main", bgl)); + wgpuShaderModuleRelease(sh); + wgpuBindGroupLayoutRelease(bgl); + } +} + +// --------------------------------------------------------------------------- +// update_bind_groups — rebuilt each frame (node views may be recreated) +// --------------------------------------------------------------------------- + +// Helper: set a texture view binding entry. +static void bg_tex(WGPUBindGroupEntry& e, uint32_t binding, + WGPUTextureView view) { + e = {}; + e.binding = binding; + e.textureView = view; +} +// Helper: set a buffer binding entry. +static void bg_buf(WGPUBindGroupEntry& e, uint32_t binding, WGPUBuffer buf, + uint64_t size) { + e = {}; + e.binding = binding; + e.buffer = buf; + e.size = size; +} + +void CNNv3Effect::update_bind_groups(NodeRegistry& nodes) { + WGPUDevice dev = ctx_.device; + + WGPUTextureView feat0_view = nodes.get_view(input_nodes_[0]); + WGPUTextureView feat1_view = nodes.get_view(input_nodes_[1]); + WGPUTextureView enc0_view = nodes.get_view(node_enc0_); + WGPUTextureView enc1_view = nodes.get_view(node_enc1_); + WGPUTextureView bn_view = nodes.get_view(node_bottleneck_); + WGPUTextureView dec1_view = nodes.get_view(node_dec1_); + WGPUTextureView out_view = nodes.get_view(output_nodes_[0]); + + WGPUBuffer wb = weights_buf_.buffer; + + auto make_bg = [&](WGPUComputePipeline pipe, WGPUBindGroupEntry* e, + uint32_t count) -> WGPUBindGroup { + WGPUBindGroupLayout bgl = + wgpuComputePipelineGetBindGroupLayout(pipe, 0); + WGPUBindGroupDescriptor desc = {}; + desc.layout = bgl; + desc.entryCount = count; + desc.entries = e; + WGPUBindGroup bg = wgpuDeviceCreateBindGroup(dev, &desc); + wgpuBindGroupLayoutRelease(bgl); + return bg; + }; + + // enc0: feat_tex0(B0), feat_tex1(B1), weights(B2), params(B3), enc0_out(B4) + { + WGPUBindGroupEntry e[5] = {}; + bg_tex(e[0], 0, feat0_view); + bg_tex(e[1], 1, feat1_view); + bg_buf(e[2], 2, wb, kWeightsBufBytes); + bg_buf(e[3], 3, enc0_params_buf_.get().buffer, sizeof(CnnV3Params4ch)); + bg_tex(e[4], 4, enc0_view); + enc0_bg_.set(make_bg(enc0_pipeline_.get(), e, 5)); + } + + // enc1: enc0_tex(B0), weights(B1), params(B2), enc1_out(B3) + { + WGPUBindGroupEntry e[4] = {}; + bg_tex(e[0], 0, enc0_view); + bg_buf(e[1], 1, wb, kWeightsBufBytes); + bg_buf(e[2], 2, enc1_params_buf_.get().buffer, sizeof(CnnV3ParamsEnc1)); + bg_tex(e[3], 3, enc1_view); + enc1_bg_.set(make_bg(enc1_pipeline_.get(), e, 4)); + } + + // bottleneck: enc1_tex(B0), weights(B1), params(B2), bn_out(B3) + { + WGPUBindGroupEntry e[4] = {}; + bg_tex(e[0], 0, enc1_view); + bg_buf(e[1], 1, wb, kWeightsBufBytes); + bg_buf(e[2], 2, bn_params_buf_.get().buffer, sizeof(CnnV3ParamsBn)); + bg_tex(e[3], 3, bn_view); + bn_bg_.set(make_bg(bn_pipeline_.get(), e, 4)); + } + + // dec1: bn_tex(B0), enc1_tex(B1), weights(B2), params(B3), dec1_out(B4) + { + WGPUBindGroupEntry e[5] = {}; + bg_tex(e[0], 0, bn_view); + bg_tex(e[1], 1, enc1_view); + bg_buf(e[2], 2, wb, kWeightsBufBytes); + bg_buf(e[3], 3, dec1_params_buf_.get().buffer, sizeof(CnnV3Params4ch)); + bg_tex(e[4], 4, dec1_view); + dec1_bg_.set(make_bg(dec1_pipeline_.get(), e, 5)); + } + + // dec0: dec1_tex(B0), enc0_tex(B1), weights(B2), params(B3), output(B4) + { + WGPUBindGroupEntry e[5] = {}; + bg_tex(e[0], 0, dec1_view); + bg_tex(e[1], 1, enc0_view); + bg_buf(e[2], 2, wb, kWeightsBufBytes); + bg_buf(e[3], 3, dec0_params_buf_.get().buffer, sizeof(CnnV3Params4ch)); + bg_tex(e[4], 4, out_view); + dec0_bg_.set(make_bg(dec0_pipeline_.get(), e, 5)); + } +} diff --git a/cnn_v3/src/cnn_v3_effect.h b/cnn_v3/src/cnn_v3_effect.h new file mode 100644 index 0000000..c358990 --- /dev/null +++ b/cnn_v3/src/cnn_v3_effect.h @@ -0,0 +1,132 @@ +// CNN v3 Effect — U-Net + FiLM inference pass +// Runs 5 compute passes (enc0→enc1→bottleneck→dec1→dec0) on G-buffer feature +// textures produced by GBufferEffect. +// +// Inputs: feat_tex0, feat_tex1 (rgba32uint, 20-channel G-buffer) +// Output: output_tex (rgba16float, 4-channel RGBA) + +#pragma once + +#include <cstdint> + +#include "gpu/effect.h" +#include "gpu/sequence.h" +#include "gpu/uniform_helper.h" +#include "gpu/wgpu_resource.h" + +// --------------------------------------------------------------------------- +// Per-pass params uniform layouts (mirror WGSL Params structs exactly) +// --------------------------------------------------------------------------- + +// enc0, dec1, dec0: 4-channel FiLM +// +// WGSL layout (vec3u has align=16, so _pad sits at offset 16): +// offset 0: weight_offset (u32, 4 bytes) +// offset 4: (12 bytes implicit padding before vec3u) +// offset 16: _pad (vec3u, 12 bytes) +// offset 28: (4 bytes implicit padding before vec4f) +// offset 32: gamma (vec4f, 16 bytes) +// offset 48: beta (vec4f, 16 bytes) +// total: 64 bytes +struct CnnV3Params4ch { + uint32_t weight_offset; // offset 0 + uint32_t _pad[7]; // offsets 4-31 (mirrors implicit + vec3u + post-pad) + float gamma[4]; // offset 32 + float beta[4]; // offset 48 +}; +static_assert(sizeof(CnnV3Params4ch) == 64, "CnnV3Params4ch must be 64 bytes"); + +// enc1: 8-channel FiLM (split into lo/hi vec4 pairs) +// +// WGSL layout (same header padding as above): +// offset 0: weight_offset (u32, 4 bytes) +// offset 16: _pad (vec3u, 12 bytes) +// offset 32: gamma_lo (vec4f, 16 bytes) +// offset 48: gamma_hi (vec4f, 16 bytes) +// offset 64: beta_lo (vec4f, 16 bytes) +// offset 80: beta_hi (vec4f, 16 bytes) +// total: 96 bytes +struct CnnV3ParamsEnc1 { + uint32_t weight_offset; // offset 0 + uint32_t _pad[7]; // offsets 4-31 + float gamma_lo[4]; // offset 32 + float gamma_hi[4]; // offset 48 + float beta_lo[4]; // offset 64 + float beta_hi[4]; // offset 80 +}; +static_assert(sizeof(CnnV3ParamsEnc1) == 96, + "CnnV3ParamsEnc1 must be 96 bytes"); + +// bottleneck: no FiLM — 4 plain u32s, no alignment gap +struct CnnV3ParamsBn { + uint32_t weight_offset; + uint32_t _pad[3]; +}; +static_assert(sizeof(CnnV3ParamsBn) == 16, "CnnV3ParamsBn must be 16 bytes"); + +// --------------------------------------------------------------------------- +// FiLM conditioning inputs (CPU-side, uploaded via set_film_params each frame) +// --------------------------------------------------------------------------- +struct CNNv3FiLMParams { + float beat_phase = 0.0f; // 0-1 within current beat + float beat_norm = 0.0f; // beat_time / 8.0, normalized 8-beat cycle + float audio_intensity = 0.0f; // peak audio level 0-1 + float style_p0 = 0.0f; // user-defined style param + float style_p1 = 0.0f; // user-defined style param +}; + +class CNNv3Effect : public Effect { + public: + CNNv3Effect(const GpuContext& ctx, const std::vector<std::string>& inputs, + const std::vector<std::string>& outputs, float start_time, + float end_time); + + void declare_nodes(NodeRegistry& registry) override; + + void render(WGPUCommandEncoder encoder, const UniformsSequenceParams& params, + NodeRegistry& nodes) override; + + // Update FiLM conditioning; call before render() each frame. + void set_film_params(const CNNv3FiLMParams& fp); + + private: + // Intermediate node names (prefixed from output[0]) + std::string node_enc0_; + std::string node_enc1_; + std::string node_bottleneck_; + std::string node_dec1_; + + // 5 compute pipelines + ComputePipeline enc0_pipeline_; + ComputePipeline enc1_pipeline_; + ComputePipeline bn_pipeline_; + ComputePipeline dec1_pipeline_; + ComputePipeline dec0_pipeline_; + + // 5 bind groups (rebuilt each render since node views may change) + BindGroup enc0_bg_; + BindGroup enc1_bg_; + BindGroup bn_bg_; + BindGroup dec1_bg_; + BindGroup dec0_bg_; + + // Params uniform buffers (one per pass) + UniformBuffer<CnnV3Params4ch> enc0_params_buf_; + UniformBuffer<CnnV3ParamsEnc1> enc1_params_buf_; + UniformBuffer<CnnV3ParamsBn> bn_params_buf_; + UniformBuffer<CnnV3Params4ch> dec1_params_buf_; + UniformBuffer<CnnV3Params4ch> dec0_params_buf_; + + // Shared packed-f16 weights (storage buffer, read-only in all shaders) + GpuBuffer weights_buf_; + + // Per-pass params shadow (updated by set_film_params, uploaded in render) + CnnV3Params4ch enc0_params_{}; + CnnV3ParamsEnc1 enc1_params_{}; + CnnV3ParamsBn bn_params_{}; + CnnV3Params4ch dec1_params_{}; + CnnV3Params4ch dec0_params_{}; + + void create_pipelines(); + void update_bind_groups(NodeRegistry& nodes); +}; diff --git a/src/effects/shaders.cc b/src/effects/shaders.cc index 22c6a6d..f64e135 100644 --- a/src/effects/shaders.cc +++ b/src/effects/shaders.cc @@ -117,6 +117,11 @@ const char* ntsc_rgb_shader_wgsl = SafeGetAsset(AssetId::ASSET_SHADER_NTSC_RGB); const char* ntsc_yiq_shader_wgsl = SafeGetAsset(AssetId::ASSET_SHADER_NTSC_YIQ); const char* gbuf_raster_wgsl = SafeGetAsset(AssetId::ASSET_SHADER_GBUF_RASTER); const char* gbuf_pack_wgsl = SafeGetAsset(AssetId::ASSET_SHADER_GBUF_PACK); +const char* cnn_v3_enc0_wgsl = SafeGetAsset(AssetId::ASSET_SHADER_CNN_V3_ENC0); +const char* cnn_v3_enc1_wgsl = SafeGetAsset(AssetId::ASSET_SHADER_CNN_V3_ENC1); +const char* cnn_v3_bottleneck_wgsl = SafeGetAsset(AssetId::ASSET_SHADER_CNN_V3_BOTTLENECK); +const char* cnn_v3_dec1_wgsl = SafeGetAsset(AssetId::ASSET_SHADER_CNN_V3_DEC1); +const char* cnn_v3_dec0_wgsl = SafeGetAsset(AssetId::ASSET_SHADER_CNN_V3_DEC0); // Compute shaders const char* gen_noise_compute_wgsl = diff --git a/src/effects/shaders.h b/src/effects/shaders.h index cf095fb..4a77597 100644 --- a/src/effects/shaders.h +++ b/src/effects/shaders.h @@ -24,6 +24,13 @@ extern const char* ntsc_yiq_shader_wgsl; extern const char* gbuf_raster_wgsl; extern const char* gbuf_pack_wgsl; +// CNN v3 inference shaders +extern const char* cnn_v3_enc0_wgsl; +extern const char* cnn_v3_enc1_wgsl; +extern const char* cnn_v3_bottleneck_wgsl; +extern const char* cnn_v3_dec1_wgsl; +extern const char* cnn_v3_dec0_wgsl; + // Compute shaders extern const char* gen_noise_compute_wgsl; extern const char* gen_perlin_compute_wgsl; diff --git a/src/gpu/demo_effects.h b/src/gpu/demo_effects.h index 91ab6f2..66b920c 100644 --- a/src/gpu/demo_effects.h +++ b/src/gpu/demo_effects.h @@ -32,8 +32,9 @@ #include "effects/scratch_effect.h" #include "effects/ntsc_effect.h" -// CNN v3 G-buffer +// CNN v3 G-buffer + inference #include "../../cnn_v3/src/gbuffer_effect.h" +#include "../../cnn_v3/src/cnn_v3_effect.h" // TODO: Port CNN effects // #include "../../cnn_v1/src/cnn_v1_effect.h" diff --git a/src/tests/gpu/test_demo_effects.cc b/src/tests/gpu/test_demo_effects.cc index c2588f2..f5af5a9 100644 --- a/src/tests/gpu/test_demo_effects.cc +++ b/src/tests/gpu/test_demo_effects.cc @@ -84,6 +84,11 @@ static void test_effects() { fixture.ctx(), std::vector<std::string>{"source"}, std::vector<std::string>{"gbuf_feat0", "gbuf_feat1"}, 0.0f, 1000.0f)}, + {"CNNv3Effect", + std::make_shared<CNNv3Effect>( + fixture.ctx(), + std::vector<std::string>{"gbuf_feat0", "gbuf_feat1"}, + std::vector<std::string>{"cnn_v3_output"}, 0.0f, 1000.0f)}, }; int passed = 0; |
