| Age | Commit message (Collapse) | Author |
|
Extracted common WGSL functions into separate files in `common/shaders/` to improve reusability and maintainability.
- Created `common/shaders/render/fullscreen_vs.wgsl` for a reusable fullscreen vertex shader.
- Created `common/shaders/math/color.wgsl` for color conversion and tone mapping functions.
- Created `common/shaders/math/utils.wgsl` for general math utilities.
- Created `common/shaders/render/raymarching.wgsl` for SDF raymarching logic.
- Updated multiple shaders to use these new common snippets via `#include`.
- Fixed the shader asset validation test to correctly handle shaders that include the common vertex shader.
This refactoring makes the shader code more modular and easier to manage.
|
|
This refactoring improves the project's structure by decoupling visual effects from the core GPU module. All effect implementations have been moved from to a new top-level directory. Shared utilities like , , and have been consolidated into the parent directory.
- **Motivation**: To create a clearer separation of concerns, making the codebase easier to navigate and maintain. This move treats effects as a distinct layer that depends on the core GPU module, rather than being embedded within it.
- **Changes**:
- Created new directory.
- Moved all effect source files (, ) to .
- Moved shared helpers (, , ) to .
- Updated and to reflect the new file locations for all build targets.
- Corrected all directives across the entire codebase (, , ) to point to the new paths.
- Updated all markdown documentation ( files) to ensure file paths and architectural descriptions are accurate.
- Fixed several compiler errors related to incorrect enum casting () that were exposed during cross-compilation for Windows.
- **Verification**:
- The entire project builds successfully for both native and Windows cross-compilation targets.
- All 34 tests pass (Usage
ctest [options]).
- The --- Running Native Build & Tests ---
Configuring with all options enabled (tests + tools)...
--
-- Build Configuration:
-- DEMO_SIZE_OPT: ON
-- DEMO_STRIP_ALL: ON
-- DEMO_FINAL_STRIP: OFF
-- DEMO_STRIP_EXTERNAL_LIBS: OFF
-- DEMO_BUILD_TESTS: ON
-- DEMO_BUILD_TOOLS: ON
-- DEMO_ENABLE_COVERAGE: OFF
-- DEMO_ENABLE_DEBUG_LOGS: OFF
-- DEMO_HEADLESS: OFF
-- DEMO_WORKSPACE: main
--
-- Loaded workspace: Main Demo
-- Timeline: timeline.seq
-- Music: pop_punk_drums.track
-- Assets: assets.txt
-- Using workspace: main
-- Configuring done (0.0s)
-- Generating done (0.1s)
-- Build files have been written to: /Users/skal/demo/build
Building all targets (demo, tests, and tools)...
[ 0%] Built target validate_uniforms_script
[ 1%] Built target procedural
[ 2%] Validating uniform buffer sizes and alignments...
[ 3%] Built target tracker_compiler
[ 4%] Built target test_3d
[ 4%] Built target test_maths
[ 4%] Built target seq_compiler
[ 4%] Built target tracker_compiler_host
[ 5%] Built target asset_packer
[ 5%] Built target test_procedural
[ 6%] Compiling demo sequence from workspace main...
[ 6%] Built target generate_tracker_music
[ 6%] Built target generate_test_demo_music
[ 6%] Compiling test_demo sequence...
Using BPM: 90
Successfully generated timeline with 16 sequences.
Using BPM: 120
Demo end time: 16.000000s
Successfully generated timeline with 1 sequences.
[ 6%] Built target generate_test_demo_timeline
[ 6%] Built target generate_timeline
Validation Warning for 'CommonPostProcessUniforms': Matching WGSL struct not found.
Validation OK for 'FadeParams': Size 16 matches C++ expected size.
Validation OK for 'ThemeModulationParams': Size 16 matches C++ expected size.
Validation OK for 'GaussianBlurParams': Size 8 matches C++ expected size.
Validation OK for 'DistortParams': Size 8 matches C++ expected size.
Validation OK for 'CircleMaskParams': Size 16 matches C++ expected size.
[ 6%] Built target generate_test_assets
[ 7%] Built target generate_demo_assets
[ 7%] Built target validate_uniforms
[ 8%] Built target util
[ 10%] Built target test_assets
[ 11%] Built target test_shader_assets
[ 12%] Built target test_file_watcher
[ 15%] Built target 3d
[ 21%] Built target test_platform
[ 22%] Built target audio
[ 23%] Built target test_window
[ 26%] Built target test_fft
[ 27%] Built target test_synth
[ 27%] Built target test_spectral_brush
[ 27%] Built target test_physics
[ 28%] Built target test_dct
[ 31%] Building CXX object CMakeFiles/gpu.dir/src/gpu/effect.cc.o
[ 30%] Built target test_mock_backend
[ 32%] Built target test_scene_loader
[ 33%] Built target test_audio_backend
[ 34%] Built target test_audio_gen
[ 36%] Built target test_silent_backend
[ 39%] Built target test_jittered_audio
[ 39%] Building CXX object CMakeFiles/gpu.dir/src/effects/heptagon_effect.cc.o
[ 42%] Built target test_wav_dump
[ 44%] Built target test_tracker_timing
[ 44%] Building CXX object CMakeFiles/gpu.dir/src/effects/particles_effect.cc.o
[ 45%] Building CXX object CMakeFiles/gpu.dir/src/effects/passthrough_effect.cc.o
[ 47%] Built target test_variable_tempo
[ 50%] Built target test_audio_engine
[ 52%] Built target test_tracker
[ 52%] Building CXX object CMakeFiles/gpu.dir/src/effects/moving_ellipse_effect.cc.o
[ 52%] Building CXX object CMakeFiles/gpu.dir/src/effects/particle_spray_effect.cc.o
[ 52%] Building CXX object CMakeFiles/gpu.dir/src/effects/gaussian_blur_effect.cc.o
[ 54%] Built target test_spectool
[ 55%] Building CXX object CMakeFiles/gpu.dir/src/effects/solarize_effect.cc.o
[ 55%] Building CXX object CMakeFiles/gpu.dir/src/effects/scene1_effect.cc.o
[ 55%] Building CXX object CMakeFiles/gpu.dir/src/effects/chroma_aberration_effect.cc.o
[ 55%] Building CXX object CMakeFiles/gpu.dir/src/gpu/shaders.cc.o
[ 57%] Building CXX object CMakeFiles/gpu.dir/src/effects/vignette_effect.cc.o
[ 57%] Building CXX object CMakeFiles/gpu.dir/src/gpu/post_process_helper.cc.o
[ 57%] Linking CXX static library libgpu.a
[ 60%] Built target gpu
[ 60%] Linking CXX executable test_uniform_helper
[ 60%] Linking CXX executable test_shader_composer
[ 60%] Building CXX object CMakeFiles/test_sequence.dir/src/tests/assets/test_sequence.cc.o
[ 61%] Linking CXX executable test_noise_functions
[ 62%] Linking CXX executable test_shader_compilation
[ 62%] Building CXX object CMakeFiles/test_demo.dir/src/app/test_demo.cc.o
[ 62%] Building CXX object CMakeFiles/demo64k.dir/src/app/main.cc.o
[ 62%] Building CXX object CMakeFiles/test_3d_render.dir/src/generated/timeline.cc.o
[ 63%] Built target test_uniform_helper
[ 64%] Built target test_shader_composer
[ 64%] Building CXX object CMakeFiles/test_3d_physics.dir/src/generated/timeline.cc.o
[ 65%] Built target test_noise_functions
[ 66%] Built target test_shader_compilation
[ 67%] Building CXX object CMakeFiles/test_mesh.dir/src/generated/timeline.cc.o
[ 67%] Building CXX object CMakeFiles/test_effect_base.dir/src/tests/gpu/test_effect_base.cc.o
[ 67%] Building CXX object CMakeFiles/test_demo_effects.dir/src/tests/gpu/test_demo_effects.cc.o
[ 67%] Building CXX object CMakeFiles/test_sequence.dir/src/generated/timeline.cc.o
[ 68%] Building CXX object CMakeFiles/test_demo.dir/src/generated/test_demo_timeline.cc.o
[ 68%] Building CXX object CMakeFiles/demo64k.dir/src/generated/timeline.cc.o
[ 68%] Linking CXX executable test_3d_render
[ 68%] Building CXX object CMakeFiles/test_effect_base.dir/src/generated/timeline.cc.o
[ 68%] Linking CXX executable test_3d_physics
[ 68%] Linking CXX executable test_mesh
[ 71%] Built target test_3d_render
[ 71%] Building CXX object CMakeFiles/test_post_process_helper.dir/src/tests/gpu/test_post_process_helper.cc.o
[ 72%] Building CXX object CMakeFiles/test_demo_effects.dir/src/generated/timeline.cc.o
[ 72%] Linking CXX executable test_demo
[ 75%] Built target test_3d_physics
[ 77%] Built target test_mesh
[ 77%] Linking CXX executable test_texture_manager
[ 78%] Linking CXX executable test_sequence
[ 78%] Linking CXX executable test_gpu_procedural
[ 80%] Built target test_demo
[ 81%] Linking CXX executable test_gpu_composite
[ 81%] Linking CXX executable demo64k
[ 83%] Built target test_sequence
[ 85%] Built target test_texture_manager
[ 86%] Built target test_gpu_procedural
[ 86%] Linking CXX executable test_post_process_helper
[ 86%] Linking CXX executable test_effect_base
[ 87%] Built target test_gpu_composite
[ 90%] Built target demo64k
[ 92%] Built target test_post_process_helper
[ 96%] Built target test_effect_base
[ 96%] Linking CXX executable test_demo_effects
[100%] Built target test_demo_effects
Running test suite...
Test project /Users/skal/demo/build
Start 1: HammingWindowTest
1/34 Test #1: HammingWindowTest ................ Passed 0.00 sec
Start 2: MathUtilsTest
2/34 Test #2: MathUtilsTest .................... Passed 0.00 sec
Start 3: FileWatcherTest
3/34 Test #3: FileWatcherTest .................. Passed 0.00 sec
Start 4: SynthEngineTest
4/34 Test #4: SynthEngineTest .................. Passed 0.00 sec
Start 5: DctTest
5/34 Test #5: DctTest .......................... Passed 0.00 sec
Start 6: FftTest
6/34 Test #6: FftTest .......................... Passed 0.01 sec
Start 7: SpectralBrushTest
7/34 Test #7: SpectralBrushTest ................ Passed 0.01 sec
Start 8: AudioGenTest
8/34 Test #8: AudioGenTest ..................... Passed 0.00 sec
Start 9: AudioBackendTest
9/34 Test #9: AudioBackendTest ................. Passed 0.00 sec
Start 10: SilentBackendTest
10/34 Test #10: SilentBackendTest ................ Passed 0.00 sec
Start 11: MockAudioBackendTest
11/34 Test #11: MockAudioBackendTest ............. Passed 0.00 sec
Start 12: WavDumpBackendTest
12/34 Test #12: WavDumpBackendTest ............... Passed 0.00 sec
Start 13: JitteredAudioBackendTest
13/34 Test #13: JitteredAudioBackendTest ......... Passed 0.00 sec
Start 14: TrackerTimingTest
14/34 Test #14: TrackerTimingTest ................ Passed 0.00 sec
Start 15: VariableTempoTest
15/34 Test #15: VariableTempoTest ................ Passed 0.00 sec
Start 16: TrackerSystemTest
16/34 Test #16: TrackerSystemTest ................ Passed 0.01 sec
Start 17: AudioEngineTest
17/34 Test #17: AudioEngineTest .................. Passed 0.00 sec
Start 18: ShaderAssetValidation
18/34 Test #18: ShaderAssetValidation ............ Passed 0.00 sec
Start 19: ShaderCompilationTest
19/34 Test #19: ShaderCompilationTest ............ Passed 0.02 sec
Start 20: NoiseFunctionsTest
20/34 Test #20: NoiseFunctionsTest ............... Passed 0.01 sec
Start 21: UniformHelperTest
21/34 Test #21: UniformHelperTest ................ Passed 0.00 sec
Start 22: AssetManagerTest
22/34 Test #22: AssetManagerTest ................. Passed 0.01 sec
Start 23: SequenceSystemTest
23/34 Test #23: SequenceSystemTest ............... Passed 0.01 sec
Start 24: ProceduralGenTest
24/34 Test #24: ProceduralGenTest ................ Passed 0.01 sec
Start 25: PhysicsTest
25/34 Test #25: PhysicsTest ...................... Passed 0.01 sec
Start 26: ThreeDSystemTest
26/34 Test #26: ThreeDSystemTest ................. Passed 0.00 sec
Start 27: ShaderComposerTest
27/34 Test #27: ShaderComposerTest ............... Passed 0.01 sec
Start 28: SceneLoaderTest
28/34 Test #28: SceneLoaderTest .................. Passed 0.01 sec
Start 29: EffectBaseTest
29/34 Test #29: EffectBaseTest ................... Passed 0.04 sec
Start 30: DemoEffectsTest
30/34 Test #30: DemoEffectsTest .................. Passed 0.03 sec
Start 31: PostProcessHelperTest
31/34 Test #31: PostProcessHelperTest ............ Passed 0.02 sec
Start 32: TextureManagerTest
32/34 Test #32: TextureManagerTest ............... Passed 0.02 sec
Start 33: GpuProceduralTest
33/34 Test #33: GpuProceduralTest ................ Passed 0.18 sec
Start 34: GpuCompositeTest
34/34 Test #34: GpuCompositeTest ................. Passed 0.20 sec
100% tests passed, 0 tests failed out of 34
Label Time Summary:
3d = 0.01 sec*proc (3 tests)
assets = 0.02 sec*proc (2 tests)
audio = 0.07 sec*proc (15 tests)
gpu = 0.54 sec*proc (11 tests)
util = 0.01 sec*proc (3 tests)
Total Test time (real) = 0.67 sec
Verifying tools compile...
[ 9%] Built target procedural
[ 18%] Built target tracker_compiler_host
[ 18%] Built target tracker_compiler
[ 18%] Built target generate_tracker_music
[ 18%] Built target asset_packer
[ 27%] Built target generate_demo_assets
[ 27%] Built target generate_test_assets
[ 36%] Built target util
[ 81%] Built target audio
[100%] Built target test_spectool
--- Running Windows Cross-Compilation Build ---
Building native tools...
--
-- Build Configuration:
-- DEMO_SIZE_OPT: OFF
-- DEMO_STRIP_ALL: OFF
-- DEMO_FINAL_STRIP: OFF
-- DEMO_STRIP_EXTERNAL_LIBS: OFF
-- DEMO_BUILD_TESTS: OFF
-- DEMO_BUILD_TOOLS: OFF
-- DEMO_ENABLE_COVERAGE: OFF
-- DEMO_ENABLE_DEBUG_LOGS: OFF
-- DEMO_HEADLESS: OFF
-- DEMO_WORKSPACE: main
--
-- Loaded workspace: Main Demo
-- Timeline: timeline.seq
-- Music: pop_punk_drums.track
-- Assets: assets.txt
-- Using workspace: main
-- Configuring done (0.0s)
-- Generating done (0.0s)
-- Build files have been written to: /Users/skal/demo/build_native
[ 50%] Built target procedural
[100%] Built target asset_packer
[100%] Built target seq_compiler
[100%] Built target tracker_compiler_host
Cross-compiling for Windows...
--
-- Build Configuration:
-- DEMO_SIZE_OPT: ON
-- DEMO_STRIP_ALL: ON
-- DEMO_FINAL_STRIP: OFF
-- DEMO_STRIP_EXTERNAL_LIBS: OFF
-- DEMO_BUILD_TESTS: OFF
-- DEMO_BUILD_TOOLS: OFF
-- DEMO_ENABLE_COVERAGE: OFF
-- DEMO_ENABLE_DEBUG_LOGS: OFF
-- DEMO_HEADLESS: OFF
-- DEMO_WORKSPACE: main
--
-- Loaded workspace: Main Demo
-- Timeline: timeline.seq
-- Music: pop_punk_drums.track
-- Assets: assets.txt
-- Using workspace: main
-- Configuring done (0.0s)
-- Generating done (0.0s)
-- Build files have been written to: /Users/skal/demo/build_win
[ 2%] Built target validate_uniforms_script
[ 2%] Built target generate_timeline
[ 4%] Built target generate_test_demo_timeline
[ 4%] Built target generate_demo_assets
[ 4%] Built target generate_test_assets
[ 6%] Built target procedural
[ 9%] Built target tracker_compiler_host
[ 10%] Validating uniform buffer sizes and alignments...
[ 11%] Built target generate_tracker_music
[ 13%] Built target generate_test_demo_music
[ 16%] Built target util
[ 28%] Built target 3d
[ 45%] Built target audio
[ 49%] Building CXX object CMakeFiles/gpu.dir/src/effects/heptagon_effect.cc.obj
[ 52%] Building CXX object CMakeFiles/gpu.dir/src/effects/gaussian_blur_effect.cc.obj
[ 54%] Building CXX object CMakeFiles/gpu.dir/src/effects/particles_effect.cc.obj
[ 54%] Building CXX object CMakeFiles/gpu.dir/src/effects/moving_ellipse_effect.cc.obj
[ 54%] Building CXX object CMakeFiles/gpu.dir/src/gpu/effect.cc.obj
[ 54%] Building CXX object CMakeFiles/gpu.dir/src/effects/passthrough_effect.cc.obj
[ 54%] Building CXX object CMakeFiles/gpu.dir/src/effects/particle_spray_effect.cc.obj
Validation Warning for 'CommonPostProcessUniforms': Matching WGSL struct not found.
Validation OK for 'FadeParams': Size 16 matches C++ expected size.
Validation OK for 'ThemeModulationParams': Size 16 matches C++ expected size.
Validation OK for 'GaussianBlurParams': Size 8 matches C++ expected size.
Validation OK for 'DistortParams': Size 8 matches C++ expected size.
Validation OK for 'CircleMaskParams': Size 16 matches C++ expected size.
[ 54%] Built target validate_uniforms
[ 55%] Building CXX object CMakeFiles/gpu.dir/src/effects/solarize_effect.cc.obj
[ 57%] Building CXX object CMakeFiles/gpu.dir/src/effects/scene1_effect.cc.obj
[ 57%] Building CXX object CMakeFiles/gpu.dir/src/effects/chroma_aberration_effect.cc.obj
[ 58%] Building CXX object CMakeFiles/gpu.dir/src/effects/vignette_effect.cc.obj
[ 59%] Building CXX object CMakeFiles/gpu.dir/src/gpu/post_process_helper.cc.obj
[ 60%] Building CXX object CMakeFiles/gpu.dir/src/gpu/shaders.cc.obj
[ 62%] Linking CXX static library libgpu.a
[ 77%] Built target gpu
[ 79%] Building CXX object CMakeFiles/demo64k.dir/src/app/main.cc.obj
[ 79%] Building CXX object CMakeFiles/test_demo.dir/src/app/test_demo.cc.obj
[ 80%] Building CXX object CMakeFiles/demo64k.dir/src/generated/timeline.cc.obj
[ 81%] Building CXX object CMakeFiles/test_demo.dir/src/generated/test_demo_timeline.cc.obj
[ 82%] Linking CXX executable test_demo.exe
[ 90%] Built target test_demo
[ 91%] Linking CXX executable demo64k.exe
[100%] Built target demo64k
Copying MinGW DLLs...
Crunching build_win/demo64k.exe...
Ultimate Packer for eXecutables
Copyright (C) 1996 - 2026
UPX 5.1.0 Markus Oberhumer, Laszlo Molnar & John Reiser Jan 7th 2026
File size Ratio Format Name
-------------------- ------ ----------- -----------
7036416 -> 4680704 66.52% win64/pe demo64k_packed.exe
Packed 1 file.
------------------------------------------------
Size Report:
-rwxr-xr-x 1 skal 89939 6.7M Feb 14 14:55 build_win/demo64k.exe
-rwxr-xr-x 1 skal 89939 6.7M Feb 14 14:55 build_win/demo64k_stripped.exe
-rwxr-xr-x 1 skal 89939 4.5M Feb 14 14:55 build_win/demo64k_packed.exe
------------------------------------------------
Top 20 Largest Symbols (from unstripped):
------------------------------------------------
Build complete. Output: build_win/demo64k.exe
All checks passed successfully. script completes without errors.
This change streamlines the project's architecture without altering any functionality.
|
|
Adds new helper for common post-process texture pattern (RenderAttachment
| TextureBinding | CopySrc usage). Refactors test_post_process_helper.cc
to use gpu_create_buffer() and gpu_create_post_process_texture(),
eliminating 91 lines of boilerplate.
- New: gpu_create_post_process_texture() in gpu.{h,cc}
- Refactor: test_post_process_helper.cc uses helpers instead of raw WGPU
- Doc: Updated WGPU_HELPERS.md with usage examples
- Verified: All tests passing (test_post_process_helper, test_demo_effects)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
Reduces WGPUTextureViewDescriptor boilerplate from 5-7 lines to 1-2.
Helper supports optional mip_levels parameter (defaults to 1).
Updated 17 call sites across gpu/, tests/, and tools/.
Net: -82 lines. All tests passing (34/34).
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
Add texture creation helpers (gpu_create_texture_2d, gpu_create_storage_texture_2d,
gpu_create_mip_view) and extend BindGroupLayoutBuilder with uint_texture and
storage_texture methods.
Refactored files:
- cnn_v2_effect.cc: Use texture helpers (~70% code reduction in create_textures)
- rotating_cube_effect.cc: Use BindGroupLayoutBuilder and texture helpers
- circle_mask_effect.cc: Use BindGroupBuilder
Benefits:
- Improved code readability
- Reduced boilerplate for texture/bind group creation
- Consistent patterns across effects
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
**CNN v2 Changes:**
- Replace point sampling with bilinear interpolation for mip-level features
- Add linear sampler (binding 6) to static features shader
- Update CNNv2Effect, cnn_test, and HTML tool
**HTML Tool UI:**
- Move controls to floating bottom bar in central view
- Consolidate video controls + Blend/Depth/Save PNG in single container
- Increase left panel width: 300px → 315px (+5%)
- Remove per-frame debug messages (visualization, rendering logs)
**Technical:**
- WGSL: textureSample() with linear_sampler vs textureLoad()
- C++: Create WGPUSampler with Linear filtering
- HTML: Change sampler from 'nearest' to 'linear'
handoff(Claude): CNN v2 now uses bilinear mip-level sampling across all tools
|
|
Root cause: Binary format is [header:20B][layer_info:20B×N][weights].
Both cnn_test and CNNv2Effect uploaded entire file to weights_buffer,
but shader reads weights_buffer[0] expecting first weight, not header.
Fix: Skip header + layer_info when uploading to GPU buffer.
- cnn_test.cc: Calculate weights_offset, upload only weights section
- cnn_v2_effect.cc: Same fix for runtime effect
Before: layer_0 output showed [R, uv_x, uv_y, black] (wrong channels)
After: layer_0 output shows [R, G, B, D] (correct identity mapping)
Tests: 34/36 passing (2 unrelated failures)
|
|
Training changes:
- Changed p3 default depth from 0.0 to 1.0 (far plane semantics)
- Extract depth from target alpha channel in both datasets
- Consistent alpha-as-depth across training/validation
Test tool enhancements (cnn_test):
- Added load_depth_from_alpha() for R32Float depth texture
- Fixed bind group layout for UnfilterableFloat sampling
- Added --save-intermediates with per-channel grayscale composites
- Each layer saved as 4x wide PNG (p0-p3 stacked horizontally)
- Global layers_composite.png for vertical layer stack overview
Investigation notes:
- Static features p4-p7 ARE computed and bound correctly
- Sin_20_y pattern visibility difference between tools under investigation
- Binary weights timestamp (Feb 13 20:36) vs HTML tool (Feb 13 22:12)
- Next: Update HTML tool with canonical binary weights
handoff(Claude): HTML tool weights update pending - base64 encoded
canonical weights ready in /tmp/weights_b64.txt for line 392 replacement.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
Fix two issues causing validation errors in test_demo:
1. Remove redundant pipeline creation without layout (static_pipeline_)
2. Change vec3<u32> to 3× u32 fields in StaticFeatureParams struct
WGSL vec3<u32> aligns to 16 bytes (std140), making struct 32 bytes,
while C++ struct was 16 bytes. Explicit fields ensure consistent layout.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
Document future enhancement for arbitrary feature vector layouts.
Proposed feature descriptor in binary format v3:
- Specify feature types, sources, and ordering
- Enable runtime experimentation without shader recompilation
- Examples: [R,G,B,dx,dy,uv_x,bias] or [mip1.r,mip2.g,laplacian,uv_x,sin20_x,bias]
Added TODOs in:
- CNN_V2_BINARY_FORMAT.md: Detailed proposal with struct layout
- CNN_V2.md: Future extensions section
- train_cnn_v2.py: compute_static_features() docstring
- cnn_v2_static.wgsl: Shader header comment
- cnn_v2_effect.cc: Version check comment
Current limitation: Hardcoded [p0,p1,p2,p3,uv_x,uv_y,sin10_x,bias] layout.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
Binary format v2 includes mip_level in header (20 bytes, was 16).
Effect reads mip_level and passes to static features shader via uniform.
Shader samples from correct mip texture based on mip_level.
Changes:
- export_cnn_v2_weights.py: Header v2 with mip_level field
- cnn_v2_effect.h: Add StaticFeatureParams, mip_level member, params buffer
- cnn_v2_effect.cc: Read mip_level from weights, create/bind params buffer, update per-frame
- cnn_v2_static.wgsl: Accept params uniform, sample from selected mip level
Binary format v2:
- Header: 20 bytes (magic, version=2, num_layers, total_weights, mip_level)
- Backward compatible: v1 weights load with mip_level=0
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
Updated comments to clarify that per-layer kernel sizes are supported.
Code already handles this correctly via LayerInfo.kernel_size field.
Changes:
- cnn_v2_effect.h: Add comment about per-layer kernel sizes
- cnn_v2_compute.wgsl: Clarify LayerParams provides per-layer config
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
- Add --cnn-version <1|2> flag to select between CNN v1 and v2
- Implement beat_phase modulation for dynamic blend in both CNN effects
- Fix CNN v2 per-layer uniform buffer sharing (each layer needs own buffer)
- Fix CNN v2 y-axis orientation to match render pass convention
- Add Scene1Effect as base visual layer to test_demo timeline
- Reorganize CNN v2 shaders into cnn_v2/ subdirectory
- Update asset paths and documentation for new shader organization
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
FATAL_CHECK triggers when condition is TRUE (error case).
Inverted equality checks: magic/version == correct_value
would fatal when weights were valid.
Changed to != checks to fail on invalid data.
|
|
- Create bind groups per layer with ping-pong buffers
- Update layer params uniform per dispatch
- Execute all layers in sequence with proper input/output swapping
- Ready for weight export and end-to-end testing
|
|
- Add binary weight format (header + layer info + packed f16)
- New export_cnn_v2_weights.py for binary weight export
- Single cnn_v2_compute.wgsl shader with storage buffer
- Load weights in CNNv2Effect::load_weights()
- Create layer compute pipeline with 5 bindings
- Fast training config: 100 epochs, 3×3 kernels, 8→4→4 channels
Next: Complete bind group creation and multi-layer compute execution
|
|
Complete multi-pass compute execution for CNNv2Effect.
Implementation:
- Layer texture creation (ping-pong buffers for intermediate results)
- Static features compute pipeline with bind group layout
- Bind group creation with 5 bindings (input mips + depth + output)
- compute() override for multi-pass execution
- Static features pass with proper workgroup dispatch
Architecture:
- Static features: 8×f16 packed as 4×u32 (RGBD + UV + sin + bias)
- Layer buffers: 2×RGBA32Uint textures (8 channels f16 each)
- Input mips: 3 levels (0, 1, 2) for multi-scale features
- Workgroup size: 8×8 threads
Status:
- Static features compute pass functional
- Layer pipeline infrastructure ready
- All 36/36 tests passing
Next: Layer shader integration, multi-layer execution
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
Infrastructure for enhanced CNN post-processing with 7D feature input.
Phase 1: Shaders
- Static features compute (RGBD + UV + sin10_x + bias → 8×f16)
- Layer template (convolution skeleton, packing/unpacking)
- 3 mip level support for multi-scale features
Phase 2: C++ Effect
- CNNv2Effect class (multi-pass architecture)
- Texture management (static features, layer buffers)
- Build integration (CMakeLists, assets, tests)
Phase 3: Training Pipeline
- train_cnn_v2.py: PyTorch model with static feature concatenation
- export_cnn_v2_shader.py: f32→f16 quantization, WGSL generation
- Configurable architecture (kernels, channels)
Phase 4: Validation
- validate_cnn_v2.sh: End-to-end pipeline
- Checkpoint → shaders → build → test images
Tests: 36/36 passing
Next: Complete render pipeline implementation (bind groups, multi-pass)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
- Add render/scene_query_mode to known placeholders in VerifyIncludes
- Remove warning for duplicate auxiliary texture registration (valid for multiple CNNEffect stacks)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
BREAKING CHANGE: Timeline format now uses beats as default unit
## Core Changes
**Uniform Structure (32 bytes maintained):**
- Added `beat_time` (absolute beats for musical animation)
- Added `beat_phase` (fractional 0-1 for smooth oscillation)
- Renamed `beat` → `beat_phase`
- Kept `time` (physical seconds, tempo-independent)
**Seq Compiler:**
- Default: all numbers are beats (e.g., `5`, `16.5`)
- Explicit seconds: `2.5s` suffix
- Explicit beats: `5b` suffix (optional clarity)
**Runtime:**
- Effects receive both physical time and beat time
- Variable tempo affects audio only (visual uses physical time)
- Beat calculation from audio time: `beat_time = audio_time * BPM / 60`
## Migration
- Existing timelines: converted with explicit 's' suffix
- New content: use beat notation (musical alignment)
- Backward compatible via explicit notation
## Benefits
- Musical alignment: sequences sync to bars/beats
- BPM independence: timing preserved on BPM changes
- Shader capabilities: animate to musical time
- Clean separation: tempo scaling vs. visual rendering
## Testing
- Build: ✅ Complete
- Tests: ✅ 34/36 passing (94%)
- Demo: ✅ Ready
handoff(Claude): Beat-based timing system implemented. Variable tempo
only affects audio sample triggering. Visual effects use physical_time
(constant) and beat_time (musical). Shaders can now animate to beats.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
SamplerCache singleton never released samplers, causing device to retain
references at shutdown. Add clear() method and call before fixture cleanup.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
- Release queue reference after submit in texture_readback
- Add final wgpuDevicePoll before cleanup to sync GPU work
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
- Add cnn_conv1x1 to shader composer registration
- Add VerifyIncludes() to detect missing snippet registrations
- STRIP_ALL-protected verification warns about unregistered includes
- Fixes cnn_test runtime failure loading cnn_layer.wgsl
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
Hardcoded vec2(1280.0f, 720.0f) → u.resolution
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
ctx_.device exists before init() but Renderer3D not initialized yet.
Changed guard from !ctx_.device to !initialized_ flag.
Set initialized_ = true after renderer_.init() in both effects.
All 36 tests pass. Demo runs without crash.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
Root cause: After swapping init/resize order, effects with Renderer3D crashed
because resize() called before init() tried to use uninitialized GPU resources.
Changes:
- Add guards in FlashCubeEffect::resize() and Hybrid3DEffect::resize() to
check ctx_.device before calling renderer_.resize()
- Remove lazy initialization remnants from CircleMaskEffect and CNNEffect
- Register auxiliary textures directly in init() (width_/height_ already set)
- Remove ensure_texture() methods and texture_initialized_ flags
All 36 tests passing. Demo runs without crashes.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
Simpler solution than lazy initialization: effects need correct
dimensions during init() to register auxiliary textures.
Changed initialization order in MainSequence:
- resize() sets width_/height_ FIRST
- init() can then use correct dimensions
Reverted lazy initialization complexity. One-line fix.
Tests: All 36 tests passing, demo runs without error
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
Prevents init/resize ordering bug and avoids unnecessary reallocation.
Changes:
- Auxiliary textures created on first use (compute/update_bind_group)
- Added ensure_texture() methods to defer registration until resize()
- Added early return in resize() if dimensions unchanged
- Removed texture registration from init() methods
Benefits:
- No reallocation on window resize if dimensions match
- Texture created with correct dimensions from start
- Memory saved if effect never renders
Tests: All 36 tests passing
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
Auxiliary textures were created during init() using default dimensions
(1280x720) before resize() was called with actual window size. This
caused compute shaders to receive uniforms with correct resolution but
render to wrong-sized textures.
Changes:
- Add MainSequence::resize_auxiliary_texture() to recreate textures
- Override resize() in CircleMaskEffect to resize circle_mask texture
- Override resize() in CNNEffect to resize captured_frame texture
- Bind groups are recreated with new texture views after resize
Tests: All 36 tests passing
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
Root cause: Uniform buffers created but not initialized before bind group
creation, causing undefined UV coordinates in circle_mask_compute.wgsl.
Changes:
- Add get_common_uniforms() helper to Effect base class
- Refactor render()/compute() signatures: 5 params → CommonPostProcessUniforms&
- Fix uninitialized uniforms in CircleMaskEffect and CNNEffect
- Update all 19 effect implementations and headers
- Fix WGSL syntax error in FlashEffect (u.audio_intensity → audio_intensity)
- Update test files (test_sequence.cc)
Benefits:
- Cleaner API: construct uniforms once per frame, reuse across effects
- More maintainable: CommonPostProcessUniforms changes need no call site updates
- Fixes UV coordinate bug in circle_mask_compute.wgsl
All 36 tests passing (100%)
handoff(Claude): Effect API refactor complete
|
|
|
|
Fixed buffer mapping callback mode mismatch causing Unknown status.
Changed from WaitAnyOnly+ProcessEvents to AllowProcessEvents+DevicePoll.
Readback now functional but CNN output incorrect (all white).
Issue isolated to tool-specific binding/uniform setup - CNNEffect
in demo works correctly.
Technical details:
- WGPUCallbackMode_WaitAnyOnly requires wgpuInstanceWaitAny
- Using wgpuInstanceProcessEvents with WaitAnyOnly never fires callback
- Fixed by using AllowProcessEvents mode + wgpuDevicePoll
- Removed debug output and platform warnings
Status: 36/36 tests pass, readback works, CNN shader issue remains.
handoff(Claude): CNN test tool readback fixed, output debugging needed
|
|
Core GPU Utility (texture_readback):
- Reusable synchronous texture-to-CPU readback (~150 lines)
- STRIP_ALL guards (0 bytes in release builds)
- Handles COPY_BYTES_PER_ROW_ALIGNMENT (256-byte alignment)
- Refactored OffscreenRenderTarget to use new utility
CNN Test Tool (cnn_test):
- Standalone PNG→3-layer CNN→PNG/PPM tool (~450 lines)
- --blend parameter (0.0-1.0) for final layer mixing
- --format option (png/ppm) for output format
- ShaderComposer integration for include resolution
Build Integration:
- Added texture_readback.cc to GPU_SOURCES (both sections)
- Tool target with STB_IMAGE support
Testing:
- All 36 tests pass (100%)
- Processes 64×64 and 555×370 images successfully
- Ground-truth validation setup complete
Known Issues:
- BUG: Tool produces black output (uninitialized input texture)
- First intermediate texture not initialized before layer loop
- MSE 64860 vs Python ground truth (expected <10)
- Fix required: Copy input to intermediate[0] before processing
Documentation:
- doc/CNN_TEST_TOOL.md - Full technical reference
- Updated PROJECT_CONTEXT.md and COMPLETED.md
handoff(Claude): CNN test tool foundation complete, needs input init bugfix
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
PyTorch Conv2d uses zero-padding; shader was using Repeat mode which
wraps edges. ClampToEdge better approximates zero-padding behavior.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
Converted ShaderToy shader (Saturday cubism experiment) to Scene1Effect
following EFFECT_WORKFLOW.md automation guidelines.
**Changes:**
- Created Scene1Effect (.h, .cc) as scene effect (not post-process)
- Converted GLSL to WGSL with manual fixes:
- Replaced RESOLUTION/iTime with uniforms.resolution/time
- Fixed const expressions (normalize not allowed in const)
- Converted mainImage() to fs_main() return value
- Manual matrix rotation for scene transformation
- Added shader asset to workspaces/main/assets.txt
- Registered in CMakeLists.txt (both GPU_SOURCES sections)
- Added to demo_effects.h and shaders declarations
- Added to timeline.seq at 22.5s for 10s duration
- Added to test_demo_effects.cc scene_effects list
**Shader features:**
- Raymarching cube and sphere with ground plane
- Reflections and soft shadows
- Sky rendering with sun and horizon glow
- ACES tonemapping and sRGB output
- Time-based rotation animation
**Tests:** All effects tests passing (5/9 scene, 9/9 post-process)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
Add BindGroupLayoutBuilder, BindGroupBuilder, RenderPipelineBuilder,
and SamplerCache to reduce repetitive WGPU code. Refactor
post_process_helper, cnn_effect, and rotating_cube_effect.
Changes:
- Bind group creation: 19 instances, 14→4 lines each
- Pipeline creation: 30-50→8 lines
- Sampler deduplication: 6 instances → cached
- Total boilerplate reduction: -122 lines across 3 files
Builder pattern prevents binding index errors and consolidates
platform-specific #ifdef in fewer locations. Binary size unchanged
(6.3M debug). Tests pass.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
Two bugs causing black screen when CNN post-processing activated:
1. Framebuffer capture timing: Capture ran inside post-effect loop after
ping-pong swaps, causing layers 1+ to capture wrong buffer. Moved
capture before loop to copy framebuffer_a once before post-chain starts.
2. Missing uniforms update: CNNEffect never updated uniforms_ buffer,
leaving uniforms.resolution uninitialized (0,0). UV calculation
p.xy/uniforms.resolution produced NaN, causing all texture samples
to return black. Added uniforms update in update_bind_group().
Files modified:
- src/gpu/effect.cc: Capture before post-chain (lines 308-346)
- src/gpu/effects/cnn_effect.cc: Add uniforms update (lines 132-142)
- workspaces/main/shaders/cnn/cnn_layer.wgsl: Remove obsolete comment
- doc/CNN_DEBUG.md: Historical debugging doc
- CLAUDE.md: Reference CNN_DEBUG.md in historical section
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
CNNEffect's "original" input was black because FadeEffect (priority 1) ran
before CNNEffect (priority 1), fading the scene. Changed framebuffer capture
to use framebuffer_a (scene output) instead of current_input (post-chain).
Also add seq_compiler validation to detect post-process priority collisions
within and across concurrent sequences, preventing similar render order issues.
Updated stub_types.h WGPULoadOp enum values to match webgpu.h spec.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
Implements automatic layer chaining and generic framebuffer capture API for
multi-layer neural network effects with proper original input preservation.
Key changes:
- Effect::needs_framebuffer_capture() - generic API for pre-render capture
- MainSequence: auto-capture to "captured_frame" auxiliary texture
- CNNEffect: multi-layer support via layer_index/total_layers params
- seq_compiler: expands "layers=N" to N chained effect instances
- Shader: @binding(4) original_input available to all layers
- Training: generates layer switches and original input binding
- Blend: mix(original, result, blend_amount) uses layer 0 input
Timeline syntax: CNNEffect layers=3 blend=0.7
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
Implements multi-layer convolutional neural network shader for stylized
post-processing of 3D rendered scenes:
**Core Components:**
- CNNEffect: C++ effect class with single-layer rendering (expandable to multi-pass)
- Modular WGSL snippets: cnn_activation, cnn_conv3x3/5x5/7x7, cnn_weights_generated
- Placeholder identity-like weights for initial testing (to be replaced by trained weights)
**Architecture:**
- Flexible kernel sizes (3×3, 5×5, 7×7) via separate snippet files
- ShaderComposer integration (#include resolution)
- Residual connections (input + processed output)
- Supports parallel convolutions (design ready, single conv implemented)
**Size Impact:**
- ~3-4 KB shader code (snippets + main shader)
- ~2-4 KB weights (depends on network architecture when trained)
- Total: ~5-8 KB (acceptable for 64k demo)
**Testing:**
- CNNEffect added to test_demo_effects.cc
- 36/36 tests passing (100%)
**Next Steps:**
- Training script (scripts/train_cnn.py) to generate real weights
- Multi-layer rendering with ping-pong textures
- Weight quantization for size optimization
handoff(Claude): CNN effect foundation complete, ready for training integration
|
|
Implements DEMO_HEADLESS build option for fast iteration cycles:
- Functional GPU/platform stubs (not pure no-ops like STRIP_EXTERNAL_LIBS)
- Audio and timeline systems work normally
- No rendering overhead
- Useful for CI, audio development, timeline validation
Files added:
- doc/HEADLESS_MODE.md - Documentation
- src/gpu/headless_gpu.cc - Validated GPU stubs
- src/platform/headless_platform.cc - Time simulation (60Hz)
- scripts/test_headless.sh - End-to-end test script
Usage:
cmake -B build_headless -DDEMO_HEADLESS=ON
cmake --build build_headless -j4
./build_headless/demo64k --headless --duration 30
Progress printed every 5s. Compatible with --dump_wav mode.
handoff(Claude): Task #76 follow-up - headless mode complete
|
|
- Use ma_backend_null for audio (100-200KB savings)
- Stub platform/gpu abstractions instead of external APIs
- Add DEMO_STRIP_EXTERNAL_LIBS build mode
- Create stub_types.h with minimal WebGPU opaque types
- Add scripts/measure_size.sh for automated measurement
Results: Demo=4.4MB, External=2.0MB (69% vs 31%)
handoff(Claude): Task #76 complete. Binary compiles but doesn't run (size measurement only).
|
|
CircleMaskEffect was creating shader modules directly without using
ShaderComposer, causing #include directives to fail at runtime.
Changes:
- Add ShaderComposer.Compose() for compute and render shaders
- Include shader_composer.h header
Fixes demo64k crash on CircleMaskEffect initialization.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
Replace redundant CommonUniforms struct definitions across 13 shaders
with #include "common_uniforms" directive. Integrate ShaderComposer
preprocessing into all shader creation pipelines.
Changes:
- Replace 9-line CommonUniforms definitions with single #include line
- Add ShaderComposer.Compose() to create_post_process_pipeline()
- Add ShaderComposer.Compose() to gpu_create_render_pass()
- Add ShaderComposer.Compose() to gpu_create_compute_pass()
- Add InitShaderComposer() calls to test_effect_base and test_demo_effects
- Update test_shader_compilation to compose shaders before validation
Net reduction: 83 lines of duplicate code eliminated
All 35 tests passing (100%)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
Replace hardcoded linear_sampler_ with configurable sampler map.
- SamplerType enum (LinearClamp, LinearRepeat, NearestClamp, NearestRepeat)
- get_or_create_sampler() for lazy sampler creation
- Default to LinearClamp for backward compatibility
Eliminates hardcoded assumptions, more flexible for future use cases.
|
|
Multi-input composite shaders with sampler support.
- Dynamic bind group layouts (N input textures + 1 sampler)
- dispatch_composite() for multi-input compute dispatch
- create_gpu_composite_texture() API
- gen_blend.wgsl and gen_mask.wgsl shaders
Guarded with #if !defined(STRIP_GPU_COMPOSITE) for easy removal.
Tests:
- Blend two noise textures
- Mask noise with grid
- Multi-stage composite (composite of composites)
Size: ~830 bytes (2 shaders + dispatch logic)
handoff(Claude): GPU procedural Phase 4 complete
|
|
Replace individual pipeline pointers with map-based system.
- Changed from 3 pointers to std::map<string, ComputePipelineInfo>
- Unified get_or_create_compute_pipeline() for lazy init
- Unified dispatch_compute() for all shaders
- Simplified create_gpu_*_texture() methods (~390 lines removed)
handoff(Claude): GPU procedural texture refactoring complete
|
|
Complete Phase 2 implementation:
- gen_perlin.wgsl: FBM with configurable octaves, amplitude decay
- gen_grid.wgsl: Grid pattern with configurable spacing/thickness
- TextureManager extensions: create_gpu_perlin_texture(), create_gpu_grid_texture()
- Asset packer now validates gen_noise, gen_perlin, gen_grid for PROC_GPU()
- 3 compute pipelines (lazy-init on first use)
Shader parameters:
- gen_perlin: seed, frequency, amplitude, amplitude_decay, octaves (32 bytes)
- gen_grid: width, height, grid_size, thickness (16 bytes)
test_3d_render migration:
- Replaced CPU sky texture (gen_perlin) with GPU version
- Replaced CPU noise texture (gen_noise) with GPU version
- Added new GPU grid texture (256x256, 32px grid, 2px lines)
Size impact:
- gen_perlin.wgsl: ~200 bytes (compressed)
- gen_grid.wgsl: ~100 bytes (compressed)
- Total Phase 2 code: ~300 bytes
- Cumulative (Phase 1+2): ~600 bytes
Testing:
- All 34 tests passing (100%)
- test_gpu_procedural validates all generators
- test_3d_render uses 3 GPU textures (noise, perlin, grid)
Next: Phase 3 - Variable dimensions, async generation, pipeline caching
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
Phase 1 implementation complete:
- GPU compute shader for noise generation (gen_noise.wgsl)
- TextureManager extensions: create_gpu_noise_texture(), dispatch_noise_compute()
- Asset packer PROC_GPU() syntax support with validation
- ShaderComposer integration for #include resolution
- Zero CPU memory overhead (GPU-only textures)
- Init-time and on-demand generation modes
Technical details:
- 8×8 workgroup size for 256×256 textures
- UniformBuffer for params (width, height, seed, frequency)
- Storage texture binding (rgba8unorm, write-only)
- Lazy pipeline compilation on first use
- ~300 bytes code (Phase 1)
Testing:
- New test: test_gpu_procedural.cc (passes)
- All 34 tests passing (100%)
Future phases:
- Phase 2: Add gen_perlin, gen_grid compute shaders
- Phase 3: Variable dimensions, async generation
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|