diff options
| -rw-r--r-- | PROJECT_CONTEXT.md | 14 | ||||
| -rw-r--r-- | TODO.md | 107 | ||||
| -rw-r--r-- | cnn_v3/README.md | 4 | ||||
| -rw-r--r-- | cnn_v3/docs/HOWTO.md | 12 | ||||
| -rw-r--r-- | doc/COMPLETED.md | 6 | ||||
| -rw-r--r-- | doc/SEQUENCE.md | 2 | ||||
| -rw-r--r-- | src/audio/audio_engine.cc | 2 | ||||
| -rw-r--r-- | src/audio/synth.cc | 5 |
8 files changed, 63 insertions, 89 deletions
diff --git a/PROJECT_CONTEXT.md b/PROJECT_CONTEXT.md index bffbeb8..d7fc771 100644 --- a/PROJECT_CONTEXT.md +++ b/PROJECT_CONTEXT.md @@ -34,21 +34,21 @@ - **Timing System:** **Beat-based timelines** for musical synchronization. Sequences defined in beats, converted to seconds at runtime. Effects receive both physical time (constant) and beat time (musical). Variable tempo affects audio only. See `doc/BEAT_TIMING.md`. - **Workspace system:** Multi-workspace support. Easy switching with `-DDEMO_WORKSPACE=<name>`. Organized structure: `music/`, `weights/`, `obj/`, `shaders/`. Shared common shaders in `src/shaders/`. See `doc/WORKSPACE_SYSTEM.md`. - **Audio:** Sample-accurate sync. Zero heap allocations per frame. Variable tempo. OLA-IDCT synthesis (v2 .spec): Hann analysis window, rectangular synthesis, 50% overlap, click-free. V1 (raw DCT-512) preserved for generated notes. .spec files regenerated as v2. -- **Shaders:** Parameterized effects (UniformHelper, .seq syntax). Beat-synchronized animation support (`beat_time`, `beat_phase`). Modular WGSL composition with ShaderComposer. 27 shared common shaders (math, render, compute). Reusable snippets: `render/scratch_lines`, `render/ntsc_common` (NTSC signal processing, RGB and YIQ input variants via `sample_ntsc_signal` hook), `math/color` (YIQ/NTSC), `math/color_c64` (C64 palette, Bayer dither, border animation). +- **Shaders:** Parameterized effects (UniformHelper, .seq syntax). Beat-synchronized animation support (`beat_time`, `beat_phase`). Modular WGSL composition with ShaderComposer. 37 shared common shaders (math/, render/, compute/, debug/). Reusable snippets: `render/scratch_lines`, `render/ntsc_common` (NTSC signal processing, RGB and YIQ input variants via `sample_ntsc_signal` hook), `math/color` (YIQ/NTSC), `math/color_c64` (C64 palette, Bayer dither, border animation). - **3D:** Hybrid SDF/rasterization with BVH. Binary scene loader. Blender pipeline. -- **Effects:** CNN post-processing: CNNEffect (v1) and CNNv2Effect operational. CNN v2: sigmoid activation, storage buffer weights (~3.2 KB), 7D static features, dynamic layers. Training stable, convergence validated. **CNN v3 Phases 1–9 complete** + runtime pipeline operational: `GBufferEffect` (MRT raster + sphere impostors + SDF shadow pass) → `GBufDeferredEffect` (albedo×diffuse debug view) wired in `cnn_v3_test` sequence. Two training bugs fixed: dec0 ReLU removed (full [0,1] output range), FiLM MLP loaded from `.bin` at runtime. Parity validated: max_err=4.88e-4. See `cnn_v3/docs/HOWTO.md`. +- **Effects:** CNN post-processing: CNNEffect (v1) and CNNv2Effect operational. CNN v2: sigmoid activation, storage buffer weights (~3.2 KB), 7D static features, dynamic layers. Training stable, convergence validated. **CNN v3 Phases 1–9 complete** + runtime pipeline operational: `GBufferEffect` (MRT raster + sphere impostors + SDF shadow pass) → `GBufDeferredEffect` (albedo×diffuse) in `cnn_v3_test`; debug sequence adds `CNNv3Effect` → `GBufViewEffect`. Training bugs fixed: dec0 ReLU removed, FiLM MLP loaded from `.bin`. Parity validated: max_err=4.88e-4. See `cnn_v3/docs/HOWTO.md`. - **Tools:** CNN test tool operational. Texture readback utility functional. Timeline editor (web-based, beat-aligned, audio playback). - **Build:** Asset dependency tracking. Size measurement. Hot-reload (debug-only). WSL (Windows 10) supported: native Linux build and cross-compile to `.exe` via `mingw-w64`. -- **Sequence:** DAG-based effect routing with explicit node system. Python compiler with topological sort and ping-pong optimization. 12 effects operational (Passthrough, Placeholder, GaussianBlur, Heptagon, Particles, RotatingCube, Hybrid3D, Flash, PeakMeter, Scene1, Scene2, Scratch). Effect times are absolute (seq_compiler adds sequence start offset). See `doc/SEQUENCE.md`. -- **Testing:** **36/36 passing**. +- **Sequence:** DAG-based effect routing with explicit node system. Python compiler with topological sort and ping-pong optimization. 18 effects operational (Passthrough, Placeholder, GaussianBlur, Heptagon, Particles, RotatingCube, Hybrid3D, Flash, PeakMeter, Scene1, Scene2, Scratch, Ntsc, NtscYiq, GBufferEffect, CNNv3Effect, GBufDeferredEffect, GBufViewEffect). Effect times are absolute (seq_compiler adds sequence start offset). See `doc/SEQUENCE.md`. +- **Testing:** **38/38 passing**. --- ## Next Up -**Active:** CNN v3 training bugs fixed ✅ — retrain from scratch with more data (≥50 samples). Spectral Brush Editor. -**Ongoing:** Test infrastructure maintenance (38/38 passing) -**Future:** CNN v3 training pass, size optimization (64k target) +**Active:** Spectral Brush Editor (Task #5). CNN v3 data collection + retrain (≥50 samples needed, 11 collected). +**Ongoing:** Test infrastructure (38/38 passing). +**Future:** Size optimization (64k target), CNN v3 2D mode, CNN v2 8-bit quantization. See `TODO.md` for details. @@ -12,32 +12,41 @@ Procedural spectrogram tool: 50-100× compression (5 KB .spec → ~100 bytes C++ --- -## Priority 2: Test Infrastructure Maintenance [ONGOING] +## Priority 2: CNN v3 Training [IN PROGRESS] -**Status:** 38/38 tests passing +**Design:** `cnn_v3/docs/CNN_V3.md` | Phases 1–9 complete. Runtime pipeline operational. + +**Pipelines:** +- `cnn_v3_test`: `GBufferEffect` → `GBufDeferredEffect` +- `cnn_v3_debug`: `GBufferEffect` → `CNNv3Effect` → `GBufViewEffect` + +**Active:** +- [ ] Restore full scene in `GBufferEffect::set_scene()` (20 cubes + 4 spheres, 2 lights) +- [ ] Collect ≥50 training samples (currently 11) — see `cnn_v3/docs/HOWTO.md` §2 +- [ ] Retrain from scratch — see `cnn_v3/docs/HOWTO.md` §3 -### ✅ Fix FFT twiddle factor accumulation bug (`src/audio/fft.cc`) — DONE +**Pending (lower priority):** +- [ ] GBufferEffect: Pass 3 transparency (transp=0 placeholder) +- [ ] GBufferEffect: `resize()` support +- [ ] Web tool (`cnn_v3/tools/shaders.js`): `prev_cnn` hardcoded to 0 in both JS pack shaders + (lines ~313 / ~39). Fix: add `prev` texture binding and wire in `tester.js`. -`fft_radix2` now computes `wr = cosf(angle*k); wi = sinf(angle*k);` directly per k. -Tests A–E added to `test_fft.cc`. `arrays_match` default tolerance reverted to 5e-3. +--- -## ✅ Audio Timing Drift — DONE +## Priority 3: Test Infrastructure [ONGOING] -Events triggered ~180ms early over 63 beats @ BPM=90. Root causes fixed: -1. `chunk_frames` truncation accumulation replaced by accurate double-precision integration. -2. `tracker` updated to double-precision time representations for exact sample-accurate scheduling. +**Status:** 38/38 tests passing -## ✅ Audio System Enhancements — DONE +--- -1. **`synth.cc`: use `ola_decode()` from `src/audio/ola.h`** — `ola_decode_frame` extracted and used for per-frame OLA-IDCT synthesis, deduplicating the IDCT + overlap handling logic. +## Priority 4: GPU-Accelerated PCM Synthesis -2. **GPU-Accelerated PCM Synthesis:** - - Compute shader for direct PCM generation (bypass spectrogram) - - Write to compute buffer, readback to synth +Compute shader for direct PCM generation (bypasses spectrogram decode). +Write to compute buffer, readback to synth. No design doc yet. --- -## Priority 4: 3D System Enhancements (Task #18) +## Priority 5: 3D System Enhancements (Task #18) Pipeline for importing complex 3D scenes to replace hardcoded geometry. @@ -45,76 +54,34 @@ Pipeline for importing complex 3D scenes to replace hardcoded geometry. --- -## Priority 4: WGSL Modularization (Task #50) [RECURRENT] +## Priority 5: WGSL Modularization (Task #50) [RECURRENT] Ongoing shader code hygiene for granular, reusable snippets. --- -## Priority 4: Wine/Windows Black Screen +## Priority 5: Wine/Windows Black Screen -`demo64k.exe` runs under Wine (wgpu-native v27, Vulkan/MoltenVK) but shows a black window — no visuals rendered. Audio and timeline progress correctly. GPU device/adapter init succeeds. +`demo64k.exe` opens under Wine but shows a black window. Audio runs correctly. -**Likely causes to investigate:** +**Likely causes:** - Swapchain format mismatch (Wine Vulkan may prefer BGRA8 over RGBA8) -- Surface present failing silently (check `WGPUSurfaceGetCurrentTexture` status) -- Render pass output not reaching the surface (missing present call or wrong texture view) +- Surface present failing silently (`WGPUSurfaceGetCurrentTexture` status) +- Render pass output not reaching the surface -**To reproduce:** `./scripts/run_win.sh` — window opens, stays black. +**To reproduce:** `./scripts/run_win.sh` --- -## CNN v3 — U-Net + FiLM [IN PROGRESS] +## Future -**Design:** `cnn_v3/docs/CNN_V3.md` | All phases 1–9 complete. Runtime pipeline operational. - -**Current pipeline:** `GBufferEffect` → `GBufDeferredEffect` → `GBufViewEffect` → sink - -**Training bugs fixed (2026-03-27):** -- ✅ dec0 ReLU removed: output now spans full [0,1] range (was stuck ≥0.5) -- ✅ FiLM MLP loaded from `cnn_v3_film_mlp.bin` at runtime (was hardcoded heuristics) - -**Active work:** -- [ ] Restore full scene in `GBufferEffect::set_scene()` (20 cubes + 4 spheres, 2 lights) -- [ ] Collect ≥50 training samples (currently 11) — see `cnn_v3/docs/HOWTO.md` §2 -- [ ] Retrain from scratch — see `cnn_v3/docs/HOWTO.md` §3 - -**Pending (lower priority):** -- [ ] GBufferEffect: Pass 3 transparency (transp=0 placeholder) -- [ ] GBufferEffect: `resize()` support -- [ ] Web tool (`cnn_v3/tools/shaders.js`): `prev_cnn` always zero in both pack shaders - (`FULL_PACK_SHADER` line ~313 and simple pack line ~39 hardcode `prev=0`). - C++ `gbuf_pack.wgsl` reads a real `prev_cnn` texture (binding 6). - Fix: add a `prev` texture binding to both JS pack shaders and wire it up in `tester.js`. - -## Future: CNN v3 "2D Mode" (G-buffer-free) - -Allow `CNNv3Effect` to run on a plain screen buffer / photo without a real G-buffer. -Fake the missing feature vectors (normals, depth, material IDs, shadow, transp) from -the RGB input alone: -- normals: approximate from local luminance gradient (Sobel) -- depth: constant (e.g. 0.5) or estimated from a simple heuristic -- material IDs / shadow / transp: neutral defaults (e.g. 0) - -This would let the effect be applied to any rendered frame (post-NTSC, post-Scratch, etc.) -without requiring a 3D G-buffer pass upstream, and enable training/inference on photos. - -Implementation sketch: -- New `CNNv3Effect2D` subclass (or a mode flag) that synthesizes `feat_tex0`/`feat_tex1` - internally from a single `rgba8unorm` input, then runs the same 5-pass U-Net. -- Separate `gbuf_pack_2d.wgsl` compute shader that fills feat0/feat1 from a photo buffer. - -## Future: CNN v2 8-bit Quantization - -Reduce weights from f16 (~3.2 KB) to i8 (~1.6 KB). - -**Requirements:** Quantization-aware training (QAT) -**Design:** `cnn_v2/docs/CNN_V2.md` - ---- +### CNN v3 "2D Mode" (G-buffer-free) +Run `CNNv3Effect` on a plain screen buffer / photo — fake normals via Sobel, constant depth, neutral material defaults. New `gbuf_pack_2d.wgsl` + `CNNv3Effect2D` subclass or mode flag. -## Future: Size Optimization (64k Target) +### CNN v2 8-bit Quantization +Reduce weights f16 (~3.2 KB) → i8 (~1.6 KB). Requires QAT. See `cnn_v2/docs/CNN_V2.md`. +### Size Optimization (64k Target) - Task #22: Windows Native Platform (Win32) - Task #28: Spectrogram Quantization - Task #34: Full STL Removal diff --git a/cnn_v3/README.md b/cnn_v3/README.md index a844b1b..bd54e50 100644 --- a/cnn_v3/README.md +++ b/cnn_v3/README.md @@ -31,7 +31,7 @@ Add images directly to these directories and commit them. ## Status -**Phases 1–7 complete.** 36/36 tests pass. +**Phases 1–9 complete.** 38/38 tests pass. Training bugs fixed (2026-03-27). | Phase | Status | |-------|--------| @@ -42,6 +42,8 @@ Add images directly to these directories and commit them. | 5 — Parity validation | ✅ max_err=4.88e-4 | | 6 — Training script | ✅ train_cnn_v3.py | | 7 — Validation tools | ✅ GBufViewEffect + web sample loader | +| 8 — Architecture upgrade [8,16] | ✅ enc_channels=[8,16], 16ch split into lo/hi pairs | +| 9 — Training bug fixes | ✅ dec0 ReLU removed, FiLM MLP loaded from .bin | See `cnn_v3/docs/HOWTO.md` for the practical playbook (§9 covers validation tools). See `cnn_v3/docs/CNN_V3.md` for full design. diff --git a/cnn_v3/docs/HOWTO.md b/cnn_v3/docs/HOWTO.md index 67f7931..e8fd0a5 100644 --- a/cnn_v3/docs/HOWTO.md +++ b/cnn_v3/docs/HOWTO.md @@ -235,7 +235,7 @@ channel-dropout training. python3 cnn_v3/training/pack_photo_sample.py \ --photo input/photo1.jpg \ --target target/photo1_styled.png \ - --output dataset/photos/sample_001/ + --output dataset/simple/sample_001/ ``` `--target` is required and must be a stylized ground-truth image at the same @@ -245,9 +245,9 @@ resolution as the photo. The script writes it as `target.png` in the sample dir. ``` dataset/ - blender/ + full/ # Blender G-buffer samples (--input-mode full) sample_0001/ sample_0002/ ... - photos/ + simple/ # Photo/stylized pairs (--input-mode simple) sample_001/ sample_002/ ... ``` @@ -399,14 +399,14 @@ Test vectors generated by `cnn_v3/training/gen_test_vectors.py` (PyTorch referen | Phase | Status | Notes | |-------|--------|-------| -| 1 — G-buffer (raster + pack) | ✅ Done | Integrated, 36/36 tests pass | +| 1 — G-buffer (raster + pack) | ✅ Done | Integrated, 38/38 tests pass | | 1 — G-buffer (SDF shadow pass) | ✅ Done | `gbuf_shadow.wgsl`, proxy-box SDF | | 2 — Training infrastructure | ✅ Done | blender_export.py, pack_*_sample.py | | 3 — WGSL U-Net shaders | ✅ Done | 5 compute shaders + cnn_v3/common snippet | -| 4 — C++ CNNv3Effect | ✅ Done | FiLM uniform upload, 36/36 tests pass | +| 4 — C++ CNNv3Effect | ✅ Done | FiLM uniform upload, 38/38 tests pass | | 5 — Parity validation | ✅ Done | test_cnn_v3_parity.cc, max_err=4.88e-4 | | 6 — FiLM MLP training | ✅ Done | train_cnn_v3.py + cnn_v3_utils.py written | -| 7 — G-buffer visualizer (C++) | ✅ Done | GBufViewEffect, 36/36 tests pass | +| 7 — G-buffer visualizer (C++) | ✅ Done | GBufViewEffect, 38/38 tests pass | | 8 — Architecture upgrade [8,16] | ✅ Done | enc_channels=[8,16], multi-scale loss, 16ch textures split into lo/hi pairs | | 7 — Sample loader (web tool) | ✅ Done | "Load sample directory" in cnn_v3/tools/ | | 9 — Training bug fixes | ✅ Done | dec0 ReLU removed (output unblocked); FiLM MLP loaded at runtime | diff --git a/doc/COMPLETED.md b/doc/COMPLETED.md index 203c27a..233373e 100644 --- a/doc/COMPLETED.md +++ b/doc/COMPLETED.md @@ -34,6 +34,12 @@ Completed task archive. See `doc/archive/` for detailed historical documents. --- +## March 2026 (continued) + +- [x] **FFT twiddle factor fix** — `fft_radix2` computes `wr/wi` directly per k via `cosf/sinf(angle*k)`. Tests A–E added to `test_fft.cc`. Tolerance reverted to 5e-3. +- [x] **Audio timing drift fix** — Events were triggered ~180ms early over 63 beats. Fixed: `chunk_frames` truncation replaced by double-precision integration; tracker updated to double-precision time. +- [x] **OLA decode refactor** — `ola_decode_frame` extracted into `src/audio/ola.h` and used in `synth.cc`, deduplicating IDCT + overlap handling logic. + ## March 2026 - [x] **CNN v3 training bug fixes (2026-03-27)** — Two bugs blocking convergence: diff --git a/doc/SEQUENCE.md b/doc/SEQUENCE.md index 3d7a6ce..bb1e8e8 100644 --- a/doc/SEQUENCE.md +++ b/doc/SEQUENCE.md @@ -307,7 +307,7 @@ params.aspect_ratio; // width/height - DAG validation, topological sort, ping-pong optimization - Multi-input/multi-output effects - Node aliasing (compile-time optimization) -- 12 effects: Passthrough, Placeholder, GaussianBlur, Heptagon, Particles, RotatingCube, Hybrid3D, Flash, PeakMeter, Scene1, Scene2, Scratch +- 18 effects: Passthrough, Placeholder, GaussianBlur, Heptagon, Particles, RotatingCube, Hybrid3D, Flash, PeakMeter, Scene1, Scene2, Scratch, Ntsc, NtscYiq, GBufferEffect, CNNv3Effect, GBufDeferredEffect, GBufViewEffect **Missing/Future:** - Flatten mode (`--flatten` generates same code as dev mode) diff --git a/src/audio/audio_engine.cc b/src/audio/audio_engine.cc index b4c4863..c184324 100644 --- a/src/audio/audio_engine.cc +++ b/src/audio/audio_engine.cc @@ -184,7 +184,7 @@ void AudioEngine::seek(float target_time) { tracker_update(t, 0.0f); } - // 6. Final update at exact target time + // 5. Final update at exact target time tracker_update(target_time, 0.0f); current_time_ = target_time; diff --git a/src/audio/synth.cc b/src/audio/synth.cc index 9b56069..0161385 100644 --- a/src/audio/synth.cc +++ b/src/audio/synth.cc @@ -28,7 +28,7 @@ struct Voice { float overlap_buf[OLA_OVERLAP]; // OLA tail from previous frame (v2 only) bool ola_mode; // True for SPEC_VERSION_V2_OLA int buffer_pos; - float fractional_pos; // Fractional sample position for tempo scaling + float fractional_pos; // Reserved int start_sample_offset; // Samples to wait before producing audio output @@ -212,8 +212,7 @@ void synth_trigger_voice(int spectrogram_id, float volume, float pan, v.ola_mode ? OLA_HOP_SIZE : DCT_SIZE; // Force reload on first render if (v.ola_mode) memset(v.overlap_buf, 0, sizeof(v.overlap_buf)); - v.fractional_pos = - 0.0f; // Initialize fractional position for tempo scaling + v.fractional_pos = 0.0f; v.start_sample_offset = start_offset_samples; v.active_spectral_data = g_synth_data.active_spectrogram_data[spectrogram_id]; |
