summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorskal <pascal.massimino@gmail.com>2026-03-29 10:15:38 +0200
committerskal <pascal.massimino@gmail.com>2026-03-29 10:15:38 +0200
commite22256e374694fd92cc55ba198d3f7b1911713fe (patch)
tree8361b5d512551c5bf513c36f1abef4ecaf8454f7
parent3be659d9f0a150f8a6527ad0edc31787b0d39994 (diff)
docs: consolidate and sync docs with current codebase state
- PROJECT_CONTEXT.md: fix effect count (12→18), shader count (27→37), update CNN v3 pipeline description, tighten Next Up section - TODO.md: fix priority numbering, restore GPU PCM synthesis as pending, streamline CNN v3 section, consolidate Future items - doc/SEQUENCE.md: effect count 12→18 - cnn_v3/README.md: phases 1–7→1–9, test count 36→38, add phases 8–9 - cnn_v3/docs/HOWTO.md: fix dataset layout blender/photos→full/simple, update test counts 36→38 throughout - doc/COMPLETED.md: archive FFT/timing/OLA fixes, remove false GPU PCM claim - src/audio/audio_engine.cc: fix step comment numbering (6→5) - src/audio/synth.cc: remove stale fractional_pos tempo-scaling comment handoff(Gemini): docs now accurate — 18 effects, 37 shaders, 38/38 tests, GPU PCM synthesis back in TODO as pending, CNN v3 dataset layout corrected.
-rw-r--r--PROJECT_CONTEXT.md14
-rw-r--r--TODO.md107
-rw-r--r--cnn_v3/README.md4
-rw-r--r--cnn_v3/docs/HOWTO.md12
-rw-r--r--doc/COMPLETED.md6
-rw-r--r--doc/SEQUENCE.md2
-rw-r--r--src/audio/audio_engine.cc2
-rw-r--r--src/audio/synth.cc5
8 files changed, 63 insertions, 89 deletions
diff --git a/PROJECT_CONTEXT.md b/PROJECT_CONTEXT.md
index bffbeb8..d7fc771 100644
--- a/PROJECT_CONTEXT.md
+++ b/PROJECT_CONTEXT.md
@@ -34,21 +34,21 @@
- **Timing System:** **Beat-based timelines** for musical synchronization. Sequences defined in beats, converted to seconds at runtime. Effects receive both physical time (constant) and beat time (musical). Variable tempo affects audio only. See `doc/BEAT_TIMING.md`.
- **Workspace system:** Multi-workspace support. Easy switching with `-DDEMO_WORKSPACE=<name>`. Organized structure: `music/`, `weights/`, `obj/`, `shaders/`. Shared common shaders in `src/shaders/`. See `doc/WORKSPACE_SYSTEM.md`.
- **Audio:** Sample-accurate sync. Zero heap allocations per frame. Variable tempo. OLA-IDCT synthesis (v2 .spec): Hann analysis window, rectangular synthesis, 50% overlap, click-free. V1 (raw DCT-512) preserved for generated notes. .spec files regenerated as v2.
-- **Shaders:** Parameterized effects (UniformHelper, .seq syntax). Beat-synchronized animation support (`beat_time`, `beat_phase`). Modular WGSL composition with ShaderComposer. 27 shared common shaders (math, render, compute). Reusable snippets: `render/scratch_lines`, `render/ntsc_common` (NTSC signal processing, RGB and YIQ input variants via `sample_ntsc_signal` hook), `math/color` (YIQ/NTSC), `math/color_c64` (C64 palette, Bayer dither, border animation).
+- **Shaders:** Parameterized effects (UniformHelper, .seq syntax). Beat-synchronized animation support (`beat_time`, `beat_phase`). Modular WGSL composition with ShaderComposer. 37 shared common shaders (math/, render/, compute/, debug/). Reusable snippets: `render/scratch_lines`, `render/ntsc_common` (NTSC signal processing, RGB and YIQ input variants via `sample_ntsc_signal` hook), `math/color` (YIQ/NTSC), `math/color_c64` (C64 palette, Bayer dither, border animation).
- **3D:** Hybrid SDF/rasterization with BVH. Binary scene loader. Blender pipeline.
-- **Effects:** CNN post-processing: CNNEffect (v1) and CNNv2Effect operational. CNN v2: sigmoid activation, storage buffer weights (~3.2 KB), 7D static features, dynamic layers. Training stable, convergence validated. **CNN v3 Phases 1–9 complete** + runtime pipeline operational: `GBufferEffect` (MRT raster + sphere impostors + SDF shadow pass) → `GBufDeferredEffect` (albedo×diffuse debug view) wired in `cnn_v3_test` sequence. Two training bugs fixed: dec0 ReLU removed (full [0,1] output range), FiLM MLP loaded from `.bin` at runtime. Parity validated: max_err=4.88e-4. See `cnn_v3/docs/HOWTO.md`.
+- **Effects:** CNN post-processing: CNNEffect (v1) and CNNv2Effect operational. CNN v2: sigmoid activation, storage buffer weights (~3.2 KB), 7D static features, dynamic layers. Training stable, convergence validated. **CNN v3 Phases 1–9 complete** + runtime pipeline operational: `GBufferEffect` (MRT raster + sphere impostors + SDF shadow pass) → `GBufDeferredEffect` (albedo×diffuse) in `cnn_v3_test`; debug sequence adds `CNNv3Effect` → `GBufViewEffect`. Training bugs fixed: dec0 ReLU removed, FiLM MLP loaded from `.bin`. Parity validated: max_err=4.88e-4. See `cnn_v3/docs/HOWTO.md`.
- **Tools:** CNN test tool operational. Texture readback utility functional. Timeline editor (web-based, beat-aligned, audio playback).
- **Build:** Asset dependency tracking. Size measurement. Hot-reload (debug-only). WSL (Windows 10) supported: native Linux build and cross-compile to `.exe` via `mingw-w64`.
-- **Sequence:** DAG-based effect routing with explicit node system. Python compiler with topological sort and ping-pong optimization. 12 effects operational (Passthrough, Placeholder, GaussianBlur, Heptagon, Particles, RotatingCube, Hybrid3D, Flash, PeakMeter, Scene1, Scene2, Scratch). Effect times are absolute (seq_compiler adds sequence start offset). See `doc/SEQUENCE.md`.
-- **Testing:** **36/36 passing**.
+- **Sequence:** DAG-based effect routing with explicit node system. Python compiler with topological sort and ping-pong optimization. 18 effects operational (Passthrough, Placeholder, GaussianBlur, Heptagon, Particles, RotatingCube, Hybrid3D, Flash, PeakMeter, Scene1, Scene2, Scratch, Ntsc, NtscYiq, GBufferEffect, CNNv3Effect, GBufDeferredEffect, GBufViewEffect). Effect times are absolute (seq_compiler adds sequence start offset). See `doc/SEQUENCE.md`.
+- **Testing:** **38/38 passing**.
---
## Next Up
-**Active:** CNN v3 training bugs fixed ✅ — retrain from scratch with more data (≥50 samples). Spectral Brush Editor.
-**Ongoing:** Test infrastructure maintenance (38/38 passing)
-**Future:** CNN v3 training pass, size optimization (64k target)
+**Active:** Spectral Brush Editor (Task #5). CNN v3 data collection + retrain (≥50 samples needed, 11 collected).
+**Ongoing:** Test infrastructure (38/38 passing).
+**Future:** Size optimization (64k target), CNN v3 2D mode, CNN v2 8-bit quantization.
See `TODO.md` for details.
diff --git a/TODO.md b/TODO.md
index ea48fc2..132be5d 100644
--- a/TODO.md
+++ b/TODO.md
@@ -12,32 +12,41 @@ Procedural spectrogram tool: 50-100× compression (5 KB .spec → ~100 bytes C++
---
-## Priority 2: Test Infrastructure Maintenance [ONGOING]
+## Priority 2: CNN v3 Training [IN PROGRESS]
-**Status:** 38/38 tests passing
+**Design:** `cnn_v3/docs/CNN_V3.md` | Phases 1–9 complete. Runtime pipeline operational.
+
+**Pipelines:**
+- `cnn_v3_test`: `GBufferEffect` → `GBufDeferredEffect`
+- `cnn_v3_debug`: `GBufferEffect` → `CNNv3Effect` → `GBufViewEffect`
+
+**Active:**
+- [ ] Restore full scene in `GBufferEffect::set_scene()` (20 cubes + 4 spheres, 2 lights)
+- [ ] Collect ≥50 training samples (currently 11) — see `cnn_v3/docs/HOWTO.md` §2
+- [ ] Retrain from scratch — see `cnn_v3/docs/HOWTO.md` §3
-### ✅ Fix FFT twiddle factor accumulation bug (`src/audio/fft.cc`) — DONE
+**Pending (lower priority):**
+- [ ] GBufferEffect: Pass 3 transparency (transp=0 placeholder)
+- [ ] GBufferEffect: `resize()` support
+- [ ] Web tool (`cnn_v3/tools/shaders.js`): `prev_cnn` hardcoded to 0 in both JS pack shaders
+ (lines ~313 / ~39). Fix: add `prev` texture binding and wire in `tester.js`.
-`fft_radix2` now computes `wr = cosf(angle*k); wi = sinf(angle*k);` directly per k.
-Tests A–E added to `test_fft.cc`. `arrays_match` default tolerance reverted to 5e-3.
+---
-## ✅ Audio Timing Drift — DONE
+## Priority 3: Test Infrastructure [ONGOING]
-Events triggered ~180ms early over 63 beats @ BPM=90. Root causes fixed:
-1. `chunk_frames` truncation accumulation replaced by accurate double-precision integration.
-2. `tracker` updated to double-precision time representations for exact sample-accurate scheduling.
+**Status:** 38/38 tests passing
-## ✅ Audio System Enhancements — DONE
+---
-1. **`synth.cc`: use `ola_decode()` from `src/audio/ola.h`** — `ola_decode_frame` extracted and used for per-frame OLA-IDCT synthesis, deduplicating the IDCT + overlap handling logic.
+## Priority 4: GPU-Accelerated PCM Synthesis
-2. **GPU-Accelerated PCM Synthesis:**
- - Compute shader for direct PCM generation (bypass spectrogram)
- - Write to compute buffer, readback to synth
+Compute shader for direct PCM generation (bypasses spectrogram decode).
+Write to compute buffer, readback to synth. No design doc yet.
---
-## Priority 4: 3D System Enhancements (Task #18)
+## Priority 5: 3D System Enhancements (Task #18)
Pipeline for importing complex 3D scenes to replace hardcoded geometry.
@@ -45,76 +54,34 @@ Pipeline for importing complex 3D scenes to replace hardcoded geometry.
---
-## Priority 4: WGSL Modularization (Task #50) [RECURRENT]
+## Priority 5: WGSL Modularization (Task #50) [RECURRENT]
Ongoing shader code hygiene for granular, reusable snippets.
---
-## Priority 4: Wine/Windows Black Screen
+## Priority 5: Wine/Windows Black Screen
-`demo64k.exe` runs under Wine (wgpu-native v27, Vulkan/MoltenVK) but shows a black window — no visuals rendered. Audio and timeline progress correctly. GPU device/adapter init succeeds.
+`demo64k.exe` opens under Wine but shows a black window. Audio runs correctly.
-**Likely causes to investigate:**
+**Likely causes:**
- Swapchain format mismatch (Wine Vulkan may prefer BGRA8 over RGBA8)
-- Surface present failing silently (check `WGPUSurfaceGetCurrentTexture` status)
-- Render pass output not reaching the surface (missing present call or wrong texture view)
+- Surface present failing silently (`WGPUSurfaceGetCurrentTexture` status)
+- Render pass output not reaching the surface
-**To reproduce:** `./scripts/run_win.sh` — window opens, stays black.
+**To reproduce:** `./scripts/run_win.sh`
---
-## CNN v3 — U-Net + FiLM [IN PROGRESS]
+## Future
-**Design:** `cnn_v3/docs/CNN_V3.md` | All phases 1–9 complete. Runtime pipeline operational.
-
-**Current pipeline:** `GBufferEffect` → `GBufDeferredEffect` → `GBufViewEffect` → sink
-
-**Training bugs fixed (2026-03-27):**
-- ✅ dec0 ReLU removed: output now spans full [0,1] range (was stuck ≥0.5)
-- ✅ FiLM MLP loaded from `cnn_v3_film_mlp.bin` at runtime (was hardcoded heuristics)
-
-**Active work:**
-- [ ] Restore full scene in `GBufferEffect::set_scene()` (20 cubes + 4 spheres, 2 lights)
-- [ ] Collect ≥50 training samples (currently 11) — see `cnn_v3/docs/HOWTO.md` §2
-- [ ] Retrain from scratch — see `cnn_v3/docs/HOWTO.md` §3
-
-**Pending (lower priority):**
-- [ ] GBufferEffect: Pass 3 transparency (transp=0 placeholder)
-- [ ] GBufferEffect: `resize()` support
-- [ ] Web tool (`cnn_v3/tools/shaders.js`): `prev_cnn` always zero in both pack shaders
- (`FULL_PACK_SHADER` line ~313 and simple pack line ~39 hardcode `prev=0`).
- C++ `gbuf_pack.wgsl` reads a real `prev_cnn` texture (binding 6).
- Fix: add a `prev` texture binding to both JS pack shaders and wire it up in `tester.js`.
-
-## Future: CNN v3 "2D Mode" (G-buffer-free)
-
-Allow `CNNv3Effect` to run on a plain screen buffer / photo without a real G-buffer.
-Fake the missing feature vectors (normals, depth, material IDs, shadow, transp) from
-the RGB input alone:
-- normals: approximate from local luminance gradient (Sobel)
-- depth: constant (e.g. 0.5) or estimated from a simple heuristic
-- material IDs / shadow / transp: neutral defaults (e.g. 0)
-
-This would let the effect be applied to any rendered frame (post-NTSC, post-Scratch, etc.)
-without requiring a 3D G-buffer pass upstream, and enable training/inference on photos.
-
-Implementation sketch:
-- New `CNNv3Effect2D` subclass (or a mode flag) that synthesizes `feat_tex0`/`feat_tex1`
- internally from a single `rgba8unorm` input, then runs the same 5-pass U-Net.
-- Separate `gbuf_pack_2d.wgsl` compute shader that fills feat0/feat1 from a photo buffer.
-
-## Future: CNN v2 8-bit Quantization
-
-Reduce weights from f16 (~3.2 KB) to i8 (~1.6 KB).
-
-**Requirements:** Quantization-aware training (QAT)
-**Design:** `cnn_v2/docs/CNN_V2.md`
-
----
+### CNN v3 "2D Mode" (G-buffer-free)
+Run `CNNv3Effect` on a plain screen buffer / photo — fake normals via Sobel, constant depth, neutral material defaults. New `gbuf_pack_2d.wgsl` + `CNNv3Effect2D` subclass or mode flag.
-## Future: Size Optimization (64k Target)
+### CNN v2 8-bit Quantization
+Reduce weights f16 (~3.2 KB) → i8 (~1.6 KB). Requires QAT. See `cnn_v2/docs/CNN_V2.md`.
+### Size Optimization (64k Target)
- Task #22: Windows Native Platform (Win32)
- Task #28: Spectrogram Quantization
- Task #34: Full STL Removal
diff --git a/cnn_v3/README.md b/cnn_v3/README.md
index a844b1b..bd54e50 100644
--- a/cnn_v3/README.md
+++ b/cnn_v3/README.md
@@ -31,7 +31,7 @@ Add images directly to these directories and commit them.
## Status
-**Phases 1–7 complete.** 36/36 tests pass.
+**Phases 1–9 complete.** 38/38 tests pass. Training bugs fixed (2026-03-27).
| Phase | Status |
|-------|--------|
@@ -42,6 +42,8 @@ Add images directly to these directories and commit them.
| 5 — Parity validation | ✅ max_err=4.88e-4 |
| 6 — Training script | ✅ train_cnn_v3.py |
| 7 — Validation tools | ✅ GBufViewEffect + web sample loader |
+| 8 — Architecture upgrade [8,16] | ✅ enc_channels=[8,16], 16ch split into lo/hi pairs |
+| 9 — Training bug fixes | ✅ dec0 ReLU removed, FiLM MLP loaded from .bin |
See `cnn_v3/docs/HOWTO.md` for the practical playbook (§9 covers validation tools).
See `cnn_v3/docs/CNN_V3.md` for full design.
diff --git a/cnn_v3/docs/HOWTO.md b/cnn_v3/docs/HOWTO.md
index 67f7931..e8fd0a5 100644
--- a/cnn_v3/docs/HOWTO.md
+++ b/cnn_v3/docs/HOWTO.md
@@ -235,7 +235,7 @@ channel-dropout training.
python3 cnn_v3/training/pack_photo_sample.py \
--photo input/photo1.jpg \
--target target/photo1_styled.png \
- --output dataset/photos/sample_001/
+ --output dataset/simple/sample_001/
```
`--target` is required and must be a stylized ground-truth image at the same
@@ -245,9 +245,9 @@ resolution as the photo. The script writes it as `target.png` in the sample dir.
```
dataset/
- blender/
+ full/ # Blender G-buffer samples (--input-mode full)
sample_0001/ sample_0002/ ...
- photos/
+ simple/ # Photo/stylized pairs (--input-mode simple)
sample_001/ sample_002/ ...
```
@@ -399,14 +399,14 @@ Test vectors generated by `cnn_v3/training/gen_test_vectors.py` (PyTorch referen
| Phase | Status | Notes |
|-------|--------|-------|
-| 1 — G-buffer (raster + pack) | ✅ Done | Integrated, 36/36 tests pass |
+| 1 — G-buffer (raster + pack) | ✅ Done | Integrated, 38/38 tests pass |
| 1 — G-buffer (SDF shadow pass) | ✅ Done | `gbuf_shadow.wgsl`, proxy-box SDF |
| 2 — Training infrastructure | ✅ Done | blender_export.py, pack_*_sample.py |
| 3 — WGSL U-Net shaders | ✅ Done | 5 compute shaders + cnn_v3/common snippet |
-| 4 — C++ CNNv3Effect | ✅ Done | FiLM uniform upload, 36/36 tests pass |
+| 4 — C++ CNNv3Effect | ✅ Done | FiLM uniform upload, 38/38 tests pass |
| 5 — Parity validation | ✅ Done | test_cnn_v3_parity.cc, max_err=4.88e-4 |
| 6 — FiLM MLP training | ✅ Done | train_cnn_v3.py + cnn_v3_utils.py written |
-| 7 — G-buffer visualizer (C++) | ✅ Done | GBufViewEffect, 36/36 tests pass |
+| 7 — G-buffer visualizer (C++) | ✅ Done | GBufViewEffect, 38/38 tests pass |
| 8 — Architecture upgrade [8,16] | ✅ Done | enc_channels=[8,16], multi-scale loss, 16ch textures split into lo/hi pairs |
| 7 — Sample loader (web tool) | ✅ Done | "Load sample directory" in cnn_v3/tools/ |
| 9 — Training bug fixes | ✅ Done | dec0 ReLU removed (output unblocked); FiLM MLP loaded at runtime |
diff --git a/doc/COMPLETED.md b/doc/COMPLETED.md
index 203c27a..233373e 100644
--- a/doc/COMPLETED.md
+++ b/doc/COMPLETED.md
@@ -34,6 +34,12 @@ Completed task archive. See `doc/archive/` for detailed historical documents.
---
+## March 2026 (continued)
+
+- [x] **FFT twiddle factor fix** — `fft_radix2` computes `wr/wi` directly per k via `cosf/sinf(angle*k)`. Tests A–E added to `test_fft.cc`. Tolerance reverted to 5e-3.
+- [x] **Audio timing drift fix** — Events were triggered ~180ms early over 63 beats. Fixed: `chunk_frames` truncation replaced by double-precision integration; tracker updated to double-precision time.
+- [x] **OLA decode refactor** — `ola_decode_frame` extracted into `src/audio/ola.h` and used in `synth.cc`, deduplicating IDCT + overlap handling logic.
+
## March 2026
- [x] **CNN v3 training bug fixes (2026-03-27)** — Two bugs blocking convergence:
diff --git a/doc/SEQUENCE.md b/doc/SEQUENCE.md
index 3d7a6ce..bb1e8e8 100644
--- a/doc/SEQUENCE.md
+++ b/doc/SEQUENCE.md
@@ -307,7 +307,7 @@ params.aspect_ratio; // width/height
- DAG validation, topological sort, ping-pong optimization
- Multi-input/multi-output effects
- Node aliasing (compile-time optimization)
-- 12 effects: Passthrough, Placeholder, GaussianBlur, Heptagon, Particles, RotatingCube, Hybrid3D, Flash, PeakMeter, Scene1, Scene2, Scratch
+- 18 effects: Passthrough, Placeholder, GaussianBlur, Heptagon, Particles, RotatingCube, Hybrid3D, Flash, PeakMeter, Scene1, Scene2, Scratch, Ntsc, NtscYiq, GBufferEffect, CNNv3Effect, GBufDeferredEffect, GBufViewEffect
**Missing/Future:**
- Flatten mode (`--flatten` generates same code as dev mode)
diff --git a/src/audio/audio_engine.cc b/src/audio/audio_engine.cc
index b4c4863..c184324 100644
--- a/src/audio/audio_engine.cc
+++ b/src/audio/audio_engine.cc
@@ -184,7 +184,7 @@ void AudioEngine::seek(float target_time) {
tracker_update(t, 0.0f);
}
- // 6. Final update at exact target time
+ // 5. Final update at exact target time
tracker_update(target_time, 0.0f);
current_time_ = target_time;
diff --git a/src/audio/synth.cc b/src/audio/synth.cc
index 9b56069..0161385 100644
--- a/src/audio/synth.cc
+++ b/src/audio/synth.cc
@@ -28,7 +28,7 @@ struct Voice {
float overlap_buf[OLA_OVERLAP]; // OLA tail from previous frame (v2 only)
bool ola_mode; // True for SPEC_VERSION_V2_OLA
int buffer_pos;
- float fractional_pos; // Fractional sample position for tempo scaling
+ float fractional_pos; // Reserved
int start_sample_offset; // Samples to wait before producing audio output
@@ -212,8 +212,7 @@ void synth_trigger_voice(int spectrogram_id, float volume, float pan,
v.ola_mode ? OLA_HOP_SIZE : DCT_SIZE; // Force reload on first render
if (v.ola_mode)
memset(v.overlap_buf, 0, sizeof(v.overlap_buf));
- v.fractional_pos =
- 0.0f; // Initialize fractional position for tempo scaling
+ v.fractional_pos = 0.0f;
v.start_sample_offset = start_offset_samples;
v.active_spectral_data =
g_synth_data.active_spectrogram_data[spectrogram_id];