diff options
| -rw-r--r-- | PROJECT_CONTEXT.md | 4 | ||||
| -rw-r--r-- | TODO.md | 7 | ||||
| -rw-r--r-- | cmake/DemoSourceLists.cmake | 1 | ||||
| -rw-r--r-- | doc/COMPLETED.md | 6 | ||||
| -rw-r--r-- | src/audio/ola.cc | 34 | ||||
| -rw-r--r-- | src/audio/ola.h | 21 | ||||
| -rw-r--r-- | src/audio/synth.cc | 10 | ||||
| -rw-r--r-- | src/tests/audio/test_wav_roundtrip.cc | 60 | ||||
| -rw-r--r-- | tools/spectool.cc | 66 |
9 files changed, 88 insertions, 121 deletions
diff --git a/PROJECT_CONTEXT.md b/PROJECT_CONTEXT.md index d9e37e8..690b589 100644 --- a/PROJECT_CONTEXT.md +++ b/PROJECT_CONTEXT.md @@ -12,7 +12,7 @@ ## Audio - 32 kHz, 16-bit stereo - Procedurally generated samples -- Real-time additive synthesis from spectrograms (OLA-IDCT, Hann window, 50% overlap) +- Real-time additive synthesis from spectrograms (OLA-IDCT, Hann analysis window, 50% overlap, rectangular synthesis) - Variable tempo system with music time abstraction - Event-based pattern triggering for dynamic tempo scaling - Modifiable Loops and Patterns, w/ script to generate them (like a Tracker) @@ -33,7 +33,7 @@ - **Timing System:** **Beat-based timelines** for musical synchronization. Sequences defined in beats, converted to seconds at runtime. Effects receive both physical time (constant) and beat time (musical). Variable tempo affects audio only. See `doc/BEAT_TIMING.md`. - **Workspace system:** Multi-workspace support. Easy switching with `-DDEMO_WORKSPACE=<name>`. Organized structure: `music/`, `weights/`, `obj/`, `shaders/`. Shared common shaders in `src/shaders/`. See `doc/WORKSPACE_SYSTEM.md`. -- **Audio:** Sample-accurate sync. Zero heap allocations per frame. Variable tempo. OLA-IDCT synthesis (v2 .spec): Hann window, 50% overlap, click-free. V1 (raw DCT-512) preserved for generated notes. .spec files regenerated as v2. +- **Audio:** Sample-accurate sync. Zero heap allocations per frame. Variable tempo. OLA-IDCT synthesis (v2 .spec): Hann analysis window, rectangular synthesis, 50% overlap, click-free. V1 (raw DCT-512) preserved for generated notes. .spec files regenerated as v2. - **Shaders:** Parameterized effects (UniformHelper, .seq syntax). Beat-synchronized animation support (`beat_time`, `beat_phase`). Modular WGSL composition with ShaderComposer. 24 shared common shaders (math, render, compute). - **3D:** Hybrid SDF/rasterization with BVH. Binary scene loader. Blender pipeline. - **Effects:** CNN post-processing: CNNEffect (v1) and CNNv2Effect operational. CNN v2: sigmoid activation, storage buffer weights (~3.2 KB), 7D static features, dynamic layers. Training stable, convergence validated. @@ -36,7 +36,12 @@ Reduce weights from f16 (~3.2 KB) to i8 (~1.6 KB). ## Priority 4: Audio System Enhancements [LOW PRIORITY] -1. **GPU-Accelerated PCM Synthesis:** +1. **`synth.cc`: use `ola_decode()` from `src/audio/ola.h`** — the OLA decode logic in + `synth_render()` is currently inlined for frame-by-frame lazy decoding. Refactor to + call `ola_decode()` for consistency with `spectool` and the test (requires decoupling + the per-frame lazy path, e.g. decode a full block on demand then serve samples). + +2. **GPU-Accelerated PCM Synthesis:** - Compute shader for direct PCM generation (bypass spectrogram) - Write to compute buffer, readback to synth diff --git a/cmake/DemoSourceLists.cmake b/cmake/DemoSourceLists.cmake index b31c482..0c57ada 100644 --- a/cmake/DemoSourceLists.cmake +++ b/cmake/DemoSourceLists.cmake @@ -11,6 +11,7 @@ set(AUDIO_SOURCES src/audio/gen.cc src/audio/fdct.cc src/audio/idct.cc + src/audio/ola.cc src/audio/fft.cc src/audio/window.cc src/audio/synth.cc diff --git a/doc/COMPLETED.md b/doc/COMPLETED.md index 9b9f549..d72591f 100644 --- a/doc/COMPLETED.md +++ b/doc/COMPLETED.md @@ -33,9 +33,11 @@ Use `read @doc/archive/FILENAME.md` to access archived documents. - [x] **OLA-IDCT Synthesis — click-free .spec decoding** - **Goal**: Eliminate frame-boundary clicks in spectrogram→PCM synthesis. - - **Implementation**: Added v2 spectrogram format (`SPEC_VERSION_V2_OLA`). Synthesis uses Hann-windowed IDCT with 50% overlap-add (hop=256, overlap=256). Per-voice `overlap_buf[256]` accumulates the tail from the previous IDCT frame. V1 path (raw DCT-512) preserved for generated notes. Hann window precomputed at `synth_init()`. `SpecHeader.version` field propagated through `Spectrogram.version` to the voice's `ola_mode` flag at trigger time. + - **Implementation**: Added v2 spectrogram format (`SPEC_VERSION_V2_OLA`). Analysis uses Hann window (FDCT), synthesis uses IDCT-OLA with rectangular synthesis window (no synthesis window — Hann at 50% overlap satisfies w[n]+w[n+H]=1, giving perfect reconstruction). Hop=256, overlap=256. Per-voice `overlap_buf[256]` accumulates the tail from the previous IDCT frame. V1 path (raw DCT-512) preserved for generated notes. `SpecHeader.version` field propagated through `Spectrogram.version` to the voice's `ola_mode` flag at trigger time. - **Bug fix (2026-03-05)**: Two bugs prevented OLA from ever activating: (1) `SpectrogramResourceManager::load_asset()` never set `resource->spec.version` from `header->version` — all loaded assets got `version=0`, OLA path silently skipped. (2) `spectool analyze_audio()` used non-overlapping frames (`chunk_idx * DCT_SIZE` stride), `hamming_window_512`, and hardcoded `header.version = 1` — the OLA encoder was never implemented. Fixed both; `.spec` files regenerated. - - **Files**: `src/audio/spectrogram_resource_manager.cc`, `tools/spectool.cc`, `workspaces/main/music/*.spec` + - **Bug fix (2026-03-05)**: `synth.cc` incorrectly applied a Hann synthesis window post-IDCT (`tmp[j] *= g_hann[j]`), effectively squaring the window and preventing perfect reconstruction. Removed synthesis window and dead `g_hann` array. `spectool` and the test were already correct. + - **Refactor (2026-03-05)**: Extracted `ola_encode()` / `ola_decode()` / `ola_num_frames()` into `src/audio/ola.h` + `ola.cc`. `spectool` and `test_wav_roundtrip` now use the shared functions; `synth.cc` inline path unchanged (lazy frame decode; see TODO). + - **Files**: `src/audio/spectrogram_resource_manager.cc`, `tools/spectool.cc`, `src/audio/synth.cc`, `src/audio/ola.h`, `src/audio/ola.cc`, `workspaces/main/music/*.spec` - **Tests**: 34/34 passing ## Recently Completed (February 21, 2026) diff --git a/src/audio/ola.cc b/src/audio/ola.cc new file mode 100644 index 0000000..738df85 --- /dev/null +++ b/src/audio/ola.cc @@ -0,0 +1,34 @@ +// This file is part of the 64k demo project. +// Implements batch OLA encode/decode shared by spectool and tests. +// See ola.h for API documentation. + +#include "audio/ola.h" +#include "audio/window.h" +#include <string.h> + +void ola_encode(const float* pcm, int n_samples, float* spec, int num_frames) { + float win[DCT_SIZE]; + hann_window_512(win); + float chunk[DCT_SIZE]; + for (int f = 0; f < num_frames; ++f) { + const int start = f * OLA_HOP_SIZE; + const int avail = + (start + DCT_SIZE <= n_samples) ? DCT_SIZE : n_samples - start; + for (int i = 0; i < avail; ++i) + chunk[i] = pcm[start + i] * win[i]; + memset(chunk + avail, 0, (DCT_SIZE - avail) * sizeof(float)); + fdct_512(chunk, spec + f * DCT_SIZE); + } +} + +void ola_decode(const float* spec, int num_frames, float* pcm) { + float overlap[OLA_OVERLAP] = {}; + float tmp[DCT_SIZE]; + for (int f = 0; f < num_frames; ++f) { + idct_512(spec + f * DCT_SIZE, tmp); + for (int j = 0; j < OLA_HOP_SIZE; ++j) + pcm[f * OLA_HOP_SIZE + j] = tmp[j] + overlap[j]; + for (int j = 0; j < OLA_OVERLAP; ++j) + overlap[j] = tmp[OLA_HOP_SIZE + j]; + } +} diff --git a/src/audio/ola.h b/src/audio/ola.h new file mode 100644 index 0000000..3dbc368 --- /dev/null +++ b/src/audio/ola.h @@ -0,0 +1,21 @@ +// This file is part of the 64k demo project. +// Shared OLA encode/decode helpers (Hann analysis, rectangular synthesis). +// Used by spectool, tests, and any batch PCM<->spec conversion. + +#pragma once +#include "audio/dct.h" + +// Returns number of OLA frames for n_samples PCM input. +static inline int ola_num_frames(int n_samples) { + return (n_samples > DCT_SIZE) ? (n_samples - DCT_SIZE) / OLA_HOP_SIZE + 1 + : 1; +} + +// Hann-windowed FDCT with 50% overlap (analysis). +// spec must hold ola_num_frames(n_samples) * DCT_SIZE floats. +void ola_encode(const float* pcm, int n_samples, float* spec, int num_frames); + +// IDCT-OLA with rectangular synthesis window (no synthesis window). +// Hann at 50% overlap satisfies w[n]+w[n+H]=1 → perfect reconstruction. +// pcm must hold num_frames * OLA_HOP_SIZE floats. +void ola_decode(const float* spec, int num_frames, float* pcm); diff --git a/src/audio/synth.cc b/src/audio/synth.cc index a723404..3212e0b 100644 --- a/src/audio/synth.cc +++ b/src/audio/synth.cc @@ -4,9 +4,7 @@ #include "synth.h" #include "audio/dct.h" -#include "audio/window.h" #include "util/debug.h" -#include <atomic> #include <math.h> #include <stdio.h> // For printf #include <string.h> // For memset @@ -47,7 +45,6 @@ static Voice g_voices[MAX_VOICES]; static volatile float g_current_output_peak = 0.0f; // Global peak for visualization static float g_tempo_scale = 1.0f; // Playback speed multiplier -static float g_hann[DCT_SIZE]; // Hann window for OLA synthesis (v2) #if !defined(STRIP_ALL) static float g_elapsed_time_sec = 0.0f; // Tracks elapsed time for event hooks @@ -57,7 +54,6 @@ void synth_init() { memset(&g_synth_data, 0, sizeof(g_synth_data)); memset(g_voices, 0, sizeof(g_voices)); g_current_output_peak = 0.0f; - hann_window_512(g_hann); #if !defined(STRIP_ALL) g_elapsed_time_sec = 0.0f; #endif /* !defined(STRIP_ALL) */ @@ -266,11 +262,11 @@ void synth_render(float* output_buffer, int num_frames) { (v.current_spectral_frame * DCT_SIZE); if (v.ola_mode) { - // OLA-IDCT synthesis (v2): Hann window + overlap-add + // OLA-IDCT synthesis (v2): no synthesis window. + // Analysis used Hann; at 50% overlap w[n]+w[n+H]=1 so + // rectangular synthesis gives perfect reconstruction. float tmp[DCT_SIZE]; idct_512(spectral_frame, tmp); - for (int j = 0; j < DCT_SIZE; ++j) - tmp[j] *= g_hann[j]; // Add saved overlap from previous frame for (int j = 0; j < OLA_OVERLAP; ++j) tmp[j] += v.overlap_buf[j]; diff --git a/src/tests/audio/test_wav_roundtrip.cc b/src/tests/audio/test_wav_roundtrip.cc index 6294d6d..79de6ad 100644 --- a/src/tests/audio/test_wav_roundtrip.cc +++ b/src/tests/audio/test_wav_roundtrip.cc @@ -1,9 +1,8 @@ // Tests the wav->spec->wav roundtrip SNR. -// Generates a sine wave, runs OLA-DCT analysis then IMDCT-OLA synthesis, +// Generates a sine wave, runs OLA encode then OLA decode, // and asserts the reconstruction SNR exceeds the threshold. -#include "audio/dct.h" -#include "audio/window.h" +#include "audio/ola.h" #include <assert.h> #include <cmath> #include <cstdio> @@ -12,49 +11,6 @@ static const int SAMPLE_RATE = 32000; static const float PI = 3.14159265358979323846f; -// Replicate analyze_audio OLA pass (Hann + FDCT, hop = OLA_HOP_SIZE) -static std::vector<float> ola_analyze(const std::vector<float>& pcm) { - float win[DCT_SIZE]; - hann_window_512(win); - - const int hop = OLA_HOP_SIZE; - const int n_pcm = (int)pcm.size(); - const int num_frames = (n_pcm > DCT_SIZE) ? (n_pcm - DCT_SIZE) / hop + 1 : 1; - - std::vector<float> spec(num_frames * DCT_SIZE); - float chunk[DCT_SIZE]; - - for (int f = 0; f < num_frames; ++f) { - const int start = f * hop; - const int avail = (start + DCT_SIZE <= n_pcm) ? DCT_SIZE : n_pcm - start; - for (int i = 0; i < avail; ++i) chunk[i] = pcm[start + i] * win[i]; - for (int i = avail; i < DCT_SIZE; ++i) chunk[i] = 0.0f; - - fdct_512(chunk, spec.data() + f * DCT_SIZE); - } - return spec; -} - -// IDCT + OLA synthesis (no synthesis window) matching decode_to_wav. -// Analysis used Hann; since Hann satisfies w[n]+w[n+H]=1 at 50% overlap, -// skipping the synthesis window gives perfect reconstruction. -static std::vector<float> ola_decode(const std::vector<float>& spec, - int num_frames) { - std::vector<float> pcm(num_frames * OLA_HOP_SIZE + OLA_OVERLAP, 0.0f); - float overlap[OLA_OVERLAP] = {}; - float tmp[DCT_SIZE]; - - for (int f = 0; f < num_frames; ++f) { - idct_512(spec.data() + f * DCT_SIZE, tmp); - for (int j = 0; j < OLA_HOP_SIZE; ++j) - pcm[f * OLA_HOP_SIZE + j] = tmp[j] + overlap[j]; - for (int j = 0; j < OLA_OVERLAP; ++j) - overlap[j] = tmp[OLA_HOP_SIZE + j]; - } - pcm.resize(num_frames * OLA_HOP_SIZE); - return pcm; -} - static float compute_snr_db(const std::vector<float>& ref, const std::vector<float>& out, int skip_samples) { @@ -78,12 +34,14 @@ int main() { for (int i = 0; i < n_samples; ++i) input[i] = 0.5f * sinf(2.0f * PI * 440.0f * i / SAMPLE_RATE); - // Analyze - std::vector<float> spec = ola_analyze(input); - const int num_frames = (int)(spec.size() / DCT_SIZE); + // Encode + const int num_frames = ola_num_frames(n_samples); + std::vector<float> spec(num_frames * DCT_SIZE); + ola_encode(input.data(), n_samples, spec.data(), num_frames); - // Decode with IDCT-OLA (no synthesis window) - std::vector<float> output = ola_decode(spec, num_frames); + // Decode + std::vector<float> output(num_frames * OLA_HOP_SIZE); + ola_decode(spec.data(), num_frames, output.data()); // SNR — skip first DCT_SIZE samples (ramp-up transient) const float snr = compute_snr_db(input, output, DCT_SIZE); diff --git a/tools/spectool.cc b/tools/spectool.cc index a9d2bd1..93f8f9a 100644 --- a/tools/spectool.cc +++ b/tools/spectool.cc @@ -3,10 +3,9 @@ // Provides both 'analyze' and 'play' modes for spectral data. #include "audio/audio.h" -#include "audio/dct.h" +#include "audio/ola.h" #include "audio/gen.h" #include "audio/synth.h" -#include "audio/window.h" #include "platform/platform.h" #include <stdio.h> #include <string.h> @@ -110,47 +109,15 @@ int analyze_audio(const char* in_path, const char* out_path, bool normalize, } } - // Second pass: Windowing + DCT (OLA v2: Hann window, 50% overlap) - std::vector<float> spec_data; - float window[WINDOW_SIZE]; - hann_window_512(window); - - // Process PCM data with OLA_HOP_SIZE stride (50% overlap) - const size_t hop = OLA_HOP_SIZE; - const size_t num_chunks = (pcm_data.size() > DCT_SIZE) - ? (pcm_data.size() - DCT_SIZE) / hop + 1 - : 1; - for (size_t chunk_idx = 0; chunk_idx < num_chunks; ++chunk_idx) { - const size_t chunk_start = chunk_idx * hop; - const size_t chunk_end = (chunk_start + DCT_SIZE < pcm_data.size()) - ? chunk_start + DCT_SIZE - : pcm_data.size(); - const size_t chunk_size = chunk_end - chunk_start; - - // Copy chunk (with zero-padding if needed) - memcpy(pcm_chunk, pcm_data.data() + chunk_start, - chunk_size * sizeof(float)); - if (chunk_size < DCT_SIZE) { - memset(pcm_chunk + chunk_size, 0, - (DCT_SIZE - chunk_size) * sizeof(float)); - } - - // Apply window - for (int i = 0; i < DCT_SIZE; ++i) { - pcm_chunk[i] *= window[i]; - } - - // Apply FDCT - float dct_chunk[DCT_SIZE]; - fdct_512(pcm_chunk, dct_chunk); - - // Add to spectrogram data - spec_data.insert(spec_data.end(), dct_chunk, dct_chunk + DCT_SIZE); - } + // Second pass: OLA encode (Hann window, 50% overlap) + const int n_pcm = (int)pcm_data.size(); + const int num_frames_enc = ola_num_frames(n_pcm); + std::vector<float> spec_data(num_frames_enc * DCT_SIZE); + ola_encode(pcm_data.data(), n_pcm, spec_data.data(), num_frames_enc); // --- Trim Silent Frames --- const float epsilon = 1e-6f; - int num_frames = spec_data.size() / DCT_SIZE; + int num_frames = num_frames_enc; int first_frame = 0; int last_frame = num_frames; @@ -269,26 +236,9 @@ int decode_to_wav(const char* in_path, const char* out_path) { std::vector<float> pcm; if (ola_mode) { - // IDCT + OLA (no synthesis window). - // Analysis: Hann * FDCT. Since Hann at 50% overlap satisfies - // w[n] + w[n+HOP] = 1, a rectangular synthesis window gives - // perfect reconstruction: output[n] = IDCT(X_k)[j] + IDCT(X_{k-1})[j+HOP] - // = x[n]*w[j] + x[n]*w[j+HOP] = x[n]. const uint32_t total_samples = (uint32_t)header.num_frames * OLA_HOP_SIZE; - pcm.assign(total_samples + OLA_OVERLAP, 0.0f); - - float overlap[OLA_OVERLAP] = {}; - for (int f = 0; f < header.num_frames; ++f) { - float tmp[DCT_SIZE]; - idct_512(spec_data.data() + f * DCT_SIZE, tmp); - // First half: output samples for this frame - for (int j = 0; j < OLA_HOP_SIZE; ++j) - pcm[f * OLA_HOP_SIZE + j] = tmp[j] + overlap[j]; - // Second half: save as overlap for next frame - for (int j = 0; j < OLA_OVERLAP; ++j) - overlap[j] = tmp[OLA_HOP_SIZE + j]; - } pcm.resize(total_samples); + ola_decode(spec_data.data(), header.num_frames, pcm.data()); } else { const uint32_t total_samples = (uint32_t)header.num_frames * DCT_SIZE; pcm.resize(total_samples); |
