diff options
| author | skal <pascal.massimino@gmail.com> | 2026-03-05 10:03:32 +0100 |
|---|---|---|
| committer | skal <pascal.massimino@gmail.com> | 2026-03-05 10:03:32 +0100 |
| commit | e2c3c3e95b6a9e53b4631b271640bb9914f8c95e (patch) | |
| tree | a0e52468bdfe53bf896d8a86fc5b147ac8afe5f3 /tools | |
| parent | f48562060413634b13706c3ffd01180da98b6049 (diff) | |
fix(audio): OLA encoder never ran; version never propagated to decoder
Two bugs kept the v2 OLA path permanently disabled:
1. SpectrogramResourceManager::load_asset() never set spec.version from
SpecHeader::version — all .spec assets loaded with version=0, so
ola_mode was always false in the voice.
2. spectool analyze_audio() used non-overlapping chunks (stride=DCT_SIZE),
hamming_window_512, and hardcoded header.version=1 — OLA analysis was
never implemented in the encoder.
Fixes: propagate header->version in load_asset(); switch spectool to
OLA_HOP_SIZE stride, hann_window_512, and SPEC_VERSION_V2_OLA.
Regenerated all .spec files.
handoff(Gemini): OLA enc/dec chain now correct end-to-end. .spec files
are v2 (50% overlap, Hann). No API changes; 33/34 tests pass
(WavDumpBackendTest pre-existing failure unrelated).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Diffstat (limited to 'tools')
| -rw-r--r-- | tools/spectool.cc | 15 |
1 files changed, 9 insertions, 6 deletions
diff --git a/tools/spectool.cc b/tools/spectool.cc index c3aebb2..70bcae2 100644 --- a/tools/spectool.cc +++ b/tools/spectool.cc @@ -110,15 +110,18 @@ int analyze_audio(const char* in_path, const char* out_path, bool normalize, } } - // Second pass: Windowing + DCT + // Second pass: Windowing + DCT (OLA v2: Hann window, 50% overlap) std::vector<float> spec_data; float window[WINDOW_SIZE]; - hamming_window_512(window); + hann_window_512(window); - // Process PCM data in DCT_SIZE chunks - const size_t num_chunks = (pcm_data.size() + DCT_SIZE - 1) / DCT_SIZE; + // Process PCM data with OLA_HOP_SIZE stride (50% overlap) + const size_t hop = OLA_HOP_SIZE; + const size_t num_chunks = (pcm_data.size() > DCT_SIZE) + ? (pcm_data.size() - DCT_SIZE) / hop + 1 + : 1; for (size_t chunk_idx = 0; chunk_idx < num_chunks; ++chunk_idx) { - const size_t chunk_start = chunk_idx * DCT_SIZE; + const size_t chunk_start = chunk_idx * hop; const size_t chunk_end = (chunk_start + DCT_SIZE < pcm_data.size()) ? chunk_start + DCT_SIZE : pcm_data.size(); @@ -202,7 +205,7 @@ int analyze_audio(const char* in_path, const char* out_path, bool normalize, SpecHeader header; memcpy(header.magic, "SPEC", 4); - header.version = 1; + header.version = SPEC_VERSION_V2_OLA; header.dct_size = DCT_SIZE; header.num_frames = trimmed_data.size() / DCT_SIZE; |
