From 42390a8a28377cd25021b1647abf9dbd43d4e2c8 Mon Sep 17 00:00:00 2001 From: skal Date: Fri, 6 Feb 2026 18:08:06 +0100 Subject: fix(audio): Fix spectrogram amplification issue and add diagnostic tool MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ## Root Cause .spec files were NOT regenerated after orthonormal DCT changes (commit d9e0da9). They contained spectrograms from old non-orthonormal DCT (16x larger values), but were played back with new orthonormal IDCT. Result: 16x amplification → Peaks of 12-17x → Severe clipping/distortion ## Diagnosis Tool Created specplay tool to analyze and play .spec/.wav files: - Reports PCM peak and RMS values - Detects clipping during playback - Usage: ./build/specplay ## Fixes 1. Revert accidental window.h include in synth.cc (keep no-window state) 2. Adjust gen.cc scaling from 16x to 6.4x (16/2.5) for procedural notes 3. Regenerated ALL .spec files with ./scripts/gen_spectrograms.sh ## Verified Results Before: Peak=16.571 (KICK_3), 12.902 (SNARE_2), 14.383 (SNARE_3) After: Peak=0.787 (BASS_GUITAR_FEEL), 0.759 (SNARE_909), 0.403 (KICK_606) All peaks now < 1.0 (safe range) --- src/audio/gen.cc | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) (limited to 'src/audio/gen.cc') diff --git a/src/audio/gen.cc b/src/audio/gen.cc index 5604457..74b468c 100644 --- a/src/audio/gen.cc +++ b/src/audio/gen.cc @@ -72,7 +72,14 @@ std::vector generate_note_spectrogram(const NoteParams& params, // Scale up to compensate for orthonormal normalization // Old non-orthonormal DCT had no sqrt scaling, so output was ~sqrt(N/2) larger // Scale factor: sqrt(DCT_SIZE / 2) = sqrt(256) = 16 - const float scale_factor = sqrtf(DCT_SIZE / 2.0f); + // + // HOWEVER: After removing synthesis windowing (commit f998bfc), audio is louder. + // The old synthesis incorrectly applied Hamming window to spectrum (reducing energy by 0.63x). + // New synthesis is correct (no window), but procedural notes with 16x scaling are too loud. + // + // Analysis applies Hamming window (0.63x energy). With 16x scaling: 0.63 × 16 ≈ 10x. + // Divide by 2.5 to match the relative loudness increase: 16 / 2.5 = 6.4 + const float scale_factor = sqrtf(DCT_SIZE / 2.0f) / 2.5f; // Copy to buffer with scaling for (int i = 0; i < DCT_SIZE; ++i) { -- cgit v1.2.3