diff options
| author | skal <pascal.massimino@gmail.com> | 2026-02-06 18:08:06 +0100 |
|---|---|---|
| committer | skal <pascal.massimino@gmail.com> | 2026-02-06 18:08:06 +0100 |
| commit | 42390a8a28377cd25021b1647abf9dbd43d4e2c8 (patch) | |
| tree | 174f10bc635754b20764e764f1b9786f50f01f63 /src/audio/gen.cc | |
| parent | 8aba6d94871315eac0153134a6c740344964d31f (diff) | |
fix(audio): Fix spectrogram amplification issue and add diagnostic tool
## Root Cause
.spec files were NOT regenerated after orthonormal DCT changes (commit d9e0da9).
They contained spectrograms from old non-orthonormal DCT (16x larger values),
but were played back with new orthonormal IDCT.
Result: 16x amplification → Peaks of 12-17x → Severe clipping/distortion
## Diagnosis Tool
Created specplay tool to analyze and play .spec/.wav files:
- Reports PCM peak and RMS values
- Detects clipping during playback
- Usage: ./build/specplay <file.spec|file.wav>
## Fixes
1. Revert accidental window.h include in synth.cc (keep no-window state)
2. Adjust gen.cc scaling from 16x to 6.4x (16/2.5) for procedural notes
3. Regenerated ALL .spec files with ./scripts/gen_spectrograms.sh
## Verified Results
Before: Peak=16.571 (KICK_3), 12.902 (SNARE_2), 14.383 (SNARE_3)
After: Peak=0.787 (BASS_GUITAR_FEEL), 0.759 (SNARE_909), 0.403 (KICK_606)
All peaks now < 1.0 (safe range)
Diffstat (limited to 'src/audio/gen.cc')
| -rw-r--r-- | src/audio/gen.cc | 9 |
1 files changed, 8 insertions, 1 deletions
diff --git a/src/audio/gen.cc b/src/audio/gen.cc index 5604457..74b468c 100644 --- a/src/audio/gen.cc +++ b/src/audio/gen.cc @@ -72,7 +72,14 @@ std::vector<float> generate_note_spectrogram(const NoteParams& params, // Scale up to compensate for orthonormal normalization // Old non-orthonormal DCT had no sqrt scaling, so output was ~sqrt(N/2) larger // Scale factor: sqrt(DCT_SIZE / 2) = sqrt(256) = 16 - const float scale_factor = sqrtf(DCT_SIZE / 2.0f); + // + // HOWEVER: After removing synthesis windowing (commit f998bfc), audio is louder. + // The old synthesis incorrectly applied Hamming window to spectrum (reducing energy by 0.63x). + // New synthesis is correct (no window), but procedural notes with 16x scaling are too loud. + // + // Analysis applies Hamming window (0.63x energy). With 16x scaling: 0.63 × 16 ≈ 10x. + // Divide by 2.5 to match the relative loudness increase: 16 / 2.5 = 6.4 + const float scale_factor = sqrtf(DCT_SIZE / 2.0f) / 2.5f; // Copy to buffer with scaling for (int i = 0; i < DCT_SIZE; ++i) { |
