diff options
| author | skal <pascal.massimino@gmail.com> | 2026-05-14 19:09:39 +0200 |
|---|---|---|
| committer | skal <pascal.massimino@gmail.com> | 2026-05-14 19:11:28 +0200 |
| commit | 6ef8f578817ee0134fd5867ca3b80590e3eb2368 (patch) | |
| tree | 5550607e5c4a16ca237bfa4430ac1ef1f5d80c5d /doc | |
| parent | 4bcbe13dab5ffb64d93cc61956f07ee5168a84c9 (diff) | |
ans: order-0 rANS coder + WGSL asset compression
Adds src/util/ans.{h,cc}, a per-chunk-adaptive order-0 rANS entropy
coder. Decoder is always built; encoder is gated on ANS_ENABLE_ENCODER
(tools only). Both sides take an optional 256-entry initial_counts
table to seed the adaptive model.
The per-chunk initial state is (1 << kBits). Higher initial states
(e.g. with a signature packed into the upper bits) force a renorm-emit
at iter 0 that the decoder never consumes, corrupting multi-chunk
streams once stats become skewed.
Asset pipeline:
- AssetRecord gains 'compression' and 'uncompressed_size' fields.
- asset_packer scans every WGSL file to build a corpus-wide byte
histogram, then ANS-encodes each shader using that histogram as the
seed. Histogram and accessor are emitted alongside the asset table.
Round-trip verification runs at pack time for every compressed
asset; failures fall back to uncompressed storage.
- asset_manager decompresses on first GetAsset(), caches the
heap-allocated buffer, and DropAsset / ReloadAssetsFromFile free it
along with the procedural cache.
- Disk-load (dev) builds are unchanged: WGSL paths stay as filenames.
Tests:
- src/tests/util/test_ans.cc: roundtrip variants (empty, single byte,
single-symbol run, all-zeros, random uniform/skewed, repeated ASCII),
seeded-vs-uniform compression, rejection of mismatched counts /
corruption / truncation, PeekUncompressedSize.
- 37/37 dev, 36/36 STRIP_ALL.
Compression observed: WGSL shaders shrink to ~0.62-0.71x in the main
workspace (81 of 105 assets qualify).
Docs:
- doc/ANS.md (new): algorithm, bitstream, API, asset pipeline
integration, compression numbers, limitations, tests.
- doc/ASSET_SYSTEM.md: new Compression section + updated technical
guarantees for compressed assets.
- doc/COMPLETED.md: May 2026 entry.
- PROJECT_CONTEXT.md: Build status line mentions WGSL ANS compression.
- CLAUDE.md, GEMINI.md: tier-3 build doc list includes ANS.md.
Diffstat (limited to 'doc')
| -rw-r--r-- | doc/ANS.md | 166 | ||||
| -rw-r--r-- | doc/ASSET_SYSTEM.md | 14 | ||||
| -rw-r--r-- | doc/COMPLETED.md | 4 |
3 files changed, 182 insertions, 2 deletions
diff --git a/doc/ANS.md b/doc/ANS.md new file mode 100644 index 0000000..c93bf82 --- /dev/null +++ b/doc/ANS.md @@ -0,0 +1,166 @@ +# ANS Compression + +Order-0 rANS entropy coder used to compress shader assets at build time and +decompress them on first access at runtime. + +**Source:** `src/util/ans.{h,cc}`. + +--- + +## Algorithm + +Per-chunk adaptive order-0 byte coder. + +| Parameter | Value | +|------------------|----------------------------------------| +| Precision | 16 bits (`kBits = 16`) | +| State range | `[1 << 16, 1 << 32)` (`uint32_t`) | +| Renorm I/O width | 16 bits (big-endian) | +| Chunk size | 1024 bytes | +| Symbols | 256 (bytes) | +| Initial state | `1 << 16` (`kInitState`) | + +The encoder iterates each chunk in reverse, the decoder forward. Symbol +counts are mutated on the fly during encode/decode and re-normalized at +each chunk boundary so the cumulative table sums to `1 << 16`. + +The chunk-end state always equals `kInitState`; the decoder rejects the +stream if it doesn't. That single check catches both bit-level corruption +and decoder/encoder model divergence (e.g. wrong initial histogram). + +The per-chunk initial state must be exactly `1 << kBits`. A higher value +(e.g. with a "signature" packed into the upper bits) forces a renorm-emit +at iter 0 that the decoder never consumes — harmless on a single chunk, +but it corrupts any stream with two or more chunks once the per-chunk +stats become skewed. + +--- + +## Bitstream Format + +Big-endian throughout. + +``` +[u32 uncompressed_size] // 4 bytes, header +per chunk (uncompressed_size > 0): + [u32 final_state] // 4 bytes + [u16 emitted_words]* // variable, in stream order +``` + +Number of emitted words per chunk is implicit — the decoder pulls a word +whenever its state drops at or below `kMask = (1 << kBits) - 1`. + +--- + +## API + +```cpp +#include "util/ans.h" + +// Always built. +bool ans::Decode(const uint8_t* src, size_t src_size, + uint8_t* dst, size_t dst_capacity, + size_t* out_size, + const uint32_t* initial_counts = nullptr); + +uint32_t ans::PeekUncompressedSize(const uint8_t* src, size_t src_size); + +// Gated on ANS_ENABLE_ENCODER (tools only). +bool ans::Encode(const uint8_t* src, size_t size, + std::vector<uint8_t>* dst, + const uint32_t* initial_counts = nullptr); + +void ans::Histogram(const uint8_t* src, size_t size, uint32_t* out_counts); +``` + +`initial_counts` is a 256-entry table that seeds the adaptive model. Both +encoder and decoder must use the same seed — a mismatch trips the chunk-end +state check immediately. Pass `nullptr` for a uniform default (all-ones). + +--- + +## Asset Pipeline Integration + +`AssetRecord` carries two extra fields: + +```cpp +enum class AssetCompression : uint8_t { + NONE = 0, + ANS_ASCII = 1, // seeded from GetAnsAsciiHistogram() +}; + +struct AssetRecord { + ... + AssetCompression compression; + size_t uncompressed_size; // == size if compression == NONE +}; +``` + +### Build time (`tools/asset_packer.cc`) + +Embedded (non-disk-load) builds only: + +1. Scan every `WGSL` asset to build a corpus-wide 256-entry byte histogram. +2. Emit it as `static const uint32_t kAnsAsciiHistogram[256]` plus a + `GetAnsAsciiHistogram()` accessor in `assets_data.cc`. +3. For each `WGSL` asset, call `TryAnsCompress()`: + `ans::Encode(...)` → reject if it's not smaller than the raw input → + round-trip verify with `ans::Decode(...)` → only then mark the asset + `ANS_ASCII`. +4. Other asset types (SPEC, TEXTURE, MESH, BINARY, MP3, PROC*) pass + through uncompressed. + +Disk-load (dev) builds skip the encoder entirely: WGSL data is the file +path, never the file contents. + +### Runtime (`src/util/asset_manager.cc`) + +`GetAsset()` checks `compression` on a cache miss: + +- `NONE` → return the static pointer (or hit the existing PROC / disk-load + branch). +- `ANS_ASCII` → allocate `uncompressed_size + 1` bytes, + `ans::Decode(..., GetAnsAsciiHistogram())`, NUL-terminate, cache. + +`DropAsset()` and `ReloadAssetsFromFile()` free the heap-allocated buffer +when `compression != NONE`, alongside the existing procedural cleanup. + +--- + +## Observed Compression + +`workspaces/main`, STRIP_ALL build: WGSL shaders compress to **0.62×–0.71×** +their raw size (81 of 105 assets qualify). Round-trip verification runs +at pack time for every compressed asset; failures abort the build. + +--- + +## Limitations + +The encoder returns `false` if it cannot produce a final state above +`kMask` for some chunk. With the corpus-derived ASCII histogram this never +trips on the demo's WGSL corpus, but inputs with a near-monolithic byte +distribution can fail. Such assets fall back to uncompressed storage. + +--- + +## Tests + +`src/tests/util/test_ans.cc` (run via `make run_util_tests` or +`./build/test_ans`): + +- Roundtrip variants: empty, single byte, single-symbol run, all-zeros, + random uniform, random skewed, repeated ASCII. +- Seeded-vs-uniform: a corpus-matched histogram compresses at least as + well as a uniform seed. +- Rejection: mismatched seed model, payload bit-flip, truncated stream. +- `PeekUncompressedSize` returns the header value. + +--- + +## See Also + +- `doc/ASSET_SYSTEM.md` — overall asset pipeline. +- `src/util/ans.h` — public API. +- `tools/asset_packer.cc` — corpus scan and per-asset compression. +- `src/util/asset_manager.cc` — runtime decompression. diff --git a/doc/ASSET_SYSTEM.md b/doc/ASSET_SYSTEM.md index a97886c..415342d 100644 --- a/doc/ASSET_SYSTEM.md +++ b/doc/ASSET_SYSTEM.md @@ -60,6 +60,16 @@ enum class AssetType : uint8_t { }; ``` +## Compression + +Each `AssetRecord` carries an `AssetCompression` flag and an +`uncompressed_size`. In **Release Mode** the asset packer runs an order-0 +rANS coder over every `WGSL` asset, seeded with a histogram derived from +the full shader corpus. `AssetManager::GetAsset()` decompresses lazily on +first access and caches the result. Other asset types and **Development +Mode** are unaffected. See `doc/ANS.md` for the algorithm, bitstream, and +runtime API. + Query at runtime: ```cpp if (GetAssetType(AssetId::NEVER_MP3) == AssetType::MP3) { ... } @@ -93,8 +103,8 @@ Tool: `tools/asset_packer.cc` ## Technical Guarantees - **Alignment**: All embedded data arrays are declared `alignas(16)` for safe `reinterpret_cast`. -- **String Safety**: Embedded assets are null-terminated (safe as C-strings). In disk-load mode, the path itself is a null-terminated C-string. -- **Size**: For embedded assets, `size` reflects the original file size (the buffer is `size + 1`). For disk-loaded assets, it reflects the file path's string length. +- **String Safety**: Embedded assets are null-terminated (safe as C-strings). In disk-load mode, the path itself is a null-terminated C-string. ANS-compressed assets are NUL-terminated by the decompressor on first access. +- **Size**: For uncompressed embedded assets, `size` is the original file size (the buffer is `size + 1`). For disk-loaded assets, it is the file path's string length. For ANS-compressed assets, `size` is the *compressed* byte count; query `uncompressed_size` for the decoded length. ## Developer Workflow diff --git a/doc/COMPLETED.md b/doc/COMPLETED.md index 233373e..bf4c3ba 100644 --- a/doc/COMPLETED.md +++ b/doc/COMPLETED.md @@ -34,6 +34,10 @@ Completed task archive. See `doc/archive/` for detailed historical documents. --- +## May 2026 + +- [x] **ANS shader compression (2026-05-14)** — Order-0 rANS coder in `src/util/ans.{h,cc}` (decoder always built; encoder gated on `ANS_ENABLE_ENCODER`). `asset_packer` derives a corpus-wide byte histogram from every WGSL file, ANS-encodes each shader with that seed, and round-trip-verifies at pack time. `AssetRecord` gains `compression` + `uncompressed_size`; `asset_manager` decompresses lazily on first `GetAsset()` and frees in `DropAsset`/`ReloadAssetsFromFile`. WGSL assets shrink to ~0.62–0.71× in `workspaces/main` (81/105). See `doc/ANS.md`. Tests: 37/37 dev, 36/36 STRIP_ALL. + ## March 2026 (continued) - [x] **FFT twiddle factor fix** — `fft_radix2` computes `wr/wi` directly per k via `cosf/sinf(angle*k)`. Tests A–E added to `test_fft.cc`. Tolerance reverted to 5e-3. |
