diff options
| author | skal <pascal.massimino@gmail.com> | 2026-05-14 19:09:39 +0200 |
|---|---|---|
| committer | skal <pascal.massimino@gmail.com> | 2026-05-14 19:11:28 +0200 |
| commit | 6ef8f578817ee0134fd5867ca3b80590e3eb2368 (patch) | |
| tree | 5550607e5c4a16ca237bfa4430ac1ef1f5d80c5d /doc/ANS.md | |
| parent | 4bcbe13dab5ffb64d93cc61956f07ee5168a84c9 (diff) | |
ans: order-0 rANS coder + WGSL asset compression
Adds src/util/ans.{h,cc}, a per-chunk-adaptive order-0 rANS entropy
coder. Decoder is always built; encoder is gated on ANS_ENABLE_ENCODER
(tools only). Both sides take an optional 256-entry initial_counts
table to seed the adaptive model.
The per-chunk initial state is (1 << kBits). Higher initial states
(e.g. with a signature packed into the upper bits) force a renorm-emit
at iter 0 that the decoder never consumes, corrupting multi-chunk
streams once stats become skewed.
Asset pipeline:
- AssetRecord gains 'compression' and 'uncompressed_size' fields.
- asset_packer scans every WGSL file to build a corpus-wide byte
histogram, then ANS-encodes each shader using that histogram as the
seed. Histogram and accessor are emitted alongside the asset table.
Round-trip verification runs at pack time for every compressed
asset; failures fall back to uncompressed storage.
- asset_manager decompresses on first GetAsset(), caches the
heap-allocated buffer, and DropAsset / ReloadAssetsFromFile free it
along with the procedural cache.
- Disk-load (dev) builds are unchanged: WGSL paths stay as filenames.
Tests:
- src/tests/util/test_ans.cc: roundtrip variants (empty, single byte,
single-symbol run, all-zeros, random uniform/skewed, repeated ASCII),
seeded-vs-uniform compression, rejection of mismatched counts /
corruption / truncation, PeekUncompressedSize.
- 37/37 dev, 36/36 STRIP_ALL.
Compression observed: WGSL shaders shrink to ~0.62-0.71x in the main
workspace (81 of 105 assets qualify).
Docs:
- doc/ANS.md (new): algorithm, bitstream, API, asset pipeline
integration, compression numbers, limitations, tests.
- doc/ASSET_SYSTEM.md: new Compression section + updated technical
guarantees for compressed assets.
- doc/COMPLETED.md: May 2026 entry.
- PROJECT_CONTEXT.md: Build status line mentions WGSL ANS compression.
- CLAUDE.md, GEMINI.md: tier-3 build doc list includes ANS.md.
Diffstat (limited to 'doc/ANS.md')
| -rw-r--r-- | doc/ANS.md | 166 |
1 files changed, 166 insertions, 0 deletions
diff --git a/doc/ANS.md b/doc/ANS.md new file mode 100644 index 0000000..c93bf82 --- /dev/null +++ b/doc/ANS.md @@ -0,0 +1,166 @@ +# ANS Compression + +Order-0 rANS entropy coder used to compress shader assets at build time and +decompress them on first access at runtime. + +**Source:** `src/util/ans.{h,cc}`. + +--- + +## Algorithm + +Per-chunk adaptive order-0 byte coder. + +| Parameter | Value | +|------------------|----------------------------------------| +| Precision | 16 bits (`kBits = 16`) | +| State range | `[1 << 16, 1 << 32)` (`uint32_t`) | +| Renorm I/O width | 16 bits (big-endian) | +| Chunk size | 1024 bytes | +| Symbols | 256 (bytes) | +| Initial state | `1 << 16` (`kInitState`) | + +The encoder iterates each chunk in reverse, the decoder forward. Symbol +counts are mutated on the fly during encode/decode and re-normalized at +each chunk boundary so the cumulative table sums to `1 << 16`. + +The chunk-end state always equals `kInitState`; the decoder rejects the +stream if it doesn't. That single check catches both bit-level corruption +and decoder/encoder model divergence (e.g. wrong initial histogram). + +The per-chunk initial state must be exactly `1 << kBits`. A higher value +(e.g. with a "signature" packed into the upper bits) forces a renorm-emit +at iter 0 that the decoder never consumes — harmless on a single chunk, +but it corrupts any stream with two or more chunks once the per-chunk +stats become skewed. + +--- + +## Bitstream Format + +Big-endian throughout. + +``` +[u32 uncompressed_size] // 4 bytes, header +per chunk (uncompressed_size > 0): + [u32 final_state] // 4 bytes + [u16 emitted_words]* // variable, in stream order +``` + +Number of emitted words per chunk is implicit — the decoder pulls a word +whenever its state drops at or below `kMask = (1 << kBits) - 1`. + +--- + +## API + +```cpp +#include "util/ans.h" + +// Always built. +bool ans::Decode(const uint8_t* src, size_t src_size, + uint8_t* dst, size_t dst_capacity, + size_t* out_size, + const uint32_t* initial_counts = nullptr); + +uint32_t ans::PeekUncompressedSize(const uint8_t* src, size_t src_size); + +// Gated on ANS_ENABLE_ENCODER (tools only). +bool ans::Encode(const uint8_t* src, size_t size, + std::vector<uint8_t>* dst, + const uint32_t* initial_counts = nullptr); + +void ans::Histogram(const uint8_t* src, size_t size, uint32_t* out_counts); +``` + +`initial_counts` is a 256-entry table that seeds the adaptive model. Both +encoder and decoder must use the same seed — a mismatch trips the chunk-end +state check immediately. Pass `nullptr` for a uniform default (all-ones). + +--- + +## Asset Pipeline Integration + +`AssetRecord` carries two extra fields: + +```cpp +enum class AssetCompression : uint8_t { + NONE = 0, + ANS_ASCII = 1, // seeded from GetAnsAsciiHistogram() +}; + +struct AssetRecord { + ... + AssetCompression compression; + size_t uncompressed_size; // == size if compression == NONE +}; +``` + +### Build time (`tools/asset_packer.cc`) + +Embedded (non-disk-load) builds only: + +1. Scan every `WGSL` asset to build a corpus-wide 256-entry byte histogram. +2. Emit it as `static const uint32_t kAnsAsciiHistogram[256]` plus a + `GetAnsAsciiHistogram()` accessor in `assets_data.cc`. +3. For each `WGSL` asset, call `TryAnsCompress()`: + `ans::Encode(...)` → reject if it's not smaller than the raw input → + round-trip verify with `ans::Decode(...)` → only then mark the asset + `ANS_ASCII`. +4. Other asset types (SPEC, TEXTURE, MESH, BINARY, MP3, PROC*) pass + through uncompressed. + +Disk-load (dev) builds skip the encoder entirely: WGSL data is the file +path, never the file contents. + +### Runtime (`src/util/asset_manager.cc`) + +`GetAsset()` checks `compression` on a cache miss: + +- `NONE` → return the static pointer (or hit the existing PROC / disk-load + branch). +- `ANS_ASCII` → allocate `uncompressed_size + 1` bytes, + `ans::Decode(..., GetAnsAsciiHistogram())`, NUL-terminate, cache. + +`DropAsset()` and `ReloadAssetsFromFile()` free the heap-allocated buffer +when `compression != NONE`, alongside the existing procedural cleanup. + +--- + +## Observed Compression + +`workspaces/main`, STRIP_ALL build: WGSL shaders compress to **0.62×–0.71×** +their raw size (81 of 105 assets qualify). Round-trip verification runs +at pack time for every compressed asset; failures abort the build. + +--- + +## Limitations + +The encoder returns `false` if it cannot produce a final state above +`kMask` for some chunk. With the corpus-derived ASCII histogram this never +trips on the demo's WGSL corpus, but inputs with a near-monolithic byte +distribution can fail. Such assets fall back to uncompressed storage. + +--- + +## Tests + +`src/tests/util/test_ans.cc` (run via `make run_util_tests` or +`./build/test_ans`): + +- Roundtrip variants: empty, single byte, single-symbol run, all-zeros, + random uniform, random skewed, repeated ASCII. +- Seeded-vs-uniform: a corpus-matched histogram compresses at least as + well as a uniform seed. +- Rejection: mismatched seed model, payload bit-flip, truncated stream. +- `PeekUncompressedSize` returns the header value. + +--- + +## See Also + +- `doc/ASSET_SYSTEM.md` — overall asset pipeline. +- `src/util/ans.h` — public API. +- `tools/asset_packer.cc` — corpus scan and per-asset compression. +- `src/util/asset_manager.cc` — runtime decompression. |
