diff options
| author | skal <pascal.massimino@gmail.com> | 2026-05-14 19:09:39 +0200 |
|---|---|---|
| committer | skal <pascal.massimino@gmail.com> | 2026-05-14 19:11:28 +0200 |
| commit | 6ef8f578817ee0134fd5867ca3b80590e3eb2368 (patch) | |
| tree | 5550607e5c4a16ca237bfa4430ac1ef1f5d80c5d /doc/ASSET_SYSTEM.md | |
| parent | 4bcbe13dab5ffb64d93cc61956f07ee5168a84c9 (diff) | |
ans: order-0 rANS coder + WGSL asset compression
Adds src/util/ans.{h,cc}, a per-chunk-adaptive order-0 rANS entropy
coder. Decoder is always built; encoder is gated on ANS_ENABLE_ENCODER
(tools only). Both sides take an optional 256-entry initial_counts
table to seed the adaptive model.
The per-chunk initial state is (1 << kBits). Higher initial states
(e.g. with a signature packed into the upper bits) force a renorm-emit
at iter 0 that the decoder never consumes, corrupting multi-chunk
streams once stats become skewed.
Asset pipeline:
- AssetRecord gains 'compression' and 'uncompressed_size' fields.
- asset_packer scans every WGSL file to build a corpus-wide byte
histogram, then ANS-encodes each shader using that histogram as the
seed. Histogram and accessor are emitted alongside the asset table.
Round-trip verification runs at pack time for every compressed
asset; failures fall back to uncompressed storage.
- asset_manager decompresses on first GetAsset(), caches the
heap-allocated buffer, and DropAsset / ReloadAssetsFromFile free it
along with the procedural cache.
- Disk-load (dev) builds are unchanged: WGSL paths stay as filenames.
Tests:
- src/tests/util/test_ans.cc: roundtrip variants (empty, single byte,
single-symbol run, all-zeros, random uniform/skewed, repeated ASCII),
seeded-vs-uniform compression, rejection of mismatched counts /
corruption / truncation, PeekUncompressedSize.
- 37/37 dev, 36/36 STRIP_ALL.
Compression observed: WGSL shaders shrink to ~0.62-0.71x in the main
workspace (81 of 105 assets qualify).
Docs:
- doc/ANS.md (new): algorithm, bitstream, API, asset pipeline
integration, compression numbers, limitations, tests.
- doc/ASSET_SYSTEM.md: new Compression section + updated technical
guarantees for compressed assets.
- doc/COMPLETED.md: May 2026 entry.
- PROJECT_CONTEXT.md: Build status line mentions WGSL ANS compression.
- CLAUDE.md, GEMINI.md: tier-3 build doc list includes ANS.md.
Diffstat (limited to 'doc/ASSET_SYSTEM.md')
| -rw-r--r-- | doc/ASSET_SYSTEM.md | 14 |
1 files changed, 12 insertions, 2 deletions
diff --git a/doc/ASSET_SYSTEM.md b/doc/ASSET_SYSTEM.md index a97886c..415342d 100644 --- a/doc/ASSET_SYSTEM.md +++ b/doc/ASSET_SYSTEM.md @@ -60,6 +60,16 @@ enum class AssetType : uint8_t { }; ``` +## Compression + +Each `AssetRecord` carries an `AssetCompression` flag and an +`uncompressed_size`. In **Release Mode** the asset packer runs an order-0 +rANS coder over every `WGSL` asset, seeded with a histogram derived from +the full shader corpus. `AssetManager::GetAsset()` decompresses lazily on +first access and caches the result. Other asset types and **Development +Mode** are unaffected. See `doc/ANS.md` for the algorithm, bitstream, and +runtime API. + Query at runtime: ```cpp if (GetAssetType(AssetId::NEVER_MP3) == AssetType::MP3) { ... } @@ -93,8 +103,8 @@ Tool: `tools/asset_packer.cc` ## Technical Guarantees - **Alignment**: All embedded data arrays are declared `alignas(16)` for safe `reinterpret_cast`. -- **String Safety**: Embedded assets are null-terminated (safe as C-strings). In disk-load mode, the path itself is a null-terminated C-string. -- **Size**: For embedded assets, `size` reflects the original file size (the buffer is `size + 1`). For disk-loaded assets, it reflects the file path's string length. +- **String Safety**: Embedded assets are null-terminated (safe as C-strings). In disk-load mode, the path itself is a null-terminated C-string. ANS-compressed assets are NUL-terminated by the decompressor on first access. +- **Size**: For uncompressed embedded assets, `size` is the original file size (the buffer is `size + 1`). For disk-loaded assets, it is the file path's string length. For ANS-compressed assets, `size` is the *compressed* byte count; query `uncompressed_size` for the decoded length. ## Developer Workflow |
