summaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorskal <pascal.massimino@gmail.com>2026-05-14 19:09:39 +0200
committerskal <pascal.massimino@gmail.com>2026-05-14 19:11:28 +0200
commit6ef8f578817ee0134fd5867ca3b80590e3eb2368 (patch)
tree5550607e5c4a16ca237bfa4430ac1ef1f5d80c5d /doc
parent4bcbe13dab5ffb64d93cc61956f07ee5168a84c9 (diff)
ans: order-0 rANS coder + WGSL asset compression
Adds src/util/ans.{h,cc}, a per-chunk-adaptive order-0 rANS entropy coder. Decoder is always built; encoder is gated on ANS_ENABLE_ENCODER (tools only). Both sides take an optional 256-entry initial_counts table to seed the adaptive model. The per-chunk initial state is (1 << kBits). Higher initial states (e.g. with a signature packed into the upper bits) force a renorm-emit at iter 0 that the decoder never consumes, corrupting multi-chunk streams once stats become skewed. Asset pipeline: - AssetRecord gains 'compression' and 'uncompressed_size' fields. - asset_packer scans every WGSL file to build a corpus-wide byte histogram, then ANS-encodes each shader using that histogram as the seed. Histogram and accessor are emitted alongside the asset table. Round-trip verification runs at pack time for every compressed asset; failures fall back to uncompressed storage. - asset_manager decompresses on first GetAsset(), caches the heap-allocated buffer, and DropAsset / ReloadAssetsFromFile free it along with the procedural cache. - Disk-load (dev) builds are unchanged: WGSL paths stay as filenames. Tests: - src/tests/util/test_ans.cc: roundtrip variants (empty, single byte, single-symbol run, all-zeros, random uniform/skewed, repeated ASCII), seeded-vs-uniform compression, rejection of mismatched counts / corruption / truncation, PeekUncompressedSize. - 37/37 dev, 36/36 STRIP_ALL. Compression observed: WGSL shaders shrink to ~0.62-0.71x in the main workspace (81 of 105 assets qualify). Docs: - doc/ANS.md (new): algorithm, bitstream, API, asset pipeline integration, compression numbers, limitations, tests. - doc/ASSET_SYSTEM.md: new Compression section + updated technical guarantees for compressed assets. - doc/COMPLETED.md: May 2026 entry. - PROJECT_CONTEXT.md: Build status line mentions WGSL ANS compression. - CLAUDE.md, GEMINI.md: tier-3 build doc list includes ANS.md.
Diffstat (limited to 'doc')
-rw-r--r--doc/ANS.md166
-rw-r--r--doc/ASSET_SYSTEM.md14
-rw-r--r--doc/COMPLETED.md4
3 files changed, 182 insertions, 2 deletions
diff --git a/doc/ANS.md b/doc/ANS.md
new file mode 100644
index 0000000..c93bf82
--- /dev/null
+++ b/doc/ANS.md
@@ -0,0 +1,166 @@
+# ANS Compression
+
+Order-0 rANS entropy coder used to compress shader assets at build time and
+decompress them on first access at runtime.
+
+**Source:** `src/util/ans.{h,cc}`.
+
+---
+
+## Algorithm
+
+Per-chunk adaptive order-0 byte coder.
+
+| Parameter | Value |
+|------------------|----------------------------------------|
+| Precision | 16 bits (`kBits = 16`) |
+| State range | `[1 << 16, 1 << 32)` (`uint32_t`) |
+| Renorm I/O width | 16 bits (big-endian) |
+| Chunk size | 1024 bytes |
+| Symbols | 256 (bytes) |
+| Initial state | `1 << 16` (`kInitState`) |
+
+The encoder iterates each chunk in reverse, the decoder forward. Symbol
+counts are mutated on the fly during encode/decode and re-normalized at
+each chunk boundary so the cumulative table sums to `1 << 16`.
+
+The chunk-end state always equals `kInitState`; the decoder rejects the
+stream if it doesn't. That single check catches both bit-level corruption
+and decoder/encoder model divergence (e.g. wrong initial histogram).
+
+The per-chunk initial state must be exactly `1 << kBits`. A higher value
+(e.g. with a "signature" packed into the upper bits) forces a renorm-emit
+at iter 0 that the decoder never consumes — harmless on a single chunk,
+but it corrupts any stream with two or more chunks once the per-chunk
+stats become skewed.
+
+---
+
+## Bitstream Format
+
+Big-endian throughout.
+
+```
+[u32 uncompressed_size] // 4 bytes, header
+per chunk (uncompressed_size > 0):
+ [u32 final_state] // 4 bytes
+ [u16 emitted_words]* // variable, in stream order
+```
+
+Number of emitted words per chunk is implicit — the decoder pulls a word
+whenever its state drops at or below `kMask = (1 << kBits) - 1`.
+
+---
+
+## API
+
+```cpp
+#include "util/ans.h"
+
+// Always built.
+bool ans::Decode(const uint8_t* src, size_t src_size,
+ uint8_t* dst, size_t dst_capacity,
+ size_t* out_size,
+ const uint32_t* initial_counts = nullptr);
+
+uint32_t ans::PeekUncompressedSize(const uint8_t* src, size_t src_size);
+
+// Gated on ANS_ENABLE_ENCODER (tools only).
+bool ans::Encode(const uint8_t* src, size_t size,
+ std::vector<uint8_t>* dst,
+ const uint32_t* initial_counts = nullptr);
+
+void ans::Histogram(const uint8_t* src, size_t size, uint32_t* out_counts);
+```
+
+`initial_counts` is a 256-entry table that seeds the adaptive model. Both
+encoder and decoder must use the same seed — a mismatch trips the chunk-end
+state check immediately. Pass `nullptr` for a uniform default (all-ones).
+
+---
+
+## Asset Pipeline Integration
+
+`AssetRecord` carries two extra fields:
+
+```cpp
+enum class AssetCompression : uint8_t {
+ NONE = 0,
+ ANS_ASCII = 1, // seeded from GetAnsAsciiHistogram()
+};
+
+struct AssetRecord {
+ ...
+ AssetCompression compression;
+ size_t uncompressed_size; // == size if compression == NONE
+};
+```
+
+### Build time (`tools/asset_packer.cc`)
+
+Embedded (non-disk-load) builds only:
+
+1. Scan every `WGSL` asset to build a corpus-wide 256-entry byte histogram.
+2. Emit it as `static const uint32_t kAnsAsciiHistogram[256]` plus a
+ `GetAnsAsciiHistogram()` accessor in `assets_data.cc`.
+3. For each `WGSL` asset, call `TryAnsCompress()`:
+ `ans::Encode(...)` → reject if it's not smaller than the raw input →
+ round-trip verify with `ans::Decode(...)` → only then mark the asset
+ `ANS_ASCII`.
+4. Other asset types (SPEC, TEXTURE, MESH, BINARY, MP3, PROC*) pass
+ through uncompressed.
+
+Disk-load (dev) builds skip the encoder entirely: WGSL data is the file
+path, never the file contents.
+
+### Runtime (`src/util/asset_manager.cc`)
+
+`GetAsset()` checks `compression` on a cache miss:
+
+- `NONE` → return the static pointer (or hit the existing PROC / disk-load
+ branch).
+- `ANS_ASCII` → allocate `uncompressed_size + 1` bytes,
+ `ans::Decode(..., GetAnsAsciiHistogram())`, NUL-terminate, cache.
+
+`DropAsset()` and `ReloadAssetsFromFile()` free the heap-allocated buffer
+when `compression != NONE`, alongside the existing procedural cleanup.
+
+---
+
+## Observed Compression
+
+`workspaces/main`, STRIP_ALL build: WGSL shaders compress to **0.62×–0.71×**
+their raw size (81 of 105 assets qualify). Round-trip verification runs
+at pack time for every compressed asset; failures abort the build.
+
+---
+
+## Limitations
+
+The encoder returns `false` if it cannot produce a final state above
+`kMask` for some chunk. With the corpus-derived ASCII histogram this never
+trips on the demo's WGSL corpus, but inputs with a near-monolithic byte
+distribution can fail. Such assets fall back to uncompressed storage.
+
+---
+
+## Tests
+
+`src/tests/util/test_ans.cc` (run via `make run_util_tests` or
+`./build/test_ans`):
+
+- Roundtrip variants: empty, single byte, single-symbol run, all-zeros,
+ random uniform, random skewed, repeated ASCII.
+- Seeded-vs-uniform: a corpus-matched histogram compresses at least as
+ well as a uniform seed.
+- Rejection: mismatched seed model, payload bit-flip, truncated stream.
+- `PeekUncompressedSize` returns the header value.
+
+---
+
+## See Also
+
+- `doc/ASSET_SYSTEM.md` — overall asset pipeline.
+- `src/util/ans.h` — public API.
+- `tools/asset_packer.cc` — corpus scan and per-asset compression.
+- `src/util/asset_manager.cc` — runtime decompression.
diff --git a/doc/ASSET_SYSTEM.md b/doc/ASSET_SYSTEM.md
index a97886c..415342d 100644
--- a/doc/ASSET_SYSTEM.md
+++ b/doc/ASSET_SYSTEM.md
@@ -60,6 +60,16 @@ enum class AssetType : uint8_t {
};
```
+## Compression
+
+Each `AssetRecord` carries an `AssetCompression` flag and an
+`uncompressed_size`. In **Release Mode** the asset packer runs an order-0
+rANS coder over every `WGSL` asset, seeded with a histogram derived from
+the full shader corpus. `AssetManager::GetAsset()` decompresses lazily on
+first access and caches the result. Other asset types and **Development
+Mode** are unaffected. See `doc/ANS.md` for the algorithm, bitstream, and
+runtime API.
+
Query at runtime:
```cpp
if (GetAssetType(AssetId::NEVER_MP3) == AssetType::MP3) { ... }
@@ -93,8 +103,8 @@ Tool: `tools/asset_packer.cc`
## Technical Guarantees
- **Alignment**: All embedded data arrays are declared `alignas(16)` for safe `reinterpret_cast`.
-- **String Safety**: Embedded assets are null-terminated (safe as C-strings). In disk-load mode, the path itself is a null-terminated C-string.
-- **Size**: For embedded assets, `size` reflects the original file size (the buffer is `size + 1`). For disk-loaded assets, it reflects the file path's string length.
+- **String Safety**: Embedded assets are null-terminated (safe as C-strings). In disk-load mode, the path itself is a null-terminated C-string. ANS-compressed assets are NUL-terminated by the decompressor on first access.
+- **Size**: For uncompressed embedded assets, `size` is the original file size (the buffer is `size + 1`). For disk-loaded assets, it is the file path's string length. For ANS-compressed assets, `size` is the *compressed* byte count; query `uncompressed_size` for the decoded length.
## Developer Workflow
diff --git a/doc/COMPLETED.md b/doc/COMPLETED.md
index 233373e..bf4c3ba 100644
--- a/doc/COMPLETED.md
+++ b/doc/COMPLETED.md
@@ -34,6 +34,10 @@ Completed task archive. See `doc/archive/` for detailed historical documents.
---
+## May 2026
+
+- [x] **ANS shader compression (2026-05-14)** — Order-0 rANS coder in `src/util/ans.{h,cc}` (decoder always built; encoder gated on `ANS_ENABLE_ENCODER`). `asset_packer` derives a corpus-wide byte histogram from every WGSL file, ANS-encodes each shader with that seed, and round-trip-verifies at pack time. `AssetRecord` gains `compression` + `uncompressed_size`; `asset_manager` decompresses lazily on first `GetAsset()` and frees in `DropAsset`/`ReloadAssetsFromFile`. WGSL assets shrink to ~0.62–0.71× in `workspaces/main` (81/105). See `doc/ANS.md`. Tests: 37/37 dev, 36/36 STRIP_ALL.
+
## March 2026 (continued)
- [x] **FFT twiddle factor fix** — `fft_radix2` computes `wr/wi` directly per k via `cosf/sinf(angle*k)`. Tests A–E added to `test_fft.cc`. Tolerance reverted to 5e-3.