doc/ANS.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166

# ANS Compression

Order-0 rANS entropy coder used to compress shader assets at build time and
decompress them on first access at runtime.

**Source:** `src/util/ans.{h,cc}`.

---

## Algorithm

Per-chunk adaptive order-0 byte coder.

| Parameter        | Value                                  |
|------------------|----------------------------------------|
| Precision        | 16 bits (`kBits = 16`)                 |
| State range      | `[1 << 16, 1 << 32)` (`uint32_t`)      |
| Renorm I/O width | 16 bits (big-endian)                   |
| Chunk size       | 1024 bytes                             |
| Symbols          | 256 (bytes)                            |
| Initial state    | `1 << 16` (`kInitState`)               |

The encoder iterates each chunk in reverse, the decoder forward. Symbol
counts are mutated on the fly during encode/decode and re-normalized at
each chunk boundary so the cumulative table sums to `1 << 16`.

The chunk-end state always equals `kInitState`; the decoder rejects the
stream if it doesn't. That single check catches both bit-level corruption
and decoder/encoder model divergence (e.g. wrong initial histogram).

The per-chunk initial state must be exactly `1 << kBits`. A higher value
(e.g. with a "signature" packed into the upper bits) forces a renorm-emit
at iter 0 that the decoder never consumes — harmless on a single chunk,
but it corrupts any stream with two or more chunks once the per-chunk
stats become skewed.

---

## Bitstream Format

Big-endian throughout.

```
[u32  uncompressed_size]            // 4 bytes, header
per chunk (uncompressed_size > 0):
  [u32  final_state]                // 4 bytes
  [u16  emitted_words]*             // variable, in stream order
```

Number of emitted words per chunk is implicit — the decoder pulls a word
whenever its state drops at or below `kMask = (1 << kBits) - 1`.

---

## API

```cpp
#include "util/ans.h"

// Always built.
bool ans::Decode(const uint8_t* src, size_t src_size,
                 uint8_t* dst, size_t dst_capacity,
                 size_t* out_size,
                 const uint32_t* initial_counts = nullptr);

uint32_t ans::PeekUncompressedSize(const uint8_t* src, size_t src_size);

// Gated on ANS_ENABLE_ENCODER (tools only).
bool ans::Encode(const uint8_t* src, size_t size,
                 std::vector<uint8_t>* dst,
                 const uint32_t* initial_counts = nullptr);

void ans::Histogram(const uint8_t* src, size_t size, uint32_t* out_counts);
```

`initial_counts` is a 256-entry table that seeds the adaptive model. Both
encoder and decoder must use the same seed — a mismatch trips the chunk-end
state check immediately. Pass `nullptr` for a uniform default (all-ones).

---

## Asset Pipeline Integration

`AssetRecord` carries two extra fields:

```cpp
enum class AssetCompression : uint8_t {
  NONE = 0,
  ANS_ASCII = 1,  // seeded from GetAnsAsciiHistogram()
};

struct AssetRecord {
  ...
  AssetCompression compression;
  size_t uncompressed_size;  // == size if compression == NONE
};
```

### Build time (`tools/asset_packer.cc`)

Embedded (non-disk-load) builds only:

1. Scan every `WGSL` asset to build a corpus-wide 256-entry byte histogram.
2. Emit it as `static const uint32_t kAnsAsciiHistogram[256]` plus a
   `GetAnsAsciiHistogram()` accessor in `assets_data.cc`.
3. For each `WGSL` asset, call `TryAnsCompress()`:
   `ans::Encode(...)` → reject if it's not smaller than the raw input →
   round-trip verify with `ans::Decode(...)` → only then mark the asset
   `ANS_ASCII`.
4. Other asset types (SPEC, TEXTURE, MESH, BINARY, MP3, PROC*) pass
   through uncompressed.

Disk-load (dev) builds skip the encoder entirely: WGSL data is the file
path, never the file contents.

### Runtime (`src/util/asset_manager.cc`)

`GetAsset()` checks `compression` on a cache miss:

- `NONE` → return the static pointer (or hit the existing PROC / disk-load
  branch).
- `ANS_ASCII` → allocate `uncompressed_size + 1` bytes,
  `ans::Decode(..., GetAnsAsciiHistogram())`, NUL-terminate, cache.

`DropAsset()` and `ReloadAssetsFromFile()` free the heap-allocated buffer
when `compression != NONE`, alongside the existing procedural cleanup.

---

## Observed Compression

`workspaces/main`, STRIP_ALL build: WGSL shaders compress to **0.62×–0.71×**
their raw size (81 of 105 assets qualify). Round-trip verification runs
at pack time for every compressed asset; failures abort the build.

---

## Limitations

The encoder returns `false` if it cannot produce a final state above
`kMask` for some chunk. With the corpus-derived ASCII histogram this never
trips on the demo's WGSL corpus, but inputs with a near-monolithic byte
distribution can fail. Such assets fall back to uncompressed storage.

---

## Tests

`src/tests/util/test_ans.cc` (run via `make run_util_tests` or
`./build/test_ans`):

- Roundtrip variants: empty, single byte, single-symbol run, all-zeros,
  random uniform, random skewed, repeated ASCII.
- Seeded-vs-uniform: a corpus-matched histogram compresses at least as
  well as a uniform seed.
- Rejection: mismatched seed model, payload bit-flip, truncated stream.
- `PeekUncompressedSize` returns the header value.

---

## See Also

- `doc/ASSET_SYSTEM.md` — overall asset pipeline.
- `src/util/ans.h` — public API.
- `tools/asset_packer.cc` — corpus scan and per-asset compression.
- `src/util/asset_manager.cc` — runtime decompression.