summaryrefslogtreecommitdiff
path: root/training/export_cnn_v2_weights.py
diff options
context:
space:
mode:
authorskal <pascal.massimino@gmail.com>2026-02-12 12:11:53 +0100
committerskal <pascal.massimino@gmail.com>2026-02-12 12:11:53 +0100
commiteaf0bd855306e70ca03f2d6579b4d6551aff6482 (patch)
tree62316af1143db1e59e1ad62e70b9844e324cda55 /training/export_cnn_v2_weights.py
parente8344bc84ec0f571e5c5aafffe7c914abe226bd6 (diff)
TODO: 8-bit weight quantization for 2× size reduction
- Add QAT (quantization-aware training) notes - Requires training with fake quantization - Target: ~1.6 KB weights (vs 3.2 KB f16) - Shader unpacking needs adaptation (4× u8 per u32)
Diffstat (limited to 'training/export_cnn_v2_weights.py')
-rwxr-xr-xtraining/export_cnn_v2_weights.py2
1 files changed, 2 insertions, 0 deletions
diff --git a/training/export_cnn_v2_weights.py b/training/export_cnn_v2_weights.py
index e3d1724..723f572 100755
--- a/training/export_cnn_v2_weights.py
+++ b/training/export_cnn_v2_weights.py
@@ -94,6 +94,8 @@ def export_weights_binary(checkpoint_path, output_path):
weight_offset += len(layer2_flat)
# Convert to f16
+ # TODO: Use 8-bit quantization for 2× size reduction
+ # Requires quantization-aware training (QAT) to maintain accuracy
all_weights_f16 = np.array(all_weights, dtype=np.float16)
# Pack f16 pairs into u32 for storage buffer