summaryrefslogtreecommitdiff
path: root/doc/COMPLETED.md
diff options
context:
space:
mode:
authorskal <pascal.massimino@gmail.com>2026-02-14 01:37:50 +0100
committerskal <pascal.massimino@gmail.com>2026-02-14 01:37:50 +0100
commitf72f404e755149e80350dc6eb34015d5e7630d44 (patch)
tree7c7d600fccaf8030a245f799b3fcd28e82161d0b /doc/COMPLETED.md
parentef091948ecb9bf83b71b28cb47d529732ad54c17 (diff)
Document CNN v2 training pipeline improvements
- HOWTO.md: Document always-save-checkpoint behavior and --quiet flag - COMPLETED.md: Add milestone entry for Feb 14 CNN v2 fixes - Details: checkpoint saving, num_layers derivation, output streamlining Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Diffstat (limited to 'doc/COMPLETED.md')
-rw-r--r--doc/COMPLETED.md9
1 files changed, 9 insertions, 0 deletions
diff --git a/doc/COMPLETED.md b/doc/COMPLETED.md
index 01c4408..c7b2cae 100644
--- a/doc/COMPLETED.md
+++ b/doc/COMPLETED.md
@@ -455,3 +455,12 @@ Use `read @doc/archive/FILENAME.md` to access archived documents.
- **test_mesh tool**: Implemented a standalone `test_mesh` tool for visualizing OBJ files with debug normal display.
- **Task #39: Visual Debugging System**: Implemented a comprehensive set of wireframe primitives (Sphere, Cone, Cross, Line, Trajectory) in `VisualDebug`. Updated `test_3d_render` to demonstrate usage.
- **Task #68: Mesh Wireframe Rendering**: Added `add_mesh_wireframe` to `VisualDebug` to visualize triangle edges for mesh objects. Integrated into `Renderer3D` debug path and `test_mesh` tool.
+
+#### CNN v2 Training Pipeline Improvements (February 14, 2026) 🎯
+- **Critical Training Fixes**: Resolved checkpoint saving and argument handling bugs in CNN v2 training pipeline. **Bug 1 (Missing Checkpoints)**: Training completed successfully but no checkpoint saved when `epochs < checkpoint_every` interval. Solution: Always save final checkpoint after training completes, regardless of interval settings. **Bug 2 (Stale Checkpoints)**: Old checkpoint files from previous runs with different parameters weren't overwritten due to `if not exists` check. Solution: Remove existence check, always overwrite final checkpoint. **Bug 3 (Ignored num_layers)**: When providing comma-separated kernel sizes (e.g., `--kernel-sizes 3,1,3`), the `--num-layers` parameter was used only for validation but not derived from list length. Solution: Derive `num_layers` from kernel_sizes list length when multiple values provided. **Bug 4 (Argument Passing)**: Shell script passed unquoted variables to Python, potentially causing parsing issues with special characters. Solution: Quote all shell variables when passing to Python scripts.
+
+- **Output Streamlining**: Reduced verbose training pipeline output by 90%. **Export Section**: Added `--quiet` flag to `export_cnn_v2_weights.py`, producing single-line summary instead of detailed layer-by-layer breakdown (e.g., "Exported 3 layers, 912 weights, 1904 bytes → test.bin"). **Validation Section**: Changed from printing 10+ lines per image (loading, processing, saving) to compact single-line format showing all images at once (e.g., "Processing images: img_000 img_001 img_002 ✓"). **Result**: Training pipeline output reduced from ~100 lines to ~30 lines while preserving essential information. Makes rapid iteration more pleasant.
+
+- **Documentation Updates**: Updated `doc/HOWTO.md` CNN v2 training section to document new behavior: always saves final checkpoint, derives num_layers from kernel_sizes list, uses streamlined output with `--quiet` flag. Added examples for both verbose and quiet export modes.
+
+- **Files Modified**: `training/train_cnn_v2.py` (checkpoint saving logic, num_layers derivation), `scripts/train_cnn_v2_full.sh` (variable quoting, validation output, checkpoint validation), `training/export_cnn_v2_weights.py` (--quiet flag support), `doc/HOWTO.md` (documentation). **Impact**: Training pipeline now robust for rapid experimentation with different architectures, no longer requires manual checkpoint management or workarounds for short training runs.