summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--doc/COMPLETED.md9
-rw-r--r--doc/HOWTO.md10
2 files changed, 19 insertions, 0 deletions
diff --git a/doc/COMPLETED.md b/doc/COMPLETED.md
index 01c4408..c7b2cae 100644
--- a/doc/COMPLETED.md
+++ b/doc/COMPLETED.md
@@ -455,3 +455,12 @@ Use `read @doc/archive/FILENAME.md` to access archived documents.
- **test_mesh tool**: Implemented a standalone `test_mesh` tool for visualizing OBJ files with debug normal display.
- **Task #39: Visual Debugging System**: Implemented a comprehensive set of wireframe primitives (Sphere, Cone, Cross, Line, Trajectory) in `VisualDebug`. Updated `test_3d_render` to demonstrate usage.
- **Task #68: Mesh Wireframe Rendering**: Added `add_mesh_wireframe` to `VisualDebug` to visualize triangle edges for mesh objects. Integrated into `Renderer3D` debug path and `test_mesh` tool.
+
+#### CNN v2 Training Pipeline Improvements (February 14, 2026) 🎯
+- **Critical Training Fixes**: Resolved checkpoint saving and argument handling bugs in CNN v2 training pipeline. **Bug 1 (Missing Checkpoints)**: Training completed successfully but no checkpoint saved when `epochs < checkpoint_every` interval. Solution: Always save final checkpoint after training completes, regardless of interval settings. **Bug 2 (Stale Checkpoints)**: Old checkpoint files from previous runs with different parameters weren't overwritten due to `if not exists` check. Solution: Remove existence check, always overwrite final checkpoint. **Bug 3 (Ignored num_layers)**: When providing comma-separated kernel sizes (e.g., `--kernel-sizes 3,1,3`), the `--num-layers` parameter was used only for validation but not derived from list length. Solution: Derive `num_layers` from kernel_sizes list length when multiple values provided. **Bug 4 (Argument Passing)**: Shell script passed unquoted variables to Python, potentially causing parsing issues with special characters. Solution: Quote all shell variables when passing to Python scripts.
+
+- **Output Streamlining**: Reduced verbose training pipeline output by 90%. **Export Section**: Added `--quiet` flag to `export_cnn_v2_weights.py`, producing single-line summary instead of detailed layer-by-layer breakdown (e.g., "Exported 3 layers, 912 weights, 1904 bytes → test.bin"). **Validation Section**: Changed from printing 10+ lines per image (loading, processing, saving) to compact single-line format showing all images at once (e.g., "Processing images: img_000 img_001 img_002 ✓"). **Result**: Training pipeline output reduced from ~100 lines to ~30 lines while preserving essential information. Makes rapid iteration more pleasant.
+
+- **Documentation Updates**: Updated `doc/HOWTO.md` CNN v2 training section to document new behavior: always saves final checkpoint, derives num_layers from kernel_sizes list, uses streamlined output with `--quiet` flag. Added examples for both verbose and quiet export modes.
+
+- **Files Modified**: `training/train_cnn_v2.py` (checkpoint saving logic, num_layers derivation), `scripts/train_cnn_v2_full.sh` (variable quoting, validation output, checkpoint validation), `training/export_cnn_v2_weights.py` (--quiet flag support), `doc/HOWTO.md` (documentation). **Impact**: Training pipeline now robust for rapid experimentation with different architectures, no longer requires manual checkpoint management or workarounds for short training runs.
diff --git a/doc/HOWTO.md b/doc/HOWTO.md
index c98f6ee..506bf0a 100644
--- a/doc/HOWTO.md
+++ b/doc/HOWTO.md
@@ -166,8 +166,11 @@ Enhanced CNN with parametric static features (7D input: RGBD + UV + sin encoding
**Defaults:** 200 epochs, 3×3 kernels, 8→4→4 channels, batch-size 16, patch-based (8×8, harris detector).
- Live progress with single-line update
+- Always saves final checkpoint (regardless of --checkpoint-every interval)
+- When multiple kernel sizes provided (e.g., 3,5,3), num_layers derived from list length
- Validates all input images on final epoch
- Exports binary weights (storage buffer architecture)
+- Streamlined output: single-line export summary, compact validation
- All parameters configurable via command-line
**Validation Only** (skip training):
@@ -207,12 +210,19 @@ Enhanced CNN with parametric static features (7D input: RGBD + UV + sin encoding
**Export Binary Weights:**
```bash
+# Verbose output (shows all layer details)
./training/export_cnn_v2_weights.py checkpoints/checkpoint_epoch_100.pth \
--output-weights workspaces/main/cnn_v2_weights.bin
+
+# Quiet mode (single-line summary)
+./training/export_cnn_v2_weights.py checkpoints/checkpoint_epoch_100.pth \
+ --output-weights workspaces/main/cnn_v2_weights.bin \
+ --quiet
```
Generates binary format: header + layer info + f16 weights (~3.2 KB for 3-layer model).
Storage buffer architecture allows dynamic layer count.
+Use `--quiet` for streamlined output in scripts (used automatically by train_cnn_v2_full.sh).
**TODO:** 8-bit quantization for 2× size reduction (~1.6 KB). Requires quantization-aware training (QAT).