diff options
Diffstat (limited to 'cnn_v2/docs/CNN_V2_WEB_TOOL.md')
| -rw-r--r-- | cnn_v2/docs/CNN_V2_WEB_TOOL.md | 348 |
1 files changed, 348 insertions, 0 deletions
diff --git a/cnn_v2/docs/CNN_V2_WEB_TOOL.md b/cnn_v2/docs/CNN_V2_WEB_TOOL.md new file mode 100644 index 0000000..b6f5b0b --- /dev/null +++ b/cnn_v2/docs/CNN_V2_WEB_TOOL.md @@ -0,0 +1,348 @@ +# CNN v2 Web Testing Tool + +Browser-based WebGPU tool for validating CNN v2 inference with layer visualization and weight inspection. + +**Location:** `tools/cnn_v2_test/index.html` + +--- + +## Status (2026-02-13) + +**Working:** +- ✅ WebGPU initialization and device setup +- ✅ Binary weight file parsing (v1 and v2 formats) +- ✅ Automatic mip-level detection from binary format v2 +- ✅ Weight statistics (min/max per layer) +- ✅ UI layout with collapsible panels +- ✅ Mode switching (Activations/Weights tabs) +- ✅ Canvas context management (2D for weights, WebGPU for activations) +- ✅ Weight visualization infrastructure (layer selection, grid layout) +- ✅ Layer naming matches codebase convention (Layer 0, Layer 1, Layer 2) +- ✅ Static features split visualization (Static 0-3, Static 4-7) +- ✅ All layers visible including output layer (Layer 2) +- ✅ Video playback support (MP4, WebM) with frame-by-frame controls +- ✅ Video looping (automatic continuous playback) +- ✅ Mip level selection (p0-p3 features at different resolutions) + +**Recent Changes (Latest):** +- Binary format v2 support: Reads mip_level from 20-byte header +- Backward compatible: v1 (16-byte header) → mip_level=0 +- Auto-update UI dropdown when loading weights with mip_level +- Display mip_level in metadata panel +- Code refactoring: Extracted FULLSCREEN_QUAD_VS shader (reused 3× across pipelines) +- Added helper methods: `getDimensions()`, `setVideoControlsEnabled()` +- Improved code organization with section headers and comments +- Moved Mip Level selector to bottom of left sidebar (removed "Features (p0-p3)" label) +- Added `loop` attribute to video element for automatic continuous playback + +**Previous Fixes:** +- Fixed Layer 2 not appearing (was excluded from layerOutputs due to isOutput check) +- Fixed canvas context switching (force clear before recreation) +- Added Static 0-3 / Static 4-7 buttons to view all 8 static feature channels +- Aligned naming with train_cnn_v2.py/.wgsl: Layer 0, Layer 1, Layer 2 (not Layer 1, 2, 3) +- Disabled Static buttons in weights mode (no learnable weights) + +**Known Issues:** +- Layer activation visualization may show black if texture data not properly unpacked +- Weight kernel display depends on correct 2D context creation after canvas recreation + +--- + +## Architecture + +### File Structure +- Single-file HTML tool (~1100 lines) +- Embedded shaders: STATIC_SHADER, CNN_SHADER, DISPLAY_SHADER, LAYER_VIZ_SHADER +- Shared WGSL component: FULLSCREEN_QUAD_VS (reused across render pipelines) +- **Embedded default weights:** DEFAULT_WEIGHTS_B64 (base64-encoded binary v2) + - Current: 4 layers (3×3, 5×5, 3×3, 3×3), 2496 f16 weights, mip_level=2 + - Source: `workspaces/main/weights/cnn_v2_weights.bin` + - Updates: Re-encode binary with `base64 -i <file>` and update constant +- Pure WebGPU (no external dependencies) + +### Code Organization + +**Recent Refactoring (2026-02-13):** +- Extracted `FULLSCREEN_QUAD_VS` constant: Reused fullscreen quad vertex shader (2 triangles covering NDC) +- Added helper methods to CNNTester class: + - `getDimensions()`: Returns current source dimensions (video or image) + - `setVideoControlsEnabled(enabled)`: Centralized video control enable/disable +- Consolidated duplicate vertex shader code (used in mipmap generation, display, layer visualization) +- Added section headers in JavaScript for better navigation +- Improved inline comments explaining shader architecture + +**Benefits:** +- Reduced code duplication (~40 lines saved) +- Easier maintenance (single source of truth for fullscreen quad) +- Clearer separation of concerns + +### Key Components + +**1. Weight Parsing** +- Reads binary format v2: header (20B) + layer info (20B×N) + f16 weights +- Backward compatible with v1: header (16B), mip_level defaults to 0 +- Computes min/max per layer via f16 unpacking +- Stores `{ layers[], weights[], mipLevel, fileSize }` +- Auto-sets UI mip-level dropdown from loaded weights + +**2. CNN Pipeline** +- Static features computation (RGBD + UV + sin + bias → 7D packed) +- Layer-by-layer convolution with storage buffer weights +- Ping-pong buffers for intermediate results +- Copy to persistent textures for visualization + +**3. Visualization Modes** + +**Activations Mode:** +- 4 grayscale views per layer (channels 0-3 of up to 8 total) +- WebGPU compute → unpack f16 → scale → grayscale +- Auto-scale: Static features = 1.0, CNN layers = 0.2 +- Static features: Shows R,G,B,D (first 4 of 8: RGBD+UV+sin+bias) +- CNN layers: Shows first 4 output channels + +**Weights Mode:** +- 2D canvas rendering per output channel +- Shows all input kernels horizontally +- Normalized by layer min/max → [0, 1] → grayscale +- 20px cells, 2px padding between kernels + +### Texture Management + +**Persistent Storage (layerTextures[]):** +- One texture per layer output (static + all CNN layers) +- `rgba32uint` format (packed f16 data) +- `COPY_DST` usage for storing results + +**Compute Buffers (computeTextures[]):** +- 2 textures for ping-pong computation +- Reused across all layers +- `COPY_SRC` usage for copying to persistent storage + +**Pipeline:** +``` +Static pass → copy to layerTextures[0] +For each CNN layer i: + Compute (ping-pong) → copy to layerTextures[i+1] +``` + +### Layer Indexing + +**UI Layer Buttons:** +- "Static" → layerOutputs[0] (7D input features) +- "Layer 1" → layerOutputs[1] (CNN layer 1 output, uses weights.layers[0]) +- "Layer 2" → layerOutputs[2] (CNN layer 2 output, uses weights.layers[1]) +- "Layer N" → layerOutputs[N] (CNN layer N output, uses weights.layers[N-1]) + +**Weights Table:** +- "Layer 1" → weights.layers[0] (first CNN layer weights) +- "Layer 2" → weights.layers[1] (second CNN layer weights) +- "Layer N" → weights.layers[N-1] + +**Consistency:** Both UI and weights table use same numbering (1, 2, 3...) for CNN layers. + +--- + +## Known Issues + +### Issue #1: Layer Activations Show Black + +**Symptom:** +- All 4 channel canvases render black +- UV gradient test (debug mode 10) works +- Raw packed data test (mode 11) shows black +- Unpacked f16 test (mode 12) shows black + +**Diagnosis:** +- Texture access works (UV gradient visible) +- Texture data is all zeros (packed.x = 0) +- Textures being read are empty + +**Root Cause:** +- `copyTextureToTexture` operations may not be executing +- Possible ordering issue (copies not submitted before visualization) +- Alternative: textures created with wrong usage flags + +**Investigation Steps Taken:** +1. Added `onSubmittedWorkDone()` wait before visualization +2. Verified texture creation with `COPY_SRC` and `COPY_DST` flags +3. Confirmed separate texture allocation per layer (no aliasing) +4. Added debug shader modes to isolate issue + +**Next Steps:** +- Verify encoder contains copy commands (add debug logging) +- Check if compute passes actually write data (add known-value test) +- Test copyTextureToTexture in isolation +- Consider CPU readback to verify texture contents + +### Issue #2: Weight Visualization Empty + +**Symptom:** +- Canvases created with correct dimensions (logged) +- No visual output (black canvases) +- Console logs show method execution + +**Potential Causes:** +1. Weight indexing calculation incorrect +2. Canvas not properly attached to DOM when rendering +3. 2D context operations not flushing +4. Min/max normalization producing black (all values equal?) + +**Debug Added:** +- Comprehensive logging of dimensions, indices, ranges +- Canvas context check before rendering + +**Next Steps:** +- Add test rendering (fixed gradient) to verify 2D context works +- Log sample weight values to verify data access +- Check if canvas is visible in DOM inspector +- Verify min/max calculation produces valid range + +--- + +## UI Layout + +### Header +- Controls: Blend slider, Depth input, View mode display +- Drop zone for .bin weight files + +### Content Area + +**Left Sidebar (300px):** +1. Drop zone for .bin weight files +2. Weights Info panel (file size, layer table with min/max) +3. Weights Visualization panel (per-layer kernel display) +4. **Mip Level selector** (bottom) - Select p0/p1/p2 for static features + +**Main Canvas (center):** +- CNN output display with video controls (Play/Pause, Frame ◄/►) +- Supports both PNG images and video files (MP4, WebM) +- Video loops automatically for continuous playback + +**Right Sidebar (panels):** +1. **Layer Visualization Panel** (top, flex: 1) + - Layer selection buttons (Static 0-3, Static 4-7, Layer 0, Layer 1, ...) + - 2×2 grid of channel views (grayscale activations) + - 4× zoom view at bottom + +### Footer +- Status line (GPU timing, dimensions, mode) +- Console log (scrollable, color-coded) + +--- + +## Shader Details + +### LAYER_VIZ_SHADER + +**Purpose:** Display single channel from packed layer texture + +**Inputs:** +- `@binding(0) layer_tex: texture_2d<u32>` - Packed f16 layer data +- `@binding(1) viz_params: vec2<f32>` - (channel_idx, scale) + +**Debug Modes:** +- Channel 10: UV gradient (texture coordinate test) +- Channel 11: Raw packed u32 data +- Channel 12: First unpacked f16 value + +**Normal Operation:** +- Unpack all 8 f16 channels from rgba32uint +- Select channel by index (0-7) +- Apply scale factor (1.0 for static, 0.2 for CNN) +- Clamp to [0, 1] and output grayscale + +**Scale Rationale:** +- Static features (RGBD, UV): already in [0, 1] range +- CNN activations: post-ReLU [0, ~5], need scaling for visibility + +--- + +## Binary Weight Format + +See `doc/CNN_V2_BINARY_FORMAT.md` for complete specification. + +**Quick Summary:** +- Header: 16 bytes (magic, version, layer count, total weights) +- Layer info: 20 bytes × N (kernel size, channels, offsets) +- Weights: Packed f16 pairs as u32 + +--- + +## Testing Workflow + +### Load & Parse +1. Drop PNG image → displays original +2. Drop .bin weights → parses and shows info table +3. Auto-runs CNN pipeline + +### Verify Pipeline +1. Check console for "Running CNN pipeline" +2. Verify "Completed in Xms" +3. Check "Layer visualization ready: N layers" + +### Debug Activations +1. Select "Activations" tab +2. Click layer buttons to switch +3. Check console for texture/canvas logs +4. If black: note which debug modes work (UV vs data) + +### Debug Weights +1. Select "Weights" tab +2. Click Layer 1 or Layer 2 (Layer 0 has no weights) +3. Check console for "Visualizing Layer N weights" +4. Check canvas dimensions logged +5. Verify weight range is non-trivial (not [0, 0]) + +--- + +## Integration with Main Project + +**Training Pipeline:** +```bash +# Generate weights +./training/train_cnn_v2.py --export-binary + +# Test in browser +open tools/cnn_v2_test/index.html +# Drop: workspaces/main/cnn_v2_weights.bin +# Drop: training/input/test.png +``` + +**Validation:** +- Compare against demo CNNv2Effect (visual check) +- Verify layer count matches binary file +- Check weight ranges match training logs + +--- + +## Future Enhancements + +- [ ] Fix layer activation visualization (black texture issue) +- [ ] Fix weight kernel display (empty canvas issue) +- [ ] Add per-channel auto-scaling (compute min/max from visible data) +- [ ] Export rendered outputs (download PNG) +- [ ] Side-by-side comparison with original +- [ ] Heatmap mode (color-coded activations) +- [ ] Weight statistics overlay (mean, std, sparsity) +- [ ] Batch processing (multiple images in sequence) +- [ ] Integration with Python training (live reload) + +--- + +## Code Metrics + +- Total lines: ~1100 +- JavaScript: ~700 lines +- WGSL shaders: ~300 lines +- HTML/CSS: ~100 lines + +**Dependencies:** None (pure WebGPU + HTML5) + +--- + +## Related Files + +- `doc/CNN_V2.md` - CNN v2 architecture and design +- `doc/CNN_TEST_TOOL.md` - C++ offline testing tool (deprecated) +- `training/train_cnn_v2.py` - Training script with binary export +- `workspaces/main/cnn_v2_weights.bin` - Trained weights |
