diff options
Diffstat (limited to 'doc/CNN_V2_WEB_TOOL.md')
| -rw-r--r-- | doc/CNN_V2_WEB_TOOL.md | 348 |
1 files changed, 0 insertions, 348 deletions
diff --git a/doc/CNN_V2_WEB_TOOL.md b/doc/CNN_V2_WEB_TOOL.md deleted file mode 100644 index b6f5b0b..0000000 --- a/doc/CNN_V2_WEB_TOOL.md +++ /dev/null @@ -1,348 +0,0 @@ -# CNN v2 Web Testing Tool - -Browser-based WebGPU tool for validating CNN v2 inference with layer visualization and weight inspection. - -**Location:** `tools/cnn_v2_test/index.html` - ---- - -## Status (2026-02-13) - -**Working:** -- ✅ WebGPU initialization and device setup -- ✅ Binary weight file parsing (v1 and v2 formats) -- ✅ Automatic mip-level detection from binary format v2 -- ✅ Weight statistics (min/max per layer) -- ✅ UI layout with collapsible panels -- ✅ Mode switching (Activations/Weights tabs) -- ✅ Canvas context management (2D for weights, WebGPU for activations) -- ✅ Weight visualization infrastructure (layer selection, grid layout) -- ✅ Layer naming matches codebase convention (Layer 0, Layer 1, Layer 2) -- ✅ Static features split visualization (Static 0-3, Static 4-7) -- ✅ All layers visible including output layer (Layer 2) -- ✅ Video playback support (MP4, WebM) with frame-by-frame controls -- ✅ Video looping (automatic continuous playback) -- ✅ Mip level selection (p0-p3 features at different resolutions) - -**Recent Changes (Latest):** -- Binary format v2 support: Reads mip_level from 20-byte header -- Backward compatible: v1 (16-byte header) → mip_level=0 -- Auto-update UI dropdown when loading weights with mip_level -- Display mip_level in metadata panel -- Code refactoring: Extracted FULLSCREEN_QUAD_VS shader (reused 3× across pipelines) -- Added helper methods: `getDimensions()`, `setVideoControlsEnabled()` -- Improved code organization with section headers and comments -- Moved Mip Level selector to bottom of left sidebar (removed "Features (p0-p3)" label) -- Added `loop` attribute to video element for automatic continuous playback - -**Previous Fixes:** -- Fixed Layer 2 not appearing (was excluded from layerOutputs due to isOutput check) -- Fixed canvas context switching (force clear before recreation) -- Added Static 0-3 / Static 4-7 buttons to view all 8 static feature channels -- Aligned naming with train_cnn_v2.py/.wgsl: Layer 0, Layer 1, Layer 2 (not Layer 1, 2, 3) -- Disabled Static buttons in weights mode (no learnable weights) - -**Known Issues:** -- Layer activation visualization may show black if texture data not properly unpacked -- Weight kernel display depends on correct 2D context creation after canvas recreation - ---- - -## Architecture - -### File Structure -- Single-file HTML tool (~1100 lines) -- Embedded shaders: STATIC_SHADER, CNN_SHADER, DISPLAY_SHADER, LAYER_VIZ_SHADER -- Shared WGSL component: FULLSCREEN_QUAD_VS (reused across render pipelines) -- **Embedded default weights:** DEFAULT_WEIGHTS_B64 (base64-encoded binary v2) - - Current: 4 layers (3×3, 5×5, 3×3, 3×3), 2496 f16 weights, mip_level=2 - - Source: `workspaces/main/weights/cnn_v2_weights.bin` - - Updates: Re-encode binary with `base64 -i <file>` and update constant -- Pure WebGPU (no external dependencies) - -### Code Organization - -**Recent Refactoring (2026-02-13):** -- Extracted `FULLSCREEN_QUAD_VS` constant: Reused fullscreen quad vertex shader (2 triangles covering NDC) -- Added helper methods to CNNTester class: - - `getDimensions()`: Returns current source dimensions (video or image) - - `setVideoControlsEnabled(enabled)`: Centralized video control enable/disable -- Consolidated duplicate vertex shader code (used in mipmap generation, display, layer visualization) -- Added section headers in JavaScript for better navigation -- Improved inline comments explaining shader architecture - -**Benefits:** -- Reduced code duplication (~40 lines saved) -- Easier maintenance (single source of truth for fullscreen quad) -- Clearer separation of concerns - -### Key Components - -**1. Weight Parsing** -- Reads binary format v2: header (20B) + layer info (20B×N) + f16 weights -- Backward compatible with v1: header (16B), mip_level defaults to 0 -- Computes min/max per layer via f16 unpacking -- Stores `{ layers[], weights[], mipLevel, fileSize }` -- Auto-sets UI mip-level dropdown from loaded weights - -**2. CNN Pipeline** -- Static features computation (RGBD + UV + sin + bias → 7D packed) -- Layer-by-layer convolution with storage buffer weights -- Ping-pong buffers for intermediate results -- Copy to persistent textures for visualization - -**3. Visualization Modes** - -**Activations Mode:** -- 4 grayscale views per layer (channels 0-3 of up to 8 total) -- WebGPU compute → unpack f16 → scale → grayscale -- Auto-scale: Static features = 1.0, CNN layers = 0.2 -- Static features: Shows R,G,B,D (first 4 of 8: RGBD+UV+sin+bias) -- CNN layers: Shows first 4 output channels - -**Weights Mode:** -- 2D canvas rendering per output channel -- Shows all input kernels horizontally -- Normalized by layer min/max → [0, 1] → grayscale -- 20px cells, 2px padding between kernels - -### Texture Management - -**Persistent Storage (layerTextures[]):** -- One texture per layer output (static + all CNN layers) -- `rgba32uint` format (packed f16 data) -- `COPY_DST` usage for storing results - -**Compute Buffers (computeTextures[]):** -- 2 textures for ping-pong computation -- Reused across all layers -- `COPY_SRC` usage for copying to persistent storage - -**Pipeline:** -``` -Static pass → copy to layerTextures[0] -For each CNN layer i: - Compute (ping-pong) → copy to layerTextures[i+1] -``` - -### Layer Indexing - -**UI Layer Buttons:** -- "Static" → layerOutputs[0] (7D input features) -- "Layer 1" → layerOutputs[1] (CNN layer 1 output, uses weights.layers[0]) -- "Layer 2" → layerOutputs[2] (CNN layer 2 output, uses weights.layers[1]) -- "Layer N" → layerOutputs[N] (CNN layer N output, uses weights.layers[N-1]) - -**Weights Table:** -- "Layer 1" → weights.layers[0] (first CNN layer weights) -- "Layer 2" → weights.layers[1] (second CNN layer weights) -- "Layer N" → weights.layers[N-1] - -**Consistency:** Both UI and weights table use same numbering (1, 2, 3...) for CNN layers. - ---- - -## Known Issues - -### Issue #1: Layer Activations Show Black - -**Symptom:** -- All 4 channel canvases render black -- UV gradient test (debug mode 10) works -- Raw packed data test (mode 11) shows black -- Unpacked f16 test (mode 12) shows black - -**Diagnosis:** -- Texture access works (UV gradient visible) -- Texture data is all zeros (packed.x = 0) -- Textures being read are empty - -**Root Cause:** -- `copyTextureToTexture` operations may not be executing -- Possible ordering issue (copies not submitted before visualization) -- Alternative: textures created with wrong usage flags - -**Investigation Steps Taken:** -1. Added `onSubmittedWorkDone()` wait before visualization -2. Verified texture creation with `COPY_SRC` and `COPY_DST` flags -3. Confirmed separate texture allocation per layer (no aliasing) -4. Added debug shader modes to isolate issue - -**Next Steps:** -- Verify encoder contains copy commands (add debug logging) -- Check if compute passes actually write data (add known-value test) -- Test copyTextureToTexture in isolation -- Consider CPU readback to verify texture contents - -### Issue #2: Weight Visualization Empty - -**Symptom:** -- Canvases created with correct dimensions (logged) -- No visual output (black canvases) -- Console logs show method execution - -**Potential Causes:** -1. Weight indexing calculation incorrect -2. Canvas not properly attached to DOM when rendering -3. 2D context operations not flushing -4. Min/max normalization producing black (all values equal?) - -**Debug Added:** -- Comprehensive logging of dimensions, indices, ranges -- Canvas context check before rendering - -**Next Steps:** -- Add test rendering (fixed gradient) to verify 2D context works -- Log sample weight values to verify data access -- Check if canvas is visible in DOM inspector -- Verify min/max calculation produces valid range - ---- - -## UI Layout - -### Header -- Controls: Blend slider, Depth input, View mode display -- Drop zone for .bin weight files - -### Content Area - -**Left Sidebar (300px):** -1. Drop zone for .bin weight files -2. Weights Info panel (file size, layer table with min/max) -3. Weights Visualization panel (per-layer kernel display) -4. **Mip Level selector** (bottom) - Select p0/p1/p2 for static features - -**Main Canvas (center):** -- CNN output display with video controls (Play/Pause, Frame ◄/►) -- Supports both PNG images and video files (MP4, WebM) -- Video loops automatically for continuous playback - -**Right Sidebar (panels):** -1. **Layer Visualization Panel** (top, flex: 1) - - Layer selection buttons (Static 0-3, Static 4-7, Layer 0, Layer 1, ...) - - 2×2 grid of channel views (grayscale activations) - - 4× zoom view at bottom - -### Footer -- Status line (GPU timing, dimensions, mode) -- Console log (scrollable, color-coded) - ---- - -## Shader Details - -### LAYER_VIZ_SHADER - -**Purpose:** Display single channel from packed layer texture - -**Inputs:** -- `@binding(0) layer_tex: texture_2d<u32>` - Packed f16 layer data -- `@binding(1) viz_params: vec2<f32>` - (channel_idx, scale) - -**Debug Modes:** -- Channel 10: UV gradient (texture coordinate test) -- Channel 11: Raw packed u32 data -- Channel 12: First unpacked f16 value - -**Normal Operation:** -- Unpack all 8 f16 channels from rgba32uint -- Select channel by index (0-7) -- Apply scale factor (1.0 for static, 0.2 for CNN) -- Clamp to [0, 1] and output grayscale - -**Scale Rationale:** -- Static features (RGBD, UV): already in [0, 1] range -- CNN activations: post-ReLU [0, ~5], need scaling for visibility - ---- - -## Binary Weight Format - -See `doc/CNN_V2_BINARY_FORMAT.md` for complete specification. - -**Quick Summary:** -- Header: 16 bytes (magic, version, layer count, total weights) -- Layer info: 20 bytes × N (kernel size, channels, offsets) -- Weights: Packed f16 pairs as u32 - ---- - -## Testing Workflow - -### Load & Parse -1. Drop PNG image → displays original -2. Drop .bin weights → parses and shows info table -3. Auto-runs CNN pipeline - -### Verify Pipeline -1. Check console for "Running CNN pipeline" -2. Verify "Completed in Xms" -3. Check "Layer visualization ready: N layers" - -### Debug Activations -1. Select "Activations" tab -2. Click layer buttons to switch -3. Check console for texture/canvas logs -4. If black: note which debug modes work (UV vs data) - -### Debug Weights -1. Select "Weights" tab -2. Click Layer 1 or Layer 2 (Layer 0 has no weights) -3. Check console for "Visualizing Layer N weights" -4. Check canvas dimensions logged -5. Verify weight range is non-trivial (not [0, 0]) - ---- - -## Integration with Main Project - -**Training Pipeline:** -```bash -# Generate weights -./training/train_cnn_v2.py --export-binary - -# Test in browser -open tools/cnn_v2_test/index.html -# Drop: workspaces/main/cnn_v2_weights.bin -# Drop: training/input/test.png -``` - -**Validation:** -- Compare against demo CNNv2Effect (visual check) -- Verify layer count matches binary file -- Check weight ranges match training logs - ---- - -## Future Enhancements - -- [ ] Fix layer activation visualization (black texture issue) -- [ ] Fix weight kernel display (empty canvas issue) -- [ ] Add per-channel auto-scaling (compute min/max from visible data) -- [ ] Export rendered outputs (download PNG) -- [ ] Side-by-side comparison with original -- [ ] Heatmap mode (color-coded activations) -- [ ] Weight statistics overlay (mean, std, sparsity) -- [ ] Batch processing (multiple images in sequence) -- [ ] Integration with Python training (live reload) - ---- - -## Code Metrics - -- Total lines: ~1100 -- JavaScript: ~700 lines -- WGSL shaders: ~300 lines -- HTML/CSS: ~100 lines - -**Dependencies:** None (pure WebGPU + HTML5) - ---- - -## Related Files - -- `doc/CNN_V2.md` - CNN v2 architecture and design -- `doc/CNN_TEST_TOOL.md` - C++ offline testing tool (deprecated) -- `training/train_cnn_v2.py` - Training script with binary export -- `workspaces/main/cnn_v2_weights.bin` - Trained weights |
