doc/CNN_V2_WEB_TOOL.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348

# CNN v2 Web Testing Tool

Browser-based WebGPU tool for validating CNN v2 inference with layer visualization and weight inspection.

**Location:** `tools/cnn_v2_test/index.html`

---

## Status (2026-02-13)

**Working:**
- ✅ WebGPU initialization and device setup
- ✅ Binary weight file parsing (v1 and v2 formats)
- ✅ Automatic mip-level detection from binary format v2
- ✅ Weight statistics (min/max per layer)
- ✅ UI layout with collapsible panels
- ✅ Mode switching (Activations/Weights tabs)
- ✅ Canvas context management (2D for weights, WebGPU for activations)
- ✅ Weight visualization infrastructure (layer selection, grid layout)
- ✅ Layer naming matches codebase convention (Layer 0, Layer 1, Layer 2)
- ✅ Static features split visualization (Static 0-3, Static 4-7)
- ✅ All layers visible including output layer (Layer 2)
- ✅ Video playback support (MP4, WebM) with frame-by-frame controls
- ✅ Video looping (automatic continuous playback)
- ✅ Mip level selection (p0-p3 features at different resolutions)

**Recent Changes (Latest):**
- Binary format v2 support: Reads mip_level from 20-byte header
- Backward compatible: v1 (16-byte header) → mip_level=0
- Auto-update UI dropdown when loading weights with mip_level
- Display mip_level in metadata panel
- Code refactoring: Extracted FULLSCREEN_QUAD_VS shader (reused 3× across pipelines)
- Added helper methods: `getDimensions()`, `setVideoControlsEnabled()`
- Improved code organization with section headers and comments
- Moved Mip Level selector to bottom of left sidebar (removed "Features (p0-p3)" label)
- Added `loop` attribute to video element for automatic continuous playback

**Previous Fixes:**
- Fixed Layer 2 not appearing (was excluded from layerOutputs due to isOutput check)
- Fixed canvas context switching (force clear before recreation)
- Added Static 0-3 / Static 4-7 buttons to view all 8 static feature channels
- Aligned naming with train_cnn_v2.py/.wgsl: Layer 0, Layer 1, Layer 2 (not Layer 1, 2, 3)
- Disabled Static buttons in weights mode (no learnable weights)

**Known Issues:**
- Layer activation visualization may show black if texture data not properly unpacked
- Weight kernel display depends on correct 2D context creation after canvas recreation

---

## Architecture

### File Structure
- Single-file HTML tool (~1100 lines)
- Embedded shaders: STATIC_SHADER, CNN_SHADER, DISPLAY_SHADER, LAYER_VIZ_SHADER
- Shared WGSL component: FULLSCREEN_QUAD_VS (reused across render pipelines)
- **Embedded default weights:** DEFAULT_WEIGHTS_B64 (base64-encoded binary v2)
  - Current: 4 layers (3×3, 5×5, 3×3, 3×3), 2496 f16 weights, mip_level=2
  - Source: `workspaces/main/weights/cnn_v2_weights.bin`
  - Updates: Re-encode binary with `base64 -i <file>` and update constant
- Pure WebGPU (no external dependencies)

### Code Organization

**Recent Refactoring (2026-02-13):**
- Extracted `FULLSCREEN_QUAD_VS` constant: Reused fullscreen quad vertex shader (2 triangles covering NDC)
- Added helper methods to CNNTester class:
  - `getDimensions()`: Returns current source dimensions (video or image)
  - `setVideoControlsEnabled(enabled)`: Centralized video control enable/disable
- Consolidated duplicate vertex shader code (used in mipmap generation, display, layer visualization)
- Added section headers in JavaScript for better navigation
- Improved inline comments explaining shader architecture

**Benefits:**
- Reduced code duplication (~40 lines saved)
- Easier maintenance (single source of truth for fullscreen quad)
- Clearer separation of concerns

### Key Components

**1. Weight Parsing**
- Reads binary format v2: header (20B) + layer info (20B×N) + f16 weights
- Backward compatible with v1: header (16B), mip_level defaults to 0
- Computes min/max per layer via f16 unpacking
- Stores `{ layers[], weights[], mipLevel, fileSize }`
- Auto-sets UI mip-level dropdown from loaded weights

**2. CNN Pipeline**
- Static features computation (RGBD + UV + sin + bias → 7D packed)
- Layer-by-layer convolution with storage buffer weights
- Ping-pong buffers for intermediate results
- Copy to persistent textures for visualization

**3. Visualization Modes**

**Activations Mode:**
- 4 grayscale views per layer (channels 0-3 of up to 8 total)
- WebGPU compute → unpack f16 → scale → grayscale
- Auto-scale: Static features = 1.0, CNN layers = 0.2
- Static features: Shows R,G,B,D (first 4 of 8: RGBD+UV+sin+bias)
- CNN layers: Shows first 4 output channels

**Weights Mode:**
- 2D canvas rendering per output channel
- Shows all input kernels horizontally
- Normalized by layer min/max → [0, 1] → grayscale
- 20px cells, 2px padding between kernels

### Texture Management

**Persistent Storage (layerTextures[]):**
- One texture per layer output (static + all CNN layers)
- `rgba32uint` format (packed f16 data)
- `COPY_DST` usage for storing results

**Compute Buffers (computeTextures[]):**
- 2 textures for ping-pong computation
- Reused across all layers
- `COPY_SRC` usage for copying to persistent storage

**Pipeline:**
```
Static pass → copy to layerTextures[0]
For each CNN layer i:
  Compute (ping-pong) → copy to layerTextures[i+1]
```

### Layer Indexing

**UI Layer Buttons:**
- "Static" → layerOutputs[0] (7D input features)
- "Layer 1" → layerOutputs[1] (CNN layer 1 output, uses weights.layers[0])
- "Layer 2" → layerOutputs[2] (CNN layer 2 output, uses weights.layers[1])
- "Layer N" → layerOutputs[N] (CNN layer N output, uses weights.layers[N-1])

**Weights Table:**
- "Layer 1" → weights.layers[0] (first CNN layer weights)
- "Layer 2" → weights.layers[1] (second CNN layer weights)
- "Layer N" → weights.layers[N-1]

**Consistency:** Both UI and weights table use same numbering (1, 2, 3...) for CNN layers.

---

## Known Issues

### Issue #1: Layer Activations Show Black

**Symptom:**
- All 4 channel canvases render black
- UV gradient test (debug mode 10) works
- Raw packed data test (mode 11) shows black
- Unpacked f16 test (mode 12) shows black

**Diagnosis:**
- Texture access works (UV gradient visible)
- Texture data is all zeros (packed.x = 0)
- Textures being read are empty

**Root Cause:**
- `copyTextureToTexture` operations may not be executing
- Possible ordering issue (copies not submitted before visualization)
- Alternative: textures created with wrong usage flags

**Investigation Steps Taken:**
1. Added `onSubmittedWorkDone()` wait before visualization
2. Verified texture creation with `COPY_SRC` and `COPY_DST` flags
3. Confirmed separate texture allocation per layer (no aliasing)
4. Added debug shader modes to isolate issue

**Next Steps:**
- Verify encoder contains copy commands (add debug logging)
- Check if compute passes actually write data (add known-value test)
- Test copyTextureToTexture in isolation
- Consider CPU readback to verify texture contents

### Issue #2: Weight Visualization Empty

**Symptom:**
- Canvases created with correct dimensions (logged)
- No visual output (black canvases)
- Console logs show method execution

**Potential Causes:**
1. Weight indexing calculation incorrect
2. Canvas not properly attached to DOM when rendering
3. 2D context operations not flushing
4. Min/max normalization producing black (all values equal?)

**Debug Added:**
- Comprehensive logging of dimensions, indices, ranges
- Canvas context check before rendering

**Next Steps:**
- Add test rendering (fixed gradient) to verify 2D context works
- Log sample weight values to verify data access
- Check if canvas is visible in DOM inspector
- Verify min/max calculation produces valid range

---

## UI Layout

### Header
- Controls: Blend slider, Depth input, View mode display
- Drop zone for .bin weight files

### Content Area

**Left Sidebar (300px):**
1. Drop zone for .bin weight files
2. Weights Info panel (file size, layer table with min/max)
3. Weights Visualization panel (per-layer kernel display)
4. **Mip Level selector** (bottom) - Select p0/p1/p2 for static features

**Main Canvas (center):**
- CNN output display with video controls (Play/Pause, Frame ◄/►)
- Supports both PNG images and video files (MP4, WebM)
- Video loops automatically for continuous playback

**Right Sidebar (panels):**
1. **Layer Visualization Panel** (top, flex: 1)
   - Layer selection buttons (Static 0-3, Static 4-7, Layer 0, Layer 1, ...)
   - 2×2 grid of channel views (grayscale activations)
   - 4× zoom view at bottom

### Footer
- Status line (GPU timing, dimensions, mode)
- Console log (scrollable, color-coded)

---

## Shader Details

### LAYER_VIZ_SHADER

**Purpose:** Display single channel from packed layer texture

**Inputs:**
- `@binding(0) layer_tex: texture_2d<u32>` - Packed f16 layer data
- `@binding(1) viz_params: vec2<f32>` - (channel_idx, scale)

**Debug Modes:**
- Channel 10: UV gradient (texture coordinate test)
- Channel 11: Raw packed u32 data
- Channel 12: First unpacked f16 value

**Normal Operation:**
- Unpack all 8 f16 channels from rgba32uint
- Select channel by index (0-7)
- Apply scale factor (1.0 for static, 0.2 for CNN)
- Clamp to [0, 1] and output grayscale

**Scale Rationale:**
- Static features (RGBD, UV): already in [0, 1] range
- CNN activations: post-ReLU [0, ~5], need scaling for visibility

---

## Binary Weight Format

See `doc/CNN_V2_BINARY_FORMAT.md` for complete specification.

**Quick Summary:**
- Header: 16 bytes (magic, version, layer count, total weights)
- Layer info: 20 bytes × N (kernel size, channels, offsets)
- Weights: Packed f16 pairs as u32

---

## Testing Workflow

### Load & Parse
1. Drop PNG image → displays original
2. Drop .bin weights → parses and shows info table
3. Auto-runs CNN pipeline

### Verify Pipeline
1. Check console for "Running CNN pipeline"
2. Verify "Completed in Xms"
3. Check "Layer visualization ready: N layers"

### Debug Activations
1. Select "Activations" tab
2. Click layer buttons to switch
3. Check console for texture/canvas logs
4. If black: note which debug modes work (UV vs data)

### Debug Weights
1. Select "Weights" tab
2. Click Layer 1 or Layer 2 (Layer 0 has no weights)
3. Check console for "Visualizing Layer N weights"
4. Check canvas dimensions logged
5. Verify weight range is non-trivial (not [0, 0])

---

## Integration with Main Project

**Training Pipeline:**
```bash
# Generate weights
./training/train_cnn_v2.py --export-binary

# Test in browser
open tools/cnn_v2_test/index.html
# Drop: workspaces/main/cnn_v2_weights.bin
# Drop: training/input/test.png
```

**Validation:**
- Compare against demo CNNv2Effect (visual check)
- Verify layer count matches binary file
- Check weight ranges match training logs

---

## Future Enhancements

- [ ] Fix layer activation visualization (black texture issue)
- [ ] Fix weight kernel display (empty canvas issue)
- [ ] Add per-channel auto-scaling (compute min/max from visible data)
- [ ] Export rendered outputs (download PNG)
- [ ] Side-by-side comparison with original
- [ ] Heatmap mode (color-coded activations)
- [ ] Weight statistics overlay (mean, std, sparsity)
- [ ] Batch processing (multiple images in sequence)
- [ ] Integration with Python training (live reload)

---

## Code Metrics

- Total lines: ~1100
- JavaScript: ~700 lines
- WGSL shaders: ~300 lines
- HTML/CSS: ~100 lines

**Dependencies:** None (pure WebGPU + HTML5)

---

## Related Files

- `doc/CNN_V2.md` - CNN v2 architecture and design
- `doc/CNN_TEST_TOOL.md` - C++ offline testing tool (deprecated)
- `training/train_cnn_v2.py` - Training script with binary export
- `workspaces/main/cnn_v2_weights.bin` - Trained weights