summaryrefslogtreecommitdiff
path: root/doc/CNN_V2.md
diff options
context:
space:
mode:
Diffstat (limited to 'doc/CNN_V2.md')
-rw-r--r--doc/CNN_V2.md39
1 files changed, 33 insertions, 6 deletions
diff --git a/doc/CNN_V2.md b/doc/CNN_V2.md
index 577cf9e..2d1d4c4 100644
--- a/doc/CNN_V2.md
+++ b/doc/CNN_V2.md
@@ -18,15 +18,15 @@ CNN v2 extends the original CNN post-processing effect with parametric static fe
- Bias integrated as static feature dimension
- Storage buffer architecture (dynamic layer count)
- Binary weight format v2 for runtime loading
+- Sigmoid activation for layer 0 and final layer (smooth [0,1] mapping)
-**Status:** ✅ Complete. Training pipeline functional, validation tools ready, mip-level support integrated.
+**Status:** ✅ Complete. Sigmoid activation, stable training, validation tools operational.
-**Known Issues:**
-- ⚠️ **cnn_test output differs from HTML validation tool** - Visual discrepancy remains after fixing uv_y inversion and Layer 0 activation. Root cause under investigation. Both tools should produce identical output given same weights/input.
+**Breaking Change:**
+- Models trained with `clamp()` incompatible. Retrain required.
**TODO:**
- 8-bit quantization with QAT for 2× size reduction (~1.6 KB)
-- Debug cnn_test vs HTML tool output difference
---
@@ -106,6 +106,12 @@ Input RGBD → Static Features Compute → CNN Layers → Output RGBA
- All layers: uniform 12D input, 4D output (ping-pong buffer)
- Storage: `texture_storage_2d<rgba32uint>` (4 channels as 2×f16 pairs)
+**Activation Functions:**
+- Layer 0 & final layer: `sigmoid(x)` for smooth [0,1] mapping
+- Middle layers: `ReLU` (max(0, x))
+- Rationale: Sigmoid prevents gradient blocking at boundaries, enabling better convergence
+- Breaking change: Models trained with `clamp(x, 0, 1)` are incompatible, retrain required
+
---
## Static Features (7D + 1 bias)
@@ -136,6 +142,27 @@ let bias = 1.0; // Learned bias per output channel
// Packed storage: [p0, p1, p2, p3, uv.x, uv.y, sin(20*uv.y), 1.0]
```
+### Input Channel Mapping
+
+**Weight tensor layout (12 input channels per layer):**
+
+| Input Channel | Feature | Description |
+|--------------|---------|-------------|
+| 0-3 | Previous layer output | 4D RGBA from prior CNN layer (or input RGBD for Layer 0) |
+| 4-11 | Static features | 8D: p0, p1, p2, p3, uv_x, uv_y, sin20_y, bias |
+
+**Static feature channel details:**
+- Channel 4 → p0 (RGB.r from mip level)
+- Channel 5 → p1 (RGB.g from mip level)
+- Channel 6 → p2 (RGB.b from mip level)
+- Channel 7 → p3 (depth or RGB channel from mip level)
+- Channel 8 → p4 (uv_x: normalized horizontal position)
+- Channel 9 → p5 (uv_y: normalized vertical position)
+- Channel 10 → p6 (sin(20*uv_y): periodic encoding)
+- Channel 11 → p7 (bias: constant 1.0)
+
+**Note:** When generating identity weights, p4-p7 correspond to input channels 8-11, not 4-7.
+
### Feature Rationale
| Feature | Dimension | Purpose | Priority |
@@ -311,7 +338,7 @@ class CNNv2(nn.Module):
# Layer 0: input RGBD (4D) + static (8D) = 12D
x = torch.cat([input_rgbd, static_features], dim=1)
x = self.layers[0](x)
- x = torch.clamp(x, 0, 1) # Output layer 0 (4 channels)
+ x = torch.sigmoid(x) # Soft [0,1] for layer 0
# Layer 1+: previous output (4D) + static (8D) = 12D
for i in range(1, len(self.layers)):
@@ -320,7 +347,7 @@ class CNNv2(nn.Module):
if i < len(self.layers) - 1:
x = F.relu(x)
else:
- x = torch.clamp(x, 0, 1) # Final output [0,1]
+ x = torch.sigmoid(x) # Soft [0,1] for final layer
return x # RGBA output
```