diff options
Diffstat (limited to 'doc/CNN_V2.md')
| -rw-r--r-- | doc/CNN_V2.md | 39 |
1 files changed, 33 insertions, 6 deletions
diff --git a/doc/CNN_V2.md b/doc/CNN_V2.md index 577cf9e..2d1d4c4 100644 --- a/doc/CNN_V2.md +++ b/doc/CNN_V2.md @@ -18,15 +18,15 @@ CNN v2 extends the original CNN post-processing effect with parametric static fe - Bias integrated as static feature dimension - Storage buffer architecture (dynamic layer count) - Binary weight format v2 for runtime loading +- Sigmoid activation for layer 0 and final layer (smooth [0,1] mapping) -**Status:** ✅ Complete. Training pipeline functional, validation tools ready, mip-level support integrated. +**Status:** ✅ Complete. Sigmoid activation, stable training, validation tools operational. -**Known Issues:** -- ⚠️ **cnn_test output differs from HTML validation tool** - Visual discrepancy remains after fixing uv_y inversion and Layer 0 activation. Root cause under investigation. Both tools should produce identical output given same weights/input. +**Breaking Change:** +- Models trained with `clamp()` incompatible. Retrain required. **TODO:** - 8-bit quantization with QAT for 2× size reduction (~1.6 KB) -- Debug cnn_test vs HTML tool output difference --- @@ -106,6 +106,12 @@ Input RGBD → Static Features Compute → CNN Layers → Output RGBA - All layers: uniform 12D input, 4D output (ping-pong buffer) - Storage: `texture_storage_2d<rgba32uint>` (4 channels as 2×f16 pairs) +**Activation Functions:** +- Layer 0 & final layer: `sigmoid(x)` for smooth [0,1] mapping +- Middle layers: `ReLU` (max(0, x)) +- Rationale: Sigmoid prevents gradient blocking at boundaries, enabling better convergence +- Breaking change: Models trained with `clamp(x, 0, 1)` are incompatible, retrain required + --- ## Static Features (7D + 1 bias) @@ -136,6 +142,27 @@ let bias = 1.0; // Learned bias per output channel // Packed storage: [p0, p1, p2, p3, uv.x, uv.y, sin(20*uv.y), 1.0] ``` +### Input Channel Mapping + +**Weight tensor layout (12 input channels per layer):** + +| Input Channel | Feature | Description | +|--------------|---------|-------------| +| 0-3 | Previous layer output | 4D RGBA from prior CNN layer (or input RGBD for Layer 0) | +| 4-11 | Static features | 8D: p0, p1, p2, p3, uv_x, uv_y, sin20_y, bias | + +**Static feature channel details:** +- Channel 4 → p0 (RGB.r from mip level) +- Channel 5 → p1 (RGB.g from mip level) +- Channel 6 → p2 (RGB.b from mip level) +- Channel 7 → p3 (depth or RGB channel from mip level) +- Channel 8 → p4 (uv_x: normalized horizontal position) +- Channel 9 → p5 (uv_y: normalized vertical position) +- Channel 10 → p6 (sin(20*uv_y): periodic encoding) +- Channel 11 → p7 (bias: constant 1.0) + +**Note:** When generating identity weights, p4-p7 correspond to input channels 8-11, not 4-7. + ### Feature Rationale | Feature | Dimension | Purpose | Priority | @@ -311,7 +338,7 @@ class CNNv2(nn.Module): # Layer 0: input RGBD (4D) + static (8D) = 12D x = torch.cat([input_rgbd, static_features], dim=1) x = self.layers[0](x) - x = torch.clamp(x, 0, 1) # Output layer 0 (4 channels) + x = torch.sigmoid(x) # Soft [0,1] for layer 0 # Layer 1+: previous output (4D) + static (8D) = 12D for i in range(1, len(self.layers)): @@ -320,7 +347,7 @@ class CNNv2(nn.Module): if i < len(self.layers) - 1: x = F.relu(x) else: - x = torch.clamp(x, 0, 1) # Final output [0,1] + x = torch.sigmoid(x) # Soft [0,1] for final layer return x # RGBA output ``` |
