summaryrefslogtreecommitdiff
path: root/doc/CNN_V2.md
diff options
context:
space:
mode:
Diffstat (limited to 'doc/CNN_V2.md')
-rw-r--r--doc/CNN_V2.md43
1 files changed, 40 insertions, 3 deletions
diff --git a/doc/CNN_V2.md b/doc/CNN_V2.md
index 78854ce..abef606 100644
--- a/doc/CNN_V2.md
+++ b/doc/CNN_V2.md
@@ -20,7 +20,13 @@ CNN v2 extends the original CNN post-processing effect with parametric static fe
- Binary weight format v2 for runtime loading
**Status:** ✅ Complete. Training pipeline functional, validation tools ready, mip-level support integrated.
-**TODO:** 8-bit quantization with QAT for 2× size reduction (~1.6 KB)
+
+**Known Issues:**
+- ⚠️ **cnn_test output differs from HTML validation tool** - Visual discrepancy remains after fixing uv_y inversion and Layer 0 activation. Root cause under investigation. Both tools should produce identical output given same weights/input.
+
+**TODO:**
+- 8-bit quantization with QAT for 2× size reduction (~1.6 KB)
+- Debug cnn_test vs HTML tool output difference
---
@@ -130,6 +136,27 @@ let bias = 1.0; // Learned bias per output channel
// Packed storage: [p0, p1, p2, p3, uv.x, uv.y, sin(20*uv.y), 1.0]
```
+### Input Channel Mapping
+
+**Weight tensor layout (12 input channels per layer):**
+
+| Input Channel | Feature | Description |
+|--------------|---------|-------------|
+| 0-3 | Previous layer output | 4D RGBA from prior CNN layer (or input RGBD for Layer 0) |
+| 4-11 | Static features | 8D: p0, p1, p2, p3, uv_x, uv_y, sin20_y, bias |
+
+**Static feature channel details:**
+- Channel 4 → p0 (RGB.r from mip level)
+- Channel 5 → p1 (RGB.g from mip level)
+- Channel 6 → p2 (RGB.b from mip level)
+- Channel 7 → p3 (depth or RGB channel from mip level)
+- Channel 8 → p4 (uv_x: normalized horizontal position)
+- Channel 9 → p5 (uv_y: normalized vertical position)
+- Channel 10 → p6 (sin(20*uv_y): periodic encoding)
+- Channel 11 → p7 (bias: constant 1.0)
+
+**Note:** When generating identity weights, p4-p7 correspond to input channels 8-11, not 4-7.
+
### Feature Rationale
| Feature | Dimension | Purpose | Priority |
@@ -326,12 +353,13 @@ class CNNv2(nn.Module):
kernel_sizes = [3, 3, 3] # Per-layer kernel sizes (e.g., [1,3,5])
num_layers = 3 # Number of CNN layers
mip_level = 0 # Mip level for p0-p3: 0=orig, 1=half, 2=quarter, 3=eighth
+grayscale_loss = False # Compute loss on grayscale (Y) instead of RGBA
learning_rate = 1e-3
batch_size = 16
epochs = 5000
# Dataset: Input RGB, Target RGBA (preserves alpha channel from image)
-# Model outputs RGBA, loss compares all 4 channels
+# Model outputs RGBA, loss compares all 4 channels (or grayscale if --grayscale-loss)
# Training loop (standard PyTorch f32)
for epoch in range(epochs):
@@ -344,7 +372,15 @@ for epoch in range(epochs):
# Forward pass
output = model(input_rgbd, static_feat)
- loss = criterion(output, target_batch)
+
+ # Loss computation (grayscale or RGBA)
+ if grayscale_loss:
+ # Convert RGBA to grayscale: Y = 0.299*R + 0.587*G + 0.114*B
+ output_gray = 0.299 * output[:, 0:1] + 0.587 * output[:, 1:2] + 0.114 * output[:, 2:3]
+ target_gray = 0.299 * target[:, 0:1] + 0.587 * target[:, 1:2] + 0.114 * target[:, 2:3]
+ loss = criterion(output_gray, target_gray)
+ else:
+ loss = criterion(output, target_batch)
# Backward pass
optimizer.zero_grad()
@@ -361,6 +397,7 @@ torch.save({
'kernel_sizes': [3, 3, 3], # Per-layer kernel sizes
'num_layers': 3,
'mip_level': 0, # Mip level used for p0-p3
+ 'grayscale_loss': False, # Whether grayscale loss was used
'features': ['p0', 'p1', 'p2', 'p3', 'uv.x', 'uv.y', 'sin10_x', 'bias']
},
'epoch': epoch,