diff options
Diffstat (limited to 'cnn_v3/docs/HOW_TO_CNN.md')
| -rw-r--r-- | cnn_v3/docs/HOW_TO_CNN.md | 14 |
1 files changed, 7 insertions, 7 deletions
diff --git a/cnn_v3/docs/HOW_TO_CNN.md b/cnn_v3/docs/HOW_TO_CNN.md index 458b68f..4966a61 100644 --- a/cnn_v3/docs/HOW_TO_CNN.md +++ b/cnn_v3/docs/HOW_TO_CNN.md @@ -97,7 +97,7 @@ It calls `pack_photo_sample.py` with both `--photo` and `--target` in a single s | `normal.png` | (128, 128, 0) uint8 | Neutral "no normal" → reconstructed (0,0,1) | | `depth.png` | All zeros uint16 | No depth data | | `matid.png` | All zeros uint8 | No material IDs | -| `shadow.png` | 255 everywhere uint8 | Assume fully lit | +| `shadow.png` | 255 everywhere uint8 | Assume fully lit (used to compute dif) | | `transp.png` | 1 − alpha uint8 | 0 = opaque | | `target.png` | Stylized target RGBA | Ground truth for training | @@ -134,7 +134,7 @@ done ### 1b. From Blender (Full G-Buffer) -Produces all 20 feature channels including normals, depth, mat IDs, and shadow. +Produces all 20 feature channels including normals, depth, mat IDs, and dif (diffuse×shadow). #### Blender requirements @@ -420,7 +420,7 @@ Applied per-sample to make the model robust to missing channels: | Channel group | Channels | Drop probability | |---------------|----------|-----------------| | Geometric | normal.xy, depth, depth_grad.xy [3,4,5,6,7] | `channel_dropout_p` (default 0.3) | -| Context | mat_id, shadow, transp [8,18,19] | `channel_dropout_p × 0.67` (~0.2) | +| Context | mat_id, dif, transp [8,18,19] | `channel_dropout_p × 0.67` (~0.2) | | Temporal | prev.rgb [9,10,11] | 0.5 (always) | This is why a model trained on Blender data also works on photos (geometry zeroed). @@ -781,7 +781,7 @@ Both produced by `export_cnn_v3_weights.py` (§3). | Texture | Format | Size | |---------|--------|------| | `feat_tex0` | rgba32uint | W × H (8 f16: albedo, normal, depth, depth_grad) | -| `feat_tex1` | rgba32uint | W × H (12 u8: mat_id, prev, mip1, mip2, shadow, transp) | +| `feat_tex1` | rgba32uint | W × H (12 u8: mat_id, prev, mip1, mip2, dif, transp) | | `enc0_tex` | rgba16float | W × H | | `enc1_tex` | rgba32uint | W/2 × H/2 (8 f16 packed) | | `bn_tex` | rgba32uint | W/4 × H/4 | @@ -790,7 +790,7 @@ Both produced by `export_cnn_v3_weights.py` (§3). ### Simple mode (photo input) -Albedo = image RGB, mip1/mip2 from GPU mipmaps, shadow = 1.0, transp = 1 − alpha, +Albedo = image RGB, mip1/mip2 from GPU mipmaps, dif = 1.0 (fully lit assumed), transp = 1 − alpha, all geometric channels (normal, depth, depth_grad, mat_id, prev) = 0. ### Browser requirements @@ -843,7 +843,7 @@ all geometric channels (normal, depth, depth_grad, mat_id, prev) = 0. | 9–11 | prev.rgb | previous frame output | zero during training | | 12–14 | mip1.rgb | pyrdown(albedo) | f32 [0,1] | | 15–17 | mip2.rgb | pyrdown(mip1) | f32 [0,1] | -| 18 | shadow | `shadow.png` | f32 [0,1] (1=lit) | +| 18 | dif | computed | f32 [0,1] max(0,dot(normal,KEY_LIGHT))×shadow | | 19 | transp | `transp.png` | f32 [0,1] (0=opaque) | **Feature texture packing** (`feat_tex0` / `feat_tex1`, both `rgba32uint`): @@ -858,6 +858,6 @@ feat_tex0 (4×u32 = 8 f16 channels via pack2x16float): feat_tex1 (4×u32 = 12 u8 channels + padding via pack4x8unorm): .x = pack4x8unorm(mat_id, prev.r, prev.g, prev.b) .y = pack4x8unorm(mip1.r, mip1.g, mip1.b, mip2.r) - .z = pack4x8unorm(mip2.g, mip2.b, shadow, transp) + .z = pack4x8unorm(mip2.g, mip2.b, dif, transp) .w = 0 (unused, 8 reserved channels) ``` |
