# Geometry Buffer Design [IN PROGRESS] **Status:** Ideation phase **Goal:** Efficient G-buffer for deferred rendering in 64k demo --- ## Overview Replace direct rendering with geometry buffer accumulation for advanced post-processing and lighting. **Target:** 8-10 bytes/pixel, 16-bit precision --- ## Buffer Elements ### Core Attributes | Attribute | Channels | Precision | Source | |-----------|----------|-----------|--------| | Albedo (RGB) | 3 | f16 | Material/procedural | | Roughness | 1 | u8/u16 | PBR material property | | Metallic | 1 | u8/u16 | PBR material property | | Normal (XYZ) | 2 | f16 | Octahedral encoding | | Depth | 1 | f16/f32 | 1/z for precision | | Object/Material ID | 1 | u16 | Rasterization/SDF | | Transparency | 1 | u8/u16 | Alpha channel | ### Optional/Derived | Attribute | Storage | Notes | |-----------|---------|-------| | Depth gradient | On-demand | Compute from depth (Sobel) | | Laplacian | On-demand | Second derivative of depth | | Motion vectors | 2×f16 | Screen-space XY | | AO | 1×f16 | Ambient occlusion | **Key insight:** Depth derivatives cheaper to compute than store (2-4 bytes/pixel saved). --- ## Packing Strategies ### Traditional Multi-Render-Target (MRT) ``` RT0 (RGBA16): Albedo.rgb + Roughness (packed with metallic) RT1 (RG16): Octahedral normal (2 channels encode XYZ) RT2 (R32F): 1/z depth (or use hardware depth buffer) RT3 (RG16): Motion vectors XY RT4 (R16UI): Object/Material ID ``` **Total:** 4-5 render targets = 8-10 bytes/pixel ### Compute Shader + Storage Buffer (RECOMMENDED) **Advantages:** - Custom bit-packing (not bound to RGBA formats) - Compute derivatives in-pass (depth gradient, Laplacian) - Cache-optimized tiling (Morton order) - No MRT limits (store 20+ attributes) **Tradeoffs:** - No hardware depth/early-Z during G-buffer generation - Manual atomics if pixel overdraw - Lose ROPs hardware optimizations **Struct Layout:** ```cpp struct GBufferPixel { u32 packed_normal; // Octahedral 16+16 u32 rgba_rough; // RGBA8 + Roughness8 + Metallic8 (26 bits used) f16 inv_z; // 1/z depth u16 material_id; // Object/material // Total: 12 bytes/pixel }; // Compressed variant (8 bytes): struct CompactGBuffer { u32 normal_depth; // Oct16 normal + u16 quantized depth u32 rgba_params; // RGB565 + Rough4 + Metal4 + Flags4 }; ``` **Access Pattern:** ```wgsl @group(0) @binding(0) var g_buffer: array; fn write_gbuffer(pixel_id: u32, data: SurfaceData) { g_buffer[pixel_id].packed_normal = pack_octahedral(data.normal); g_buffer[pixel_id].rgba_rough = pack_rgba8(data.albedo) | (u32(data.roughness * 255.0) << 24); g_buffer[pixel_id].inv_z = f16(1.0 / data.depth); g_buffer[pixel_id].material_id = data.id; } ``` --- ## Normal Encoding **Octahedral mapping** (most efficient for 2-channel storage): - Encodes unit sphere normal to 2D square - 16-bit per channel = good precision - Fast encode/decode (no trig) ```cpp vec2 octahedral_encode(vec3 n) { n /= (abs(n.x) + abs(n.y) + abs(n.z)); vec2 p = n.z >= 0.0 ? n.xy : (1.0 - abs(n.yx)) * sign(n.xy); return p * 0.5 + 0.5; // [0, 1] } vec3 octahedral_decode(vec2 p) { p = p * 2.0 - 1.0; // [-1, 1] vec3 n = vec3(p.x, p.y, 1.0 - abs(p.x) - abs(p.y)); float t = max(-n.z, 0.0); n.x += n.x >= 0.0 ? -t : t; n.y += n.y >= 0.0 ? -t : t; return normalize(n); } ``` --- ## Depth Storage **1/z (inverse depth):** - Better precision distribution (more bits near camera) - Linear in screen space - Matches perspective projection **Alternatives:** - Logarithmic depth (even better precision) - Hardware depth buffer (R32F, free with render targets) --- ## Material Properties **Roughness/Metallic are NOT geometry:** - **Source:** Texture lookups, procedural noise, or constants - **Not bump-mapping:** Bump/normal maps perturb normals (geometry) - **PBR properties:** Control light interaction (0=smooth/dielectric, 1=rough/metal) **Demoscene approach:** Procedural generation or baked constants (avoid textures). --- ## Post-Processing Derivatives **Compute on-demand** (cheaper than storing): ```wgsl // Depth gradient (Sobel filter) fn depth_gradient(uv: vec2f) -> vec2f { let dx = textureLoad(depth, uv + vec2(1,0)) - textureLoad(depth, uv - vec2(1,0)); let dy = textureLoad(depth, uv + vec2(0,1)) - textureLoad(depth, uv - vec2(0,1)); return vec2(dx, dy) * 0.5; } // Laplacian (edge detection) fn laplacian(uv: vec2f) -> f32 { let c = textureLoad(depth, uv); let n = textureLoad(depth, uv + vec2(0,1)); let s = textureLoad(depth, uv - vec2(0,1)); let e = textureLoad(depth, uv + vec2(1,0)); let w = textureLoad(depth, uv - vec2(1,0)); return (n + s + e + w) - 4.0 * c; } ``` --- ## Integration with Hybrid Renderer **Current:** Hybrid SDF raymarching + rasterized proxy geometry **Future:** Both write to unified G-buffer ```cpp // Rasterization pass void rasterize_geometry() { // Vertex shader → fragment shader // Write to G-buffer (compute or MRT) } // SDF raymarching pass (compute) void raymarch_sdf() { // Per-pixel ray march // Write to same G-buffer at hit points } // Deferred lighting pass void deferred_lighting() { // Read G-buffer // Apply PBR lighting, shadows, etc. } ``` **Atomics handling:** Use depth test or tile-based sorting to avoid conflicts. --- ## Size Budget **Target:** 1920×1080 @ 8 bytes/pixel = **16 MB** **Compressed:** 1920×1080 @ 6 bytes/pixel = **12 MB** **Acceptable for 64k demo:** RAM usage OK, not binary size. --- ## Next Steps 1. Prototype compute shader G-buffer writer 2. Implement octahedral normal encoding 3. Test SDF + raster unified writes 4. Add deferred lighting pass 5. Validate depth derivative quality (gradient/Laplacian) 6. Optimize packing (aim for 6-8 bytes/pixel) --- ## References - Octahedral mapping: "Survey of Efficient Representations for Independent Unit Vectors" (Meyer et al.) - PBR theory: "Physically Based Rendering" (Pharr, Jakob, Humphreys) - G-buffer design: "Deferred Rendering in Killzone 2" (Valient, 2007)