# Geometry Buffer Design [IN PROGRESS]

**Status:** Ideation phase
**Goal:** Efficient G-buffer for deferred rendering in 64k demo

---

## Overview

Replace direct rendering with geometry buffer accumulation for advanced post-processing and lighting.

**Target:** 8-10 bytes/pixel, 16-bit precision

---

## Buffer Elements

### Core Attributes

| Attribute | Channels | Precision | Source |
|-----------|----------|-----------|--------|
| Albedo (RGB) | 3 | f16 | Material/procedural |
| Roughness | 1 | u8/u16 | PBR material property |
| Metallic | 1 | u8/u16 | PBR material property |
| Normal (XYZ) | 2 | f16 | Octahedral encoding |
| Depth | 1 | f16/f32 | 1/z for precision |
| Object/Material ID | 1 | u16 | Rasterization/SDF |
| Transparency | 1 | u8/u16 | Alpha channel |

### Optional/Derived

| Attribute | Storage | Notes |
|-----------|---------|-------|
| Depth gradient | On-demand | Compute from depth (Sobel) |
| Laplacian | On-demand | Second derivative of depth |
| Motion vectors | 2×f16 | Screen-space XY |
| AO | 1×f16 | Ambient occlusion |

**Key insight:** Depth derivatives cheaper to compute than store (2-4 bytes/pixel saved).

---

## Packing Strategies

### Traditional Multi-Render-Target (MRT)

```
RT0 (RGBA16): Albedo.rgb + Roughness (packed with metallic)
RT1 (RG16):   Octahedral normal (2 channels encode XYZ)
RT2 (R32F):   1/z depth (or use hardware depth buffer)
RT3 (RG16):   Motion vectors XY
RT4 (R16UI):  Object/Material ID
```

**Total:** 4-5 render targets = 8-10 bytes/pixel

### Compute Shader + Storage Buffer (RECOMMENDED)

**Advantages:**
- Custom bit-packing (not bound to RGBA formats)
- Compute derivatives in-pass (depth gradient, Laplacian)
- Cache-optimized tiling (Morton order)
- No MRT limits (store 20+ attributes)

**Tradeoffs:**
- No hardware depth/early-Z during G-buffer generation
- Manual atomics if pixel overdraw
- Lose ROPs hardware optimizations

**Struct Layout:**
```cpp
struct GBufferPixel {
  u32 packed_normal;      // Octahedral 16+16
  u32 rgba_rough;         // RGBA8 + Roughness8 + Metallic8 (26 bits used)
  f16 inv_z;              // 1/z depth
  u16 material_id;        // Object/material
  // Total: 12 bytes/pixel
};

// Compressed variant (8 bytes):
struct CompactGBuffer {
  u32 normal_depth;       // Oct16 normal + u16 quantized depth
  u32 rgba_params;        // RGB565 + Rough4 + Metal4 + Flags4
};
```

**Access Pattern:**
```wgsl
@group(0) @binding(0) var<storage, read_write> g_buffer: array<GBufferPixel>;

fn write_gbuffer(pixel_id: u32, data: SurfaceData) {
  g_buffer[pixel_id].packed_normal = pack_octahedral(data.normal);
  g_buffer[pixel_id].rgba_rough = pack_rgba8(data.albedo) | (u32(data.roughness * 255.0) << 24);
  g_buffer[pixel_id].inv_z = f16(1.0 / data.depth);
  g_buffer[pixel_id].material_id = data.id;
}
```

---

## Normal Encoding

**Octahedral mapping** (most efficient for 2-channel storage):
- Encodes unit sphere normal to 2D square
- 16-bit per channel = good precision
- Fast encode/decode (no trig)

```cpp
vec2 octahedral_encode(vec3 n) {
  n /= (abs(n.x) + abs(n.y) + abs(n.z));
  vec2 p = n.z >= 0.0 ? n.xy : (1.0 - abs(n.yx)) * sign(n.xy);
  return p * 0.5 + 0.5; // [0, 1]
}

vec3 octahedral_decode(vec2 p) {
  p = p * 2.0 - 1.0; // [-1, 1]
  vec3 n = vec3(p.x, p.y, 1.0 - abs(p.x) - abs(p.y));
  float t = max(-n.z, 0.0);
  n.x += n.x >= 0.0 ? -t : t;
  n.y += n.y >= 0.0 ? -t : t;
  return normalize(n);
}
```

---

## Depth Storage

**1/z (inverse depth):**
- Better precision distribution (more bits near camera)
- Linear in screen space
- Matches perspective projection

**Alternatives:**
- Logarithmic depth (even better precision)
- Hardware depth buffer (R32F, free with render targets)

---

## Material Properties

**Roughness/Metallic are NOT geometry:**
- **Source:** Texture lookups, procedural noise, or constants
- **Not bump-mapping:** Bump/normal maps perturb normals (geometry)
- **PBR properties:** Control light interaction (0=smooth/dielectric, 1=rough/metal)

**Demoscene approach:** Procedural generation or baked constants (avoid textures).

---

## Post-Processing Derivatives

**Compute on-demand** (cheaper than storing):

```wgsl
// Depth gradient (Sobel filter)
fn depth_gradient(uv: vec2f) -> vec2f {
  let dx = textureLoad(depth, uv + vec2(1,0)) - textureLoad(depth, uv - vec2(1,0));
  let dy = textureLoad(depth, uv + vec2(0,1)) - textureLoad(depth, uv - vec2(0,1));
  return vec2(dx, dy) * 0.5;
}

// Laplacian (edge detection)
fn laplacian(uv: vec2f) -> f32 {
  let c = textureLoad(depth, uv);
  let n = textureLoad(depth, uv + vec2(0,1));
  let s = textureLoad(depth, uv - vec2(0,1));
  let e = textureLoad(depth, uv + vec2(1,0));
  let w = textureLoad(depth, uv - vec2(1,0));
  return (n + s + e + w) - 4.0 * c;
}
```

---

## Integration with Hybrid Renderer

**Current:** Hybrid SDF raymarching + rasterized proxy geometry
**Future:** Both write to unified G-buffer

```cpp
// Rasterization pass
void rasterize_geometry() {
  // Vertex shader → fragment shader
  // Write to G-buffer (compute or MRT)
}

// SDF raymarching pass (compute)
void raymarch_sdf() {
  // Per-pixel ray march
  // Write to same G-buffer at hit points
}

// Deferred lighting pass
void deferred_lighting() {
  // Read G-buffer
  // Apply PBR lighting, shadows, etc.
}
```

**Atomics handling:** Use depth test or tile-based sorting to avoid conflicts.

---

## Size Budget

**Target:** 1920×1080 @ 8 bytes/pixel = **16 MB**
**Compressed:** 1920×1080 @ 6 bytes/pixel = **12 MB**

**Acceptable for 64k demo:** RAM usage OK, not binary size.

---

## Next Steps

1. Prototype compute shader G-buffer writer
2. Implement octahedral normal encoding
3. Test SDF + raster unified writes
4. Add deferred lighting pass
5. Validate depth derivative quality (gradient/Laplacian)
6. Optimize packing (aim for 6-8 bytes/pixel)

---

## References

- Octahedral mapping: "Survey of Efficient Representations for Independent Unit Vectors" (Meyer et al.)
- PBR theory: "Physically Based Rendering" (Pharr, Jakob, Humphreys)
- G-buffer design: "Deferred Rendering in Killzone 2" (Valient, 2007)