summaryrefslogtreecommitdiff
path: root/cnn_v2/README.md
blob: ef0cf447bfb2f51391bd48fef916cce13d192695 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
# CNN v2: Parametric Post-Processing Neural Network

**Architecture:** 3-layer compute, storage buffer (~3.2 KB)
**Features:** 7D static (RGBD + UV + sin + bias), sigmoid activation

## Quick Start

```bash
./cnn_v2/scripts/train_cnn_v2_full.sh
```

## Documentation

- [CNN_V2.md](docs/CNN_V2.md) - Architecture and implementation details
- [CNN_V2_BINARY_FORMAT.md](docs/CNN_V2_BINARY_FORMAT.md) - Weight format specification
- [CNN_V2_WEB_TOOL.md](docs/CNN_V2_WEB_TOOL.md) - Validation tool documentation
- [CNN_V2_DEBUG_TOOLS.md](docs/CNN_V2_DEBUG_TOOLS.md) - Debugging and analysis tools

## Integration

- **C++:** `cnn_v2/src/cnn_v2_effect.{h,cc}`
- **Assets:** `workspaces/main/assets.txt` (lines 47-49)
- **Test:** `src/tests/gpu/test_demo_effects.cc` (line 93)

## Directory Structure

```
cnn_v2/
├── README.md              # This file
├── src/
│   ├── cnn_v2_effect.h    # Effect header
│   └── cnn_v2_effect.cc   # Effect implementation
├── shaders/               # WGSL shaders (6 files)
├── weights/               # Binary weights (3 files)
├── training/              # Python training scripts (4 files)
├── scripts/               # Shell scripts (train_cnn_v2_full.sh)
├── tools/                 # Validation tools (HTML)
└── docs/                  # Documentation (4 markdown files)
```

## Training Pipeline

1. **Train model:** `./cnn_v2/scripts/train_cnn_v2_full.sh`
2. **Export weights:** Automatic (binary format, ~3.2 KB)
3. **Validate:** HTML tool at `cnn_v2/tools/cnn_v2_test/index.html`

For detailed training options: `./cnn_v2/scripts/train_cnn_v2_full.sh --help`

## Key Features

- **Parametric static features:** 7D input (RGBD + UV + sin encoding + bias)
- **Storage buffer architecture:** Dynamic layer count, compact binary format
- **Sigmoid activation:** Smooth gradients, prevents training collapse
- **Patch-based training:** Sample-efficient, focuses on salient regions
- **Sub-10KB target:** Achieved with 3-layer model (~3.2 KB)

## Next Steps

- **8-bit quantization:** 2× size reduction (~1.6 KB) via quantization-aware training (QAT)
- **CNN v3:** U-Net architecture for enhanced quality (separate directory)