Presentations

Parallelization and Scaling

Metrics:

Hardware FLOP/s
- Depends on e.g. checkpointing or activation recomputation
- Good for comparing to theoretical/reference performance of the hardware.
Tokens / GPU / second
- Can not compare models of different sizes.
- Good for comparing a single model on different number of GPUs.
Model FLOP/s Utilization
- Good for comparing frameworks and clusters.