An interactive calculator that estimates the largest transformer that fits on a pooled GPU cluster under Adam mixed-precision training, accounting for weights, gradients, optimizer state, and forward activations.
How many parameters fit on your GPU cluster?
← Back