Skip to main content

Products · Sheet 06

Everything you need to run dedicated inference.

Five products, one promise: any open-source model on auto-optimized GPUs, priced like infrastructure should be.

Hardware · Schedule A

The GPU fleet, end to end

From 50 dollars a month to frontier-scale serving. A sample of the fleet — the optimizer picks the right card for your model.

Full fleet
GPUTierVRAMBest for/hour
T4Budget16 GBSmall models, batch jobs$0.59
L40SMid48 GB8B–13B production inference$1.95
H100Performance80 GB30B–70B, demanding latency SLAs$3.95
H200Performance141 GBLong-context 70B, MoE models$4.54
B200Top192 GBFrontier models, lowest latency$6.25

Pick a model. We handle the rest.

Every product above ships with the same promise: a flat monthly quote before you confirm, and infrastructure that gets out of the way.