Pricing · Sheet 04
Flat monthly pricing. No surprise bills.
Pay for the GPU and hours you use, plus a small honest margin. Lock in the price at deploy time — every deployment gets a precise monthly quote before you confirm.
Range: $50/mo to $6,250+/mo
GPU tiers · Schedule A
GPU tiers
The optimizer selects from these GPUs based on your model size, precision, and priority. Prices shown are per GPU, per hour.
| GPU | Tier | VRAM | Best for | /hour |
|---|---|---|---|---|
T4 | Budget | 16 GB | Small models, batch jobs | $0.59 |
L4 | Budget | 24 GB | 7B–8B chat, moderate throughput | $0.80 |
A10 | Budget | 24 GB | Latency-tolerant serving | $1.10 |
L40S | Mid | 48 GB | 8B–13B production inference | $1.95 |
A100 40 GB | Mid | 40 GB | Mid-size models, stable workloads | $2.10 |
RTX Pro 6000 | Mid | 96 GB | Memory-heavy single-GPU serving | $3.05 |
H100 | Performance | 80 GB | 30B–70B, demanding latency SLAs | $3.95 |
H200 | Performance | 141 GB | Long-context 70B, MoE models | $4.54 |
B200 8 TB/s | Top | 192 GB | Frontier models, lowest latency | $6.25 |
A100 80 GB | Performance | 80 GB | Legacy workloads requiring A100 | $8.99 |
Example configurations
Representative monthly estimates. Your actual quote is calculated from the live optimizer — this is just a starting point.
8B chatbot
Llama 3 8B Instruct
- GPU
- 1× L40S
- Hours / day
- 24
Estimated monthly
$140
Always-on single-GPU deployment for a conversational assistant.
70B assistant
Llama 3 70B Instruct
- GPU
- 2× H100
- Hours / day
- 24
Estimated monthly
$3,100
Two-GPU tensor-parallel serving for demanding production loads.
MoE frontier
Mixtral 8×22B
- GPU
- 4× H100
- Hours / day
- 24
Estimated monthly
$5,400
Four-GPU deployment for large mixture-of-experts models.
Estimate your monthly price
A rough approximation using our standard margin and a typical 70% active-time assumption. The real quote is calculated live in the dashboard from your model characteristics.
Estimate
$1,572
/month