About
Deep Infra is an AI inference and machine learning infrastructure platform that provides API access to hosted models, GPU instances, and dedicated clusters. The service exposes model endpoints across text generation, embeddings, speech, image, and video categories, with pricing based on per-token, per-minute, per-image, or instance-hour usage depending on the workload.
The pricing page lists usage-based billing with no free allowance for the core hosted inference service. It also includes dedicated custom LLM deployments on A100, H100, H200, B200, and B300 GPUs, billed in minute granularity, plus dedicated instances and clusters available by sales contact.
- Hosted model APIs across multiple modalities
- Per-token, per-minute, and per-image billing
- GPU instances and dedicated clusters
- 256k to 1M token contexts on select models
- SOC 2 and ISO 27001 certified
- US-based data centers
- 200 concurrent request limit
What's included in the free tier
See Deep Infra pricing for current limits.