About

Together AI is a cloud platform for running open-source models through serverless inference, batch inference, dedicated inference, and dedicated container inference. It also includes accelerated compute with GPU clusters, sandbox environments, managed storage, and fine-tuning and evaluation tools for model shaping.

The pricing page shows usage-based pricing rather than a free plan: serverless inference is priced per 1M tokens, dedicated inference starts at $3.99 per hour for 1x H100 80GB, GPU clusters start at $3.49 per hour, sandbox compute is billed per vCPU and GiB RAM, managed storage is $0.16 per GiB/month, and fine-tuning is priced per 1M tokens with model-specific minimum charges.

  • Serverless inference APIs
  • Batch inference workloads
  • Dedicated GPU endpoints
  • GPU clusters on demand
  • Sandbox development environments
  • Managed storage for model data
  • Fine-tuning and evaluations

Free Tier Value

32
FTV score
Credit card Required
Feature parity 100%

The provided pricing page shows usage-based serverless inference priced per 1M tokens and several other offerings that are contact-sales/custom, but it does not show any concrete free-credit or free-trial amount on the page text. Since listing_type is already given as free_credit and no credit amount is visible, the conservative free-tier value must be null rather than inferred. Feature parity is set to 100 because free-credit listings generally expose the same product surface as paid usage, and there is no explicit gating text in the excerpt.

What's included in the free tier

Details aren't itemized for this entry yet - check the pricing page below for the latest.

Paid plans

Serverless Inference

Usage-based
Price per 1M tokens
tokens
priced per 1M tokens
models
multiple model families listed
cached input
discounted cached rates shown for some models
  • Chat
  • Vision
  • Image
  • Audio
  • Video
  • Transcribe
  • Embeddings
  • Rerank

GPU Clusters

Contact sales
gpu types
GB300, GB200, B200, H200, H100
  • Reliable GPU clusters at scale
  • Accelerated compute
  • NVIDIA GPU options
  • Self-service clusters
  • Generally available

Sandbox

Contact sales
  • Developer environments for AI
  • Build development environments

Managed Storage

Contact sales
  • Store model weights and data securely

Fine-Tuning

Contact sales
  • Shape models with your data
  • Larger models
  • Longer contexts

Dedicated Model Inference

Contact sales
  • Inference on custom hardware

Pricing extracted from Together AI's pricing page. Always verify current pricing before committing.