About
Fireworks AI is a cloud platform for running, tuning, and deploying generative AI models. It provides serverless inference, on-demand GPU deployments, and fine-tuning workflows for open models, with model access exposed through an API and developer tooling.
The pricing page lists $1 in free credits for serverless inference, along with per-token pricing for text and vision models, per-second pricing for speech-to-text, per-step and per-image pricing for image generation, and per-GPU-hour pricing for on-demand deployments. It is aimed at developers building code assistants, chatbots, RAG systems, multimodal apps, and other model-backed applications.
- Serverless inference with postpaid billing
- $1 free credits at signup
- Per-token text and vision pricing
- Speech-to-text billed per audio minute
- Image generation billed per step or image
- On-demand GPU deployments by hour
- Fine-tuning and reinforcement tuning support
- Open model library with API access
Free Tier Value
The pricing page explicitly says Serverless Inference starts with "$1 in free credits," and the extracted free-tier bullets confirm those credits can be used across serverless inference, text/vision, speech-to-text, image generation, and embeddings. Per the free_credit rule, the conservative monthly value is credit_amount/3 = $1/3 = $0.33. This is a usage-based product, so the paid plans are the serverless token billing, fine-tuning token billing, and on-demand GPU-second billing shown on the pricing page.
What's included in the free tier
- Access to serverless inference with $1 in free credits.
- Use of text and vision models within the free credits.
- Use of speech-to-text models within the free credits.
- Use of image generation models within the free credits.
- Use of embeddings models within the free credits.
See Fireworks AI pricing for current limits.
Paid plans
Serverless Inference
- input tokens
- $0.07 to $2.00 per 1M input tokens depending on model
- output tokens
- $0.16 to $8.00 per 1M output tokens depending on model
- embeddings input tokens
- $0.008 to $0.10 per 1M input tokens depending on model
- Zero setup
- No cold starts
- High rate limits
- Postpaid billing
- Text and vision models
- Embeddings models
Fine Tuning
- training tokens
- $0.50 to $40.00 per 1M training tokens depending on model and method
- Serve fine-tuned models for the same price as base models
- Supervised fine tuning
- Preference fine tuning
- Reinforcement fine tuning
- LoRA and full-parameter tuning
On-Demand Deployments
- GPU hour
- $6.00 to $11.00 per hour depending on GPU type
- No extra charges for start-up times
- Faster speeds
- Higher rate limits
- Lower costs at scale
Pricing extracted from Fireworks AI's pricing page. Always verify current pricing before committing.