About

Fireworks AI is a cloud platform for running, tuning, and deploying generative AI models. It provides serverless inference, on-demand GPU deployments, and fine-tuning workflows for open models, with model access exposed through an API and developer tooling.

The pricing page lists $1 in free credits for serverless inference, along with per-token pricing for text and vision models, per-second pricing for speech-to-text, per-step and per-image pricing for image generation, and per-GPU-hour pricing for on-demand deployments. It is aimed at developers building code assistants, chatbots, RAG systems, multimodal apps, and other model-backed applications.

  • Serverless inference with postpaid billing
  • $1 free credits at signup
  • Per-token text and vision pricing
  • Speech-to-text billed per audio minute
  • Image generation billed per step or image
  • On-demand GPU deployments by hour
  • Fine-tuning and reinforcement tuning support
  • Open model library with API access

Free Tier Value

25
FTV score
Est. value $1 one-time credit
Signup credit $1
Credit card Required
Feature parity 100%

The pricing page explicitly says Serverless Inference starts with "$1 in free credits," and the extracted free-tier bullets confirm those credits can be used across serverless inference, text/vision, speech-to-text, image generation, and embeddings. Per the free_credit rule, the conservative monthly value is credit_amount/3 = $1/3 = $0.33. This is a usage-based product, so the paid plans are the serverless token billing, fine-tuning token billing, and on-demand GPU-second billing shown on the pricing page.

What's included in the free tier

  • Access to serverless inference with $1 in free credits.
  • Use of text and vision models within the free credits.
  • Use of speech-to-text models within the free credits.
  • Use of image generation models within the free credits.
  • Use of embeddings models within the free credits.

Paid plans

Serverless Inference

Usage-based
Pay per token, with high rate limits and postpaid billing
input tokens
$0.07 to $2.00 per 1M input tokens depending on model
output tokens
$0.16 to $8.00 per 1M output tokens depending on model
embeddings input tokens
$0.008 to $0.10 per 1M input tokens depending on model
  • Zero setup
  • No cold starts
  • High rate limits
  • Postpaid billing
  • Text and vision models
  • Embeddings models

Fine Tuning

Usage-based
Priced per 1M training tokens
training tokens
$0.50 to $40.00 per 1M training tokens depending on model and method
  • Serve fine-tuned models for the same price as base models
  • Supervised fine tuning
  • Preference fine tuning
  • Reinforcement fine tuning
  • LoRA and full-parameter tuning

On-Demand Deployments

Usage-based
Pay per GPU second
GPU hour
$6.00 to $11.00 per hour depending on GPU type
  • No extra charges for start-up times
  • Faster speeds
  • Higher rate limits
  • Lower costs at scale

Pricing extracted from Fireworks AI's pricing page. Always verify current pricing before committing.