About

AssemblyAI provides speech-to-text, streaming speech-to-text, speech understanding, guardrails, and an LLM gateway for voice and audio applications. The platform exposes APIs for prerecorded transcription, real-time transcription, speaker diarization, language detection, keyterms prompting, custom formatting, sentiment analysis, summarization, entity detection, PII redaction, and related audio intelligence features.

The pricing page lists pay-as-you-go usage for the hosted API and a free offer of $50 in credits with no credit card required. Free users can start 5 new streaming sessions per minute, while pay-as-you-go accounts start with 100 new streams per minute. The service is aimed at developers building transcription, voice agents, contact center workflows, medical transcription, and other speech-processing applications.

  • Speech-to-text and streaming transcription APIs
  • Speaker diarization and speaker identification
  • Language detection and code-switching support
  • Sentiment, summaries, topics, and chapters
  • PII redaction and profanity filtering
  • $50 signup credits
  • 5 new streams per minute on free accounts

Free Tier Value

57
FTV score
Est. value $50 one-time credit
Signup credit $50
Credit card Not required
Feature parity 100%

The page explicitly says “Get started today with $50 in free credits” and “No credit card required,” so no_cc_required is true. Per the free_credit rule, the conservative monthly value is credit_amount/3 = $50/3 = $16.67. The paid offerings shown are usage-based hourly/minute rates across Speech-to-Text, Streaming Speech-to-Text, Voice Agent API, Speech Understanding, Guardrails, and LLM Gateway; the free credits can be applied broadly to these APIs, so feature parity is treated as 100% for the free-credit listing.

What's included in the free tier

  • $50 in free credits for AssemblyAI’s Speech-to-Text APIs.
  • Free users can start 5 new Universal-Streaming sessions per minute.
  • No credit card required to get started with the free offer.

Paid plans

Universal-2

Usage-based
$0.15/hr
hour
$0.15/hr
keyterms prompting
Included
speaker diarization
$0.02/hr
medical mode
$0.15/hr
  • Highly accurate speech-to-text
  • Supports 99 languages
  • Lower price than Universal-3 Pro
  • Keyterms prompting included
  • Speaker diarization available
  • Medical mode available

Universal-3 Pro

Usage-based
$0.21/hr
hour
$0.21/hr
keyterms prompting
$0.05/hr
prompting
$0.05/hr
speaker diarization
$0.02/hr
medical mode
$0.15/hr
  • Most accurate speech-to-text model
  • Multilingual accuracy
  • Entity detection
  • Language support for English, Spanish, German, French, Italian, and Portuguese
  • Keyterms prompting available
  • Prompting available
  • Speaker diarization available
  • Medical mode available

Universal-Streaming

Usage-based
$0.15/hr
hour
$0.15/hr
speaker diarization
$0.12/hr
keyterms prompting
$0.04/hr
  • Fastest model for real-time English transcription
  • Optimized for speed and cost-effectiveness
  • Speaker identification available
  • Entity detection available
  • Multilingual variant available

Universal-Streaming Multilingual

Usage-based
$0.15/hr
hour
$0.15/hr
speaker diarization
$0.12/hr
  • Multilingual real-time transcription
  • Supports English, Spanish, German, French, Portuguese, and Italian
  • Speaker identification available
  • Entity detection available

Whisper-Streaming

Usage-based
$0.30/hr
hour
$0.30/hr
  • Open-source Whisper model
  • Enhanced infrastructure
  • Unlimited scale
  • Supports 99+ languages
  • Speaker identification available

Voice Agent API

Usage-based
$4.50/hr ($0.075/min)
hour
$4.50/hr
minute
$0.075/min
  • End-to-end voice agent stack
  • Built on streaming speech-to-text
  • Speaker identification available
  • Production voice agent use cases

Pricing extracted from AssemblyAI's pricing page. Always verify current pricing before committing.