About
AssemblyAI provides speech-to-text, streaming speech-to-text, speech understanding, guardrails, and an LLM gateway for voice and audio applications. The platform exposes APIs for prerecorded transcription, real-time transcription, speaker diarization, language detection, keyterms prompting, custom formatting, sentiment analysis, summarization, entity detection, PII redaction, and related audio intelligence features.
The pricing page lists pay-as-you-go usage for the hosted API and a free offer of $50 in credits with no credit card required. Free users can start 5 new streaming sessions per minute, while pay-as-you-go accounts start with 100 new streams per minute. The service is aimed at developers building transcription, voice agents, contact center workflows, medical transcription, and other speech-processing applications.
- Speech-to-text and streaming transcription APIs
- Speaker diarization and speaker identification
- Language detection and code-switching support
- Sentiment, summaries, topics, and chapters
- PII redaction and profanity filtering
- $50 signup credits
- 5 new streams per minute on free accounts
Free Tier Value
The page explicitly says “Get started today with $50 in free credits” and “No credit card required,” so no_cc_required is true. Per the free_credit rule, the conservative monthly value is credit_amount/3 = $50/3 = $16.67. The paid offerings shown are usage-based hourly/minute rates across Speech-to-Text, Streaming Speech-to-Text, Voice Agent API, Speech Understanding, Guardrails, and LLM Gateway; the free credits can be applied broadly to these APIs, so feature parity is treated as 100% for the free-credit listing.
What's included in the free tier
- $50 in free credits for AssemblyAI’s Speech-to-Text APIs.
- Free users can start 5 new Universal-Streaming sessions per minute.
- No credit card required to get started with the free offer.
See AssemblyAI pricing for current limits.
Paid plans
Universal-2
- hour
- $0.15/hr
- keyterms prompting
- Included
- speaker diarization
- $0.02/hr
- medical mode
- $0.15/hr
- Highly accurate speech-to-text
- Supports 99 languages
- Lower price than Universal-3 Pro
- Keyterms prompting included
- Speaker diarization available
- Medical mode available
Universal-3 Pro
- hour
- $0.21/hr
- keyterms prompting
- $0.05/hr
- prompting
- $0.05/hr
- speaker diarization
- $0.02/hr
- medical mode
- $0.15/hr
- Most accurate speech-to-text model
- Multilingual accuracy
- Entity detection
- Language support for English, Spanish, German, French, Italian, and Portuguese
- Keyterms prompting available
- Prompting available
- Speaker diarization available
- Medical mode available
Universal-Streaming
- hour
- $0.15/hr
- speaker diarization
- $0.12/hr
- keyterms prompting
- $0.04/hr
- Fastest model for real-time English transcription
- Optimized for speed and cost-effectiveness
- Speaker identification available
- Entity detection available
- Multilingual variant available
Universal-Streaming Multilingual
- hour
- $0.15/hr
- speaker diarization
- $0.12/hr
- Multilingual real-time transcription
- Supports English, Spanish, German, French, Portuguese, and Italian
- Speaker identification available
- Entity detection available
Whisper-Streaming
- hour
- $0.30/hr
- Open-source Whisper model
- Enhanced infrastructure
- Unlimited scale
- Supports 99+ languages
- Speaker identification available
Voice Agent API
- hour
- $4.50/hr
- minute
- $0.075/min
- End-to-end voice agent stack
- Built on streaming speech-to-text
- Speaker identification available
- Production voice agent use cases
Pricing extracted from AssemblyAI's pricing page. Always verify current pricing before committing.