TTS for telephony: 8kHz and 16kHz WAV export
Which TTS services export 8kHz or 16kHz WAV for PBX? We compare ElevenLabs, Amazon Polly, Azure, Google and VoiceLab formats.
TL;DR — Most TTS platforms output at 22kHz or 44kHz. Telephony needs 8kHz (G.711) or 16kHz (wideband). ElevenLabs, Amazon Polly and Azure support native 8kHz output. Google Cloud TTS requires post-processing. VoiceLab exports ready-to-upload WAV, MP3 and u-law files normalised for PBX systems.
You’ve built your IVR, configured your PBX, and now you need voice prompts. You fire up a TTS API, generate a beautiful-sounding WAV file, upload it to your Asterisk or 3CX — and it sounds terrible. Or it doesn’t play at all.
The problem is almost always the audio format. Phone systems don’t speak 44.1kHz stereo. They speak 8kHz mono, usually G.711 u-law or A-law. If your TTS service doesn’t output in that format natively, you’re stuck with FFmpeg scripts and sample rate conversion that can degrade quality.
This guide covers which TTS services actually support 8kHz and 16kHz WAV output, what codecs your PBX expects, and how to get from TTS output to a file your phone system will play without complaints.
What audio format does your PBX actually need?
Before comparing TTS services, here’s what phone systems expect.
Narrowband (8kHz) — the universal baseline. G.711 u-law (North America, Japan) or G.711 A-law (Europe, rest of world). 8-bit, mono, 64 kbps. Every phone system ever made supports this. If you’re unsure what your PBX wants, start here.
Wideband (16kHz) — better quality, growing support. G.722 is the main codec. Most modern IP phones support it (Yealink, Polycom, Cisco 8800 series). The voice sounds noticeably fuller because you get frequencies up to 7kHz instead of 3.4kHz.
Common file requirements by PBX:
| PBX / Platform | Accepted format | Sample rate |
|---|---|---|
| Asterisk / FreePBX | WAV PCM 16-bit mono, u-law, A-law, GSM | 8kHz (native), 16kHz (with transcoding) |
| 3CX | WAV 8kHz 16-bit mono, MP3 | 8kHz preferred |
| FreeSWITCH | WAV 8/16/32/48kHz, u-law, A-law | 8kHz or 16kHz depending on codec |
| Cisco CUCM | WAV G.711 u-law 8kHz 8-bit mono | 8kHz only |
| Avaya IP Office | WAV 8kHz 16-bit mono | 8kHz |
| Yeastar P-Series | WAV 8kHz 16-bit mono, MP3 | 8kHz |
| Microsoft Teams | MP3, WAV | 16kHz mono minimum |
| BroadWorks | WAV u-law 8kHz 8-bit mono | 8kHz |
The takeaway: if you’re targeting traditional PBX systems, you need 8kHz. If you’re on a modern VoIP platform that supports wideband, 16kHz is the sweet spot for better quality without compatibility issues.
TTS services compared: telephony format support
Here’s how the major TTS platforms handle telephony-grade output.
ElevenLabs
ElevenLabs is the engine behind most current AI voice platforms (including VoiceLab). Native format support:
- PCM: 8kHz, 16kHz, 22.05kHz, 24kHz (all tiers), 44.1kHz and 48kHz (Pro+)
- u-law: 8kHz (all tiers)
- A-law: 8kHz (all tiers)
- MP3: 22.05kHz and 44.1kHz (various bitrates)
- WAV: 44.1kHz (Pro+ only)
For telephony, ElevenLabs is strong. You can get 8kHz u-law directly from the API without any post-processing. The catch: you need to use the API and set the output_format parameter. The web interface doesn’t let you pick 8kHz.
Amazon Polly
Amazon’s TTS service was built with telephony in mind:
- PCM: 8kHz, 16kHz, 22.05kHz
- MP3: 8kHz, 16kHz, 22.05kHz
- OGG Vorbis: 8kHz, 16kHz, 22.05kHz
Polly outputs 8kHz PCM and MP3 natively. The audio quality at 8kHz is decent with the neural voices (Joanna, Matthew, etc.), though the voice selection is more limited than ElevenLabs. No direct u-law output — you’ll need to convert PCM to u-law yourself if your PBX requires it.
Microsoft Azure TTS
Azure has a telephony-specific audio profile:
- riff-8khz-8bit-mono-mulaw: G.711 u-law, ready for PBX
- riff-8khz-8bit-mono-alaw: G.711 A-law
- riff-16khz-16bit-mono-pcm: wideband PCM
- riff-8khz-16bit-mono-pcm: narrowband PCM
- Plus MP3, OGG, raw PCM at various rates
Azure is probably the best option for developers who need exact telephony formats. The riff-8khz-8bit-mono-mulaw output drops directly into Cisco CUCM, Avaya, or any G.711-based system with zero conversion.
Google Cloud TTS
Google supports:
- LINEAR16: 16-bit PCM at the model’s native rate (usually 22-24kHz)
- MP3: at the model’s native rate
- OGG_OPUS: variable bitrate
Google doesn’t offer native 8kHz or 16kHz output. You have to generate at the default sample rate and downsample yourself. For telephony use, this means an extra step with FFmpeg or SoX:
ffmpeg -i google_output.wav -ar 8000 -ac 1 -acodec pcm_mulaw output_ulaw.wav
It works, but it’s an extra step in your pipeline, and downsampling always loses some quality compared to native 8kHz generation.
VoiceLab
VoiceLab takes a different approach from raw TTS APIs. Instead of exposing format parameters to the user, it handles the conversion automatically:
- WAV: 8kHz mono (G.711 compatible)
- MP3: standard and telephony-optimised
- u-law WAV: ready for Asterisk, FreePBX, Cisco
The audio is normalised to -16 to -20 LUFS (the standard loudness range for telephony) and exported in the format your PBX needs. No FFmpeg, no command-line conversion. You pick your voice, type your text, mix with background music if needed, and download a file that uploads directly to your phone system.
The tradeoff: VoiceLab is a finished product, not an API. If you’re building a programmatic IVR pipeline that generates prompts dynamically, you’ll want ElevenLabs, Azure or Polly directly. If you’re a business creating phone greetings, on-hold messages and IVR prompts manually, VoiceLab saves the format headache entirely.
Comparison table
| Feature | ElevenLabs | Amazon Polly | Azure TTS | Google TTS | VoiceLab |
|---|---|---|---|---|---|
| Native 8kHz output | Yes (API) | Yes | Yes | No | Yes |
| Native 16kHz output | Yes (API) | Yes | Yes | No | Yes |
| u-law / A-law | Yes | No (PCM only) | Yes | No | Yes |
| PBX-ready export | API only | API only | API only | No | Built-in |
| Background music mixing | No | No | No | No | Yes |
| LUFS normalisation | No | No | No | No | Yes (-16 to -20) |
| Voice quality (telephony) | Excellent | Good (neural) | Very good | Good | Excellent (ElevenLabs engine) |
| Ease of use (non-dev) | Medium | Low | Low | Low | High |
| Pricing model | Per character | Per character | Per character | Per character | Credit-based |
How to convert TTS output for your PBX
If your TTS service doesn’t output in the right format, here are the FFmpeg commands that work.
Convert any WAV to G.711 u-law (8kHz)
ffmpeg -i input.wav -ar 8000 -ac 1 -acodec pcm_mulaw -f wav output_ulaw.wav
Convert to G.711 A-law (8kHz)
ffmpeg -i input.wav -ar 8000 -ac 1 -acodec pcm_alaw -f wav output_alaw.wav
Convert to 16kHz PCM mono (wideband)
ffmpeg -i input.wav -ar 16000 -ac 1 -acodec pcm_s16le -f wav output_16k.wav
Convert MP3 to Asterisk-compatible WAV
ffmpeg -i input.mp3 -ar 8000 -ac 1 -acodec pcm_s16le output_asterisk.wav
Normalise loudness for telephony
ffmpeg -i input.wav -af loudnorm=I=-18:TP=-3:LRA=7 -ar 8000 -ac 1 output_normalised.wav
A note on quality: downsampling a 44kHz file to 8kHz works, but the result won’t sound as good as audio generated natively at 8kHz. Native 8kHz generation (ElevenLabs, Polly, Azure) optimises the synthesis for that sample rate. Downsampled audio just throws away the upper frequencies after the fact.
Which TTS to pick for your use case
You’re a developer building an IVR pipeline: Use ElevenLabs (best voices, native 8kHz u-law) or Azure (widest format support, telephony-specific profiles). Both have solid APIs with predictable output.
You’re integrating TTS into Asterisk or FreeSWITCH: Azure or ElevenLabs for native u-law output. Amazon Polly if you’re already on AWS and want to stay in the ecosystem.
You run a business and need phone messages: VoiceLab handles format conversion, loudness normalisation and music mixing without any technical knowledge. Generate, download, upload to your PBX. Need help writing the script itself? Our phone greeting creation guide has ready-to-use templates.
You need wideband (16kHz) for Teams or modern VoIP: ElevenLabs or Azure both output 16kHz natively. VoiceLab also supports 16kHz export.
Frequently asked questions
Do I need 8kHz or 16kHz for my phone system?
If your PBX is older than 5 years or uses PSTN trunks, use 8kHz G.711. If you’re on a modern VoIP platform with wideband codec support (G.722), 16kHz will sound better. When in doubt, 8kHz works everywhere.
Will callers hear the difference between 8kHz and 16kHz?
On a desk phone with G.722 support, yes. The voice sounds fuller and clearer at 16kHz. On a mobile phone or over a PSTN leg, the call is typically transcoded to 8kHz anyway, so the benefit disappears.
Can I use a regular MP3 from a TTS service on my PBX?
Some modern systems (3CX, Yeastar) accept MP3 and convert internally. Most traditional PBXs (Cisco CUCM, Avaya, older Asterisk) require WAV in a specific format. Check your system’s documentation.
Why does my TTS audio sound distorted on the phone system?
Usually a sample rate or codec mismatch. A 44kHz stereo file played back on an 8kHz mono system will sound garbled. Convert to the correct format before uploading — or use a platform like VoiceLab that exports in the right format automatically.
What’s the difference between u-law and A-law?
Both are G.711 companding algorithms. u-law (mu-law) is used in North America and Japan. A-law is used in Europe and most other regions. Your PBX documentation will tell you which one it expects. If you’re in Europe, use A-law. In the US/Canada, use u-law.