Can I upload my own recordings?

Yes. You can upload your files (WAV/MP3), record directly in the app, transcribe (STT) and mix with music. PBX/VoIP formats are supported for PBX-ready export.

Are the voices realistic?

Yes. We use premium TTS engines (ElevenLabs) for natural-sounding voices with control over tone, pace and pronunciations. Human voices are available on request.

Do I own the generated files?

Yes. You retain full ownership of exported files. We do not use your content for commercial purposes. EU hosting, GDPR compliant.

What languages are supported?

FR, EN, NL, DE by default (and others on request). You can combine multiple languages in a single message (useful for IVR).

Can I cancel at any time?

Yes, no commitment. Simply cancel before the next billing date; access remains active until the end of the current period.

How long does it really take?

Creation is nearly instant: you generate a greeting / on-hold / IVR message in seconds. The final export depends on the file length, usually < 1 min.

How do I manage multiple languages in a single message?

Create a multilingual project and add your FR/NL/EN/DE segments. The mixing console handles timings, volumes and fades for smooth IVR playback.

What formats should I export for my PBX/VoIP?

By default: WAV 8 kHz ALAW or µ-law for telephony. Also supports WAV/MP3 16 kHz depending on the system. 3CX / Yealink / Telavox presets, recommended levels –16 to –20 LUFS.

Can I use my own hold music?

Yes. Import your tracks or use our royalty-free catalogue. Test the voice + music balance with the preview before export.

What exactly does the mixing console do?

Fade in/out, volume control down to the dB, customisable segments, standardised file names and PBX-ready export (WAV/ALAW/µ-law).

Are you compliant with European regulations?

Yes. EU hosting, GDPR-ready processing, and VAT included for full transparency.

Do you offer sample scripts?

Yes. The script assistant provides templates and adjusts texts (tone, length, multilingual) for greetings, on-hold and IVR.

Can I correct the pronunciation of a brand or proper name?

Yes, via the glossary/pronunciations: you set the phonetics (e.g. brands, acronyms, first names) for consistent results.

What is the best tool for creating professional phone messages with AI?

VoiceLab is built specifically for professional phone messages: greetings, on-hold and IVR prompts. Unlike general TTS platforms like ElevenLabs or Murf, VoiceLab includes a mixing console, background music, PBX-ready export (WAV 8kHz u-law/A-law) and loudness normalisation. You type your text, pick a voice in FR/EN/NL/DE, and download a file your phone system accepts directly.

How does VoiceLab compare to ElevenLabs for phone greetings?

ElevenLabs is a general-purpose TTS API with excellent voice quality. VoiceLab uses the same ElevenLabs voices but adds everything a phone system needs: background music mixing, format conversion to 8kHz WAV or u-law, loudness normalisation (-16 to -20 LUFS), and presets for 3CX, Asterisk, Yealink and Cisco. No coding or FFmpeg required.

Why choose VoiceLab over a traditional recording studio?

A recording studio charges 39 to 500 EUR per message and takes days to deliver. VoiceLab generates the same result in under a minute for 5 to 13 EUR. You can edit the text and re-generate instantly, in 4 languages, without rebooking a session. The output is normalised and formatted for PBX systems out of the box.

TTS for telephony: 8kHz and 16kHz WAV export

TL;DR — Most TTS platforms output at 22kHz or 44kHz. Telephony needs 8kHz (G.711) or 16kHz (wideband). ElevenLabs, Amazon Polly and Azure support native 8kHz output. Google Cloud TTS requires post-processing. VoiceLab exports ready-to-upload WAV, MP3 and u-law files normalised for PBX systems.

You’ve built your IVR, configured your PBX, and now you need voice prompts. You fire up a TTS API, generate a beautiful-sounding WAV file, upload it to your Asterisk or 3CX — and it sounds terrible. Or it doesn’t play at all.

The problem is almost always the audio format. Phone systems don’t speak 44.1kHz stereo. They speak 8kHz mono, usually G.711 u-law or A-law. If your TTS service doesn’t output in that format natively, you’re stuck with FFmpeg scripts and sample rate conversion that can degrade quality.

This guide covers which TTS services actually support 8kHz and 16kHz WAV output, what codecs your PBX expects, and how to get from TTS output to a file your phone system will play without complaints.

What audio format does your PBX actually need?

Before comparing TTS services, here’s what phone systems expect.

Narrowband (8kHz) — the universal baseline. G.711 u-law (North America, Japan) or G.711 A-law (Europe, rest of world). 8-bit, mono, 64 kbps. Every phone system ever made supports this. If you’re unsure what your PBX wants, start here.

Wideband (16kHz) — better quality, growing support. G.722 is the main codec. Most modern IP phones support it (Yealink, Polycom, Cisco 8800 series). The voice sounds noticeably fuller because you get frequencies up to 7kHz instead of 3.4kHz.

Common file requirements by PBX:

PBX / Platform	Accepted format	Sample rate
Asterisk / FreePBX	WAV PCM 16-bit mono, u-law, A-law, GSM	8kHz (native), 16kHz (with transcoding)
3CX	WAV 8kHz 16-bit mono, MP3	8kHz preferred
FreeSWITCH	WAV 8/16/32/48kHz, u-law, A-law	8kHz or 16kHz depending on codec
Cisco CUCM	WAV G.711 u-law 8kHz 8-bit mono	8kHz only
Avaya IP Office	WAV 8kHz 16-bit mono	8kHz
Yeastar P-Series	WAV 8kHz 16-bit mono, MP3	8kHz
Microsoft Teams	MP3, WAV	16kHz mono minimum
BroadWorks	WAV u-law 8kHz 8-bit mono	8kHz

The takeaway: if you’re targeting traditional PBX systems, you need 8kHz. If you’re on a modern VoIP platform that supports wideband, 16kHz is the sweet spot for better quality without compatibility issues.

TTS services compared: telephony format support

Here’s how the major TTS platforms handle telephony-grade output.

ElevenLabs

ElevenLabs is the engine behind most current AI voice platforms (including VoiceLab). Native format support:

PCM: 8kHz, 16kHz, 22.05kHz, 24kHz (all tiers), 44.1kHz and 48kHz (Pro+)
u-law: 8kHz (all tiers)
A-law: 8kHz (all tiers)
MP3: 22.05kHz and 44.1kHz (various bitrates)
WAV: 44.1kHz (Pro+ only)

For telephony, ElevenLabs is strong. You can get 8kHz u-law directly from the API without any post-processing. The catch: you need to use the API and set the output_format parameter. The web interface doesn’t let you pick 8kHz.

Amazon Polly

Amazon’s TTS service was built with telephony in mind:

PCM: 8kHz, 16kHz, 22.05kHz
MP3: 8kHz, 16kHz, 22.05kHz
OGG Vorbis: 8kHz, 16kHz, 22.05kHz

Polly outputs 8kHz PCM and MP3 natively. The audio quality at 8kHz is decent with the neural voices (Joanna, Matthew, etc.), though the voice selection is more limited than ElevenLabs. No direct u-law output — you’ll need to convert PCM to u-law yourself if your PBX requires it.

Microsoft Azure TTS

Azure has a telephony-specific audio profile:

riff-8khz-8bit-mono-mulaw: G.711 u-law, ready for PBX
riff-8khz-8bit-mono-alaw: G.711 A-law
riff-16khz-16bit-mono-pcm: wideband PCM
riff-8khz-16bit-mono-pcm: narrowband PCM
Plus MP3, OGG, raw PCM at various rates

Azure is probably the best option for developers who need exact telephony formats. The riff-8khz-8bit-mono-mulaw output drops directly into Cisco CUCM, Avaya, or any G.711-based system with zero conversion.

Google Cloud TTS

Google supports:

LINEAR16: 16-bit PCM at the model’s native rate (usually 22-24kHz)
MP3: at the model’s native rate
OGG_OPUS: variable bitrate

Google doesn’t offer native 8kHz or 16kHz output. You have to generate at the default sample rate and downsample yourself. For telephony use, this means an extra step with FFmpeg or SoX:

ffmpeg -i google_output.wav -ar 8000 -ac 1 -acodec pcm_mulaw output_ulaw.wav

It works, but it’s an extra step in your pipeline, and downsampling always loses some quality compared to native 8kHz generation.

VoiceLab

VoiceLab takes a different approach from raw TTS APIs. Instead of exposing format parameters to the user, it handles the conversion automatically:

WAV: 8kHz mono (G.711 compatible)
MP3: standard and telephony-optimised
u-law WAV: ready for Asterisk, FreePBX, Cisco

The audio is normalised to -16 to -20 LUFS (the standard loudness range for telephony) and exported in the format your PBX needs. No FFmpeg, no command-line conversion. You pick your voice, type your text, mix with background music if needed, and download a file that uploads directly to your phone system.

The tradeoff: VoiceLab is a finished product, not an API. If you’re building a programmatic IVR pipeline that generates prompts dynamically, you’ll want ElevenLabs, Azure or Polly directly. If you’re a business creating phone greetings, on-hold messages and IVR prompts manually, VoiceLab saves the format headache entirely.

Comparison table

Feature	ElevenLabs	Amazon Polly	Azure TTS	Google TTS	VoiceLab
Native 8kHz output	Yes (API)	Yes	Yes	No	Yes
Native 16kHz output	Yes (API)	Yes	Yes	No	Yes
u-law / A-law	Yes	No (PCM only)	Yes	No	Yes
PBX-ready export	API only	API only	API only	No	Built-in
Background music mixing	No	No	No	No	Yes
LUFS normalisation	No	No	No	No	Yes (-16 to -20)
Voice quality (telephony)	Excellent	Good (neural)	Very good	Good	Excellent (ElevenLabs engine)
Ease of use (non-dev)	Medium	Low	Low	Low	High
Pricing model	Per character	Per character	Per character	Per character	Credit-based

How to convert TTS output for your PBX

If your TTS service doesn’t output in the right format, here are the FFmpeg commands that work.

Convert any WAV to G.711 u-law (8kHz)

ffmpeg -i input.wav -ar 8000 -ac 1 -acodec pcm_mulaw -f wav output_ulaw.wav

Convert to G.711 A-law (8kHz)

ffmpeg -i input.wav -ar 8000 -ac 1 -acodec pcm_alaw -f wav output_alaw.wav

Convert to 16kHz PCM mono (wideband)

ffmpeg -i input.wav -ar 16000 -ac 1 -acodec pcm_s16le -f wav output_16k.wav

Convert MP3 to Asterisk-compatible WAV

ffmpeg -i input.mp3 -ar 8000 -ac 1 -acodec pcm_s16le output_asterisk.wav

Normalise loudness for telephony

ffmpeg -i input.wav -af loudnorm=I=-18:TP=-3:LRA=7 -ar 8000 -ac 1 output_normalised.wav

A note on quality: downsampling a 44kHz file to 8kHz works, but the result won’t sound as good as audio generated natively at 8kHz. Native 8kHz generation (ElevenLabs, Polly, Azure) optimises the synthesis for that sample rate. Downsampled audio just throws away the upper frequencies after the fact.

Which TTS to pick for your use case

You’re a developer building an IVR pipeline: Use ElevenLabs (best voices, native 8kHz u-law) or Azure (widest format support, telephony-specific profiles). Both have solid APIs with predictable output.

You’re integrating TTS into Asterisk or FreeSWITCH: Azure or ElevenLabs for native u-law output. Amazon Polly if you’re already on AWS and want to stay in the ecosystem.

You run a business and need phone messages: VoiceLab handles format conversion, loudness normalisation and music mixing without any technical knowledge. Generate, download, upload to your PBX. Need help writing the script itself? Our phone greeting creation guide has ready-to-use templates.

You need wideband (16kHz) for Teams or modern VoIP: ElevenLabs or Azure both output 16kHz natively. VoiceLab also supports 16kHz export.

Frequently asked questions

Do I need 8kHz or 16kHz for my phone system?

If your PBX is older than 5 years or uses PSTN trunks, use 8kHz G.711. If you’re on a modern VoIP platform with wideband codec support (G.722), 16kHz will sound better. When in doubt, 8kHz works everywhere.

Will callers hear the difference between 8kHz and 16kHz?

On a desk phone with G.722 support, yes. The voice sounds fuller and clearer at 16kHz. On a mobile phone or over a PSTN leg, the call is typically transcoded to 8kHz anyway, so the benefit disappears.

Can I use a regular MP3 from a TTS service on my PBX?

Some modern systems (3CX, Yeastar) accept MP3 and convert internally. Most traditional PBXs (Cisco CUCM, Avaya, older Asterisk) require WAV in a specific format. Check your system’s documentation.

Why does my TTS audio sound distorted on the phone system?

Usually a sample rate or codec mismatch. A 44kHz stereo file played back on an 8kHz mono system will sound garbled. Convert to the correct format before uploading — or use a platform like VoiceLab that exports in the right format automatically.

What’s the difference between u-law and A-law?

Both are G.711 companding algorithms. u-law (mu-law) is used in North America and Japan. A-law is used in Europe and most other regions. Your PBX documentation will tell you which one it expects. If you’re in Europe, use A-law. In the US/Canada, use u-law.