Generates natural-sounding speech
Documentation Index
Fetch the complete documentation index at: https://docs.contextlm.ai/llms.txt
Use this file to discover all available pages before exploring further.
Speech synthesis parameters
The text input to be converted to speech. Tags including [pause short], [pause] and [long pause] can be added to text to control the pause in speech. Only supported in vivid model.
"Hello, how are you? [pause] I'm fine, thank you."
The voice id to use in speech synthesis
"en-US-Phoenix-F-HD"
The model id to use in speech synthesis.
vivid, flow The custom prompt to use in speech synthesis. Only supported in vivid model.
Speaking pitch, in the range [-20.0, 20.0]. 20 means increase 20 semitones from the original pitch. -20 means decrease 20 semitones from the original pitch. Pitch is not fully supported in vivid model, mixed results may occur.
-20 <= x <= 20Speaking rate/speed, in the range [0.25, 4.0]. 1.0 is the normal native speed supported by the specific voice. 2.0 is twice as fast, and 0.5 is half as fast. Any other values < 0.25 or > 4.0 will return an error. Speaking rate is not fully supported in vivid model, mixed results may occur.
0.25 <= x <= 4The format of the audio byte stream. LINEAR16 a.k.a WAV is the best for audio quality.
LINEAR16, MP3, OGG_OPUS, MULAW Successful response