Generates natural-sounding speech
Speech synthesis parameters
The text input to be converted to speech
The voice id to use in speech synthesis
Speaking pitch, in the range [-20.0, 20.0]. 20 means increase 20 semitones from the original pitch. -20 means decrease 20 semitones from the original pitch.
-20 <= x <= 20Speaking rate/speed, in the range [0.25, 4.0]. 1.0 is the normal native speed supported by the specific voice. 2.0 is twice as fast, and 0.5 is half as fast. Any other values < 0.25 or > 4.0 will return an error.
0.25 <= x <= 4The format of the audio byte stream. LINEAR16 a.k.a WAV is the best for audio quality.
LINEAR16, MP3, OGG_OPUS, MULAW Successful response