Skip to main content
POST
/
v1
/
generate_speech
cURL
curl --request POST \
  --url https://api.contextlm.ai/v1/generate_speech \
  --header 'Content-Type: application/json' \
  --header 'X-API-Key: <api-key>' \
  --data @- <<EOF
{
  "text": "Hello, how are you? [pause] I'm fine, thank you.",
  "voice_id": "en-US-Phoenix-F-HD",
  "model_id": "vivid",
  "custom_prompt": "<string>",
  "pitch": 0,
  "speaking_rate": 1,
  "output_format": "LINEAR16"
}
EOF
{
  "audiobytes": "<string>"
}

Documentation Index

Fetch the complete documentation index at: https://docs.contextlm.ai/llms.txt

Use this file to discover all available pages before exploring further.

Authorizations

X-API-Key
string
header
required

Body

application/json

Speech synthesis parameters

text
string
required

The text input to be converted to speech. Tags including [pause short], [pause] and [long pause] can be added to text to control the pause in speech. Only supported in vivid model.

Example:

"Hello, how are you? [pause] I'm fine, thank you."

voice_id
string
required

The voice id to use in speech synthesis

Example:

"en-US-Phoenix-F-HD"

model_id
enum<string>
default:vivid

The model id to use in speech synthesis.

Available options:
vivid,
flow
custom_prompt
string

The custom prompt to use in speech synthesis. Only supported in vivid model.

pitch
number
default:0

Speaking pitch, in the range [-20.0, 20.0]. 20 means increase 20 semitones from the original pitch. -20 means decrease 20 semitones from the original pitch. Pitch is not fully supported in vivid model, mixed results may occur.

Required range: -20 <= x <= 20
speaking_rate
number
default:1

Speaking rate/speed, in the range [0.25, 4.0]. 1.0 is the normal native speed supported by the specific voice. 2.0 is twice as fast, and 0.5 is half as fast. Any other values < 0.25 or > 4.0 will return an error. Speaking rate is not fully supported in vivid model, mixed results may occur.

Required range: 0.25 <= x <= 4
output_format
enum<string>
default:LINEAR16

The format of the audio byte stream. LINEAR16 a.k.a WAV is the best for audio quality.

Available options:
LINEAR16,
MP3,
OGG_OPUS,
MULAW

Response

Successful response

audiobytes
file