Audio Generation Models - Parameters Reference
This page is auto-generated from model configurations. Last updated: 2026-03-13.
This reference lists all available audio generation models and their parameters. Use these parameter names when calling the Generation API.
Models
Academia
Beatoven
ElevenLabs
Google
Meta
MiniMax
Academia
Lux TTS
High-quality voice cloning TTS at 48kHz from text and a reference audio clip.
Model ID: model_lux-tts
Capabilities: txt2audio
LLM Markdown: https://app.scenario.com/api/models/model_lux-tts/markdown
| Parameter | Type | Required | Default | Min | Max | Allowed Values | Description |
|---|---|---|---|---|---|---|---|
prompt | string | Yes | - | - | - | - | Text to convert to speech. |
audio | file | Yes | - | - | - | - | Reference audio for voice cloning. |
guidanceScale | number | No | 3 | 0 | 10 | - | Higher values increase adherence to the reference voice. |
numInferenceSteps | number | No | 4 | 1 | 16 | - | Number of flow-matching inference steps. |
maxRefLength | number | No | 5 | 1 | 15 | - | Maximum reference audio duration used for voice encoding (seconds). |
seed | number | No | - | 0 | 2147483647 | - | Seed for reproducible outputs. |
MM Audio 2 Text To Audio
MMAudio generates synchronized audio given text inputs. It can generate sounds described by a prompt.
Model ID: model_mm-audio-2-t2a
Capabilities: txt2audio
LLM Markdown: https://app.scenario.com/api/models/model_mm-audio-2-t2a/markdown
| Parameter | Type | Required | Default | Min | Max | Allowed Values | Description |
|---|---|---|---|---|---|---|---|
prompt | string | Yes | - | - | - | - | Text prompt for generated audio |
negativePrompt | string | No | - | - | - | - | Negative prompt to avoid certain sounds |
duration | number | No | 8 | 1 | 30 | - | Output duration in seconds. |
numSteps | number | No | 25 | 4 | 50 | - | The number of steps to generate the audio for |
cfgStrength | number | No | 4.5 | 1 | 20 | - | Higher values will keep output closer to the prompt |
maskAwayClip | boolean | No | false | - | - | - | Mask away certain sounds in the audio |
seed | number | No | - | 0 | 65535 | - | Random seed for reproducible generation |
Tada 1B Text to Speech
Lighter Tada voice cloning text-to-speech variant with multilingual support.
Model ID: model_tada-1b-text-to-speech
Capabilities: txt2audio
LLM Markdown: https://app.scenario.com/api/models/model_tada-1b-text-to-speech/markdown
| Parameter | Type | Required | Default | Min | Max | Allowed Values | Description |
|---|---|---|---|---|---|---|---|
audio | file | Yes | - | - | - | - | Reference audio for voice cloning. |
prompt | string | Yes | - | - | - | - | Text to synthesize with the reference voice. |
transcript | string | No | - | - | - | - | Transcript of the reference audio. Required for non-English references. |
language | string | No | en | - | - | en, ar, ch, de, es, fr, it, ja, pl, pt | Language used for text alignment. |
numInferenceSteps | number | No | 20 | 1 | 50 | - | Number of ODE solver steps for acoustic generation. |
speedUpFactor | number | No | 1 | 0.5 | 2 | - | Values > 1 speed up and values < 1 slow down speech. |
temperature | number | No | 0.6 | 0 | 2 | - | Sampling temperature for text token generation. |
topP | number | No | 0.9 | 0 | 1 | - | Top-p nucleus sampling value. |
repetitionPenalty | number | No | 1.1 | 1 | 2 | - | Penalty applied to repeated tokens. |
acousticCfgScale | number | No | 1.6 | 0 | 10 | - | Classifier-free guidance scale for acoustic generation. |
noiseTemperature | number | No | 0.9 | 0 | 2 | - | Temperature for diffusion noise during flow matching. |
numExtraSteps | number | No | 0 | 0 | 50 | - | Additional autoregressive steps for continuation. |
Tada 3B Text to Speech
Voice cloning text-to-speech with multilingual alignment and expressive controls.
Model ID: model_tada-3b-text-to-speech
Capabilities: txt2audio
LLM Markdown: https://app.scenario.com/api/models/model_tada-3b-text-to-speech/markdown
| Parameter | Type | Required | Default | Min | Max | Allowed Values | Description |
|---|---|---|---|---|---|---|---|
audio | file | Yes | - | - | - | - | Reference audio for voice cloning. |
prompt | string | Yes | - | - | - | - | Text to synthesize with the reference voice. |
transcript | string | No | - | - | - | - | Transcript of the reference audio. Required for non-English references. |
language | string | No | en | - | - | en, ar, ch, de, es, fr, it, ja, pl, pt | Language used for text alignment. |
numInferenceSteps | number | No | 20 | 1 | 50 | - | Number of ODE solver steps for acoustic generation. |
speedUpFactor | number | No | 1 | 0.5 | 2 | - | Values > 1 speed up and values < 1 slow down speech. |
temperature | number | No | 0.6 | 0 | 2 | - | Sampling temperature for text token generation. |
topP | number | No | 0.9 | 0 | 1 | - | Top-p nucleus sampling value. |
repetitionPenalty | number | No | 1.1 | 1 | 2 | - | Penalty applied to repeated tokens. |
acousticCfgScale | number | No | 1.6 | 0 | 10 | - | Classifier-free guidance scale for acoustic generation. |
noiseTemperature | number | No | 0.9 | 0 | 2 | - | Temperature for diffusion noise during flow matching. |
numExtraSteps | number | No | 0 | 0 | 50 | - | Additional autoregressive steps for continuation. |
Beatoven
Beatoven Music Generation
Generate royalty-free instrumental music from electronic, hip hop, and indie rock to cinematic and classical genres. Perfect for games, films, social content, podcasts, and more.
Model ID: model_beatoven-music-generation
Capabilities: txt2audio
LLM Markdown: https://app.scenario.com/api/models/model_beatoven-music-generation/markdown
| Parameter | Type | Required | Default | Min | Max | Allowed Values | Description |
|---|---|---|---|---|---|---|---|
prompt | string | Yes | - | - | - | - | Describe the music you want to generate |
negativePrompt | string | No | - | - | - | - | Describe what you want to avoid in the music (instruments, styles, moods). |
duration | number | No | 90 | 5 | 150 | - | Length of the generated music in seconds |
refinement | number | No | 100 | 10 | 200 | - | Refinement level - Higher values may improve quality but take longer |
creativity | number | No | 16 | 1 | 20 | - | Creativity level - higher values allow more creative interpretation of the prompt |
seed | number | No | - | 0 | 2147483647 | - | Use a seed for reproducible results. Leave blank to use a random seed. |
Beatoven Sound Effect
Create professional-grade sound effects from animal and vehicle to nature, sci-fi, and otherworldly sounds. Perfect for films, games, and digital content.
Model ID: model_beatoven-sound-effect
Capabilities: txt2audio
LLM Markdown: https://app.scenario.com/api/models/model_beatoven-sound-effect/markdown
| Parameter | Type | Required | Default | Min | Max | Allowed Values | Description |
|---|---|---|---|---|---|---|---|
prompt | string | Yes | - | - | - | - | Describe the sound effect you want to generate |
negativePrompt | string | No | - | - | - | - | Describe the types of sounds you don't want to generate in the output |
duration | number | No | 7 | 1 | 35 | - | Length of the generated sound effect in seconds |
refinement | number | No | 40 | 10 | 200 | - | Refinement level - Higher values may improve quality but take longer |
creativity | number | No | 16 | 1 | 20 | - | Creativity level - higher values allow more creative interpretation of the prompt |
seed | number | No | - | 0 | 2147483647 | - | Use a seed for reproducible results. Leave blank to use a random seed. |
ElevenLabs
ElevenLabs Multilingual v2
Life-like, emotionally rich text-to-speech model supporting 29 languages.
Model ID: model_elevenlabs-multilingual-v2
Capabilities: txt2audio
LLM Markdown: https://app.scenario.com/api/models/model_elevenlabs-multilingual-v2/markdown
| Parameter | Type | Required | Default | Min | Max | Allowed Values | Description |
|---|---|---|---|---|---|---|---|
text | string | Yes | - | - | - | - | The text to convert to speech |
voice | string | No | Aria | - | - | Aria, Roger, Sarah, Laura, Charlie, George, Callum, River, Liam, Charlotte, Alice, Matilda, Will, Jessica, Eric, Chris, Brian, Daniel, Lily, Bill | The voice to use for speech generation |
stability | number | No | 0.5 | 0 | 1 | - | Voice stability, for now the fal api respond with an error if the input is different than 0.5 |
similarityBoost | number | No | 0.5 | 0 | 1 | - | Similarity boost |
styleExaggeration | number | No | 0 | 0 | 1 | - | Style exaggeration |
speed | number | No | 1 | 0.7 | 1.2 | - | Speech speed (0.7-1.2). Values below 1.0 slow down the speech, above 1.0 speed it up. Extreme values may affect quality. |
previousText | string | No | - | - | - | - | The text that came before the text of the current request. Can be used to improve the speech's continuity when concatenating together multiple generations or to influence the speech's continuity in the current generation. |
nextText | string | No | - | - | - | - | The text that comes after the text of the current request. Can be used to improve the speech's continuity when concatenating together multiple generations or to influence the speech's continuity in the current generation. |
languageCode | string | No | - | - | - | ``, en, ca, es, fr, de, it, ja, ko, zh, ru, ar, hi, bn, pa, ta, te, mr, ur, fa, tr, nl, sv, da, no, fi, el, ro, hu, cs, sk, sl, pt, id, th, vi, ms, tl, yo, ig, ha, am, az, be, bg, hr | Language code (ISO 639-1) used to enforce a language for the model. Currently only Turbo v2.5 and Flash v2.5 support language enforcement. |
ElevenLabs Sound Effects v2
Professional sound effects generation for audio production and content creation.
Model ID: model_elevenlabs-sound-effects-v2
Capabilities: txt2audio
LLM Markdown: https://app.scenario.com/api/models/model_elevenlabs-sound-effects-v2/markdown
| Parameter | Type | Required | Default | Min | Max | Allowed Values | Description |
|---|---|---|---|---|---|---|---|
text | string | Yes | - | - | - | - | A textual description of the sound effect to generate. |
durationSeconds | number | No | 5 | 0.5 | 22 | - | Duration in seconds (0.5-22). If None, optimal duration will be determined from prompt. |
promptInfluence | number | No | 0.3 | 0 | 1 | - | How closely to follow the sound description. Higher values mean less variation. |
loop | boolean | No | false | - | - | - | Whether to loop the sound effect. |
outputFormat | string | No | mp3_44100_128 | - | - | mp3_22050_32, mp3_44100_32, mp3_44100_64, mp3_44100_96, mp3_44100_128, mp3_44100_192 | Output format of the generated audio. Formatted as codec_sample_rate_bitrate. |
ElevenLabs Turbo v2.5
High-quality, low-latency text-to-speech model in multiple languages.
Model ID: model_elevenlabs-turbo-v2-5
Capabilities: txt2audio
LLM Markdown: https://app.scenario.com/api/models/model_elevenlabs-turbo-v2-5/markdown
| Parameter | Type | Required | Default | Min | Max | Allowed Values | Description |
|---|---|---|---|---|---|---|---|
text | string | Yes | - | - | - | - | The text to convert to speech |
voice | string | No | Aria | - | - | Aria, Roger, Sarah, Laura, Charlie, George, Callum, River, Liam, Charlotte, Alice, Matilda, Will, Jessica, Eric, Chris, Brian, Daniel, Lily, Bill | The voice to use for speech generation |
stability | number | No | 0.5 | 0 | 1 | - | Voice stability |
similarityBoost | number | No | 0.5 | 0 | 1 | - | Similarity boost |
styleExaggeration | number | No | 0 | 0 | 1 | - | Style exaggeration |
speed | number | No | 1 | 0.7 | 1.2 | - | Speech speed (0.7-1.2). Values below 1.0 slow down the speech, above 1.0 speed it up. Extreme values may affect quality. |
previousText | string | No | - | - | - | - | The text that came before the text of the current request. Can be used to improve the speech's continuity when concatenating together multiple generations or to influence the speech's continuity in the current generation. |
nextText | string | No | - | - | - | - | The text that comes after the text of the current request. Can be used to improve the speech's continuity when concatenating together multiple generations or to influence the speech's continuity in the current generation. |
languageCode | string | No | - | - | - | ``, en, ca, es, fr, de, it, ja, ko, zh, ru, ar, hi, bn, pa, ta, te, mr, ur, fa, tr, nl, sv, da, no, fi, el, ro, hu, cs, sk, sl, pt, id, th, vi, ms, tl, yo, ig, ha, am, az, be, bg, hr | Language code (ISO 639-1) used to enforce a language for the model. Currently only Turbo v2.5 and Flash v2.5 support language enforcement. |
ElevenLabs v3
Next-generation text-to-speech model with advanced voice synthesis and enhanced naturalness.
Model ID: model_elevenlabs-tts-v3
Capabilities: txt2audio
LLM Markdown: https://app.scenario.com/api/models/model_elevenlabs-tts-v3/markdown
| Parameter | Type | Required | Default | Min | Max | Allowed Values | Description |
|---|---|---|---|---|---|---|---|
text | string | Yes | - | - | - | - | The text to convert to speech |
voice | string | No | Aria | - | - | Aria, Roger, Sarah, Laura, Charlie, George, Callum, River, Liam, Charlotte, Alice, Matilda, Will, Jessica, Eric, Chris, Brian, Daniel, Lily, Bill | The voice to use for speech generation |
similarityBoost | number | No | 0.75 | 0 | 1 | - | Similarity boost |
styleExaggeration | number | No | 0 | 0 | 1 | - | Style exaggeration |
speed | number | No | 1 | 0.7 | 1.2 | - | Speech speed (0.7-1.2). Values below 1.0 slow down the speech, above 1.0 speed it up. Extreme values may affect quality. |
languageCode | string | No | - | - | - | ``, en, ca, es, fr, de, it, ja, ko, zh, ru, ar, hi, bn, pa, ta, te, mr, ur, fa, tr, nl, sv, da, no, fi, el, ro, hu, cs, sk, sl, pt, id, th, vi, ms, tl, yo, ig, ha, am, az, be, bg, hr | Language code (ISO 639-1) used to enforce a language for the model. Currently only Turbo v2.5 and Flash v2.5 support language enforcement. |
Google
Gemini 2.5 Flash TTS
Convert text to natural-sounding speech using Google's Gemini 2.5 Flash model with multiple voice presets.
Model ID: model_google-gemini-2-5-flash-tts
Capabilities: txt2audio
LLM Markdown: https://app.scenario.com/api/models/model_google-gemini-2-5-flash-tts/markdown
| Parameter | Type | Required | Default | Min | Max | Allowed Values | Description |
|---|---|---|---|---|---|---|---|
text | string | Yes | - | - | - | - | Text to convert to speech |
voice | string | No | Puck | - | - | Achernar, Achird, Algenib, Algieba, Alnilam, Aoede, Autonoe, Callirrhoe, Charon, Despina, Enceladus, Erinome, Fenrir, Gacrux, Iapetus, Kore, Laomedeia, Leda, Orus, Pulcherrima, Puck, Rasalgethi, Sadachbia, Sadaltager, Schedar, Sulafat, Umbriel, Vindemiatrix, Zephyr, Zubenelgenubi | Voice preset to use for speech synthesis |
language | string | No | en-US | - | - | ar-EG, bn-BD, de-DE, en-IN, en-US, es-US, fr-FR, hi-IN, id-ID, it-IT, ja-JP, ko-KR, mr-IN, nl-NL, pl-PL, pt-BR, ro-RO, ru-RU, ta-IN, te-IN, th-TH, tr-TR, uk-UA, vi-VN | Language for speech synthesis (auto-detected if not specified) |
Gemini 2.5 Pro TTS
Convert text to natural-sounding speech using Google's Gemini 2.5 Pro model with multiple voice presets.
Model ID: model_google-gemini-2-5-pro-tts
Capabilities: txt2audio
LLM Markdown: https://app.scenario.com/api/models/model_google-gemini-2-5-pro-tts/markdown
| Parameter | Type | Required | Default | Min | Max | Allowed Values | Description |
|---|---|---|---|---|---|---|---|
text | string | Yes | - | - | - | - | Text to convert to speech |
voice | string | No | Puck | - | - | Achernar, Achird, Algenib, Algieba, Alnilam, Aoede, Autonoe, Callirrhoe, Charon, Despina, Enceladus, Erinome, Fenrir, Gacrux, Iapetus, Kore, Laomedeia, Leda, Orus, Pulcherrima, Puck, Rasalgethi, Sadachbia, Sadaltager, Schedar, Sulafat, Umbriel, Vindemiatrix, Zephyr, Zubenelgenubi | Voice preset to use for speech synthesis |
language | string | No | en-US | - | - | ar-EG, bn-BD, de-DE, en-IN, en-US, es-US, fr-FR, hi-IN, id-ID, it-IT, ja-JP, ko-KR, mr-IN, nl-NL, pl-PL, pt-BR, ro-RO, ru-RU, ta-IN, te-IN, th-TH, tr-TR, uk-UA, vi-VN | Language for speech synthesis (auto-detected if not specified) |
Google Lyria 2
Model ID: model_lyria-2
Capabilities: txt2audio
LLM Markdown: https://app.scenario.com/api/models/model_lyria-2/markdown
| Parameter | Type | Required | Default | Min | Max | Allowed Values | Description |
|---|---|---|---|---|---|---|---|
prompt | string | Yes | - | - | - | - | Text prompt for audio generation |
negativePrompt | string | No | - | - | - | - | Description of what to exclude from the generated audio |
seed | number | No | - | - | - | - | Use a seed for reproducible results. Leave blank to use a random seed. |
Meta
Meta MusicGen
Model ID: model_meta-musicgen
Capabilities: txt2audio
LLM Markdown: https://app.scenario.com/api/models/model_meta-musicgen/markdown
| Parameter | Type | Required | Default | Min | Max | Allowed Values | Description |
|---|---|---|---|---|---|---|---|
modelVersion | string | No | stereo-melody-large | - | - | stereo-melody-large, stereo-large, melody-large, large | Model to use for generation |
prompt | string | No | - | - | - | - | A description of the music you want to generate. |
inputAudio | file | No | - | - | - | - | An audio file that will influence the generated music. If continuation is True, the generated music will be a continuation of the audio file. Otherwise, the generated music will mimic the audio file's melody. |
duration | number | No | 8 | 1 | 30 | - | Duration of the generated audio in seconds. |
continuation | boolean | No | false | - | - | - | If True, generated music will continue from Input Audio. Otherwise, generated music will mimic Input Audio's melody. |
continuationStart | number | No | 0 | 0 | - | - | Start time of the audio file to use for continuation. |
continuationEnd | number | No | - | 0 | - | - | End time of the audio file to use for continuation. If None, will default to the end of the audio clip. |
multiBandDiffusion | boolean | No | false | - | - | - | If True, the EnCodec tokens will be decoded with MultiBand Diffusion. Only works with non-stereo models. |
normalizationStrategy | string | No | loudness | - | - | loudness, clip, peak, rms | Strategy for normalizing audio. |
temperature | number | No | 1 | - | - | - | Controls the 'conservativeness' of the sampling process. Higher temperature means more diversity. |
classifierFreeGuidance | number | No | 3 | 0 | 10 | - | Increases the influence of inputs on the output. Higher values produce lower-varience outputs that adhere more closely to inputs. |
seed | number | No | - | - | - | - | Seed for random number generator. If None or -1, a random seed will be used. |
MiniMax
Minimax Music 2.0
Model ID: model_minimax-music-2-0
Capabilities: txt2audio
LLM Markdown: https://app.scenario.com/api/models/model_minimax-music-2-0/markdown
| Parameter | Type | Required | Default | Min | Max | Allowed Values | Description |
|---|---|---|---|---|---|---|---|
prompt | string | Yes | - | - | - | - | A description of the music, specifying style, mood, and scenario. |
lyrics | string | Yes | - | - | - | - | Lyrics of the song. Use n to separate lines. You may add structure tags like [Intro], [Verse], [Chorus], [Bridge], [Outro] to enhance the arrangement. |
sampleRate | number | No | 44100 | - | - | 8000, 16000, 22050, 24000, 32000, 44100 | Sample rate for the generated music |
bitrate | number | No | 256000 | - | - | 32000, 64000, 128000, 256000 | Bitrate for the generated music |
Minimax Speech 2.6 HD
MiniMax Speech 2.6 HD delivers studio-quality multilingual text-to-audio with nuanced prosody, subtitle export, and premium voices
Model ID: model_minimax-speech-2-6-hd
Capabilities: txt2audio
LLM Markdown: https://app.scenario.com/api/models/model_minimax-speech-2-6-hd/markdown
| Parameter | Type | Required | Default | Min | Max | Allowed Values | Description |
|---|---|---|---|---|---|---|---|
text | string | Yes | - | - | - | - | Text to convert to speech. Use <#x#> between words to control pause duration (0.01-99.99s). |
voiceId | string | No | Wise_Woman | - | - | Wise_Woman, Friendly_Person, Inspirational_girl, Deep_Voice_Man, Calm_Woman, Casual_Guy, Lively_Girl, Patient_Man, Young_Knight, Determined_Man, Lovely_Girl, Decent_Boy, Imposing_Manner, Elegant_Man, Abbess, Sweet_Girl_2, Exuberant_Girl | Desired voice. |
speed | number | No | 1 | 0.5 | 2 | - | Speech speed |
volume | number | No | 1 | 0 | 10 | - | Speech volume |
pitch | number | No | 0 | -12 | 12 | - | Speech pitch |
emotion | string | No | auto | - | - | auto, happy, sad, angry, fearful, disgusted, surprised, calm, fluent, neutral | Speech emotion |
sampleRate | number | No | 32000 | - | - | 8000, 16000, 22050, 24000, 32000, 44100 | Sample rate for the generated speech |
bitrate | number | No | 128000 | - | - | 32000, 64000, 128000, 256000 | Bitrate for the generated speech |
channel | string | No | stereo | - | - | mono, stereo | Number of audio channels |
languageBoost | string | No | Automatic | - | - | None, Automatic, Chinese, Chinese,Yue, Cantonese, English, Arabic, Russian, Spanish, French, Portuguese, German, Turkish, Dutch, Ukrainian, Vietnamese, Indonesian, Japanese, Italian, Korean, Thai, Polish, Romanian, Greek, Czech, Finnish, Hindi, Bulgarian, Danish, Hebrew, Malay, Persian, Slovak, Swedish, Croatian, Filipino, Hungarian, Norwegian, Slovenian, Catalan, Nynorsk, Tamil, Afrikaans | Enhance recognition of specific languages and dialects |
englishNormalization | boolean | No | false | - | - | - | Enable English text normalization for better number reading (slightly increases latency) |
Minimax Speech 2.6 Turbo
Low-latency MiniMax Speech 2.6 Turbo brings multilingual, emotional text-to-speech.
Model ID: model_minimax-speech-2-6-turbo
Capabilities: txt2audio
LLM Markdown: https://app.scenario.com/api/models/model_minimax-speech-2-6-turbo/markdown
| Parameter | Type | Required | Default | Min | Max | Allowed Values | Description |
|---|---|---|---|---|---|---|---|
text | string | Yes | - | - | - | - | Text to convert to speech. Use <#x#> between words to control pause duration (0.01-99.99s). |
voiceId | string | No | Wise_Woman | - | - | Wise_Woman, Friendly_Person, Inspirational_girl, Deep_Voice_Man, Calm_Woman, Casual_Guy, Lively_Girl, Patient_Man, Young_Knight, Determined_Man, Lovely_Girl, Decent_Boy, Imposing_Manner, Elegant_Man, Abbess, Sweet_Girl_2, Exuberant_Girl | Desired voice. |
speed | number | No | 1 | 0.5 | 2 | - | Speech speed |
volume | number | No | 1 | 0 | 10 | - | Speech volume |
pitch | number | No | 0 | -12 | 12 | - | Speech pitch |
emotion | string | No | auto | - | - | auto, happy, sad, angry, fearful, disgusted, surprised, calm, fluent, neutral | Speech emotion |
sampleRate | number | No | 32000 | - | - | 8000, 16000, 22050, 24000, 32000, 44100 | Sample rate for the generated speech |
bitrate | number | No | 128000 | - | - | 32000, 64000, 128000, 256000 | Bitrate for the generated speech |
channel | string | No | stereo | - | - | mono, stereo | Number of audio channels |
languageBoost | string | No | Automatic | - | - | None, Automatic, Chinese, Chinese,Yue, Cantonese, English, Arabic, Russian, Spanish, French, Portuguese, German, Turkish, Dutch, Ukrainian, Vietnamese, Indonesian, Japanese, Italian, Korean, Thai, Polish, Romanian, Greek, Czech, Finnish, Hindi, Bulgarian, Danish, Hebrew, Malay, Persian, Slovak, Swedish, Croatian, Filipino, Hungarian, Norwegian, Slovenian, Catalan, Nynorsk, Tamil, Afrikaans | Enhance recognition of specific languages and dialects |
englishNormalization | boolean | No | false | - | - | - | Enable English text normalization for better number reading (slightly increases latency) |
Updated 10 days ago