Audio Generation Models - Parameters Reference

This page is auto-generated from model configurations. Last updated: 2026-03-13.

This reference lists all available audio generation models and their parameters. Use these parameter names when calling the Generation API.

Models

Academia

Beatoven

ElevenLabs

Google

Meta

MiniMax


Academia

Lux TTS

High-quality voice cloning TTS at 48kHz from text and a reference audio clip.

Model ID: model_lux-tts

Capabilities: txt2audio

LLM Markdown: https://app.scenario.com/api/models/model_lux-tts/markdown

ParameterTypeRequiredDefaultMinMaxAllowed ValuesDescription
promptstringYes----Text to convert to speech.
audiofileYes----Reference audio for voice cloning.
guidanceScalenumberNo3010-Higher values increase adherence to the reference voice.
numInferenceStepsnumberNo4116-Number of flow-matching inference steps.
maxRefLengthnumberNo5115-Maximum reference audio duration used for voice encoding (seconds).
seednumberNo-02147483647-Seed for reproducible outputs.

MM Audio 2 Text To Audio

MMAudio generates synchronized audio given text inputs. It can generate sounds described by a prompt.

Model ID: model_mm-audio-2-t2a

Capabilities: txt2audio

LLM Markdown: https://app.scenario.com/api/models/model_mm-audio-2-t2a/markdown

ParameterTypeRequiredDefaultMinMaxAllowed ValuesDescription
promptstringYes----Text prompt for generated audio
negativePromptstringNo----Negative prompt to avoid certain sounds
durationnumberNo8130-Output duration in seconds.
numStepsnumberNo25450-The number of steps to generate the audio for
cfgStrengthnumberNo4.5120-Higher values will keep output closer to the prompt
maskAwayClipbooleanNofalse---Mask away certain sounds in the audio
seednumberNo-065535-Random seed for reproducible generation

Tada 1B Text to Speech

Lighter Tada voice cloning text-to-speech variant with multilingual support.

Model ID: model_tada-1b-text-to-speech

Capabilities: txt2audio

LLM Markdown: https://app.scenario.com/api/models/model_tada-1b-text-to-speech/markdown

ParameterTypeRequiredDefaultMinMaxAllowed ValuesDescription
audiofileYes----Reference audio for voice cloning.
promptstringYes----Text to synthesize with the reference voice.
transcriptstringNo----Transcript of the reference audio. Required for non-English references.
languagestringNoen--en, ar, ch, de, es, fr, it, ja, pl, ptLanguage used for text alignment.
numInferenceStepsnumberNo20150-Number of ODE solver steps for acoustic generation.
speedUpFactornumberNo10.52-Values > 1 speed up and values < 1 slow down speech.
temperaturenumberNo0.602-Sampling temperature for text token generation.
topPnumberNo0.901-Top-p nucleus sampling value.
repetitionPenaltynumberNo1.112-Penalty applied to repeated tokens.
acousticCfgScalenumberNo1.6010-Classifier-free guidance scale for acoustic generation.
noiseTemperaturenumberNo0.902-Temperature for diffusion noise during flow matching.
numExtraStepsnumberNo0050-Additional autoregressive steps for continuation.

Tada 3B Text to Speech

Voice cloning text-to-speech with multilingual alignment and expressive controls.

Model ID: model_tada-3b-text-to-speech

Capabilities: txt2audio

LLM Markdown: https://app.scenario.com/api/models/model_tada-3b-text-to-speech/markdown

ParameterTypeRequiredDefaultMinMaxAllowed ValuesDescription
audiofileYes----Reference audio for voice cloning.
promptstringYes----Text to synthesize with the reference voice.
transcriptstringNo----Transcript of the reference audio. Required for non-English references.
languagestringNoen--en, ar, ch, de, es, fr, it, ja, pl, ptLanguage used for text alignment.
numInferenceStepsnumberNo20150-Number of ODE solver steps for acoustic generation.
speedUpFactornumberNo10.52-Values > 1 speed up and values < 1 slow down speech.
temperaturenumberNo0.602-Sampling temperature for text token generation.
topPnumberNo0.901-Top-p nucleus sampling value.
repetitionPenaltynumberNo1.112-Penalty applied to repeated tokens.
acousticCfgScalenumberNo1.6010-Classifier-free guidance scale for acoustic generation.
noiseTemperaturenumberNo0.902-Temperature for diffusion noise during flow matching.
numExtraStepsnumberNo0050-Additional autoregressive steps for continuation.

Beatoven

Beatoven Music Generation

Generate royalty-free instrumental music from electronic, hip hop, and indie rock to cinematic and classical genres. Perfect for games, films, social content, podcasts, and more.

Model ID: model_beatoven-music-generation

Capabilities: txt2audio

LLM Markdown: https://app.scenario.com/api/models/model_beatoven-music-generation/markdown

ParameterTypeRequiredDefaultMinMaxAllowed ValuesDescription
promptstringYes----Describe the music you want to generate
negativePromptstringNo----Describe what you want to avoid in the music (instruments, styles, moods).
durationnumberNo905150-Length of the generated music in seconds
refinementnumberNo10010200-Refinement level - Higher values may improve quality but take longer
creativitynumberNo16120-Creativity level - higher values allow more creative interpretation of the prompt
seednumberNo-02147483647-Use a seed for reproducible results. Leave blank to use a random seed.

Beatoven Sound Effect

Create professional-grade sound effects from animal and vehicle to nature, sci-fi, and otherworldly sounds. Perfect for films, games, and digital content.

Model ID: model_beatoven-sound-effect

Capabilities: txt2audio

LLM Markdown: https://app.scenario.com/api/models/model_beatoven-sound-effect/markdown

ParameterTypeRequiredDefaultMinMaxAllowed ValuesDescription
promptstringYes----Describe the sound effect you want to generate
negativePromptstringNo----Describe the types of sounds you don't want to generate in the output
durationnumberNo7135-Length of the generated sound effect in seconds
refinementnumberNo4010200-Refinement level - Higher values may improve quality but take longer
creativitynumberNo16120-Creativity level - higher values allow more creative interpretation of the prompt
seednumberNo-02147483647-Use a seed for reproducible results. Leave blank to use a random seed.

ElevenLabs

ElevenLabs Multilingual v2

Life-like, emotionally rich text-to-speech model supporting 29 languages.

Model ID: model_elevenlabs-multilingual-v2

Capabilities: txt2audio

LLM Markdown: https://app.scenario.com/api/models/model_elevenlabs-multilingual-v2/markdown

ParameterTypeRequiredDefaultMinMaxAllowed ValuesDescription
textstringYes----The text to convert to speech
voicestringNoAria--Aria, Roger, Sarah, Laura, Charlie, George, Callum, River, Liam, Charlotte, Alice, Matilda, Will, Jessica, Eric, Chris, Brian, Daniel, Lily, BillThe voice to use for speech generation
stabilitynumberNo0.501-Voice stability, for now the fal api respond with an error if the input is different than 0.5
similarityBoostnumberNo0.501-Similarity boost
styleExaggerationnumberNo001-Style exaggeration
speednumberNo10.71.2-Speech speed (0.7-1.2). Values below 1.0 slow down the speech, above 1.0 speed it up. Extreme values may affect quality.
previousTextstringNo----The text that came before the text of the current request. Can be used to improve the speech's continuity when concatenating together multiple generations or to influence the speech's continuity in the current generation.
nextTextstringNo----The text that comes after the text of the current request. Can be used to improve the speech's continuity when concatenating together multiple generations or to influence the speech's continuity in the current generation.
languageCodestringNo---``, en, ca, es, fr, de, it, ja, ko, zh, ru, ar, hi, bn, pa, ta, te, mr, ur, fa, tr, nl, sv, da, no, fi, el, ro, hu, cs, sk, sl, pt, id, th, vi, ms, tl, yo, ig, ha, am, az, be, bg, hrLanguage code (ISO 639-1) used to enforce a language for the model. Currently only Turbo v2.5 and Flash v2.5 support language enforcement.

ElevenLabs Sound Effects v2

Professional sound effects generation for audio production and content creation.

Model ID: model_elevenlabs-sound-effects-v2

Capabilities: txt2audio

LLM Markdown: https://app.scenario.com/api/models/model_elevenlabs-sound-effects-v2/markdown

ParameterTypeRequiredDefaultMinMaxAllowed ValuesDescription
textstringYes----A textual description of the sound effect to generate.
durationSecondsnumberNo50.522-Duration in seconds (0.5-22). If None, optimal duration will be determined from prompt.
promptInfluencenumberNo0.301-How closely to follow the sound description. Higher values mean less variation.
loopbooleanNofalse---Whether to loop the sound effect.
outputFormatstringNomp3_44100_128--mp3_22050_32, mp3_44100_32, mp3_44100_64, mp3_44100_96, mp3_44100_128, mp3_44100_192Output format of the generated audio. Formatted as codec_sample_rate_bitrate.

ElevenLabs Turbo v2.5

High-quality, low-latency text-to-speech model in multiple languages.

Model ID: model_elevenlabs-turbo-v2-5

Capabilities: txt2audio

LLM Markdown: https://app.scenario.com/api/models/model_elevenlabs-turbo-v2-5/markdown

ParameterTypeRequiredDefaultMinMaxAllowed ValuesDescription
textstringYes----The text to convert to speech
voicestringNoAria--Aria, Roger, Sarah, Laura, Charlie, George, Callum, River, Liam, Charlotte, Alice, Matilda, Will, Jessica, Eric, Chris, Brian, Daniel, Lily, BillThe voice to use for speech generation
stabilitynumberNo0.501-Voice stability
similarityBoostnumberNo0.501-Similarity boost
styleExaggerationnumberNo001-Style exaggeration
speednumberNo10.71.2-Speech speed (0.7-1.2). Values below 1.0 slow down the speech, above 1.0 speed it up. Extreme values may affect quality.
previousTextstringNo----The text that came before the text of the current request. Can be used to improve the speech's continuity when concatenating together multiple generations or to influence the speech's continuity in the current generation.
nextTextstringNo----The text that comes after the text of the current request. Can be used to improve the speech's continuity when concatenating together multiple generations or to influence the speech's continuity in the current generation.
languageCodestringNo---``, en, ca, es, fr, de, it, ja, ko, zh, ru, ar, hi, bn, pa, ta, te, mr, ur, fa, tr, nl, sv, da, no, fi, el, ro, hu, cs, sk, sl, pt, id, th, vi, ms, tl, yo, ig, ha, am, az, be, bg, hrLanguage code (ISO 639-1) used to enforce a language for the model. Currently only Turbo v2.5 and Flash v2.5 support language enforcement.

ElevenLabs v3

Next-generation text-to-speech model with advanced voice synthesis and enhanced naturalness.

Model ID: model_elevenlabs-tts-v3

Capabilities: txt2audio

LLM Markdown: https://app.scenario.com/api/models/model_elevenlabs-tts-v3/markdown

ParameterTypeRequiredDefaultMinMaxAllowed ValuesDescription
textstringYes----The text to convert to speech
voicestringNoAria--Aria, Roger, Sarah, Laura, Charlie, George, Callum, River, Liam, Charlotte, Alice, Matilda, Will, Jessica, Eric, Chris, Brian, Daniel, Lily, BillThe voice to use for speech generation
similarityBoostnumberNo0.7501-Similarity boost
styleExaggerationnumberNo001-Style exaggeration
speednumberNo10.71.2-Speech speed (0.7-1.2). Values below 1.0 slow down the speech, above 1.0 speed it up. Extreme values may affect quality.
languageCodestringNo---``, en, ca, es, fr, de, it, ja, ko, zh, ru, ar, hi, bn, pa, ta, te, mr, ur, fa, tr, nl, sv, da, no, fi, el, ro, hu, cs, sk, sl, pt, id, th, vi, ms, tl, yo, ig, ha, am, az, be, bg, hrLanguage code (ISO 639-1) used to enforce a language for the model. Currently only Turbo v2.5 and Flash v2.5 support language enforcement.

Google

Gemini 2.5 Flash TTS

Convert text to natural-sounding speech using Google's Gemini 2.5 Flash model with multiple voice presets.

Model ID: model_google-gemini-2-5-flash-tts

Capabilities: txt2audio

LLM Markdown: https://app.scenario.com/api/models/model_google-gemini-2-5-flash-tts/markdown

ParameterTypeRequiredDefaultMinMaxAllowed ValuesDescription
textstringYes----Text to convert to speech
voicestringNoPuck--Achernar, Achird, Algenib, Algieba, Alnilam, Aoede, Autonoe, Callirrhoe, Charon, Despina, Enceladus, Erinome, Fenrir, Gacrux, Iapetus, Kore, Laomedeia, Leda, Orus, Pulcherrima, Puck, Rasalgethi, Sadachbia, Sadaltager, Schedar, Sulafat, Umbriel, Vindemiatrix, Zephyr, ZubenelgenubiVoice preset to use for speech synthesis
languagestringNoen-US--ar-EG, bn-BD, de-DE, en-IN, en-US, es-US, fr-FR, hi-IN, id-ID, it-IT, ja-JP, ko-KR, mr-IN, nl-NL, pl-PL, pt-BR, ro-RO, ru-RU, ta-IN, te-IN, th-TH, tr-TR, uk-UA, vi-VNLanguage for speech synthesis (auto-detected if not specified)

Gemini 2.5 Pro TTS

Convert text to natural-sounding speech using Google's Gemini 2.5 Pro model with multiple voice presets.

Model ID: model_google-gemini-2-5-pro-tts

Capabilities: txt2audio

LLM Markdown: https://app.scenario.com/api/models/model_google-gemini-2-5-pro-tts/markdown

ParameterTypeRequiredDefaultMinMaxAllowed ValuesDescription
textstringYes----Text to convert to speech
voicestringNoPuck--Achernar, Achird, Algenib, Algieba, Alnilam, Aoede, Autonoe, Callirrhoe, Charon, Despina, Enceladus, Erinome, Fenrir, Gacrux, Iapetus, Kore, Laomedeia, Leda, Orus, Pulcherrima, Puck, Rasalgethi, Sadachbia, Sadaltager, Schedar, Sulafat, Umbriel, Vindemiatrix, Zephyr, ZubenelgenubiVoice preset to use for speech synthesis
languagestringNoen-US--ar-EG, bn-BD, de-DE, en-IN, en-US, es-US, fr-FR, hi-IN, id-ID, it-IT, ja-JP, ko-KR, mr-IN, nl-NL, pl-PL, pt-BR, ro-RO, ru-RU, ta-IN, te-IN, th-TH, tr-TR, uk-UA, vi-VNLanguage for speech synthesis (auto-detected if not specified)

Google Lyria 2

Model ID: model_lyria-2

Capabilities: txt2audio

LLM Markdown: https://app.scenario.com/api/models/model_lyria-2/markdown

ParameterTypeRequiredDefaultMinMaxAllowed ValuesDescription
promptstringYes----Text prompt for audio generation
negativePromptstringNo----Description of what to exclude from the generated audio
seednumberNo----Use a seed for reproducible results. Leave blank to use a random seed.

Meta

Meta MusicGen

Model ID: model_meta-musicgen

Capabilities: txt2audio

LLM Markdown: https://app.scenario.com/api/models/model_meta-musicgen/markdown

ParameterTypeRequiredDefaultMinMaxAllowed ValuesDescription
modelVersionstringNostereo-melody-large--stereo-melody-large, stereo-large, melody-large, largeModel to use for generation
promptstringNo----A description of the music you want to generate.
inputAudiofileNo----An audio file that will influence the generated music. If continuation is True, the generated music will be a continuation of the audio file. Otherwise, the generated music will mimic the audio file's melody.
durationnumberNo8130-Duration of the generated audio in seconds.
continuationbooleanNofalse---If True, generated music will continue from Input Audio. Otherwise, generated music will mimic Input Audio's melody.
continuationStartnumberNo00--Start time of the audio file to use for continuation.
continuationEndnumberNo-0--End time of the audio file to use for continuation. If None, will default to the end of the audio clip.
multiBandDiffusionbooleanNofalse---If True, the EnCodec tokens will be decoded with MultiBand Diffusion. Only works with non-stereo models.
normalizationStrategystringNoloudness--loudness, clip, peak, rmsStrategy for normalizing audio.
temperaturenumberNo1---Controls the 'conservativeness' of the sampling process. Higher temperature means more diversity.
classifierFreeGuidancenumberNo3010-Increases the influence of inputs on the output. Higher values produce lower-varience outputs that adhere more closely to inputs.
seednumberNo----Seed for random number generator. If None or -1, a random seed will be used.

MiniMax

Minimax Music 2.0

Model ID: model_minimax-music-2-0

Capabilities: txt2audio

LLM Markdown: https://app.scenario.com/api/models/model_minimax-music-2-0/markdown

ParameterTypeRequiredDefaultMinMaxAllowed ValuesDescription
promptstringYes----A description of the music, specifying style, mood, and scenario.
lyricsstringYes----Lyrics of the song. Use n to separate lines. You may add structure tags like [Intro], [Verse], [Chorus], [Bridge], [Outro] to enhance the arrangement.
sampleRatenumberNo44100--8000, 16000, 22050, 24000, 32000, 44100Sample rate for the generated music
bitratenumberNo256000--32000, 64000, 128000, 256000Bitrate for the generated music

Minimax Speech 2.6 HD

MiniMax Speech 2.6 HD delivers studio-quality multilingual text-to-audio with nuanced prosody, subtitle export, and premium voices

Model ID: model_minimax-speech-2-6-hd

Capabilities: txt2audio

LLM Markdown: https://app.scenario.com/api/models/model_minimax-speech-2-6-hd/markdown

ParameterTypeRequiredDefaultMinMaxAllowed ValuesDescription
textstringYes----Text to convert to speech. Use <#x#> between words to control pause duration (0.01-99.99s).
voiceIdstringNoWise_Woman--Wise_Woman, Friendly_Person, Inspirational_girl, Deep_Voice_Man, Calm_Woman, Casual_Guy, Lively_Girl, Patient_Man, Young_Knight, Determined_Man, Lovely_Girl, Decent_Boy, Imposing_Manner, Elegant_Man, Abbess, Sweet_Girl_2, Exuberant_GirlDesired voice.
speednumberNo10.52-Speech speed
volumenumberNo1010-Speech volume
pitchnumberNo0-1212-Speech pitch
emotionstringNoauto--auto, happy, sad, angry, fearful, disgusted, surprised, calm, fluent, neutralSpeech emotion
sampleRatenumberNo32000--8000, 16000, 22050, 24000, 32000, 44100Sample rate for the generated speech
bitratenumberNo128000--32000, 64000, 128000, 256000Bitrate for the generated speech
channelstringNostereo--mono, stereoNumber of audio channels
languageBooststringNoAutomatic--None, Automatic, Chinese, Chinese,Yue, Cantonese, English, Arabic, Russian, Spanish, French, Portuguese, German, Turkish, Dutch, Ukrainian, Vietnamese, Indonesian, Japanese, Italian, Korean, Thai, Polish, Romanian, Greek, Czech, Finnish, Hindi, Bulgarian, Danish, Hebrew, Malay, Persian, Slovak, Swedish, Croatian, Filipino, Hungarian, Norwegian, Slovenian, Catalan, Nynorsk, Tamil, AfrikaansEnhance recognition of specific languages and dialects
englishNormalizationbooleanNofalse---Enable English text normalization for better number reading (slightly increases latency)

Minimax Speech 2.6 Turbo

Low-latency MiniMax Speech 2.6 Turbo brings multilingual, emotional text-to-speech.

Model ID: model_minimax-speech-2-6-turbo

Capabilities: txt2audio

LLM Markdown: https://app.scenario.com/api/models/model_minimax-speech-2-6-turbo/markdown

ParameterTypeRequiredDefaultMinMaxAllowed ValuesDescription
textstringYes----Text to convert to speech. Use <#x#> between words to control pause duration (0.01-99.99s).
voiceIdstringNoWise_Woman--Wise_Woman, Friendly_Person, Inspirational_girl, Deep_Voice_Man, Calm_Woman, Casual_Guy, Lively_Girl, Patient_Man, Young_Knight, Determined_Man, Lovely_Girl, Decent_Boy, Imposing_Manner, Elegant_Man, Abbess, Sweet_Girl_2, Exuberant_GirlDesired voice.
speednumberNo10.52-Speech speed
volumenumberNo1010-Speech volume
pitchnumberNo0-1212-Speech pitch
emotionstringNoauto--auto, happy, sad, angry, fearful, disgusted, surprised, calm, fluent, neutralSpeech emotion
sampleRatenumberNo32000--8000, 16000, 22050, 24000, 32000, 44100Sample rate for the generated speech
bitratenumberNo128000--32000, 64000, 128000, 256000Bitrate for the generated speech
channelstringNostereo--mono, stereoNumber of audio channels
languageBooststringNoAutomatic--None, Automatic, Chinese, Chinese,Yue, Cantonese, English, Arabic, Russian, Spanish, French, Portuguese, German, Turkish, Dutch, Ukrainian, Vietnamese, Indonesian, Japanese, Italian, Korean, Thai, Polish, Romanian, Greek, Czech, Finnish, Hindi, Bulgarian, Danish, Hebrew, Malay, Persian, Slovak, Swedish, Croatian, Filipino, Hungarian, Norwegian, Slovenian, Catalan, Nynorsk, Tamil, AfrikaansEnhance recognition of specific languages and dialects
englishNormalizationbooleanNofalse---Enable English text normalization for better number reading (slightly increases latency)