Academia
This page is auto-generated from model configurations. Last updated: 2026-03-13.
This reference lists all available Academia audio generation models and their parameters. Use these parameter names when calling the Generation API.
Lux TTS
High-quality voice cloning TTS at 48kHz from text and a reference audio clip.
Model ID: model_lux-tts
Capabilities: txt2audio
LLM Markdown: https://app.scenario.com/api/models/model_lux-tts/markdown
| Parameter | Type | Required | Default | Min | Max | Allowed Values | Description |
|---|---|---|---|---|---|---|---|
prompt | string | Yes | - | - | - | - | Text to convert to speech. |
audio | file | Yes | - | - | - | - | Reference audio for voice cloning. |
guidanceScale | number | No | 3 | 0 | 10 | - | Higher values increase adherence to the reference voice. |
numInferenceSteps | number | No | 4 | 1 | 16 | - | Number of flow-matching inference steps. |
maxRefLength | number | No | 5 | 1 | 15 | - | Maximum reference audio duration used for voice encoding (seconds). |
seed | number | No | - | 0 | 2147483647 | - | Seed for reproducible outputs. |
MM Audio 2 Text To Audio
MMAudio generates synchronized audio given text inputs. It can generate sounds described by a prompt.
Model ID: model_mm-audio-2-t2a
Capabilities: txt2audio
LLM Markdown: https://app.scenario.com/api/models/model_mm-audio-2-t2a/markdown
| Parameter | Type | Required | Default | Min | Max | Allowed Values | Description |
|---|---|---|---|---|---|---|---|
prompt | string | Yes | - | - | - | - | Text prompt for generated audio |
negativePrompt | string | No | - | - | - | - | Negative prompt to avoid certain sounds |
duration | number | No | 8 | 1 | 30 | - | Output duration in seconds. |
numSteps | number | No | 25 | 4 | 50 | - | The number of steps to generate the audio for |
cfgStrength | number | No | 4.5 | 1 | 20 | - | Higher values will keep output closer to the prompt |
maskAwayClip | boolean | No | false | - | - | - | Mask away certain sounds in the audio |
seed | number | No | - | 0 | 65535 | - | Random seed for reproducible generation |
Tada 1B Text to Speech
Lighter Tada voice cloning text-to-speech variant with multilingual support.
Model ID: model_tada-1b-text-to-speech
Capabilities: txt2audio
LLM Markdown: https://app.scenario.com/api/models/model_tada-1b-text-to-speech/markdown
| Parameter | Type | Required | Default | Min | Max | Allowed Values | Description |
|---|---|---|---|---|---|---|---|
audio | file | Yes | - | - | - | - | Reference audio for voice cloning. |
prompt | string | Yes | - | - | - | - | Text to synthesize with the reference voice. |
transcript | string | No | - | - | - | - | Transcript of the reference audio. Required for non-English references. |
language | string | No | en | - | - | en, ar, ch, de, es, fr, it, ja, pl, pt | Language used for text alignment. |
numInferenceSteps | number | No | 20 | 1 | 50 | - | Number of ODE solver steps for acoustic generation. |
speedUpFactor | number | No | 1 | 0.5 | 2 | - | Values > 1 speed up and values < 1 slow down speech. |
temperature | number | No | 0.6 | 0 | 2 | - | Sampling temperature for text token generation. |
topP | number | No | 0.9 | 0 | 1 | - | Top-p nucleus sampling value. |
repetitionPenalty | number | No | 1.1 | 1 | 2 | - | Penalty applied to repeated tokens. |
acousticCfgScale | number | No | 1.6 | 0 | 10 | - | Classifier-free guidance scale for acoustic generation. |
noiseTemperature | number | No | 0.9 | 0 | 2 | - | Temperature for diffusion noise during flow matching. |
numExtraSteps | number | No | 0 | 0 | 50 | - | Additional autoregressive steps for continuation. |
Tada 3B Text to Speech
Voice cloning text-to-speech with multilingual alignment and expressive controls.
Model ID: model_tada-3b-text-to-speech
Capabilities: txt2audio
LLM Markdown: https://app.scenario.com/api/models/model_tada-3b-text-to-speech/markdown
| Parameter | Type | Required | Default | Min | Max | Allowed Values | Description |
|---|---|---|---|---|---|---|---|
audio | file | Yes | - | - | - | - | Reference audio for voice cloning. |
prompt | string | Yes | - | - | - | - | Text to synthesize with the reference voice. |
transcript | string | No | - | - | - | - | Transcript of the reference audio. Required for non-English references. |
language | string | No | en | - | - | en, ar, ch, de, es, fr, it, ja, pl, pt | Language used for text alignment. |
numInferenceSteps | number | No | 20 | 1 | 50 | - | Number of ODE solver steps for acoustic generation. |
speedUpFactor | number | No | 1 | 0.5 | 2 | - | Values > 1 speed up and values < 1 slow down speech. |
temperature | number | No | 0.6 | 0 | 2 | - | Sampling temperature for text token generation. |
topP | number | No | 0.9 | 0 | 1 | - | Top-p nucleus sampling value. |
repetitionPenalty | number | No | 1.1 | 1 | 2 | - | Penalty applied to repeated tokens. |
acousticCfgScale | number | No | 1.6 | 0 | 10 | - | Classifier-free guidance scale for acoustic generation. |
noiseTemperature | number | No | 0.9 | 0 | 2 | - | Temperature for diffusion noise during flow matching. |
numExtraSteps | number | No | 0 | 0 | 50 | - | Additional autoregressive steps for continuation. |
Updated about 4 hours ago