--- title: Academia | Scenario Docs --- > This page is auto-generated from model configurations. Last updated: 2026-04-09. This reference lists all available **Academia** audio generation models and their parameters. Use these parameter names when calling the [Generation API](/api/postgeneratecustom/index.md). - [Lux TTS](#lux-tts) - [MM Audio 2 Text To Audio](#mm-audio-2-text-to-audio) - [Tada 1B Text to Speech](#tada-1b-text-to-speech) - [Tada 3B Text to Speech](#tada-3b-text-to-speech) --- ## Lux TTS High-quality voice cloning TTS at 48kHz from text and a reference audio clip. **Model ID:** `model_lux-tts` **Capabilities:** `txt2audio` **LLM Markdown:** | Parameter | Type | Required | Default | Min | Max | Allowed Values | Description | | ------------------- | ------ | -------- | ------- | --- | ---------- | -------------- | ------------------------------------------------------------------- | | `prompt` | string | Yes | - | - | - | - | Text to convert to speech. | | `audio` | file | Yes | - | - | - | - | Reference audio for voice cloning. | | `guidanceScale` | number | No | `3` | 0 | 10 | - | Higher values increase adherence to the reference voice. | | `numInferenceSteps` | number | No | `4` | 1 | 16 | - | Number of flow-matching inference steps. | | `maxRefLength` | number | No | `5` | 1 | 15 | - | Maximum reference audio duration used for voice encoding (seconds). | | `seed` | number | No | - | 0 | 2147483647 | - | Seed for reproducible outputs. | ## MM Audio 2 Text To Audio MMAudio generates synchronized audio given text inputs. It can generate sounds described by a prompt. **Model ID:** `model_mm-audio-2-t2a` **Capabilities:** `txt2audio` **LLM Markdown:** | Parameter | Type | Required | Default | Min | Max | Allowed Values | Description | | ---------------- | ------- | -------- | ------- | --- | ----- | -------------- | --------------------------------------------------- | | `prompt` | string | Yes | - | - | - | - | Text prompt for generated audio | | `negativePrompt` | string | No | - | - | - | - | Negative prompt to avoid certain sounds | | `duration` | number | No | `8` | 1 | 30 | - | Output duration in seconds. | | `numSteps` | number | No | `25` | 4 | 50 | - | The number of steps to generate the audio for | | `cfgStrength` | number | No | `4.5` | 1 | 20 | - | Higher values will keep output closer to the prompt | | `maskAwayClip` | boolean | No | `false` | - | - | - | Mask away certain sounds in the audio | | `seed` | number | No | - | 0 | 65535 | - | Random seed for reproducible generation | ## Tada 1B Text to Speech Lighter Tada voice cloning text-to-speech variant with multilingual support. **Model ID:** `model_tada-1b-text-to-speech` **Capabilities:** `txt2audio` **LLM Markdown:** | Parameter | Type | Required | Default | Min | Max | Allowed Values | Description | | ------------------- | ------ | -------- | ------- | --- | --- | ---------------------------------------------------------- | ----------------------------------------------------------------------- | | `audio` | file | Yes | - | - | - | - | Reference audio for voice cloning. | | `prompt` | string | Yes | - | - | - | - | Text to synthesize with the reference voice. | | `transcript` | string | No | - | - | - | - | Transcript of the reference audio. Required for non-English references. | | `language` | string | No | `en` | - | - | `en`, `ar`, `ch`, `de`, `es`, `fr`, `it`, `ja`, `pl`, `pt` | Language used for text alignment. | | `numInferenceSteps` | number | No | `20` | 1 | 50 | - | Number of ODE solver steps for acoustic generation. | | `speedUpFactor` | number | No | `1` | 0.5 | 2 | - | Values > 1 speed up and values < 1 slow down speech. | | `temperature` | number | No | `0.6` | 0 | 2 | - | Sampling temperature for text token generation. | | `topP` | number | No | `0.9` | 0 | 1 | - | Top-p nucleus sampling value. | | `repetitionPenalty` | number | No | `1.1` | 1 | 2 | - | Penalty applied to repeated tokens. | | `acousticCfgScale` | number | No | `1.6` | 0 | 10 | - | Classifier-free guidance scale for acoustic generation. | | `noiseTemperature` | number | No | `0.9` | 0 | 2 | - | Temperature for diffusion noise during flow matching. | | `numExtraSteps` | number | No | `0` | 0 | 50 | - | Additional autoregressive steps for continuation. | ## Tada 3B Text to Speech Voice cloning text-to-speech with multilingual alignment and expressive controls. **Model ID:** `model_tada-3b-text-to-speech` **Capabilities:** `txt2audio` **LLM Markdown:** | Parameter | Type | Required | Default | Min | Max | Allowed Values | Description | | ------------------- | ------ | -------- | ------- | --- | --- | ---------------------------------------------------------- | ----------------------------------------------------------------------- | | `audio` | file | Yes | - | - | - | - | Reference audio for voice cloning. | | `prompt` | string | Yes | - | - | - | - | Text to synthesize with the reference voice. | | `transcript` | string | No | - | - | - | - | Transcript of the reference audio. Required for non-English references. | | `language` | string | No | `en` | - | - | `en`, `ar`, `ch`, `de`, `es`, `fr`, `it`, `ja`, `pl`, `pt` | Language used for text alignment. | | `numInferenceSteps` | number | No | `20` | 1 | 50 | - | Number of ODE solver steps for acoustic generation. | | `speedUpFactor` | number | No | `1` | 0.5 | 2 | - | Values > 1 speed up and values < 1 slow down speech. | | `temperature` | number | No | `0.6` | 0 | 2 | - | Sampling temperature for text token generation. | | `topP` | number | No | `0.9` | 0 | 1 | - | Top-p nucleus sampling value. | | `repetitionPenalty` | number | No | `1.1` | 1 | 2 | - | Penalty applied to repeated tokens. | | `acousticCfgScale` | number | No | `1.6` | 0 | 10 | - | Classifier-free guidance scale for acoustic generation. | | `noiseTemperature` | number | No | `0.9` | 0 | 2 | - | Temperature for diffusion noise during flow matching. | | `numExtraSteps` | number | No | `0` | 0 | 50 | - | Additional autoregressive steps for continuation. |