---
title: Academia | Scenario Docs
---

> This page is auto-generated from model configurations. Last updated: 2026-04-09.

This reference lists all available **Academia** audio generation models and their parameters. Use these parameter names when calling the [Generation API](/api/postgeneratecustom/index.md).

- [Lux TTS](#lux-tts)
- [MM Audio 2 Text To Audio](#mm-audio-2-text-to-audio)
- [Tada 1B Text to Speech](#tada-1b-text-to-speech)
- [Tada 3B Text to Speech](#tada-3b-text-to-speech)

---

## Lux TTS

High-quality voice cloning TTS at 48kHz from text and a reference audio clip.

**Model ID:** `model_lux-tts`

**Capabilities:** `txt2audio`

**LLM Markdown:** <https://app.scenario.com/api/models/model_lux-tts/markdown>

| Parameter           | Type   | Required | Default | Min | Max        | Allowed Values | Description                                                         |
| ------------------- | ------ | -------- | ------- | --- | ---------- | -------------- | ------------------------------------------------------------------- |
| `prompt`            | string | Yes      | -       | -   | -          | -              | Text to convert to speech.                                          |
| `audio`             | file   | Yes      | -       | -   | -          | -              | Reference audio for voice cloning.                                  |
| `guidanceScale`     | number | No       | `3`     | 0   | 10         | -              | Higher values increase adherence to the reference voice.            |
| `numInferenceSteps` | number | No       | `4`     | 1   | 16         | -              | Number of flow-matching inference steps.                            |
| `maxRefLength`      | number | No       | `5`     | 1   | 15         | -              | Maximum reference audio duration used for voice encoding (seconds). |
| `seed`              | number | No       | -       | 0   | 2147483647 | -              | Seed for reproducible outputs.                                      |

## MM Audio 2 Text To Audio

MMAudio generates synchronized audio given text inputs. It can generate sounds described by a prompt.

**Model ID:** `model_mm-audio-2-t2a`

**Capabilities:** `txt2audio`

**LLM Markdown:** <https://app.scenario.com/api/models/model_mm-audio-2-t2a/markdown>

| Parameter        | Type    | Required | Default | Min | Max   | Allowed Values | Description                                         |
| ---------------- | ------- | -------- | ------- | --- | ----- | -------------- | --------------------------------------------------- |
| `prompt`         | string  | Yes      | -       | -   | -     | -              | Text prompt for generated audio                     |
| `negativePrompt` | string  | No       | -       | -   | -     | -              | Negative prompt to avoid certain sounds             |
| `duration`       | number  | No       | `8`     | 1   | 30    | -              | Output duration in seconds.                         |
| `numSteps`       | number  | No       | `25`    | 4   | 50    | -              | The number of steps to generate the audio for       |
| `cfgStrength`    | number  | No       | `4.5`   | 1   | 20    | -              | Higher values will keep output closer to the prompt |
| `maskAwayClip`   | boolean | No       | `false` | -   | -     | -              | Mask away certain sounds in the audio               |
| `seed`           | number  | No       | -       | 0   | 65535 | -              | Random seed for reproducible generation             |

## Tada 1B Text to Speech

Lighter Tada voice cloning text-to-speech variant with multilingual support.

**Model ID:** `model_tada-1b-text-to-speech`

**Capabilities:** `txt2audio`

**LLM Markdown:** <https://app.scenario.com/api/models/model_tada-1b-text-to-speech/markdown>

| Parameter           | Type   | Required | Default | Min | Max | Allowed Values                                             | Description                                                             |
| ------------------- | ------ | -------- | ------- | --- | --- | ---------------------------------------------------------- | ----------------------------------------------------------------------- |
| `audio`             | file   | Yes      | -       | -   | -   | -                                                          | Reference audio for voice cloning.                                      |
| `prompt`            | string | Yes      | -       | -   | -   | -                                                          | Text to synthesize with the reference voice.                            |
| `transcript`        | string | No       | -       | -   | -   | -                                                          | Transcript of the reference audio. Required for non-English references. |
| `language`          | string | No       | `en`    | -   | -   | `en`, `ar`, `ch`, `de`, `es`, `fr`, `it`, `ja`, `pl`, `pt` | Language used for text alignment.                                       |
| `numInferenceSteps` | number | No       | `20`    | 1   | 50  | -                                                          | Number of ODE solver steps for acoustic generation.                     |
| `speedUpFactor`     | number | No       | `1`     | 0.5 | 2   | -                                                          | Values > 1 speed up and values < 1 slow down speech.                    |
| `temperature`       | number | No       | `0.6`   | 0   | 2   | -                                                          | Sampling temperature for text token generation.                         |
| `topP`              | number | No       | `0.9`   | 0   | 1   | -                                                          | Top-p nucleus sampling value.                                           |
| `repetitionPenalty` | number | No       | `1.1`   | 1   | 2   | -                                                          | Penalty applied to repeated tokens.                                     |
| `acousticCfgScale`  | number | No       | `1.6`   | 0   | 10  | -                                                          | Classifier-free guidance scale for acoustic generation.                 |
| `noiseTemperature`  | number | No       | `0.9`   | 0   | 2   | -                                                          | Temperature for diffusion noise during flow matching.                   |
| `numExtraSteps`     | number | No       | `0`     | 0   | 50  | -                                                          | Additional autoregressive steps for continuation.                       |

## Tada 3B Text to Speech

Voice cloning text-to-speech with multilingual alignment and expressive controls.

**Model ID:** `model_tada-3b-text-to-speech`

**Capabilities:** `txt2audio`

**LLM Markdown:** <https://app.scenario.com/api/models/model_tada-3b-text-to-speech/markdown>

| Parameter           | Type   | Required | Default | Min | Max | Allowed Values                                             | Description                                                             |
| ------------------- | ------ | -------- | ------- | --- | --- | ---------------------------------------------------------- | ----------------------------------------------------------------------- |
| `audio`             | file   | Yes      | -       | -   | -   | -                                                          | Reference audio for voice cloning.                                      |
| `prompt`            | string | Yes      | -       | -   | -   | -                                                          | Text to synthesize with the reference voice.                            |
| `transcript`        | string | No       | -       | -   | -   | -                                                          | Transcript of the reference audio. Required for non-English references. |
| `language`          | string | No       | `en`    | -   | -   | `en`, `ar`, `ch`, `de`, `es`, `fr`, `it`, `ja`, `pl`, `pt` | Language used for text alignment.                                       |
| `numInferenceSteps` | number | No       | `20`    | 1   | 50  | -                                                          | Number of ODE solver steps for acoustic generation.                     |
| `speedUpFactor`     | number | No       | `1`     | 0.5 | 2   | -                                                          | Values > 1 speed up and values < 1 slow down speech.                    |
| `temperature`       | number | No       | `0.6`   | 0   | 2   | -                                                          | Sampling temperature for text token generation.                         |
| `topP`              | number | No       | `0.9`   | 0   | 1   | -                                                          | Top-p nucleus sampling value.                                           |
| `repetitionPenalty` | number | No       | `1.1`   | 1   | 2   | -                                                          | Penalty applied to repeated tokens.                                     |
| `acousticCfgScale`  | number | No       | `1.6`   | 0   | 10  | -                                                          | Classifier-free guidance scale for acoustic generation.                 |
| `noiseTemperature`  | number | No       | `0.9`   | 0   | 2   | -                                                          | Temperature for diffusion noise during flow matching.                   |
| `numExtraSteps`     | number | No       | `0`     | 0   | 50  | -                                                          | Additional autoregressive steps for continuation.                       |