MiniMax
This page is auto-generated from model configurations. Last updated: 2026-03-13.
This reference lists all available MiniMax audio generation models and their parameters. Use these parameter names when calling the Generation API.
Minimax Music 2.0
Model ID: model_minimax-music-2-0
Capabilities: txt2audio
LLM Markdown: https://app.scenario.com/api/models/model_minimax-music-2-0/markdown
| Parameter | Type | Required | Default | Min | Max | Allowed Values | Description |
|---|---|---|---|---|---|---|---|
prompt | string | Yes | - | - | - | - | A description of the music, specifying style, mood, and scenario. |
lyrics | string | Yes | - | - | - | - | Lyrics of the song. Use n to separate lines. You may add structure tags like [Intro], [Verse], [Chorus], [Bridge], [Outro] to enhance the arrangement. |
sampleRate | number | No | 44100 | - | - | 8000, 16000, 22050, 24000, 32000, 44100 | Sample rate for the generated music |
bitrate | number | No | 256000 | - | - | 32000, 64000, 128000, 256000 | Bitrate for the generated music |
Minimax Speech 2.6 HD
MiniMax Speech 2.6 HD delivers studio-quality multilingual text-to-audio with nuanced prosody, subtitle export, and premium voices
Model ID: model_minimax-speech-2-6-hd
Capabilities: txt2audio
LLM Markdown: https://app.scenario.com/api/models/model_minimax-speech-2-6-hd/markdown
| Parameter | Type | Required | Default | Min | Max | Allowed Values | Description |
|---|---|---|---|---|---|---|---|
text | string | Yes | - | - | - | - | Text to convert to speech. Use <#x#> between words to control pause duration (0.01-99.99s). |
voiceId | string | No | Wise_Woman | - | - | Wise_Woman, Friendly_Person, Inspirational_girl, Deep_Voice_Man, Calm_Woman, Casual_Guy, Lively_Girl, Patient_Man, Young_Knight, Determined_Man, Lovely_Girl, Decent_Boy, Imposing_Manner, Elegant_Man, Abbess, Sweet_Girl_2, Exuberant_Girl | Desired voice. |
speed | number | No | 1 | 0.5 | 2 | - | Speech speed |
volume | number | No | 1 | 0 | 10 | - | Speech volume |
pitch | number | No | 0 | -12 | 12 | - | Speech pitch |
emotion | string | No | auto | - | - | auto, happy, sad, angry, fearful, disgusted, surprised, calm, fluent, neutral | Speech emotion |
sampleRate | number | No | 32000 | - | - | 8000, 16000, 22050, 24000, 32000, 44100 | Sample rate for the generated speech |
bitrate | number | No | 128000 | - | - | 32000, 64000, 128000, 256000 | Bitrate for the generated speech |
channel | string | No | stereo | - | - | mono, stereo | Number of audio channels |
languageBoost | string | No | Automatic | - | - | None, Automatic, Chinese, Chinese,Yue, Cantonese, English, Arabic, Russian, Spanish, French, Portuguese, German, Turkish, Dutch, Ukrainian, Vietnamese, Indonesian, Japanese, Italian, Korean, Thai, Polish, Romanian, Greek, Czech, Finnish, Hindi, Bulgarian, Danish, Hebrew, Malay, Persian, Slovak, Swedish, Croatian, Filipino, Hungarian, Norwegian, Slovenian, Catalan, Nynorsk, Tamil, Afrikaans | Enhance recognition of specific languages and dialects |
englishNormalization | boolean | No | false | - | - | - | Enable English text normalization for better number reading (slightly increases latency) |
Minimax Speech 2.6 Turbo
Low-latency MiniMax Speech 2.6 Turbo brings multilingual, emotional text-to-speech.
Model ID: model_minimax-speech-2-6-turbo
Capabilities: txt2audio
LLM Markdown: https://app.scenario.com/api/models/model_minimax-speech-2-6-turbo/markdown
| Parameter | Type | Required | Default | Min | Max | Allowed Values | Description |
|---|---|---|---|---|---|---|---|
text | string | Yes | - | - | - | - | Text to convert to speech. Use <#x#> between words to control pause duration (0.01-99.99s). |
voiceId | string | No | Wise_Woman | - | - | Wise_Woman, Friendly_Person, Inspirational_girl, Deep_Voice_Man, Calm_Woman, Casual_Guy, Lively_Girl, Patient_Man, Young_Knight, Determined_Man, Lovely_Girl, Decent_Boy, Imposing_Manner, Elegant_Man, Abbess, Sweet_Girl_2, Exuberant_Girl | Desired voice. |
speed | number | No | 1 | 0.5 | 2 | - | Speech speed |
volume | number | No | 1 | 0 | 10 | - | Speech volume |
pitch | number | No | 0 | -12 | 12 | - | Speech pitch |
emotion | string | No | auto | - | - | auto, happy, sad, angry, fearful, disgusted, surprised, calm, fluent, neutral | Speech emotion |
sampleRate | number | No | 32000 | - | - | 8000, 16000, 22050, 24000, 32000, 44100 | Sample rate for the generated speech |
bitrate | number | No | 128000 | - | - | 32000, 64000, 128000, 256000 | Bitrate for the generated speech |
channel | string | No | stereo | - | - | mono, stereo | Number of audio channels |
languageBoost | string | No | Automatic | - | - | None, Automatic, Chinese, Chinese,Yue, Cantonese, English, Arabic, Russian, Spanish, French, Portuguese, German, Turkish, Dutch, Ukrainian, Vietnamese, Indonesian, Japanese, Italian, Korean, Thai, Polish, Romanian, Greek, Czech, Finnish, Hindi, Bulgarian, Danish, Hebrew, Malay, Persian, Slovak, Swedish, Croatian, Filipino, Hungarian, Norwegian, Slovenian, Catalan, Nynorsk, Tamil, Afrikaans | Enhance recognition of specific languages and dialects |
englishNormalization | boolean | No | false | - | - | - | Enable English text normalization for better number reading (slightly increases latency) |
Updated about 4 hours ago