MiniMax
This page is auto-generated from model configurations. Last updated: 2026-04-15.
This reference lists all available MiniMax audio generation models and their parameters. Use these parameter names when calling the Generation API.
- Minimax Music 2.0
- Minimax Music 2.5
- Minimax Music 2.6
- Minimax Music Cover
- Minimax Speech 2.6 HD
- Minimax Speech 2.6 Turbo
- Minimax Speech 2.8 HD
- Minimax Speech 2.8 Turbo
Minimax Music 2.0
Model ID: model_minimax-music-2-0
Capabilities: txt2audio
LLM Markdown: https://app.scenario.com/api/models/model_minimax-music-2-0/markdown
| Parameter | Type | Required | Default | Min | Max | Allowed Values | Description |
|---|---|---|---|---|---|---|---|
prompt | string | Yes | - | - | - | - | A description of the music, specifying style, mood, and scenario. |
lyrics | string | Yes | - | - | - | - | Lyrics of the song. Use n to separate lines. You may add structure tags like [Intro], [Verse], [Chorus], [Bridge], [Outro] to enhance the arrangement. |
sampleRate | number | No | 44100 | - | - | 8000, 16000, 22050, 24000, 32000, 44100 | Sample rate for the generated music |
bitrate | number | No | 256000 | - | - | 32000, 64000, 128000, 256000 | Bitrate for the generated music |
Minimax Music 2.5
Full-length songs with natural vocals and rich instrumentation from lyrics and an optional style prompt. Control arrangement with 14+ section tags (e.g. [Intro], [Verse], [Chorus]).
Model ID: model_minimax-music-2-5
Capabilities: txt2audio
LLM Markdown: https://app.scenario.com/api/models/model_minimax-music-2-5/markdown
| Parameter | Type | Required | Default | Min | Max | Allowed Values | Description |
|---|---|---|---|---|---|---|---|
lyrics | string | Yes | - | - | - | - | Lyrics for the song (1–3,500 characters). Use \n between lines and \n\n for pauses. Structure tags include [Intro], [Verse], [Pre Chorus], [Chorus], [Hook], [Drop], [Bridge], [Solo], [Build Up], [Inst], [Interlude], [Break], [Transition], [Outro], and more. |
prompt | string | No | - | - | - | - | Optional description of genre, mood, tempo, vocal style, and instrumentation (up to 2,000 characters). |
sampleRate | number | No | 44100 | - | - | 16000, 24000, 32000, 44100 | Audio sample rate for the generated music |
bitrate | number | No | 256000 | - | - | 32000, 64000, 128000, 256000 | Bitrate for the generated music |
Minimax Music 2.6
MiniMax Music 2.6 creates full-length songs with vocals and rich orchestration from lyrics and a style prompt, or instrumental-only from a prompt. BPM/key hints in the prompt, optional auto-generated lyrics, and up to about six minutes per run.
Model ID: model_minimax-music-2-6
Capabilities: txt2audio
LLM Markdown: https://app.scenario.com/api/models/model_minimax-music-2-6/markdown
| Parameter | Type | Required | Default | Min | Max | Allowed Values | Description |
|---|---|---|---|---|---|---|---|
lyrics | string | No | - | - | - | - | Lyrics for vocal tracks (up to 3,500 characters). Omit for instrumental mode, or leave empty and enable 'Auto lyrics' to generate lyrics from the style prompt. Use \n between lines and \n\n for pauses. Structure tags include [Intro], [Verse], [Pre Chorus], [Chorus], [Hook], [Drop], [Bridge], [Solo], [Build Up], [Inst], [Interlude], [Break], [Transition], [Outro], and more. |
prompt | string | Yes | - | - | - | - | Genre, mood, tempo (e.g. BPM), key, vocal style, and instrumentation (up to 2,000 characters). For instrumental mode, this is the main instruction. |
isInstrumental | boolean | No | false | - | - | - | When true, generates music without vocals. Use the style prompt only; lyrics are not required. |
lyricsOptimizer | boolean | No | false | - | - | - | When true and lyrics are empty, the model generates lyrics from the style prompt. |
sampleRate | number | No | 44100 | - | - | 16000, 24000, 32000, 44100 | Audio sample rate for the generated music |
bitrate | number | No | 256000 | - | - | 32000, 64000, 128000, 256000 | Bitrate for the generated music |
Minimax Music Cover
Reimagine an existing song in a new style while preserving the original melody: voice, instruments, genre, and arrangement can change from a target style prompt. Works best with clear vocals and melody.
Model ID: model_minimax-music-cover
Capabilities: audio2audio
LLM Markdown: https://app.scenario.com/api/models/model_minimax-music-cover/markdown
| Parameter | Type | Required | Default | Min | Max | Allowed Values | Description |
|---|---|---|---|---|---|---|---|
audioUrl | file | Yes | - | - | - | - | Audio to cover (MP3 or WAV). Works best with music that has clear vocals and melody. |
prompt | string | Yes | - | - | - | - | Target style for the cover: genre, vocal character, instruments, and production (up to 2,000 characters). |
sampleRate | number | No | 44100 | - | - | 16000, 24000, 32000, 44100 | Audio sample rate for the output |
bitrate | number | No | 256000 | - | - | 32000, 64000, 128000, 256000 | Bitrate for the output audio |
Minimax Speech 2.6 HD
MiniMax Speech 2.6 HD delivers studio-quality multilingual text-to-audio with nuanced prosody, subtitle export, and premium voices
Model ID: model_minimax-speech-2-6-hd
Capabilities: txt2audio
LLM Markdown: https://app.scenario.com/api/models/model_minimax-speech-2-6-hd/markdown
| Parameter | Type | Required | Default | Min | Max | Allowed Values | Description |
|---|---|---|---|---|---|---|---|
text | string | Yes | - | - | - | - | Text to convert to speech. Use <#x#> between words to control pause duration (0.01-99.99s). |
voiceId | string | No | Wise_Woman | - | - | Wise_Woman, Friendly_Person, Inspirational_girl, Deep_Voice_Man, Calm_Woman, Casual_Guy, Lively_Girl, Patient_Man, Young_Knight, Determined_Man, Lovely_Girl, Decent_Boy, Imposing_Manner, Elegant_Man, Abbess, Sweet_Girl_2, Exuberant_Girl | Desired voice. |
speed | number | No | 1 | 0.5 | 2 | - | Speech speed |
volume | number | No | 1 | 0 | 10 | - | Speech volume |
pitch | number | No | 0 | -12 | 12 | - | Speech pitch |
emotion | string | No | auto | - | - | auto, happy, sad, angry, fearful, disgusted, surprised, calm, fluent, neutral | Speech emotion |
sampleRate | number | No | 32000 | - | - | 8000, 16000, 22050, 24000, 32000, 44100 | Sample rate for the generated speech |
bitrate | number | No | 128000 | - | - | 32000, 64000, 128000, 256000 | Bitrate for the generated speech |
channel | string | No | stereo | - | - | mono, stereo | Number of audio channels |
languageBoost | string | No | Automatic | - | - | None, Automatic, Chinese, Chinese,Yue, Cantonese, English, Arabic, Russian, Spanish, French, Portuguese, German, Turkish, Dutch, Ukrainian, Vietnamese, Indonesian, Japanese, Italian, Korean, Thai, Polish, Romanian, Greek, Czech, Finnish, Hindi, Bulgarian, Danish, Hebrew, Malay, Persian, Slovak, Swedish, Croatian, Filipino, Hungarian, Norwegian, Slovenian, Catalan, Nynorsk, Tamil, Afrikaans | Enhance recognition of specific languages and dialects |
englishNormalization | boolean | No | false | - | - | - | Enable English text normalization for better number reading (slightly increases latency) |
Minimax Speech 2.6 Turbo
Low-latency MiniMax Speech 2.6 Turbo brings multilingual, emotional text-to-speech.
Model ID: model_minimax-speech-2-6-turbo
Capabilities: txt2audio
LLM Markdown: https://app.scenario.com/api/models/model_minimax-speech-2-6-turbo/markdown
| Parameter | Type | Required | Default | Min | Max | Allowed Values | Description |
|---|---|---|---|---|---|---|---|
text | string | Yes | - | - | - | - | Text to convert to speech. Use <#x#> between words to control pause duration (0.01-99.99s). |
voiceId | string | No | Wise_Woman | - | - | Wise_Woman, Friendly_Person, Inspirational_girl, Deep_Voice_Man, Calm_Woman, Casual_Guy, Lively_Girl, Patient_Man, Young_Knight, Determined_Man, Lovely_Girl, Decent_Boy, Imposing_Manner, Elegant_Man, Abbess, Sweet_Girl_2, Exuberant_Girl | Desired voice. |
speed | number | No | 1 | 0.5 | 2 | - | Speech speed |
volume | number | No | 1 | 0 | 10 | - | Speech volume |
pitch | number | No | 0 | -12 | 12 | - | Speech pitch |
emotion | string | No | auto | - | - | auto, happy, sad, angry, fearful, disgusted, surprised, calm, fluent, neutral | Speech emotion |
sampleRate | number | No | 32000 | - | - | 8000, 16000, 22050, 24000, 32000, 44100 | Sample rate for the generated speech |
bitrate | number | No | 128000 | - | - | 32000, 64000, 128000, 256000 | Bitrate for the generated speech |
channel | string | No | stereo | - | - | mono, stereo | Number of audio channels |
languageBoost | string | No | Automatic | - | - | None, Automatic, Chinese, Chinese,Yue, Cantonese, English, Arabic, Russian, Spanish, French, Portuguese, German, Turkish, Dutch, Ukrainian, Vietnamese, Indonesian, Japanese, Italian, Korean, Thai, Polish, Romanian, Greek, Czech, Finnish, Hindi, Bulgarian, Danish, Hebrew, Malay, Persian, Slovak, Swedish, Croatian, Filipino, Hungarian, Norwegian, Slovenian, Catalan, Nynorsk, Tamil, Afrikaans | Enhance recognition of specific languages and dialects |
englishNormalization | boolean | No | false | - | - | - | Enable English text normalization for better number reading (slightly increases latency) |
Minimax Speech 2.8 HD
MiniMax Speech 2.8 HD: studio-grade text-to-speech with 32+ languages, expressive emotion control, preset voices, and interjections like (laughs) or (applause). Suited for final production and broadcast-quality voiceovers.
Model ID: model_minimax-speech-2-8-hd
Capabilities: txt2audio
LLM Markdown: https://app.scenario.com/api/models/model_minimax-speech-2-8-hd/markdown
| Parameter | Type | Required | Default | Min | Max | Allowed Values | Description |
|---|---|---|---|---|---|---|---|
text | string | Yes | - | - | - | - | Text to convert to speech. Use <#x#> between segments to control pause duration (0.01–99.99s). Optional interjections such as (laughs), (sighs), or (applause) for more natural delivery. |
voiceId | string | No | Wise_Woman | - | - | Wise_Woman, Friendly_Person, Inspirational_girl, Deep_Voice_Man, Calm_Woman, Casual_Guy, Lively_Girl, Patient_Man, Young_Knight, Determined_Man, Lovely_Girl, Decent_Boy, Imposing_Manner, Elegant_Man, Abbess, Sweet_Girl_2, Exuberant_Girl | Desired voice. |
speed | number | No | 1 | 0.5 | 2 | - | Speech speed |
volume | number | No | 1 | 0 | 10 | - | Speech volume |
pitch | number | No | 0 | -12 | 12 | - | Speech pitch |
emotion | string | No | auto | - | - | auto, happy, sad, angry, fearful, disgusted, surprised, calm, fluent, neutral | Speech emotion |
sampleRate | number | No | 32000 | - | - | 8000, 16000, 22050, 24000, 32000, 44100 | Sample rate for the generated speech |
bitrate | number | No | 128000 | - | - | 32000, 64000, 128000, 256000 | Bitrate for the generated speech |
channel | string | No | stereo | - | - | mono, stereo | Number of audio channels |
languageBoost | string | No | Automatic | - | - | None, Automatic, Chinese, Chinese,Yue, Cantonese, English, Arabic, Russian, Spanish, French, Portuguese, German, Turkish, Dutch, Ukrainian, Vietnamese, Indonesian, Japanese, Italian, Korean, Thai, Polish, Romanian, Greek, Czech, Finnish, Hindi, Bulgarian, Danish, Hebrew, Malay, Persian, Slovak, Swedish, Croatian, Filipino, Hungarian, Norwegian, Slovenian, Catalan, Nynorsk, Tamil, Afrikaans | Enhance recognition of specific languages and dialects |
englishNormalization | boolean | No | false | - | - | - | Enable English text normalization for better number reading (slightly increases latency) |
Minimax Speech 2.8 Turbo
MiniMax Speech 2.8 Turbo: fast, natural text-to-speech with 40+ languages, voice presets, emotion control, and optional interjections like (laughs) or (sighs). Optimized for low-latency and real-time use.
Model ID: model_minimax-speech-2-8-turbo
Capabilities: txt2audio
LLM Markdown: https://app.scenario.com/api/models/model_minimax-speech-2-8-turbo/markdown
| Parameter | Type | Required | Default | Min | Max | Allowed Values | Description |
|---|---|---|---|---|---|---|---|
text | string | Yes | - | - | - | - | Text to convert to speech. Use <#x#> between segments to control pause duration (0.01–99.99s). Optional interjections such as (laughs), (sighs), or (breath) for more natural delivery. |
voiceId | string | No | Wise_Woman | - | - | Wise_Woman, Friendly_Person, Inspirational_girl, Deep_Voice_Man, Calm_Woman, Casual_Guy, Lively_Girl, Patient_Man, Young_Knight, Determined_Man, Lovely_Girl, Decent_Boy, Imposing_Manner, Elegant_Man, Abbess, Sweet_Girl_2, Exuberant_Girl | Desired voice. |
speed | number | No | 1 | 0.5 | 2 | - | Speech speed |
volume | number | No | 1 | 0 | 10 | - | Speech volume |
pitch | number | No | 0 | -12 | 12 | - | Speech pitch |
emotion | string | No | auto | - | - | auto, happy, sad, angry, fearful, disgusted, surprised, calm, fluent, neutral | Speech emotion |
sampleRate | number | No | 32000 | - | - | 8000, 16000, 22050, 24000, 32000, 44100 | Sample rate for the generated speech |
bitrate | number | No | 128000 | - | - | 32000, 64000, 128000, 256000 | Bitrate for the generated speech |
channel | string | No | stereo | - | - | mono, stereo | Number of audio channels |
languageBoost | string | No | Automatic | - | - | None, Automatic, Chinese, Chinese,Yue, Cantonese, English, Arabic, Russian, Spanish, French, Portuguese, German, Turkish, Dutch, Ukrainian, Vietnamese, Indonesian, Japanese, Italian, Korean, Thai, Polish, Romanian, Greek, Czech, Finnish, Hindi, Bulgarian, Danish, Hebrew, Malay, Persian, Slovak, Swedish, Croatian, Filipino, Hungarian, Norwegian, Slovenian, Catalan, Nynorsk, Tamil, Afrikaans | Enhance recognition of specific languages and dialects |
englishNormalization | boolean | No | false | - | - | - | Enable English text normalization for better number reading (slightly increases latency) |
Updated 11 days ago