MiniMax

This page is auto-generated from model configurations. Last updated: 2026-04-15.

This reference lists all available MiniMax audio generation models and their parameters. Use these parameter names when calling the Generation API.


Minimax Music 2.0

Model ID: model_minimax-music-2-0

Capabilities: txt2audio

LLM Markdown: https://app.scenario.com/api/models/model_minimax-music-2-0/markdown

ParameterTypeRequiredDefaultMinMaxAllowed ValuesDescription
promptstringYes----A description of the music, specifying style, mood, and scenario.
lyricsstringYes----Lyrics of the song. Use n to separate lines. You may add structure tags like [Intro], [Verse], [Chorus], [Bridge], [Outro] to enhance the arrangement.
sampleRatenumberNo44100--8000, 16000, 22050, 24000, 32000, 44100Sample rate for the generated music
bitratenumberNo256000--32000, 64000, 128000, 256000Bitrate for the generated music

Minimax Music 2.5

Full-length songs with natural vocals and rich instrumentation from lyrics and an optional style prompt. Control arrangement with 14+ section tags (e.g. [Intro], [Verse], [Chorus]).

Model ID: model_minimax-music-2-5

Capabilities: txt2audio

LLM Markdown: https://app.scenario.com/api/models/model_minimax-music-2-5/markdown

ParameterTypeRequiredDefaultMinMaxAllowed ValuesDescription
lyricsstringYes----Lyrics for the song (1–3,500 characters). Use \n between lines and \n\n for pauses. Structure tags include [Intro], [Verse], [Pre Chorus], [Chorus], [Hook], [Drop], [Bridge], [Solo], [Build Up], [Inst], [Interlude], [Break], [Transition], [Outro], and more.
promptstringNo----Optional description of genre, mood, tempo, vocal style, and instrumentation (up to 2,000 characters).
sampleRatenumberNo44100--16000, 24000, 32000, 44100Audio sample rate for the generated music
bitratenumberNo256000--32000, 64000, 128000, 256000Bitrate for the generated music

Minimax Music 2.6

MiniMax Music 2.6 creates full-length songs with vocals and rich orchestration from lyrics and a style prompt, or instrumental-only from a prompt. BPM/key hints in the prompt, optional auto-generated lyrics, and up to about six minutes per run.

Model ID: model_minimax-music-2-6

Capabilities: txt2audio

LLM Markdown: https://app.scenario.com/api/models/model_minimax-music-2-6/markdown

ParameterTypeRequiredDefaultMinMaxAllowed ValuesDescription
lyricsstringNo----Lyrics for vocal tracks (up to 3,500 characters). Omit for instrumental mode, or leave empty and enable 'Auto lyrics' to generate lyrics from the style prompt. Use \n between lines and \n\n for pauses. Structure tags include [Intro], [Verse], [Pre Chorus], [Chorus], [Hook], [Drop], [Bridge], [Solo], [Build Up], [Inst], [Interlude], [Break], [Transition], [Outro], and more.
promptstringYes----Genre, mood, tempo (e.g. BPM), key, vocal style, and instrumentation (up to 2,000 characters). For instrumental mode, this is the main instruction.
isInstrumentalbooleanNofalse---When true, generates music without vocals. Use the style prompt only; lyrics are not required.
lyricsOptimizerbooleanNofalse---When true and lyrics are empty, the model generates lyrics from the style prompt.
sampleRatenumberNo44100--16000, 24000, 32000, 44100Audio sample rate for the generated music
bitratenumberNo256000--32000, 64000, 128000, 256000Bitrate for the generated music

Minimax Music Cover

Reimagine an existing song in a new style while preserving the original melody: voice, instruments, genre, and arrangement can change from a target style prompt. Works best with clear vocals and melody.

Model ID: model_minimax-music-cover

Capabilities: audio2audio

LLM Markdown: https://app.scenario.com/api/models/model_minimax-music-cover/markdown

ParameterTypeRequiredDefaultMinMaxAllowed ValuesDescription
audioUrlfileYes----Audio to cover (MP3 or WAV). Works best with music that has clear vocals and melody.
promptstringYes----Target style for the cover: genre, vocal character, instruments, and production (up to 2,000 characters).
sampleRatenumberNo44100--16000, 24000, 32000, 44100Audio sample rate for the output
bitratenumberNo256000--32000, 64000, 128000, 256000Bitrate for the output audio

Minimax Speech 2.6 HD

MiniMax Speech 2.6 HD delivers studio-quality multilingual text-to-audio with nuanced prosody, subtitle export, and premium voices

Model ID: model_minimax-speech-2-6-hd

Capabilities: txt2audio

LLM Markdown: https://app.scenario.com/api/models/model_minimax-speech-2-6-hd/markdown

ParameterTypeRequiredDefaultMinMaxAllowed ValuesDescription
textstringYes----Text to convert to speech. Use <#x#> between words to control pause duration (0.01-99.99s).
voiceIdstringNoWise_Woman--Wise_Woman, Friendly_Person, Inspirational_girl, Deep_Voice_Man, Calm_Woman, Casual_Guy, Lively_Girl, Patient_Man, Young_Knight, Determined_Man, Lovely_Girl, Decent_Boy, Imposing_Manner, Elegant_Man, Abbess, Sweet_Girl_2, Exuberant_GirlDesired voice.
speednumberNo10.52-Speech speed
volumenumberNo1010-Speech volume
pitchnumberNo0-1212-Speech pitch
emotionstringNoauto--auto, happy, sad, angry, fearful, disgusted, surprised, calm, fluent, neutralSpeech emotion
sampleRatenumberNo32000--8000, 16000, 22050, 24000, 32000, 44100Sample rate for the generated speech
bitratenumberNo128000--32000, 64000, 128000, 256000Bitrate for the generated speech
channelstringNostereo--mono, stereoNumber of audio channels
languageBooststringNoAutomatic--None, Automatic, Chinese, Chinese,Yue, Cantonese, English, Arabic, Russian, Spanish, French, Portuguese, German, Turkish, Dutch, Ukrainian, Vietnamese, Indonesian, Japanese, Italian, Korean, Thai, Polish, Romanian, Greek, Czech, Finnish, Hindi, Bulgarian, Danish, Hebrew, Malay, Persian, Slovak, Swedish, Croatian, Filipino, Hungarian, Norwegian, Slovenian, Catalan, Nynorsk, Tamil, AfrikaansEnhance recognition of specific languages and dialects
englishNormalizationbooleanNofalse---Enable English text normalization for better number reading (slightly increases latency)

Minimax Speech 2.6 Turbo

Low-latency MiniMax Speech 2.6 Turbo brings multilingual, emotional text-to-speech.

Model ID: model_minimax-speech-2-6-turbo

Capabilities: txt2audio

LLM Markdown: https://app.scenario.com/api/models/model_minimax-speech-2-6-turbo/markdown

ParameterTypeRequiredDefaultMinMaxAllowed ValuesDescription
textstringYes----Text to convert to speech. Use <#x#> between words to control pause duration (0.01-99.99s).
voiceIdstringNoWise_Woman--Wise_Woman, Friendly_Person, Inspirational_girl, Deep_Voice_Man, Calm_Woman, Casual_Guy, Lively_Girl, Patient_Man, Young_Knight, Determined_Man, Lovely_Girl, Decent_Boy, Imposing_Manner, Elegant_Man, Abbess, Sweet_Girl_2, Exuberant_GirlDesired voice.
speednumberNo10.52-Speech speed
volumenumberNo1010-Speech volume
pitchnumberNo0-1212-Speech pitch
emotionstringNoauto--auto, happy, sad, angry, fearful, disgusted, surprised, calm, fluent, neutralSpeech emotion
sampleRatenumberNo32000--8000, 16000, 22050, 24000, 32000, 44100Sample rate for the generated speech
bitratenumberNo128000--32000, 64000, 128000, 256000Bitrate for the generated speech
channelstringNostereo--mono, stereoNumber of audio channels
languageBooststringNoAutomatic--None, Automatic, Chinese, Chinese,Yue, Cantonese, English, Arabic, Russian, Spanish, French, Portuguese, German, Turkish, Dutch, Ukrainian, Vietnamese, Indonesian, Japanese, Italian, Korean, Thai, Polish, Romanian, Greek, Czech, Finnish, Hindi, Bulgarian, Danish, Hebrew, Malay, Persian, Slovak, Swedish, Croatian, Filipino, Hungarian, Norwegian, Slovenian, Catalan, Nynorsk, Tamil, AfrikaansEnhance recognition of specific languages and dialects
englishNormalizationbooleanNofalse---Enable English text normalization for better number reading (slightly increases latency)

Minimax Speech 2.8 HD

MiniMax Speech 2.8 HD: studio-grade text-to-speech with 32+ languages, expressive emotion control, preset voices, and interjections like (laughs) or (applause). Suited for final production and broadcast-quality voiceovers.

Model ID: model_minimax-speech-2-8-hd

Capabilities: txt2audio

LLM Markdown: https://app.scenario.com/api/models/model_minimax-speech-2-8-hd/markdown

ParameterTypeRequiredDefaultMinMaxAllowed ValuesDescription
textstringYes----Text to convert to speech. Use <#x#> between segments to control pause duration (0.01–99.99s). Optional interjections such as (laughs), (sighs), or (applause) for more natural delivery.
voiceIdstringNoWise_Woman--Wise_Woman, Friendly_Person, Inspirational_girl, Deep_Voice_Man, Calm_Woman, Casual_Guy, Lively_Girl, Patient_Man, Young_Knight, Determined_Man, Lovely_Girl, Decent_Boy, Imposing_Manner, Elegant_Man, Abbess, Sweet_Girl_2, Exuberant_GirlDesired voice.
speednumberNo10.52-Speech speed
volumenumberNo1010-Speech volume
pitchnumberNo0-1212-Speech pitch
emotionstringNoauto--auto, happy, sad, angry, fearful, disgusted, surprised, calm, fluent, neutralSpeech emotion
sampleRatenumberNo32000--8000, 16000, 22050, 24000, 32000, 44100Sample rate for the generated speech
bitratenumberNo128000--32000, 64000, 128000, 256000Bitrate for the generated speech
channelstringNostereo--mono, stereoNumber of audio channels
languageBooststringNoAutomatic--None, Automatic, Chinese, Chinese,Yue, Cantonese, English, Arabic, Russian, Spanish, French, Portuguese, German, Turkish, Dutch, Ukrainian, Vietnamese, Indonesian, Japanese, Italian, Korean, Thai, Polish, Romanian, Greek, Czech, Finnish, Hindi, Bulgarian, Danish, Hebrew, Malay, Persian, Slovak, Swedish, Croatian, Filipino, Hungarian, Norwegian, Slovenian, Catalan, Nynorsk, Tamil, AfrikaansEnhance recognition of specific languages and dialects
englishNormalizationbooleanNofalse---Enable English text normalization for better number reading (slightly increases latency)

Minimax Speech 2.8 Turbo

MiniMax Speech 2.8 Turbo: fast, natural text-to-speech with 40+ languages, voice presets, emotion control, and optional interjections like (laughs) or (sighs). Optimized for low-latency and real-time use.

Model ID: model_minimax-speech-2-8-turbo

Capabilities: txt2audio

LLM Markdown: https://app.scenario.com/api/models/model_minimax-speech-2-8-turbo/markdown

ParameterTypeRequiredDefaultMinMaxAllowed ValuesDescription
textstringYes----Text to convert to speech. Use <#x#> between segments to control pause duration (0.01–99.99s). Optional interjections such as (laughs), (sighs), or (breath) for more natural delivery.
voiceIdstringNoWise_Woman--Wise_Woman, Friendly_Person, Inspirational_girl, Deep_Voice_Man, Calm_Woman, Casual_Guy, Lively_Girl, Patient_Man, Young_Knight, Determined_Man, Lovely_Girl, Decent_Boy, Imposing_Manner, Elegant_Man, Abbess, Sweet_Girl_2, Exuberant_GirlDesired voice.
speednumberNo10.52-Speech speed
volumenumberNo1010-Speech volume
pitchnumberNo0-1212-Speech pitch
emotionstringNoauto--auto, happy, sad, angry, fearful, disgusted, surprised, calm, fluent, neutralSpeech emotion
sampleRatenumberNo32000--8000, 16000, 22050, 24000, 32000, 44100Sample rate for the generated speech
bitratenumberNo128000--32000, 64000, 128000, 256000Bitrate for the generated speech
channelstringNostereo--mono, stereoNumber of audio channels
languageBooststringNoAutomatic--None, Automatic, Chinese, Chinese,Yue, Cantonese, English, Arabic, Russian, Spanish, French, Portuguese, German, Turkish, Dutch, Ukrainian, Vietnamese, Indonesian, Japanese, Italian, Korean, Thai, Polish, Romanian, Greek, Czech, Finnish, Hindi, Bulgarian, Danish, Hebrew, Malay, Persian, Slovak, Swedish, Croatian, Filipino, Hungarian, Norwegian, Slovenian, Catalan, Nynorsk, Tamil, AfrikaansEnhance recognition of specific languages and dialects
englishNormalizationbooleanNofalse---Enable English text normalization for better number reading (slightly increases latency)