MiniMax

This page is auto-generated from model configurations. Last updated: 2026-04-15.

This reference lists all available MiniMax audio generation models and their parameters. Use these parameter names when calling the Generation API.

Minimax Music 2.0
Minimax Music 2.5
Minimax Music 2.6
Minimax Music Cover
Minimax Speech 2.6 HD
Minimax Speech 2.6 Turbo
Minimax Speech 2.8 HD
Minimax Speech 2.8 Turbo

Minimax Music 2.0

Model ID: model_minimax-music-2-0

Capabilities: txt2audio

LLM Markdown: https://app.scenario.com/api/models/model_minimax-music-2-0/markdown

Parameter	Type	Required	Default	Min	Max	Allowed Values	Description
`prompt`	string	Yes	-	-	-	-	A description of the music, specifying style, mood, and scenario.
`lyrics`	string	Yes	-	-	-	-	Lyrics of the song. Use n to separate lines. You may add structure tags like [Intro], [Verse], [Chorus], [Bridge], [Outro] to enhance the arrangement.
`sampleRate`	number	No	`44100`	-	-	`8000`, `16000`, `22050`, `24000`, `32000`, `44100`	Sample rate for the generated music
`bitrate`	number	No	`256000`	-	-	`32000`, `64000`, `128000`, `256000`	Bitrate for the generated music

Minimax Music 2.5

Full-length songs with natural vocals and rich instrumentation from lyrics and an optional style prompt. Control arrangement with 14+ section tags (e.g. [Intro], [Verse], [Chorus]).

Model ID: model_minimax-music-2-5

Capabilities: txt2audio

LLM Markdown: https://app.scenario.com/api/models/model_minimax-music-2-5/markdown

Parameter	Type	Required	Default	Min	Max	Allowed Values	Description
`lyrics`	string	Yes	-	-	-	-	Lyrics for the song (1–3,500 characters). Use \n between lines and \n\n for pauses. Structure tags include [Intro], [Verse], [Pre Chorus], [Chorus], [Hook], [Drop], [Bridge], [Solo], [Build Up], [Inst], [Interlude], [Break], [Transition], [Outro], and more.
`prompt`	string	No	-	-	-	-	Optional description of genre, mood, tempo, vocal style, and instrumentation (up to 2,000 characters).
`sampleRate`	number	No	`44100`	-	-	`16000`, `24000`, `32000`, `44100`	Audio sample rate for the generated music
`bitrate`	number	No	`256000`	-	-	`32000`, `64000`, `128000`, `256000`	Bitrate for the generated music

Minimax Music 2.6

MiniMax Music 2.6 creates full-length songs with vocals and rich orchestration from lyrics and a style prompt, or instrumental-only from a prompt. BPM/key hints in the prompt, optional auto-generated lyrics, and up to about six minutes per run.

Model ID: model_minimax-music-2-6

Capabilities: txt2audio

LLM Markdown: https://app.scenario.com/api/models/model_minimax-music-2-6/markdown

Parameter	Type	Required	Default	Min	Max	Allowed Values	Description
`lyrics`	string	No	-	-	-	-	Lyrics for vocal tracks (up to 3,500 characters). Omit for instrumental mode, or leave empty and enable 'Auto lyrics' to generate lyrics from the style prompt. Use \n between lines and \n\n for pauses. Structure tags include [Intro], [Verse], [Pre Chorus], [Chorus], [Hook], [Drop], [Bridge], [Solo], [Build Up], [Inst], [Interlude], [Break], [Transition], [Outro], and more.
`prompt`	string	Yes	-	-	-	-	Genre, mood, tempo (e.g. BPM), key, vocal style, and instrumentation (up to 2,000 characters). For instrumental mode, this is the main instruction.
`isInstrumental`	boolean	No	`false`	-	-	-	When true, generates music without vocals. Use the style prompt only; lyrics are not required.
`lyricsOptimizer`	boolean	No	`false`	-	-	-	When true and lyrics are empty, the model generates lyrics from the style prompt.
`sampleRate`	number	No	`44100`	-	-	`16000`, `24000`, `32000`, `44100`	Audio sample rate for the generated music
`bitrate`	number	No	`256000`	-	-	`32000`, `64000`, `128000`, `256000`	Bitrate for the generated music

Minimax Music Cover

Reimagine an existing song in a new style while preserving the original melody: voice, instruments, genre, and arrangement can change from a target style prompt. Works best with clear vocals and melody.

Model ID: model_minimax-music-cover

Capabilities: audio2audio

LLM Markdown: https://app.scenario.com/api/models/model_minimax-music-cover/markdown

Parameter	Type	Required	Default	Min	Max	Allowed Values	Description
`audioUrl`	file	Yes	-	-	-	-	Audio to cover (MP3 or WAV). Works best with music that has clear vocals and melody.
`prompt`	string	Yes	-	-	-	-	Target style for the cover: genre, vocal character, instruments, and production (up to 2,000 characters).
`sampleRate`	number	No	`44100`	-	-	`16000`, `24000`, `32000`, `44100`	Audio sample rate for the output
`bitrate`	number	No	`256000`	-	-	`32000`, `64000`, `128000`, `256000`	Bitrate for the output audio

Minimax Speech 2.6 HD

MiniMax Speech 2.6 HD delivers studio-quality multilingual text-to-audio with nuanced prosody, subtitle export, and premium voices

Model ID: model_minimax-speech-2-6-hd

Capabilities: txt2audio

LLM Markdown: https://app.scenario.com/api/models/model_minimax-speech-2-6-hd/markdown

Parameter	Type	Required	Default	Min	Max	Allowed Values	Description
`text`	string	Yes	-	-	-	-	Text to convert to speech. Use <#x#> between words to control pause duration (0.01-99.99s).
`voiceId`	string	No	`Wise_Woman`	-	-	`Wise_Woman`, `Friendly_Person`, `Inspirational_girl`, `Deep_Voice_Man`, `Calm_Woman`, `Casual_Guy`, `Lively_Girl`, `Patient_Man`, `Young_Knight`, `Determined_Man`, `Lovely_Girl`, `Decent_Boy`, `Imposing_Manner`, `Elegant_Man`, `Abbess`, `Sweet_Girl_2`, `Exuberant_Girl`	Desired voice.
`speed`	number	No	`1`	0.5	2	-	Speech speed
`volume`	number	No	`1`	0	10	-	Speech volume
`pitch`	number	No	`0`	-12	12	-	Speech pitch
`emotion`	string	No	`auto`	-	-	`auto`, `happy`, `sad`, `angry`, `fearful`, `disgusted`, `surprised`, `calm`, `fluent`, `neutral`	Speech emotion
`sampleRate`	number	No	`32000`	-	-	`8000`, `16000`, `22050`, `24000`, `32000`, `44100`	Sample rate for the generated speech
`bitrate`	number	No	`128000`	-	-	`32000`, `64000`, `128000`, `256000`	Bitrate for the generated speech
`channel`	string	No	`stereo`	-	-	`mono`, `stereo`	Number of audio channels
`languageBoost`	string	No	`Automatic`	-	-	`None`, `Automatic`, `Chinese`, `Chinese,Yue`, `Cantonese`, `English`, `Arabic`, `Russian`, `Spanish`, `French`, `Portuguese`, `German`, `Turkish`, `Dutch`, `Ukrainian`, `Vietnamese`, `Indonesian`, `Japanese`, `Italian`, `Korean`, `Thai`, `Polish`, `Romanian`, `Greek`, `Czech`, `Finnish`, `Hindi`, `Bulgarian`, `Danish`, `Hebrew`, `Malay`, `Persian`, `Slovak`, `Swedish`, `Croatian`, `Filipino`, `Hungarian`, `Norwegian`, `Slovenian`, `Catalan`, `Nynorsk`, `Tamil`, `Afrikaans`	Enhance recognition of specific languages and dialects
`englishNormalization`	boolean	No	`false`	-	-	-	Enable English text normalization for better number reading (slightly increases latency)

Minimax Speech 2.6 Turbo

Low-latency MiniMax Speech 2.6 Turbo brings multilingual, emotional text-to-speech.

Model ID: model_minimax-speech-2-6-turbo

Capabilities: txt2audio

LLM Markdown: https://app.scenario.com/api/models/model_minimax-speech-2-6-turbo/markdown

Parameter	Type	Required	Default	Min	Max	Allowed Values	Description
`text`	string	Yes	-	-	-	-	Text to convert to speech. Use <#x#> between words to control pause duration (0.01-99.99s).
`voiceId`	string	No	`Wise_Woman`	-	-	`Wise_Woman`, `Friendly_Person`, `Inspirational_girl`, `Deep_Voice_Man`, `Calm_Woman`, `Casual_Guy`, `Lively_Girl`, `Patient_Man`, `Young_Knight`, `Determined_Man`, `Lovely_Girl`, `Decent_Boy`, `Imposing_Manner`, `Elegant_Man`, `Abbess`, `Sweet_Girl_2`, `Exuberant_Girl`	Desired voice.
`speed`	number	No	`1`	0.5	2	-	Speech speed
`volume`	number	No	`1`	0	10	-	Speech volume
`pitch`	number	No	`0`	-12	12	-	Speech pitch
`emotion`	string	No	`auto`	-	-	`auto`, `happy`, `sad`, `angry`, `fearful`, `disgusted`, `surprised`, `calm`, `fluent`, `neutral`	Speech emotion
`sampleRate`	number	No	`32000`	-	-	`8000`, `16000`, `22050`, `24000`, `32000`, `44100`	Sample rate for the generated speech
`bitrate`	number	No	`128000`	-	-	`32000`, `64000`, `128000`, `256000`	Bitrate for the generated speech
`channel`	string	No	`stereo`	-	-	`mono`, `stereo`	Number of audio channels
`languageBoost`	string	No	`Automatic`	-	-	`None`, `Automatic`, `Chinese`, `Chinese,Yue`, `Cantonese`, `English`, `Arabic`, `Russian`, `Spanish`, `French`, `Portuguese`, `German`, `Turkish`, `Dutch`, `Ukrainian`, `Vietnamese`, `Indonesian`, `Japanese`, `Italian`, `Korean`, `Thai`, `Polish`, `Romanian`, `Greek`, `Czech`, `Finnish`, `Hindi`, `Bulgarian`, `Danish`, `Hebrew`, `Malay`, `Persian`, `Slovak`, `Swedish`, `Croatian`, `Filipino`, `Hungarian`, `Norwegian`, `Slovenian`, `Catalan`, `Nynorsk`, `Tamil`, `Afrikaans`	Enhance recognition of specific languages and dialects
`englishNormalization`	boolean	No	`false`	-	-	-	Enable English text normalization for better number reading (slightly increases latency)

Minimax Speech 2.8 HD

MiniMax Speech 2.8 HD: studio-grade text-to-speech with 32+ languages, expressive emotion control, preset voices, and interjections like (laughs) or (applause). Suited for final production and broadcast-quality voiceovers.

Model ID: model_minimax-speech-2-8-hd

Capabilities: txt2audio

LLM Markdown: https://app.scenario.com/api/models/model_minimax-speech-2-8-hd/markdown

Parameter	Type	Required	Default	Min	Max	Allowed Values	Description
`text`	string	Yes	-	-	-	-	Text to convert to speech. Use <#x#> between segments to control pause duration (0.01–99.99s). Optional interjections such as (laughs), (sighs), or (applause) for more natural delivery.
`voiceId`	string	No	`Wise_Woman`	-	-	`Wise_Woman`, `Friendly_Person`, `Inspirational_girl`, `Deep_Voice_Man`, `Calm_Woman`, `Casual_Guy`, `Lively_Girl`, `Patient_Man`, `Young_Knight`, `Determined_Man`, `Lovely_Girl`, `Decent_Boy`, `Imposing_Manner`, `Elegant_Man`, `Abbess`, `Sweet_Girl_2`, `Exuberant_Girl`	Desired voice.
`speed`	number	No	`1`	0.5	2	-	Speech speed
`volume`	number	No	`1`	0	10	-	Speech volume
`pitch`	number	No	`0`	-12	12	-	Speech pitch
`emotion`	string	No	`auto`	-	-	`auto`, `happy`, `sad`, `angry`, `fearful`, `disgusted`, `surprised`, `calm`, `fluent`, `neutral`	Speech emotion
`sampleRate`	number	No	`32000`	-	-	`8000`, `16000`, `22050`, `24000`, `32000`, `44100`	Sample rate for the generated speech
`bitrate`	number	No	`128000`	-	-	`32000`, `64000`, `128000`, `256000`	Bitrate for the generated speech
`channel`	string	No	`stereo`	-	-	`mono`, `stereo`	Number of audio channels
`languageBoost`	string	No	`Automatic`	-	-	`None`, `Automatic`, `Chinese`, `Chinese,Yue`, `Cantonese`, `English`, `Arabic`, `Russian`, `Spanish`, `French`, `Portuguese`, `German`, `Turkish`, `Dutch`, `Ukrainian`, `Vietnamese`, `Indonesian`, `Japanese`, `Italian`, `Korean`, `Thai`, `Polish`, `Romanian`, `Greek`, `Czech`, `Finnish`, `Hindi`, `Bulgarian`, `Danish`, `Hebrew`, `Malay`, `Persian`, `Slovak`, `Swedish`, `Croatian`, `Filipino`, `Hungarian`, `Norwegian`, `Slovenian`, `Catalan`, `Nynorsk`, `Tamil`, `Afrikaans`	Enhance recognition of specific languages and dialects
`englishNormalization`	boolean	No	`false`	-	-	-	Enable English text normalization for better number reading (slightly increases latency)

Minimax Speech 2.8 Turbo

MiniMax Speech 2.8 Turbo: fast, natural text-to-speech with 40+ languages, voice presets, emotion control, and optional interjections like (laughs) or (sighs). Optimized for low-latency and real-time use.

Model ID: model_minimax-speech-2-8-turbo

Capabilities: txt2audio

LLM Markdown: https://app.scenario.com/api/models/model_minimax-speech-2-8-turbo/markdown

Parameter	Type	Required	Default	Min	Max	Allowed Values	Description
`text`	string	Yes	-	-	-	-	Text to convert to speech. Use <#x#> between segments to control pause duration (0.01–99.99s). Optional interjections such as (laughs), (sighs), or (breath) for more natural delivery.
`voiceId`	string	No	`Wise_Woman`	-	-	`Wise_Woman`, `Friendly_Person`, `Inspirational_girl`, `Deep_Voice_Man`, `Calm_Woman`, `Casual_Guy`, `Lively_Girl`, `Patient_Man`, `Young_Knight`, `Determined_Man`, `Lovely_Girl`, `Decent_Boy`, `Imposing_Manner`, `Elegant_Man`, `Abbess`, `Sweet_Girl_2`, `Exuberant_Girl`	Desired voice.
`speed`	number	No	`1`	0.5	2	-	Speech speed
`volume`	number	No	`1`	0	10	-	Speech volume
`pitch`	number	No	`0`	-12	12	-	Speech pitch
`emotion`	string	No	`auto`	-	-	`auto`, `happy`, `sad`, `angry`, `fearful`, `disgusted`, `surprised`, `calm`, `fluent`, `neutral`	Speech emotion
`sampleRate`	number	No	`32000`	-	-	`8000`, `16000`, `22050`, `24000`, `32000`, `44100`	Sample rate for the generated speech
`bitrate`	number	No	`128000`	-	-	`32000`, `64000`, `128000`, `256000`	Bitrate for the generated speech
`channel`	string	No	`stereo`	-	-	`mono`, `stereo`	Number of audio channels
`languageBoost`	string	No	`Automatic`	-	-	`None`, `Automatic`, `Chinese`, `Chinese,Yue`, `Cantonese`, `English`, `Arabic`, `Russian`, `Spanish`, `French`, `Portuguese`, `German`, `Turkish`, `Dutch`, `Ukrainian`, `Vietnamese`, `Indonesian`, `Japanese`, `Italian`, `Korean`, `Thai`, `Polish`, `Romanian`, `Greek`, `Czech`, `Finnish`, `Hindi`, `Bulgarian`, `Danish`, `Hebrew`, `Malay`, `Persian`, `Slovak`, `Swedish`, `Croatian`, `Filipino`, `Hungarian`, `Norwegian`, `Slovenian`, `Catalan`, `Nynorsk`, `Tamil`, `Afrikaans`	Enhance recognition of specific languages and dialects
`englishNormalization`	boolean	No	`false`	-	-	-	Enable English text normalization for better number reading (slightly increases latency)