Bytedance

GENERATION API CALLS

Audio Generation

This page is auto-generated from model configurations. Last updated: 2026-07-01.

This reference lists all available Bytedance audio generation models and their parameters. Use these parameter names when calling the Generation API.

Seed Audio 1.0

Seed Audio 1.0

BytePlus Seed Audio text-to-speech with audio or image references, and fine-grained speech controls.

Model ID: model_byteplus-seed-audio-1-0

Capabilities: txt2audio

LLM Markdown: https://app.scenario.com/api/models/model_byteplus-seed-audio-1-0/markdown

Parameter	Type	Required	Default	Min	Max	Allowed Values	Description
`textPrompt`	string	Yes	-	-	-	-	The text you want spoken aloud, up to 2048 characters. When using audio references, point to each one in your text with @Audio1, @Audio2, or @Audio3.
`audioReferences`	file_array	No	“	-	-	-	Up to 3 audio samples to base the voice on. Refer to each one in your text as @Audio1, @Audio2, or @Audio3. Can’t be combined with image reference. Max duration: 30 seconds, Max size: 10 MB.
`imageReference`	file	No	-	-	-	-	A reference image to derive the voice from (JPEG, PNG, or WebP, up to 10MB). Can’t be combined with audio references. Max 10 MB.
`sampleRate`	number	No	`24000`	-	-	`8000`, `16000`, `24000`, `32000`, `44100`, `48000`	The audio quality, in samples per second. Higher values sound clearer; 24 kHz is a good default for speech.
`speechRate`	number	No	`0`	-50	100	-	How fast the voice speaks. 0 is normal speed; higher is faster (100 is double speed), lower is slower (-50 is half speed).
`loudnessRate`	number	No	`0`	-50	100	-	How loud the voice is. 0 is normal; higher is louder (100 is double volume), lower is quieter (-50 is half volume).
`pitchRate`	number	No	`0`	-12	12	-	How high or low the voice sounds, in semitones (-12 to 12). Positive values raise the pitch; negative values lower it.

Bytedance

Seed Audio 1.0

What can I help you with?