Skip to content
Get started
GENERATION API CALLS
Audio Generation

Bytedance

This page is auto-generated from model configurations. Last updated: 2026-07-01.

This reference lists all available Bytedance audio generation models and their parameters. Use these parameter names when calling the Generation API.


BytePlus Seed Audio text-to-speech with audio or image references, and fine-grained speech controls.

Model ID: model_byteplus-seed-audio-1-0

Capabilities: txt2audio

LLM Markdown: https://app.scenario.com/api/models/model_byteplus-seed-audio-1-0/markdown

ParameterTypeRequiredDefaultMinMaxAllowed ValuesDescription
textPromptstringYes----The text you want spoken aloud, up to 2048 characters. When using audio references, point to each one in your text with @Audio1, @Audio2, or @Audio3.
audioReferencesfile_arrayNoβ€œ---Up to 3 audio samples to base the voice on. Refer to each one in your text as @Audio1, @Audio2, or @Audio3. Can’t be combined with image reference. Max duration: 30 seconds, Max size: 10 MB.
imageReferencefileNo----A reference image to derive the voice from (JPEG, PNG, or WebP, up to 10MB). Can’t be combined with audio references. Max 10 MB.
sampleRatenumberNo24000--8000, 16000, 24000, 32000, 44100, 48000The audio quality, in samples per second. Higher values sound clearer; 24 kHz is a good default for speech.
speechRatenumberNo0-50100-How fast the voice speaks. 0 is normal speed; higher is faster (100 is double speed), lower is slower (-50 is half speed).
loudnessRatenumberNo0-50100-How loud the voice is. 0 is normal; higher is louder (100 is double volume), lower is quieter (-50 is half volume).
pitchRatenumberNo0-1212-How high or low the voice sounds, in semitones (-12 to 12). Positive values raise the pitch; negative values lower it.