Bytedance
This page is auto-generated from model configurations. Last updated: 2026-07-01.
This reference lists all available Bytedance audio generation models and their parameters. Use these parameter names when calling the Generation API.
Seed Audio 1.0
Section titled βSeed Audio 1.0βBytePlus Seed Audio text-to-speech with audio or image references, and fine-grained speech controls.
Model ID: model_byteplus-seed-audio-1-0
Capabilities: txt2audio
LLM Markdown: https://app.scenario.com/api/models/model_byteplus-seed-audio-1-0/markdown
| Parameter | Type | Required | Default | Min | Max | Allowed Values | Description |
|---|---|---|---|---|---|---|---|
textPrompt | string | Yes | - | - | - | - | The text you want spoken aloud, up to 2048 characters. When using audio references, point to each one in your text with @Audio1, @Audio2, or @Audio3. |
audioReferences | file_array | No | β | - | - | - | Up to 3 audio samples to base the voice on. Refer to each one in your text as @Audio1, @Audio2, or @Audio3. Canβt be combined with image reference. Max duration: 30 seconds, Max size: 10 MB. |
imageReference | file | No | - | - | - | - | A reference image to derive the voice from (JPEG, PNG, or WebP, up to 10MB). Canβt be combined with audio references. Max 10 MB. |
sampleRate | number | No | 24000 | - | - | 8000, 16000, 24000, 32000, 44100, 48000 | The audio quality, in samples per second. Higher values sound clearer; 24 kHz is a good default for speech. |
speechRate | number | No | 0 | -50 | 100 | - | How fast the voice speaks. 0 is normal speed; higher is faster (100 is double speed), lower is slower (-50 is half speed). |
loudnessRate | number | No | 0 | -50 | 100 | - | How loud the voice is. 0 is normal; higher is louder (100 is double volume), lower is quieter (-50 is half volume). |
pitchRate | number | No | 0 | -12 | 12 | - | How high or low the voice sounds, in semitones (-12 to 12). Positive values raise the pitch; negative values lower it. |