HeyGen

HeyGen Avatar 4

Model ID: model_heygen-avatar4-i2v

Input	Label	Type	Default	Min	Max	Allowed Values	Notes
`image`	Avatar Image	`file`	None	None	None	None	Required. Image to animate. Should contain a clear face.
`prompt`	Script	`string`	None	None	None	None	Required. The text the avatar will speak.
`background`	Background	`inputs_array`	[]	None	1	None	Optional background configuration (color, image, or video).
`voice`	Voice	`string`	Melissa	None	None	Melissa, Warm Pro Narrator, Chill Brian, Ivy, etc.	Required. Name of the voice to use for the avatar.
`resolution`	Resolution	`string`	720p	None	None	360p, 480p, 540p, 720p, 1080p	Video resolution preset.
`talkingStyle`	Talking Style	`string`	stable	None	None	stable, expressive	'stable' for minimal movement, 'expressive' for more animation.
`caption`	Add Captions	`boolean`	false	None	None	None	Whether to add captions to the video.

Model ID: model_heygen-v2-video-agent

Input	Label	Type	Default	Min	Max	Allowed Values	Notes
`prompt`	Prompt	`string`	None	None	None	None	Required. Natural language prompt describing style, visual elements, and desired length.
`duration`	Duration	`number`	30	5	120	None	Approximate video duration in seconds. Suggested: 30, 60, or 90.
`orientation`	Orientation	`string`	portrait	None	None	portrait, landscape	Video orientation selection.
`avatar`	Avatar	`string`	Adriana SuitSofa Front	None	None	See full list in documentation	Optional avatar to use in the video; extensive list available.

Model ID: model_heygen-v2-translate-precision

Input	Label	Type	Default	Min	Max	Allowed Values	Notes
`video`	Video	`file`	None	None	None	None	Required. URL of the video to translate.
`outputLanguage`	Output Language	`string`	Spanish	None	None	English, Spanish, French, Hindi, Italian, German, Polish, Portuguese, Chinese, Japanese, Dutch, Turkish, Korean, and many others.	The target language to translate the video into.
`translateAudioOnly`	Translate Audio Only	`boolean`	false	None	None	None	Translate only the audio, ignore faces and only translate the voice track.
`enableDynamicDuration`	Enable Dynamic Duration	`boolean`	true	None	None	None	Enhances conversational fluidity between languages with different speaking rates.
`speakerNum`	Speaker Num	`number`	None	None	None	None	Number of speakers in the video.

Model ID: model_heygen-v2-translate-speed

Input	Label	Type	Default	Min	Max	Allowed Values	Notes
`video`	Video	`file`	None	None	None	None	Required. URL of the video to translate.
`outputLanguage`	Output Language	`string`	Spanish	None	None	English, Spanish, French, Hindi, Italian, German, Polish, Portuguese, Chinese, Japanese, Dutch, Turkish, Korean, and many others	The target language to translate the video into.
`translateAudioOnly`	Translate Audio Only	`boolean`	false	None	None	None	Translate only the audio, ignore faces and only translate the voice track.
`enableDynamicDuration`	Enable Dynamic Duration	`boolean`	true	None	None	None	Enhances conversational fluidity between languages with different speaking rates.
`speakerNum`	Speaker Num	`number`	None	None	None	None	Number of speakers in the video.

Model ID: model_kling-lip-sync

Input	Label	Type	Default	Min	Max	Allowed Values	Notes
`videoUrl`	Video	`assetId`	–	–	–	–	2-10s, <100MB, 720p–1080p
`audioFile`	Lip Sync Audio File	`assetId`	–	–	–	–	required if text not provided
`text`	Lip Sync Text	`string`	–	–	–	–	required if audio not provided
`voiceId`	AI Voice List	`string`	en_AOT	–	–	(40+ voices available)	used when text provided
`voiceSpeed`	Voice Speed	`number`	1	0.8	2	–

Model ID: model_kling-video-ai-avatar-v2-pro

Input	Label	Type	Default	Notes
`image`	Image	`assetId`	–	required avatar image
`audio`	Audio	`assetId`	–	required
`text`	Add Description	`string`	–	optional cue

Model ID: model_bytedance-omni-human-1-5

Input	Label	Type	Default	Notes
`image`	Image	`assetId`	–	required
`audio`	Audio	`assetId`	–	required (recommended ≤15s for best quality)

Model ID: model_bytedance-omni-human

Input	Label	Type	Default	Notes
`image`	Image	`assetId`	–	required
`audio`	Audio	`assetId`	–	required (recommended ≤15s for best quality)

Model ID: model_pixverse-lipsync

Input	Label	Type	Default	Allowed Values	Notes
`video`	Video	`assetId`	–	–	required
`audio`	Audio	`assetId`	–	–	required if text not provided
`text`	Text To Speech	`string`	""	–	required if audio not provided
`voiceId`	Voice	`string`	Auto	Emily, James, Isabella, Liam, Chloe, Adrian, Harper, Ava, Sophia, Julia, Mason, Jack, Oliver, Ethan, Auto	used when text provided

Model ID: model_creatify-aurora

Input	Label	Type	Default	Min	Max	Allowed Values	Notes
`image`	Avatar Image	`assetId`	None	None	None	None	Required. Input avatar image
`audio`	Audio	`assetId`	None	None	None	None	Required. Input audio file
`prompt`	Prompt	`string`	None	None	2048	None	Optional text prompt to guide generation
`guidanceScale`	Prompt Guidance	`number`	1	0	5	None	Higher values follow the prompt more closely
`audioGuidanceScale`	Audio Guidance	`number`	2	0	5	None	Higher values follow the audio more closely
`resolution`	Resolution	`string`	720p	None	None	480p, 720p	Output video resolution

Model ID: model_sync-lipsync-react-1

Input	Label	Type	Default	Min	Max	Allowed Values	Notes
`video`	Video	`assetId`	None	None	None	None	Required. Input video file
`audio`	Audio	`assetId`	None	None	None	None	Required. Input audio file. Maximum duration is 15 seconds.
`emotion`	Emotion	`string`	neutral	None	None	happy, angry, sad, neutral, disgusted, surprised	Emotion prompt for generation
`modelMode`	Model Mode	`string`	face	None	None	lips, face, head	Controls edit region and movement scope
`lipsyncMode`	Lipsync Mode	`string`	bounce	None	None	loop, bounce, cut_off, silence, remap	Behavior when audio and video durations differ
`temperature`	Temperature	`number`	0.5	0	1	None	Controls expressiveness of lipsync

Model ID: model_sync-lipsync-v2-pro

Input	Label	Type	Default	Min	Max	Allowed Values	Notes
`video`	Video	`assetId`	–	–	–	–	required
`audio`	Audio	`assetId`	–	–	–	–	required
`syncMode`	Sync Mode	`string`	loop	–	–	loop, bounce, cut_off, silence, remap	lipsync behavior when lengths mismatch
`temperature`	Temperature	`number`	0.5	0	1	–	expressiveness
`activeSpeaker`	Active Speaker	`boolean`	false	–	–	–	detects active speaker