The complete, open-source guide to the Seedance 2.0 API by ByteDance on fal.ai — with working Python, Node.js (JavaScript/TypeScript), and cURL examples for every endpoint, pricing, schemas, prompt tips, and FAQs.
Seedance 2.0 is ByteDance's state-of-the-art video generation model available via the fal.ai API. It produces cinematic 720p video up to 15 seconds with native synchronized audio, lip-sync speech, multi-shot editing, real-world physics, and director-level camera control from text, image, or reference inputs.
This repo collects every Seedance 2.0 endpoint (standard and fast tiers), fully documented with copy-paste examples — the fastest way to start building with the Seedance 2.0 API today.
- What is Seedance 2.0?
- All Seedance 2.0 API Endpoints
- Pricing
- Quick Start
- Endpoint Reference
- Common Parameters
- Prompting Guide
- FAQ
- Deep-dive docs
- Resources
- Contributing
- License
Seedance 2.0 (by ByteDance, served on fal.ai) is a next-generation AI video generation model. Compared to first-generation text-to-video models, Seedance 2.0 delivers:
- Native synchronized audio — sound effects, ambient audio, and lip-synced speech generated together with video (not post-hoc dubbed).
- Multi-shot editing — a single prompt can produce multiple camera cuts within the same clip.
- Director-level camera control — pans, dollies, crane shots, dutch angles, and more, driven by natural language.
- Real-world physics — cloth, fluids, lighting, and object interactions behave plausibly.
- Multimodal inputs — combine text prompts with up to 9 reference images, 3 reference videos, and 3 audio clips (reference-to-video).
- Up to 15-second clips at 480p or 720p, in aspect ratios from 9:16 (vertical/Reels/Shorts/TikTok) to 21:9 (ultrawide cinematic).
Two tiers are available on fal.ai: the standard tier (highest quality) and the fast tier (lower latency and cost, same API surface).
| # | Endpoint | Model ID | Input | Audio | Playground |
|---|---|---|---|---|---|
| 1 | Text to Video | bytedance/seedance-2.0/text-to-video |
text prompt | yes | Open |
| 2 | Image to Video | bytedance/seedance-2.0/image-to-video |
text + start (and optional end) image | yes | Open |
| 3 | Reference to Video | bytedance/seedance-2.0/reference-to-video |
text + up to 9 images / 3 videos / 3 audio | yes | Open |
| 4 | Fast Text to Video | bytedance/seedance-2.0/fast/text-to-video |
text prompt | yes | Open |
| 5 | Fast Image to Video | bytedance/seedance-2.0/fast/image-to-video |
text + start (and optional end) image | yes | Open |
| 6 | Fast Reference to Video | bytedance/seedance-2.0/fast/reference-to-video |
text + up to 9 images / 3 videos / 3 audio | yes | Open |
Base HTTP endpoint: https://fal.run/<model-id> — e.g. https://fal.run/bytedance/seedance-2.0/fast/text-to-video.
Pricing is per second of generated video at 720p on fal.ai. Cost does not change whether you generate audio or not.
| Endpoint | 720p price / sec | Token price (per 1K tokens) |
|---|---|---|
| Text to Video | $0.3034 | $0.0140 |
| Image to Video | $0.3024 | $0.0140 |
| Reference to Video (images only) | $0.3024 | $0.0140 |
| Reference to Video (with video inputs) | $0.1814 | $0.0140 |
| Fast — Text to Video | $0.2419 | $0.0112 |
| Fast — Image to Video | $0.2419 | $0.0112 |
| Fast — Reference to Video (images only) | $0.2419 | $0.0112 |
| Fast — Reference to Video (with video inputs) | $0.14515 | $0.0112 |
Tokens are computed as (height * width * duration * 24) / 1024. For reference-to-video, input + output duration both count. See the official fal.ai pricing page for the latest numbers.
Sign up at fal.ai, create a key at fal.ai/dashboard/keys, and export it:
export FAL_KEY="your_fal_api_key_here"Or copy .env.example to .env and fill in the value.
pip install -r requirements.txt
python examples/python/text_to_video.pyimport fal_client
result = fal_client.subscribe(
"bytedance/seedance-2.0/fast/text-to-video",
arguments={
"prompt": "A cinematic shot of a hummingbird drinking from a neon flower at dusk, 4k, slow motion.",
"resolution": "720p",
"duration": "6",
"aspect_ratio": "16:9",
"generate_audio": True,
},
with_logs=True,
)
print(result["video"]["url"])cd examples/javascript
npm install
node text_to_video.mjsimport { fal } from "@fal-ai/client";
const result = await fal.subscribe("bytedance/seedance-2.0/fast/text-to-video", {
input: {
prompt: "A cinematic shot of a hummingbird drinking from a neon flower at dusk, 4k, slow motion.",
resolution: "720p",
duration: "6",
aspect_ratio: "16:9",
generate_audio: true,
},
logs: true,
});
console.log(result.data.video.url);curl --request POST \
--url https://fal.run/bytedance/seedance-2.0/fast/text-to-video \
--header "Authorization: Key $FAL_KEY" \
--header "Content-Type: application/json" \
--data '{
"prompt": "A cinematic shot of a hummingbird drinking from a neon flower at dusk, 4k, slow motion.",
"resolution": "720p",
"duration": "6",
"aspect_ratio": "16:9",
"generate_audio": true
}'Full runnable scripts live in examples/.
Generate a video from a pure text prompt.
Model IDs
- Standard:
bytedance/seedance-2.0/text-to-video - Fast:
bytedance/seedance-2.0/fast/text-to-video
Input
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
prompt |
string | yes | — | Your text prompt. |
resolution |
enum | no | 720p |
480p or 720p. |
duration |
enum | no | auto |
auto or "4"–"15" (seconds). |
aspect_ratio |
enum | no | auto |
21:9, 16:9, 4:3, 1:1, 3:4, 9:16, auto. |
generate_audio |
bool | no | true |
Native SFX/ambient/speech audio. |
seed |
int | no | — | For reproducibility. |
end_user_id |
string | no | — | Attribute usage to an end-user. |
Examples: examples/python/text_to_video.py, examples/javascript/text_to_video.mjs, examples/curl/text_to_video.sh.
Animate a still image into a cinematic clip. Supports optional end-frame control for clean transitions.
Model IDs
- Standard:
bytedance/seedance-2.0/image-to-video - Fast:
bytedance/seedance-2.0/fast/image-to-video
Additional Input
| Field | Type | Required | Notes |
|---|---|---|---|
image_url |
string | yes | Starting frame. JPEG/PNG/WebP, ≤ 30 MB. |
end_image_url |
string | no | Optional ending frame for smooth transitions. |
All other parameters match text-to-video. Examples: examples/python/image_to_video.py, examples/javascript/image_to_video.mjs, examples/curl/image_to_video.sh.
The most powerful mode. Provide up to 9 images, 3 videos, and 3 audio clips as references and mention them in the prompt as @Image1, @Video1, @Audio1, etc.
Model IDs
- Standard:
bytedance/seedance-2.0/reference-to-video - Fast:
bytedance/seedance-2.0/fast/reference-to-video
Additional Input
| Field | Type | Required | Notes |
|---|---|---|---|
image_urls |
string[] | no | Up to 9. JPEG/PNG/WebP, ≤ 30 MB each. |
video_urls |
string[] | no | Up to 3. MP4/MOV, combined 2–15 s, ≤ 50 MB, ~480p–720p. |
audio_urls |
string[] | no | Up to 3. MP3/WAV, combined ≤ 15 s, ≤ 15 MB each. Requires at least one reference image or video. |
Total files across all modalities must not exceed 12. Examples: examples/python/reference_to_video.py, examples/javascript/reference_to_video.mjs, examples/curl/reference_to_video.sh.
The fast tier has an identical API surface to the standard tier — just swap the model ID. Use it when you want lower latency and cost and can accept a small quality tradeoff.
| Instead of... | Use... |
|---|---|
bytedance/seedance-2.0/text-to-video |
bytedance/seedance-2.0/fast/text-to-video |
bytedance/seedance-2.0/image-to-video |
bytedance/seedance-2.0/fast/image-to-video |
bytedance/seedance-2.0/reference-to-video |
bytedance/seedance-2.0/fast/reference-to-video |
These parameters work identically across every Seedance 2.0 endpoint:
resolution—480p(faster, cheaper) or720p(default, balanced).duration—"auto"or an integer-as-string from"4"to"15"seconds.aspect_ratio—21:9,16:9,4:3,1:1,3:4,9:16, orauto. Use9:16for TikTok/Reels/Shorts,16:9for YouTube/landscape,21:9for cinematic.generate_audio—trueby default. Keeps audio free — same cost whether on or off.seed— integer for reproducibility.end_user_id— optional string to attribute usage per end user.
Output is the same for every endpoint:
{
"video": { "url": "https://.../output.mp4" },
"seed": 42
}Seedance 2.0 responds very well to cinematographic language. Use this template:
[Shot type] of [subject] [action] in [environment], [lighting], [mood/style], [camera movement], [audio cue].
Examples
Wide aerial tracking shot of a lone cyclist riding through a neon-lit Tokyo alley at night, volumetric rain, cinematic lighting, slow push-in, rain and distant thunder.Close-up of a chef flambéing a pan in a rustic kitchen, golden hour through the window, shallow depth of field, handheld camera, sizzling oil and jazz music.Multi-shot: establishing drone shot of an alpine lake at sunrise, cut to a hiker sipping coffee, cut to a hawk taking flight. Ambient birdsong and wind.
Tips
- Be specific about motion. "Slow dolly-in", "whip pan", "crane rise", "handheld follow" — Seedance 2.0 understands them.
- Mention audio explicitly if you want specific SFX ("footsteps on gravel", "distant thunder", "crowd cheering").
- For lip-sync, add the exact line of dialogue in quotes and describe the speaker's emotion.
- Reference inputs: in reference-to-video, refer to references as
@Image1,@Video2,@Audio1inside the prompt to direct how they're used. - Aspect ratio matters for composition. Vertical (
9:16) naturally favours portraits;21:9favours landscapes and establishing shots.
The Seedance 2.0 API is ByteDance's video generation API exposed on fal.ai. It generates up to 15-second 720p video with native synchronized audio from text prompts, still images, or multimodal references.
Create a fal.ai account, grab an API key, and call any of the six Seedance 2.0 endpoints listed above via HTTP, the Python client (fal-client), or the JavaScript client (@fal-ai/client).
Both tiers share the same parameters and features. The fast tier trades a small amount of quality for lower latency and about 20–40% lower cost.
Yes — natively. generate_audio defaults to true and covers sound effects, ambient audio, and lip-synced speech. Audio generation is free (same cost whether on or off).
480p or 720p, and 4 to 15 seconds (or auto to let the model pick).
21:9, 16:9, 4:3, 1:1, 3:4, 9:16, or auto.
From $0.14515/sec (fast, reference-to-video with video inputs) up to $0.3034/sec (standard text-to-video). See the pricing table.
Yes — fal.ai lists Seedance 2.0 as commercial-use. Always re-check the current terms on the model page.
Yes. Use fal.queue.submit / fal.queue.status / fal.queue.result (JS) or the equivalent Python queue helpers for long-running jobs and webhooks. See docs/queue.md.
Upload images, videos, and/or audio, then refer to them by name (@Image1, @Video1, @Audio1) inside your prompt. Seedance 2.0 will condition the generation on them.
- Official model pages on fal.ai
- fal.ai docs
- Python client
fal-client - JavaScript client
@fal-ai/client - fal.ai pricing
PRs welcome! Useful contributions:
- More prompt recipes (cinematic, product ads, anime, lip-sync dialogues).
- Examples in additional languages (Go, Rust, Ruby, PHP, Swift).
- Benchmarks comparing standard vs. fast tier.
- Integrations (Next.js, FastAPI, Cloudflare Workers).
Open an issue or pull request.
MIT — free to use, modify, and distribute. This repo is an unofficial community guide to the Seedance 2.0 API on fal.ai; it is not affiliated with ByteDance or fal.ai.
Keywords: seedance 2.0, seedance 2.0 api, seedance 2 api, bytedance seedance, bytedance video generation, fal.ai seedance, fal seedance 2.0, seedance text to video, seedance image to video, seedance reference to video, seedance fast api, AI video generation API, text to video API, image to video API, lip sync video API, synchronized audio video AI, 720p AI video generation, 15 second AI video, cinematic AI video API.