Early access · unified API

The AI Video Generation API reference

Every major video model behind one API surface: Veo, Kling, Runway, Seedance, Wan, Hailuo and more. Below: the comparison table, real per-second pricing, latency numbers and the async integration pattern they all share.

One key, every modelPer-second billingWebhooks + polling

The comparison

Every major video generation model, one table

List pricing and latency compiled from provider docs and public aggregator rates, checked July 2026. Prices are per second of generated video; a "10s clip" column keeps the math honest.

Prices move fast in this market. Ranges reflect resolution tiers and direct-vs-aggregator differences. Latency = typical time to first downloadable clip for a 5 to 10 second generation.
Model$/second10s clipLatencyMax resMax lengthNative audioImage-to-videoAccess
Google Veo 3.1 cinematic leader$0.05 to 0.20$0.50 to 2.0015 to 25s1080p8s (+extend)YesYesGemini API, Vertex, aggregators
Kling 3.0 Pro speed + value$0.08 to 0.11$0.84 to 1.1215 to 30s1080p15sYes (Omni)YesKling API, aggregators
Runway Gen-4.5 quality benchmark #1$0.20 to 0.25$2.00 to 2.5020 to 40s1080p10s (+extend)NoYesRunway API, aggregators
Seedance 2.0$0.08 to 0.10$0.80 to 1.00~45s+1080p10sYesYes (9 ref images)BytePlus, aggregators
Wan 2.7 cheapest 1080p$0.02 to 0.10$0.20 to 1.0030 to 60s1080p10sNoYesAlibaba, aggregators, self-host (open)
Hailuo 2.3 Pro~$0.05 to 0.08~$0.50 to 0.8030 to 60s1080p10sNoYesMiniMax API, aggregators
Luma Ray 3~$0.06 to 0.12~$0.60 to 1.2020 to 40s1080p10s (+extend)NoYesLuma API, aggregators
Amazon Nova Reel~$0.08~$0.8060s+720p2min (multi-shot)NoYesAWS Bedrock
PixVerse v6~$0.04 to 0.08~$0.40 to 0.8020 to 40s1080p8sNoYesPixVerse API, aggregators
Vidu Q3~$0.04 to 0.08~$0.40 to 0.8030 to 60s1080p16sNoYesVidu API, aggregators
Hunyuan Video open sourceCompute onlyGPU costHardware-bound720p+~5sNoYesSelf-host, open weights
OpenAI Sora 2 API retiring Sept 2026Credit-basedVaries60s+1080p20s (+ext to 120s)YesYesOpenAI API (deprecated), app
Heads up: the Sora 2 video API is being retired

OpenAI has announced the Sora 2 Videos API shuts down on September 24, 2026. If you built on it, the practical migrations are Veo 3.1 (closest on quality plus native audio), Kling 3.0 (cheaper, faster) or an aggregator layer so the next deprecation is a config change instead of a rewrite. That last option is exactly why we're building this unified API.

Know what you're buying

Three kinds of "video API" (don't integrate the wrong one)

Half the frustration with this market is vendors using one term for three different products. Route yourself first:

Generative

Text/image to video models

Veo, Kling, Runway, Seedance, Wan, Hailuo. A prompt or a still image goes in, novel footage comes out. Priced per second of output. This page (and this API) is about these.

Avatar

Script to presenter

HeyGen, Synthesia, D-ID, Tavus. A script becomes a talking-head video with a licensed or cloned presenter. Priced per minute or credit. Right choice for training and personalized outreach video.

Render

Template assembly

Shotstack, Creatomate, JSON2Video. Programmatic editing: your existing clips, images and captions composited by template. Nothing is generated. Right choice for automated slideshows and data-driven video.

Integration

The async pattern every video API shares

Video generation takes 15 to 60+ seconds, so every serious API is asynchronous: submit a job, then poll or receive a webhook. Here is the same integration in three languages against our unified endpoint. Swap the model string to switch providers; nothing else changes.

curl
# 1. submit
curl -X POST .../v1/generate \
 -H "Authorization: Bearer $KEY" \
 -d '{"model":"kling-3.0",
     "prompt":"product rotating on
      a marble pedestal, studio
      light","duration":5}'

# 2. poll until done
curl .../v1/jobs/gen_8f2k1 \
 -H "Authorization: Bearer $KEY"
# {"status":"succeeded",
#  "video_url":"https://..."}
python
import time, requests

job = requests.post(
  f"{BASE}/v1/generate",
  headers=auth,
  json={"model": "veo-3.1",
        "prompt": prompt,
        "duration": 8}
).json()

while True:
    j = requests.get(
      f"{BASE}/v1/jobs/{job['id']}",
      headers=auth).json()
    if j["status"] != "running":
        break
    time.sleep(3)

print(j["video_url"])
node
const job = await fetch(
  `${BASE}/v1/generate`, {
  method: "POST",
  headers: auth,
  body: JSON.stringify({
    model: "wan-2.7",
    prompt,
    duration: 5,
    webhook_url:
      "https://app.dev/hooks"
  })
}).then(r => r.json());

// webhook fires on completion:
// { id, status: "succeeded",
//   video_url, seconds_billed }

Two production notes the docs never lead with. First, always implement the webhook path even if you start with polling: long generations plus polling loops are how serverless bills explode. Second, store the provider job ID before you await anything; when a function times out mid-poll, that ID is the difference between resuming and re-billing.

Cost at scale

What 1,000 ten-second clips actually cost

Per-second prices look interchangeable until you multiply. The same monthly workload, 1,000 clips of 10 seconds each, priced across the field:

Model1,000 x 10s clips / monthRead
Wan 2.7$200 to 1,000The volume play. B-roll, drafts, high-iteration creative testing.
Hailuo 2.3 / PixVerse / Vidu$400 to 800The value middle. Good quality per dollar for social content.
Seedance 2.0$800 to 1,000Pay for multimodal control: 9 reference images, audio input.
Kling 3.0 Pro$840 to 1,120Fastest latency tier plus audio. The ads workhorse.
Veo 3.1$500 to 2,000Cinematic ceiling; the range is the resolution tier you pick.
Runway Gen-4.5$2,000 to 2,500Benchmark-topping quality. Hero shots, not volume.

The strategy that falls out of this table: route by shot value. Draft and iterate on a cheap model, re-render the winning prompt on a premium one. Teams doing this cut video spend 60 to 80 percent versus running everything on the flagship, and it is precisely the workflow a unified API makes a one-line change.

Use-case router

Which model for which job

You're buildingUseWhy
Ad creative testing at volumeKling 3.0, Seedance 2.0Cheap enough to test 20 variants, fast enough to iterate same-day, native audio for sound-on placements.
Cinematic brand filmVeo 3.1, Runway Gen-4.5Best physics, coherence and camera language. Render few, render high.
Social b-roll pipelinesWan 2.7, Hailuo 2.3Cost per clip low enough to generate daily at feed scale.
Product demos from stillsKling 3.0, PixVerse v6Strong image-to-video: your real product photos become motion without a studio.
Avatar or training videoHeyGen, Synthesia, TavusDifferent category (see above). Script-to-presenter beats generative models for talking heads.
Full control / no per-second feesHunyuan, Wan (open weights)Self-host on your GPUs. You trade ops burden for marginal cost.

Direct integration vs an aggregator layer

Going direct to one provider gets you enterprise SLAs, the newest checkpoints first, and one vendor relationship. It also gets you their rate limits, their deprecation schedule (see Sora) and a rewrite every time the leaderboard flips, which in this market is roughly quarterly.

An aggregation layer (one key, one schema, model as a string parameter) trades a small markup for portability: reroute when prices drop, A/B models on real traffic, and survive deprecations with a config change. Our position is obviously the second camp, since that is what we are building, but the honest rule is: direct if you are certain of your model and volume-negotiating with the vendor; aggregate if you value optionality. Most teams shipping product, not research, want optionality.

FAQ

Video generation API questions, answered

What is the cheapest AI video generation API?

Wan 2.7 is the cheapest mainstream 1080p option at roughly $0.02 to 0.10 per second depending on host and tier, which puts a 10-second clip at $0.20 to 1.00. If you can run your own GPUs, open-weight models (Wan, Hunyuan) drop marginal cost to pure compute. For hosted volume work, the Hailuo and PixVerse tier around $0.04 to 0.08 per second is the usual sweet spot.

What is the fastest video generation API?

Veo 3.1 and Kling 3.0 currently lead on latency, typically returning a 5 to 10 second clip in 15 to 30 seconds. Seedance and Wan trade speed for cost or control, usually landing at 30 to 60+ seconds. Latency varies with load, resolution and duration; benchmark with your own prompts at your usual hours before committing.

Which video APIs generate audio too?

Veo 3.1, Kling 3.0 (Omni), Seedance 2.0 and Sora 2 generate synchronized audio natively, including ambient sound and speech. Runway, Wan, Hailuo and Luma output silent video; you add sound in post or via a TTS pass. For sound-on placements like TikTok, native audio saves a pipeline step and usually sounds more coherent.

Is the Sora API still available?

OpenAI has announced the Sora 2 Videos API shuts down on September 24, 2026. Existing integrations keep working until then. If you are choosing today, build against Veo 3.1 or Kling 3.0, or integrate through an abstraction layer so the next provider change is configuration, not code.

How is video API pricing billed: per second or credits?

Most providers bill per second of generated output, with the rate depending on model and resolution tier. Some (Runway, Sora) use credit systems that abstract the same thing. Aggregators normalize everything to per-second or per-clip pricing. Watch for the failed-generation policy: good APIs don't bill failed or moderation-blocked jobs, but not all are good.

Can I use generated videos commercially?

Hosted providers (Google, Kling, Runway, MiniMax) grant commercial usage rights on paid tiers, and none of the majors watermark paid API output. Open-weight models vary: Wan's license permits commercial use, others restrict above certain scale. Whatever you ship, keep the generation metadata; provenance requirements (C2PA labeling) are tightening across ad platforms.

What resolution and length can these APIs produce?

The current hosted standard is 1080p at 5 to 15 seconds per generation, with extension endpoints chaining clips toward 30 to 120 seconds. Nova Reel does multi-shot two-minute videos at 720p. True 4K generation is not commercially standard yet; teams upscale 1080p output when they need it.

Can I self-host a video generation model?

Yes: Hunyuan Video and Wan publish open weights that run on a single high-memory GPU (think H100 or a rented A100 class card). You give up managed scaling, safety filtering and the newest checkpoints, and gain unlimited generation at fixed hardware cost. Break-even versus API pricing typically lands around several thousand clips per month.

Should I integrate one provider directly or use an aggregator?

Direct if you have negotiated volume pricing with one vendor and stability matters more than flexibility. Aggregator if you want to route between models, hedge deprecations and test quality-per-dollar continuously. The market leaderboard has flipped roughly every quarter since 2024, which is the strongest argument for keeping the model name a string in your config.

What is the best AI video generation API overall?

Runway Gen-4.5 tops quality benchmarks, Veo 3.1 leads the cinematic-plus-audio combination, Kling 3.0 wins price-to-performance, and Wan 2.7 wins pure price. There is no single best, which is the point of this page: match the table above to your use case, or use a unified API and stop betting your roadmap on one vendor.

One key. Every video model.

We're onboarding early-access developers now. Unified schema, per-second billing, webhooks, and model routing across everything in the table above.