How to Write AI Video Prompts | Action, Camera Movement, Duration & Stability

AI video prompts need three things on top of an image prompt: action (what moves), camera (how the camera moves), and duration (how long). This guide gives four shot skeletons and compares Runway, Pika, Kling, Seedance, Sora and Veo.

In this guide

Why video prompts are harder than image prompts
4 shot skeletons
Video prompt structure diagram
Wrong vs. right examples
5 real samples
6 common pitfalls
Model comparison: Runway, Pika, Kling, Sora, Seedance, Hailuo, Veo

Why video prompts are harder than image prompts

An image needs a single frame. A video needs every frame to make physical sense in sequence. Beginners run into three failure modes. Abstract action: the model cannot tell whether the subject moves or the camera moves and the shot drifts. Action overload: two or three actions stuffed into five seconds — pick one or fail all. Missing stability: forget to mention camera holds static and most models default to a slow push that ruins the mood you wanted.

So every video prompt should state, explicitly: subject action (one, concrete), camera motion (static / push / pull / track / tilt / orbit), duration (3-5-8 seconds), and stability (no shake / steady gimbal feel).

4 shot skeletons

Atmospheric still (highest success rate)

[scene + subject] + camera holds static + gentle [micro motion: drift, sway, ripple] + [time of day + light] + no shake

Example: a single raindrop falling on a glass window at night, camera holds static, neon lights blurred in the background, gentle vertical impact ripple, no camera shake. 3–5 seconds, almost any model lands this cleanly.

Slow dolly in

[scene] + slow steady dolly in toward [subject] + [motion direction] + over 5 seconds + cinematic pacing + no jitter

"Slow", "steady", "over X seconds" are the load-bearing words. Skipping duration leaves the model to invent a default push speed, which is usually too fast.

Tracking shot

[subject walking/moving] + camera tracks horizontally to the right at the same pace + [environment scrolling past] + medium shot + steady gimbal feel

The "at the same pace" phrase is the cure for the most common failure mode: subject moves faster than the camera.

Action close-up

close-up of [subject performing single action] + [single specific motion verb: pours, lifts, turns] + slow motion 120fps look + shallow depth of field + camera holds static

One verb only. "Pours and stirs and lifts" collapses every model. One verb per shot.

Video prompt structure diagram

actiona barista slowly pours espresso into a glass cup

cameracamera holds static / slow dolly in / tracks left

durationover 5 seconds / 8-second clip

speedslow motion 120fps look / real time

stabilityno camera shake / steady gimbal feel

scenewarm window light · cafe interior

stylecinematic, shallow depth of field

Wrong vs. right examples

✗ Wrong

a beautiful cinematic video of a girl walking in a forest, magical, dreamy, stunning, amazing 4k

No camera motion, no duration, no stability, action is the vague "walking". Ten runs give ten different shots.

✓ Right

a young woman in a wool coat walks slowly forward through a misty pine forest, camera tracks horizontally to the right at the same walking pace, 5-second shot, soft morning backlight, cinematic shallow depth of field, steady gimbal feel, no camera shake

Walking speed (slowly), camera motion (tracks horizontally), sync (same pace), duration (5-second), light (backlight) and stability (no shake) all present. Ten runs give a consistent shot.

5 real samples

Sample 1 · Rain-night stillRunway Gen-3 / Seedance

a single raindrop slides down a foggy window at night, camera holds static, neon city lights blurred in the background, slow motion 120fps look, gentle vertical motion only, shallow depth of field, no camera shake, 5-second clip

The easiest reliable shot: static camera + single micro motion. Works on essentially every model.

Sample 2 · Coffee pourKling / Pika

close-up of a barista's hands slowly pouring espresso into a glass cup, warm cafe interior blurred behind, single action of pouring only, camera holds static, soft side light from the right, real-time pacing, 4-second clip

"Single action of pouring only" prevents the model from adding stirs, lifts and set-downs.

Sample 3 · Tokyo tracking shotRunway Gen-3

a young woman in a black trench coat walks forward through a rain-soaked Tokyo alley at night, camera tracks horizontally to the right at the same walking pace, neon reflections on wet ground, shallow depth of field, steady gimbal feel, 6-second cinematic shot

"At the same walking pace" stops the camera from outpacing the subject.

Sample 4 · Food macroSeedance / Hailuo

extreme close-up of melted chocolate slowly dripping onto a glossy croissant, camera holds completely static, single dripping motion, warm side light, shallow depth of field, slow motion 120fps look, 3-second clip

Macro food shots are static + single slow action. Slow motion amplifies the material reading.

Sample 5 · Drone pushKling / Sora

misty mountain valley at sunrise, slow steady drone dolly forward over the treetops, sunlight breaking through clouds, very gentle pacing over 8 seconds, cinematic wide shot, no jitter, smooth motion

"Slow steady drone dolly forward" tells the model the rig and direction; "over 8 seconds" controls speed.

6 common pitfalls

Pitfall 1 · Vague action verb

"Moves", "walks", "interacts" are too soft. Use concrete verbs: "slowly pours", "lifts the cup", "turns the head to the left".

Pitfall 2 · Multiple actions stuffed in

"She walks in, sits down, picks up the cup, drinks" in 5 seconds always fails. One core action per shot.

Pitfall 3 · No duration or speed

State "3-second / 5-second / 8-second clip" plus "slowly / steady / real-time".

Pitfall 4 · No stability cue

Without "no shake / steady", most models default to handheld jitter that destroys still-life mood.

Pitfall 5 · Subject and camera both moving fast

Heavy subject motion + heavy camera motion almost always collapses. Lock one, move the other.

Pitfall 6 · Reusing image quality keywords

"Masterpiece, 8k, best quality" do almost nothing for video. "Cinematic, shallow depth of field, color grading" is enough.

Model comparison

Model	Typical duration	Strengths	Notes
Runway Gen-3	5–10 s	Cinematic, tracking shots	Strong action continuity, sensitive to camera-motion words
Pika 2.x	3–5 s	Short atmospheric clips	Likes short, simple action descriptions
Kling 2.x	5–10 s	People performance, ads	Excellent at non-English prompts
Seedance 2.0	5–8 s	Widescreen cinematic	See the Seedance page on this site
Sora	10–60 s	Long narrative shots	Prompts read more like a natural description
Hailuo / MiniMax	5–6 s	People, landscapes	Very friendly to long descriptive sentences
Veo 2 / Veo 3	5–8 s	Cinematic quality	Google ecosystem

Reality check: Even the best video models miss roughly half the time. Plan for 3–5 attempts per shot, keep prompts short, actions singular, and stability explicit.

Frequently asked questions

How long can AI videos be?

3–10 seconds is the reliable range across Runway, Pika, Kling, Seedance, Hailuo and Veo. Sora goes up to 60 seconds but prompt control gets harder as duration grows.

Do video models need negative prompts?

Most ignore them. Phrase the constraint positively: "no camera shake", "single action only", "no extra motion".

How do I keep a character consistent across multiple shots?

Today the safest path is to drive each shot from the same reference image (image-to-video) and lock the seed when supported. Pure text consistency across shots is unreliable.

Can I write video prompts in languages other than English?

Kling and Hailuo handle Chinese exceptionally well; Runway, Pika and Sora prefer English.

How to Write AI Video Prompts: Action, Camera Movement and Duration

Why video prompts are harder than image prompts

4 shot skeletons

Atmospheric still (highest success rate)

Slow dolly in

Tracking shot

Action close-up

Video prompt structure diagram

Wrong vs. right examples

✗ Wrong

✓ Right

5 real samples

6 common pitfalls

Model comparison

Frequently asked questions

How long can AI videos be?

Do video models need negative prompts?

How do I keep a character consistent across multiple shots?

Can I write video prompts in languages other than English?