2026-06-16

How to Write Seedance 2.0 Prompts: Shot Structure + the 8-Element Formula

Writing Seedance 2.0 prompts is not about flowery words — it is about structure. The model is best understood as a "multimodal AI director" that splits a shot into a space layer (what is in frame) and a time layer (how it changes over time). So a good prompt is an engineering instruction — who, where, doing what, shot how, in what order. This piece lays out how to write the text prompt along the officially recommended structure, and how VideoLens generates that structure from a reference video. (Text prompts only — multimodal reference-to-video is out of scope here.)

I. The 8-element formula

The recommended formula: precise subject + action detail + scene/environment + light & color + camera move + visual style + image quality + constraints. Lock "who is doing what" first, then "where and what mood," then "how it is shot," and finally tighten the result with style, quality and constraints.

Element	What to write	Example
Subject	2–3 stable static traits (wardrobe/hair/look/class)	a woman in a red dress and straw hat
Action	down to limbs + range/speed/force	slowly raises a hand, dips her head
Scene	the setting / position / spatial relation	a dorm corridor at dusk
Light & color	the light and color tone of the frame	warm sunlight through the window, soft light
Camera move	standard terms, one move per shot	steady medium tracking, slow push-in
Visual style	art style and overall tone	cinematic documentary / fresh anime / 3D
Quality	sharpness, detail, texture	HD, cinematic, soft light
Constraints	bound the result, avoid artifacts	no subtitles, no logo/watermark, no face warp

II. Space × time: why shot sequencing

The model decouples space and time internally, so the ideal prompt for a complex video is a timeline of shots: split it into several shots and describe each in event order. A vague "a man runs nervously down the street, very cinematic" is far weaker than Shot 1 / Shot 2 / Shot 3.

Organize each shot as: ① camera move or cut → ② subject action & expression → ③ position / spatial change → ④ audio (SFX / voice / BGM).

Per the official guide: the model is unstable with exact timings (e.g. 0–3s). Do not force per-shot durations — order with "Shot 1 / 2 / 3" and let pacing emerge.

III. Writing action (four rules)

Rule	How	Example
Specific + quantified	name the limb + range/speed/force	slowly raise hand, quick head turn
Prefer slow, small moves	avoid sprinting/leaping/violent rolls	walk slowly, sit down naturally
Add transitions	state the inertia linking moves	raise the arm off the turning motion
Externalize emotion	use body detail, not "very sad/angry"	see the table below

Emotion externalization — translating abstract feeling into filmable detail:

Feeling	Externalized as action & detail
Sadness	head down, shoulders trembling, reddened eyes, fingers gripping the hem, tears welling but not falling
Joy	an irrepressible smile, relaxed brow, light steps, a little spin
Anxiety	checking the watch, drumming fingers, quick breath, darting eyes, nail-biting
Anger	clenched fists, tight jaw, heaving chest, knife-like stare, words forced through teeth
Relief	a long exhale, shoulders loosening, a faint smile, gaze lifting to the distance

IV. Camera, quality and constraints

Use standard camera terms directly — the model reads them well: medium, close-up, wide, slow push-in, steady pan, locked-off. Note: keep to one move per shot; combining push/pull/pan/tilt destabilizes the image.

The closing trio — quality, style and constraints — tightens the output:

Type	Purpose	Template / example
Quality	sets sharpness & texture	HD, rich detail, cinematic, soft light
Style	unifies the art direction	cyberpunk teal-purple, retro film, fresh anime, 3D
Constraints	avoids artifacts & leftover marks	no subtitles / no text / no logo / no watermark

V. Audio & dialogue symbols

Seedance 2.0 natively co-generates audio and video; fixed symbols mark the type of information so the model parses it correctly:

Type	Symbol	Example
Music	（）	（upbeat rock plays in the background）
SFX	<>	<a dog barks in the distance>
Dialogue	{}	{hello world}; mark the language for non-CN/EN, e.g. in Japanese say {こんにちは}
Caption	【】	【Chapter 1: Departure】

A few dialogue tips: keep one language per line (proper nouns aside); the model misreads rare/polyphonic Chinese characters — swap in a common homophone (e.g. 螭龙山 → 吃龙山); and add a "no subtitles" constraint if you do not want captions.

VI. Generate this structure with VideoLens

Writing this by hand means typing every shot. VideoLens runs it in reverse — give it a reference video and its Creation Assistant breaks it down shot by shot and outputs Seedance 2.0 prompts in exactly this structure:

· anchors the recurring characters, scenes and props as reusable entities; · generates a per-shot prompt in shot order (camera move + subject action + scene & light); · closes with a style tail that unifies quality and tone, defaulting to "no subtitles, no logo/watermark"; · separates dialogue, SFX and BGM and maps each onto its shot.

In short: you do not start from a blank page — VideoLens hands you a shot list you can tweak directly.

The prompt methodology in this article is compiled from ByteDance's official Doubao Seedance 2.0 prompt guide; specs, phrasing and terminology follow the official documentation.

A Seedance 2.0 prompt is essentially a shot-level "director instruction." Once you internalize the space + time layers, the 8 elements, shot sequencing and the symbol convention, you can write prompts that generate reliably — and when you want to skip the manual work, hand a reference video to VideoLens for a ready-to-tweak shot list.