Cinematic AI Video Prompting: 13-Layer Framework

Layer Breakdown

Every Layer Explained

Each layer is its own instruction set. Here is what was written for this generation and why each layer matters.

Layer 1

Style

The style layer is the global art direction. It tells the model what kind of visual world to render before anything else is described. Every subsequent layer is interpreted through this lens.

Ultra realistic live action cinematic style. Dark comedy with horror undertones. Multi shot Hollywood action montage with expensive cinematic coverage, not a single take. Photorealistic, 45mm film quality, ARRI camera, sharp focus, high detail texture, natural film grain, strong depth of field control, professional color grading. The tone should balance absurd humor with grotesque body horror. Grounded, gritty, cinematic, and visually premium.

Click to copy

Why it matters

Without the style layer, the model defaults to its own interpretation. Specifying "ARRI camera," "45mm film quality," and "multi shot montage" tells it this is not an animation, not a single clip, and not stylized. It is grounded live-action cinema. The genre tags (dark comedy, body horror) set the tonal register for every frame.

Layer 2

Environment

The environment is the set. Not just a location name, but textures, spatial depth, time of day, and atmospheric conditions. The model builds the physical world from these details.

A white pickup truck parked under a concrete overpass at dusk. A shallow river channel stretches behind it. Power lines and distant bridges frame the fading golden sky. The space should feel urban, open, and slightly empty, with long shadows, concrete textures, dry riverbed ground, and a calm evening atmosphere before the chaos erupts. The final commercial style burger shot happens in the same environment, with destroyed fast food packaging left behind after the attack.

Click to copy

Why it matters

"Under a concrete overpass at dusk" gives the model architecture, lighting conditions, and mood in one phrase. "Long shadows, concrete textures, dry riverbed ground" are rendering instructions. Every adjective is a texture the model has to produce. The more specific your environment, the less the model invents on its own.

Layer 3

Character

The character layer defines appearance, wardrobe, posture, and emotional energy. Not just what they look like but how they behave.

Actor Reference

Upload Your Own Face

Inside Enhancor, you can upload your own face as a reference in the Influencers upload slot. The model uses your photo to guide the look of the generated character, matching skin tone, hair, and facial features. This is how the actor in this video was directed.

A dark haired bearded man in a black crewneck and jeans. He is relaxed, casual, and completely unfazed by danger. He sits cross legged on the hood of the truck, lazily eating a burger like he does not care about anything happening around him. His energy is mildly annoyed at worst, never panicked. After the attack, he returns to the exact same calm attitude, picks the burger back up, and keeps eating as if nothing happened.

Click to copy

Why it matters

"Mildly annoyed at worst, never panicked" is a performance direction. It constrains the model from generating dramatic facial expressions. "Returns to the exact same calm attitude" tells the model the emotional arc: flat line, not a curve. This is how you get deadpan comedy from an AI model.

Layer 4

Creature Form

When your scene includes transformation or non-human elements, this layer defines the visual design, scale, and physicality of the creature.

A massive pale tusked creature with elongated limbs, clawed hands, distorted proportions, and a grotesque expanding body. The transformation should feel violent, fleshy, physical, and cinematic. No glowing magic, no fantasy energy effects. Pure body horror, bone cracking, stretching, tearing, and monstrous scale.

Click to copy

Why it matters

"No glowing magic, no fantasy energy effects" is a negative constraint. It tells the model what NOT to do, which is just as important as what to do. Without this, the model might add magical particle effects that break the grounded realism established in the style layer.

Layer 5

Threat

If there is an antagonist or opposing force, describe it as its own layer. Separate from the main character so the model treats them as distinct entities.

A pale zombie with wet dark hair, bruised eyes, and a blood stained white shirt. It sprints out from the shadows of the river channel with jerky unnatural movement, aggressive and feral.

Click to copy

Why it matters

"Jerky unnatural movement" is a physics instruction disguised as a character description. It tells the model the zombie does not move like a normal human. This is how you get uncanny movement in AI video: describe the motion quality inside the character definition.

Layer 6

Core Action

The action layer is the plot summary in one sentence. It chains every beat of the scene with arrows to show sequence and flow.

Casually eating a burger on the truck hood → noticing the zombie with mild annoyance → setting the burger down → dropping from the hood → violently transforming into a massive tusked creature → devouring the zombie whole → shrinking back into human form → calmly returning to the burger → ending on a polished fast food commercial style zoom in

Click to copy

Why it matters

The arrow chain is a storyboard compressed into text. The model reads this as a sequence of events that must happen in order. Each arrow is a cut point. This is the spine of the entire generation, and the timeline layer expands on it with precise timing.

Layer 7

Energy

Energy is the emotional rhythm of the scene. Not what happens, but how it feels. This is your tone director.

Deadpan calm interrupted by explosive grotesque violence. The humor comes from how little the man cares. The horror comes from the sudden brutality and unnatural creature transformation. The final beat should feel absurdly polished and commercial after all the chaos.

Click to copy

Why it matters

Energy is the most abstract layer but one of the most powerful. "Deadpan calm interrupted by explosive grotesque violence" is a rhythm instruction. It tells the model the pacing: slow, slow, slow, EXPLOSION, slow again. Without this, the model might generate consistent energy throughout.

Layer 8

Camera

The camera layer is your cinematographer. Shot types, angles, movement quality, and how the camera behaves at each story beat.

Multi shot montage. Never one camera angle, never one continuous cut. Handheld shake throughout. Shot language: - medium hood shot for calm setup - wide threat reveal - close up reaction shot - medium transformation shot - wide low angle monster attack - medium return to human form - slow commercial push in on burger The camera should sway gently at first, then become more unstable and reactive during the zombie sprint and transformation, then become smooth and polished in the final burger hero shot.

Click to copy

Why it matters

Listing specific shot types for each beat is like handing the model a shot list. "Wide low angle monster attack" is not generic. It specifies the angle (low), the width (wide), and the content (monster attack). The camera behavior description ("sway gently at first, then become unstable") creates a dynamic arc within the cinematography itself.

Layer 9

Lighting

Lighting is what makes the image feel real. This layer defines the quality, direction, color, and changes in light throughout the scene.

Cinematic dusk lighting with a soft golden evening sky filtered through concrete shadow. Naturalistic live action lighting only. - golden dusk highlights on skin, truck, and burger - deep concrete shadows under the overpass - dark channel shadows where the zombie emerges - high contrast texture during the creature sequence - final burger shot with polished appetizing highlights and shallow depth of field - destroyed packaging softly blurred in the background

Click to copy

Why it matters

"Naturalistic live action lighting only" is a guardrail. It prevents the model from adding dramatic fantasy lighting during the creature sequence. The bullet points map specific light qualities to specific objects, giving the model per-element lighting instructions rather than a generic "warm sunset" direction.

Layer 10

Physics

Physics tells the model how objects and bodies interact with the world. Weight, momentum, texture, and physical behavior.

The man chews naturally and lazily. The burger has realistic weight and texture. The zombie runs with unstable jerky momentum. The transformation is violent and physical: spine cracking, limbs stretching, torso expanding, jaw splitting open, body mass rising upward. The creature should feel heavy and real when it lands and lunges. The zombie should be grabbed with force and swallowed in one grotesque motion. After the kill, the creature rapidly contracts back into human form with disturbing fluidity. The burger hero shot should feel pristine and absurdly composed in contrast with the destruction behind it.

Click to copy

Why it matters

AI video models often struggle with physics. Describing specific physical sensations like "spine cracking, limbs stretching, torso expanding" gives the model frame-by-frame rendering guidance. "Heavy and real when it lands" prevents the common AI issue of objects feeling weightless.

Layer 11

Audio

Audio tells the model what the scene should sound like. Even in video models without explicit audio generation, this layer influences the visual pacing and implied sound design.

Ambient dusk city atmosphere at first. Soft chewing and wrapper movement. Distant environmental hum under the overpass. Sudden zombie footsteps and feral sprint sounds. Violent bone cracking and flesh stretching during transformation. Heavy creature impact and body movement. Grotesque devouring sound at the swallow. Silence or near silence after the kill for comedic contrast. Subtle wrapper handling and chewing again. Final polished commercial tone with slogan delivery: "McDonald's, I'm lovin' it."

Click to copy

Why it matters

"Silence or near silence after the kill for comedic contrast" is a pacing instruction delivered through audio language. The model interprets this as a visual beat: stillness, pause, nothing happening. Audio descriptions influence the implied rhythm of the video even when the model does not generate actual sound.

Layer 12

Timeline

The timeline maps every action to a specific timestamp. This is the editor's cut list. It controls pacing, duration per beat, and the exact sequence of events.

0 - 3s

Calm Setup

Medium shot of the man sitting cross legged on the hood of the white pickup truck, casually chewing a burger. Golden dusk light catches his face and beard. Camera sways gently.

3 - 5s

Threat Reveal

Wide shot of the concrete channel as the zombie bursts from the shadows, sprinting with jerky unnatural strides. Camera shakes while tracking.

5 - 7s

Reaction

Close up on the man's face. Chewing slows. Eyebrows rise with mild annoyance. He calmly sets the burger down on the hood beside him.

7 - 10s

Transformation

The man drops off the hood and his body violently expands into the massive pale tusked creature. Spine cracking, limbs stretching, jaws splitting open. Camera jolts with each bone snap.

10 - 12s

Attack

Wide low angle as the creature lunges and catches the charging zombie, lifts it, and swallows it whole. Jaw unhinges unnaturally. Camera shudders with impact.

12 - 14s

Return to Calm

The creature rapidly shrinks back into the man. He hops back on the hood, picks up the burger, takes another bite, and keeps chewing as if nothing happened.

14 - 15s

Commercial Punchline

Slow zoom into the burger. Dramatic food commercial hero shot. Sharp focus, shallow depth of field, glossy highlights. Destroyed packaging blurred in background. Text: "McDonald's, I'm lovin' it."

Why it matters

Timestamps are the single most powerful control mechanism. 3 seconds for setup, 2 seconds for threat reveal, 3 seconds for transformation. Each beat gets exactly the screen time it needs. Without timestamps, the model allocates time randomly and key moments get cut short or dragged out.

Layer 13

Style Boosters

Style boosters are keyword tags that reinforce the overall aesthetic. They act as a final reminder to the model of the visual and tonal priorities.

multi shot montage handheld cinematic chaos dark comedy timing grotesque body horror deadpan performance golden dusk underpass mood urban emptiness 35mm film texture premium Hollywood coverage commercial style burger ending absurd contrast between horror and advertising

Why it matters

Boosters are the TL;DR for the model. If it loses track of the detailed instructions in earlier layers, the boosters pull it back to the core aesthetic. Think of them as weighted tags that reinforce priority keywords across the entire prompt.

Full Prompt

Copy the Entire Prompt

The complete 13-layer prompt in one block. Click to copy and paste into your video model.

STYLE Ultra realistic live action cinematic style. Dark comedy with horror undertones. Multi shot Hollywood action montage with expensive cinematic coverage, not a single take. Photorealistic, 45mm film quality, ARRI camera, sharp focus, high detail texture, natural film grain, strong depth of field control, professional color grading. The tone should balance absurd humor with grotesque body horror. Grounded, gritty, cinematic, and visually premium. ENVIRONMENT A white pickup truck parked under a concrete overpass at dusk. A shallow river channel stretches behind it. Power lines and distant bridges frame the fading golden sky. The space should feel urban, open, and slightly empty, with long shadows, concrete textures, dry riverbed ground, and a calm evening atmosphere before the chaos erupts. The final commercial style burger shot happens in the same environment, with destroyed fast food packaging left behind after the attack. CHARACTER A dark haired bearded man in a black crewneck and jeans. He is relaxed, casual, and completely unfazed by danger. He sits cross legged on the hood of the truck, lazily eating a burger like he does not care about anything happening around him. His energy is mildly annoyed at worst, never panicked. After the attack, he returns to the exact same calm attitude, picks the burger back up, and keeps eating as if nothing happened. CREATURE FORM A massive pale tusked creature with elongated limbs, clawed hands, distorted proportions, and a grotesque expanding body. The transformation should feel violent, fleshy, physical, and cinematic. No glowing magic, no fantasy energy effects. Pure body horror, bone cracking, stretching, tearing, and monstrous scale. THREAT A pale zombie with wet dark hair, bruised eyes, and a blood stained white shirt. It sprints out from the shadows of the river channel with jerky unnatural movement, aggressive and feral. CORE ACTION Casually eating a burger on the truck hood > noticing the zombie with mild annoyance > setting the burger down > dropping from the hood > violently transforming into a massive tusked creature > devouring the zombie whole > shrinking back into human form > calmly returning to the burger > ending on a polished fast food commercial style zoom in ENERGY Deadpan calm interrupted by explosive grotesque violence. The humor comes from how little the man cares. The horror comes from the sudden brutality and unnatural creature transformation. The final beat should feel absurdly polished and commercial after all the chaos. CAMERA Multi shot montage. Never one camera angle, never one continuous cut. Handheld shake throughout. Shot language: medium hood shot for calm setup, wide threat reveal, close up reaction shot, medium transformation shot, wide low angle monster attack, medium return to human form, slow commercial push in on burger. The camera should sway gently at first, then become more unstable and reactive during the zombie sprint and transformation, then become smooth and polished in the final burger hero shot. LIGHTING Cinematic dusk lighting with a soft golden evening sky filtered through concrete shadow. Naturalistic live action lighting only. Golden dusk highlights on skin, truck, and burger. Deep concrete shadows under the overpass. Dark channel shadows where the zombie emerges. High contrast texture during the creature sequence. Final burger shot with polished appetizing highlights and shallow depth of field. Destroyed packaging softly blurred in the background. PHYSICS The man chews naturally and lazily. The burger has realistic weight and texture. The zombie runs with unstable jerky momentum. The transformation is violent and physical: spine cracking, limbs stretching, torso expanding, jaw splitting open, body mass rising upward. The creature should feel heavy and real when it lands and lunges. The zombie should be grabbed with force and swallowed in one grotesque motion. After the kill, the creature rapidly contracts back into human form with disturbing fluidity. The burger hero shot should feel pristine and absurdly composed in contrast with the destruction behind it. AUDIO Ambient dusk city atmosphere at first. Soft chewing and wrapper movement. Distant environmental hum under the overpass. Sudden zombie footsteps and feral sprint sounds. Violent bone cracking and flesh stretching during transformation. Heavy creature impact and body movement. Grotesque devouring sound at the swallow. Silence or near silence after the kill for comedic contrast. Subtle wrapper handling and chewing again. Final polished commercial tone with slogan delivery. TIMELINE 0 to 3 seconds: Medium shot of the man sitting cross legged on the hood of the white pickup truck, casually chewing a burger. Golden dusk light catches his face and beard. Camera sways gently. The mood is ambient, calm, and almost too relaxed. 3 to 5 seconds: Wide shot of the concrete channel as the zombie bursts from the shadows under the bridge, sprinting with jerky unnatural strides across the dry riverbed toward the truck. Camera shakes while tracking the approaching threat. 5 to 7 seconds: Close up on the man's face as he notices the zombie. Chewing slows. His eyebrows rise with mild annoyance rather than fear. He calmly sets the burger down on the hood beside him. 7 to 10 seconds: Medium shot as the man drops off the hood and his body violently expands and twists upward into the massive pale tusked creature. Spine cracking, limbs stretching, jaws splitting open wide, towering over the truck. Camera jolts with each bone snap of the transformation. 10 to 12 seconds: Wide low angle as the creature lunges forward and catches the charging zombie in its enormous clawed hand, lifts it off the ground, and swallows it whole in one grotesque bite. Jaw unhinges unnaturally. Camera shudders with the impact. 12 to 14 seconds: Medium shot as the creature rapidly shrinks back into the man, standing calmly beside the truck. He hops back onto the hood, picks up the burger, takes another bite, and keeps chewing as if nothing happened. 14 to 15 seconds: The camera slowly zooms in on the burger as it is placed down and becomes the clear center of the frame. The tone shifts into a dramatic fast food commercial hero shot. Sharp focus on the burger, shallow depth of field, rich cinematic texture, glossy highlights on the bun and ingredients. Destroyed packaging sits blurred in the background. STYLE BOOSTERS multi shot montage, handheld cinematic chaos, dark comedy timing, grotesque body horror, deadpan performance, golden dusk underpass mood, urban emptiness, 35mm film texture, premium Hollywood coverage, commercial style burger ending, absurd contrast between horror and advertising

Click to copy entire prompt

The 13-Layer Prompt Framework

13 Layers of a Cinematic Prompt

Generated from This Prompt

Every Layer Explained

Why 13 Layers?

Copy the Entire Prompt

Upload Your Human Face

Build This on Enhancor

Join the PublicAI Community