In AI Image Generation, the Prompt Is the Brush

How AI image prompts work and why wording matters. Scene-narrative, structured JSON, and reference-based styles across 5 prompt axes, with real XBRUSH examples.

Byoul Oh

Mar 21, 2026

In AI Image Generation, the Prompt Is the Brush

Contents

How AI Generates Images Real Example 1: Snow Leopard — Scene-Narrative Prompt Real Example 2: Strawberry Staircase Fashion — Structured Prompt Real Example 3: Subway Headphone Ad — Reference Image-Based Structured Prompt Comparing the Three Prompt Approaches The 5 Axes of Prompt Design 1. Subject and Action — Who Is Doing What, Specifically 2. Environment and Background — The Space Around the Subject 3. Lighting — Direction, Texture, Temperature 4. Camera and Composition — Angle, Lens, Ratio 5. Quality and Style — Declaring the Rendering Direction Frequently Asked Questions Related Posts

An AI image prompt is a text instruction that tells AI what image to generate. The more specifically you describe the subject, environment, lighting, composition, and style — rather than just listing keywords — the closer the result is to what you intended.

Written by XBRUSH Content Team · Last updated: 2026-03-21

Type "cat photo" into the same AI model and you get a different cat every time. That's because AI fills in "the rest" on its own.

In AI image generation, a prompt isn't just a search query. It's the blueprint the painter draws before sitting down at the canvas.

How AI Generates Images

Stable Diffusion-based models process text and images in the same latent space and fill unspecified areas probabilistically. Newer unified Transformer models like Flux, Imagen 4, and Kling understand spatial relationships and lighting with greater accuracy.

Stable Diffusion: compresses images into latent space → maps text to the same space (CLIP) → generates from noise via denoising. Unspecified areas are filled probabilistically.

Newer models (Flux, Imagen 3, Kling): process text and images through a unified Transformer → understand spatial relationships, attribute combinations, and lighting with greater accuracy.

As models improve, their ability to interpret prompts becomes more precise — but the responsibility to design "what to express and how" still rests with the user.

Real Example 1: Snow Leopard — Scene-Narrative Prompt

A scene-narrative prompt describes every element of the scene in natural language, like a shooting script. The more you pack into the sentence — subject action, background elements, light quality, color tone — the less room AI has to fill in "the rest" arbitrarily.

Snow leopard detailed prompt example — LocalBanana

by @NanoBanana / localbanana.io

Detailed prompt: lifted paw walking, melting snow, purple/yellow flowers, sun halo, sharp rock face, warm light, blue eyes, direct eye contact. Every element realized in the actual image.

This result is possible because the subject ("snow leopard with lifted paw"), background ("melting snow, purple/yellow flowers"), lighting ("sun halo, warm light"), and gaze direction ("direct eye contact") were each specified explicitly.

Real Example 2: Strawberry Staircase Fashion — Structured Prompt

A structured JSON prompt separates each image element into key-value pairs. This approach makes it easier to maintain consistent results when generating multiple images from the same prompt.

Strawberry staircase fashion — detailed structured prompt example

by @Strength04_X / localbanana.io

Structured JSON prompt: quality, camera, lighting, style, scene, subject, outfit, pose, composition. Consistent results across all 4 images.

Because each key is passed to AI independently, changing only outfit generates a new image with a different outfit while keeping everything else (lighting, composition, background) the same.

Real Example 3: Subway Headphone Ad — Reference Image-Based Structured Prompt

A reference image-based prompt fixes a specific person's face, style, or composition as a reference point and specifies the remaining elements in text. Best for product ads or work where character consistency matters.

Subway headphone ad — reference-based structured prompt result

A structured product ad prompt that includes conditions for carrying over the face from the reference image exactly.

JSON structure: reference (match facial structure, proportions, and identity), scene (subway car, fluorescent lighting), subject (headphone adjustment pose, side gaze), product (premium wireless headphones), style (luxury campaign, ultra-photorealistic).

Comparing the Three Prompt Approaches

The three approaches are not mutually exclusive. Choose or combine them depending on the nature of your work.

Approach	Strengths	Weaknesses	Best For
Natural language narrative	Fast to write, flexible results	Low consistency, hard to reproduce	Idea exploration, one-off images
Structured JSON	Easy per-element edits, high consistency	Initial setup takes time	Image series, product photo variations
Reference image-based	Fixed face/style, character consistency	Reference image prep required	Product ads, character-based content

The 5 Axes of Prompt Design

A good prompt covers all 5 axes: subject and action, environment and background, lighting, camera and composition, quality and style. The more specifically you fill each axis, the less room AI has to decide arbitrarily — and the closer you get to your intended image.

1. Subject and Action — Who Is Doing What, Specifically

Weak Example	Strong Example
cat	A black cat sitting on a brick wall under moonlight, gazing into the distance
woman	A woman in her thirties at a café window, holding a coffee cup in both hands, looking outside
food photo	A strawberry crepe on a white ceramic plate with whipped cream dripping down the side

2. Environment and Background — The Space Around the Subject

Weak Example	Strong Example
forest background	Misty dawn coniferous forest, moss-covered rocks, wet fallen leaves
city	Rainy night Tokyo alley, neon reflections, wet asphalt

3. Lighting — Direction, Texture, Temperature

Weak Example	Strong Example
bright photo	3pm side natural light, soft shadows, warm yellow tone
dramatic lighting	Single spotlight, bottom-up angle, cold blue light, strong contrast

4. Camera and Composition — Angle, Lens, Ratio

Weak Example	Strong Example
close-up	85mm lens, shallow depth of field (f/1.8), focus on eyes, background bokeh
full-body shot	20mm wide angle, high angle, full body, rule-of-thirds composition

5. Quality and Style — Declaring the Rendering Direction

Realistic photography: photorealistic, 8K, shot on Sony A7R V, RAW
Illustration: digital illustration, soft watercolor, studio ghibli style
Commercial ad: luxury campaign, ultra-photoreal, commercial photography

Frequently Asked Questions

Q: Where do I start to write better AI image prompts?

Start by being specific about the subject and action. Instead of "cat," write "a black cat sitting on a wall looking into the distance" — adding action and state immediately improves precision. Lighting and camera angle can be added as a next step.

Q: Do Korean or English prompts work better?

Newer models like Flux and Imagen 4 handle Korean prompts well. However, since models are trained primarily on English data, technical style descriptors (photorealistic, bokeh, studio lighting, etc.) tend to produce more stable results in English. A recommended approach: describe the full context in Korean and add technical modifiers in English.

Q: Do longer prompts give better results?

Not necessarily. What matters is specificity, not length. A 30-word prompt that precisely covers the 5 axes (subject, environment, lighting, composition, style) will outperform 100 vague words listed together. Specifying too many elements simultaneously can cause AI to struggle balancing them, leaving none properly realized.

Q: Why do I get different results every time with the same prompt?

AI image generation is a probability-based process. Even with the same prompt, different seed values used in each generation produce different results. To reproduce a specific result, fix the seed value, or use a structured prompt to narrow the range of probabilistic variation by specifying more elements.

Q: How do I enter a prompt in XBRUSH?

Type your prompt into the text input field on the XBRUSH image generation screen. To use a reference image together, upload the image first and then add the prompt.

Guide Insight