xbrush logo | Blog
Docs Pricing
English 한국어
Go to App
Docs Pricing Go to App
GuideInsight

In AI Image Generation, the Prompt Is the Brush

How AI image prompts work and why wording matters. Scene-narrative, structured JSON, and reference-based styles across 5 prompt axes, with real XBRUSH examples.
Byoul Oh's avatar
Byoul Oh
Mar 21, 2026
In AI Image Generation, the Prompt Is the Brush
Contents
How AI Generates ImagesReal Example 1: Snow Leopard — Scene-Narrative PromptReal Example 2: Strawberry Staircase Fashion — Structured PromptReal Example 3: Subway Headphone Ad — Reference Image-Based Structured PromptComparing the Three Prompt ApproachesThe 5 Axes of Prompt Design1. Subject and Action — Who Is Doing What, Specifically2. Environment and Background — The Space Around the Subject3. Lighting — Direction, Texture, Temperature4. Camera and Composition — Angle, Lens, Ratio5. Quality and Style — Declaring the Rendering DirectionFrequently Asked QuestionsRelated Posts

An AI image prompt is a text instruction that tells AI what image to generate. The more specifically you describe the subject, environment, lighting, composition, and style — rather than just listing keywords — the closer the result is to what you intended.

Written by XBRUSH Content Team · Last updated: 2026-03-21

Type "cat photo" into the same AI model and you get a different cat every time. That's because AI fills in "the rest" on its own.

In AI image generation, a prompt isn't just a search query. It's the blueprint the painter draws before sitting down at the canvas.


How AI Generates Images

Stable Diffusion-based models process text and images in the same latent space and fill unspecified areas probabilistically. Newer unified Transformer models like Flux, Imagen 4, and Kling understand spatial relationships and lighting with greater accuracy.

Stable Diffusion: compresses images into latent space → maps text to the same space (CLIP) → generates from noise via denoising. Unspecified areas are filled probabilistically.

Newer models (Flux, Imagen 3, Kling): process text and images through a unified Transformer → understand spatial relationships, attribute combinations, and lighting with greater accuracy.

As models improve, their ability to interpret prompts becomes more precise — but the responsibility to design "what to express and how" still rests with the user.


Real Example 1: Snow Leopard — Scene-Narrative Prompt

A scene-narrative prompt describes every element of the scene in natural language, like a shooting script. The more you pack into the sentence — subject action, background elements, light quality, color tone — the less room AI has to fill in "the rest" arbitrarily.

Snow leopard detailed prompt example — LocalBanana

by @NanoBanana / localbanana.io

Detailed prompt: lifted paw walking, melting snow, purple/yellow flowers, sun halo, sharp rock face, warm light, blue eyes, direct eye contact. Every element realized in the actual image.

This result is possible because the subject ("snow leopard with lifted paw"), background ("melting snow, purple/yellow flowers"), lighting ("sun halo, warm light"), and gaze direction ("direct eye contact") were each specified explicitly.


Real Example 2: Strawberry Staircase Fashion — Structured Prompt

A structured JSON prompt separates each image element into key-value pairs. This approach makes it easier to maintain consistent results when generating multiple images from the same prompt.

Strawberry staircase fashion — detailed structured prompt example

by @Strength04_X / localbanana.io

Structured JSON prompt: quality, camera, lighting, style, scene, subject, outfit, pose, composition. Consistent results across all 4 images.

Because each key is passed to AI independently, changing only outfit generates a new image with a different outfit while keeping everything else (lighting, composition, background) the same.


Real Example 3: Subway Headphone Ad — Reference Image-Based Structured Prompt

A reference image-based prompt fixes a specific person's face, style, or composition as a reference point and specifies the remaining elements in text. Best for product ads or work where character consistency matters.

Subway headphone ad — reference-based structured prompt result

A structured product ad prompt that includes conditions for carrying over the face from the reference image exactly.

JSON structure: reference (match facial structure, proportions, and identity), scene (subway car, fluorescent lighting), subject (headphone adjustment pose, side gaze), product (premium wireless headphones), style (luxury campaign, ultra-photorealistic).


Comparing the Three Prompt Approaches

The three approaches are not mutually exclusive. Choose or combine them depending on the nature of your work.

ApproachStrengthsWeaknessesBest For
Natural language narrativeFast to write, flexible resultsLow consistency, hard to reproduceIdea exploration, one-off images
Structured JSONEasy per-element edits, high consistencyInitial setup takes timeImage series, product photo variations
Reference image-basedFixed face/style, character consistencyReference image prep requiredProduct ads, character-based content

The 5 Axes of Prompt Design

A good prompt covers all 5 axes: subject and action, environment and background, lighting, camera and composition, quality and style. The more specifically you fill each axis, the less room AI has to decide arbitrarily — and the closer you get to your intended image.

1. Subject and Action — Who Is Doing What, Specifically

Weak ExampleStrong Example
catA black cat sitting on a brick wall under moonlight, gazing into the distance
womanA woman in her thirties at a café window, holding a coffee cup in both hands, looking outside
food photoA strawberry crepe on a white ceramic plate with whipped cream dripping down the side

2. Environment and Background — The Space Around the Subject

Weak ExampleStrong Example
forest backgroundMisty dawn coniferous forest, moss-covered rocks, wet fallen leaves
cityRainy night Tokyo alley, neon reflections, wet asphalt

3. Lighting — Direction, Texture, Temperature

Weak ExampleStrong Example
bright photo3pm side natural light, soft shadows, warm yellow tone
dramatic lightingSingle spotlight, bottom-up angle, cold blue light, strong contrast

4. Camera and Composition — Angle, Lens, Ratio

Weak ExampleStrong Example
close-up85mm lens, shallow depth of field (f/1.8), focus on eyes, background bokeh
full-body shot20mm wide angle, high angle, full body, rule-of-thirds composition

5. Quality and Style — Declaring the Rendering Direction

  • Realistic photography: photorealistic, 8K, shot on Sony A7R V, RAW
  • Illustration: digital illustration, soft watercolor, studio ghibli style
  • Commercial ad: luxury campaign, ultra-photoreal, commercial photography

Frequently Asked Questions

Q: Where do I start to write better AI image prompts?

Start by being specific about the subject and action. Instead of "cat," write "a black cat sitting on a wall looking into the distance" — adding action and state immediately improves precision. Lighting and camera angle can be added as a next step.

Q: Do Korean or English prompts work better?

Newer models like Flux and Imagen 4 handle Korean prompts well. However, since models are trained primarily on English data, technical style descriptors (photorealistic, bokeh, studio lighting, etc.) tend to produce more stable results in English. A recommended approach: describe the full context in Korean and add technical modifiers in English.

Q: Do longer prompts give better results?

Not necessarily. What matters is specificity, not length. A 30-word prompt that precisely covers the 5 axes (subject, environment, lighting, composition, style) will outperform 100 vague words listed together. Specifying too many elements simultaneously can cause AI to struggle balancing them, leaving none properly realized.

Q: Why do I get different results every time with the same prompt?

AI image generation is a probability-based process. Even with the same prompt, different seed values used in each generation produce different results. To reproduce a specific result, fix the seed value, or use a structured prompt to narrow the range of probabilistic variation by specifying more elements.

Q: How do I enter a prompt in XBRUSH?

Type your prompt into the text input field on the XBRUSH image generation screen. To use a reference image together, upload the image first and then add the prompt.


Related Posts

  • AI Image & Video Generation: Workflows vs. Step-by-Step Prompting
  • Spring Marketing Visuals in 10 Minutes with AI
  • How to Replace Product Photo Props Without Reshooting
Share article
xbrush logo
Lightweight Inc.
CEO Yunho Yeon | Business Registration 208-87-02239
E-commerce Registration 2026-Seoul Seocho-1518
Unit 306, Seoul AI Hub, 47 Maeheon-ro 8-gil, Seocho-gu, Seoul, South Korea
contact@lightweight.kr
Resources
Blog User Guide
Terms and Policy
Terms of Service Privacy Policy Cookie Policy
Customer Service
Mon–Fri 10:00 AM – 6:00 PM (KST)
+82-507-1336-9329
contact@lightweight.kr
Copyright ⓒ 2026 Lightweight Inc. All Rights Reserved.