An AI image prompt is a text instruction that tells AI what image to generate. The more specifically you describe the subject, environment, lighting, composition, and style — rather than just listing keywords — the closer the result is to what you intended.
Written by XBRUSH Content Team · Last updated: 2026-03-21
Type "cat photo" into the same AI model and you get a different cat every time. That's because AI fills in "the rest" on its own.
In AI image generation, a prompt isn't just a search query. It's the blueprint the painter draws before sitting down at the canvas.
How AI Generates Images
Stable Diffusion-based models process text and images in the same latent space and fill unspecified areas probabilistically. Newer unified Transformer models like Flux, Imagen 4, and Kling understand spatial relationships and lighting with greater accuracy.
Stable Diffusion: compresses images into latent space → maps text to the same space (CLIP) → generates from noise via denoising. Unspecified areas are filled probabilistically.
Newer models (Flux, Imagen 3, Kling): process text and images through a unified Transformer → understand spatial relationships, attribute combinations, and lighting with greater accuracy.
As models improve, their ability to interpret prompts becomes more precise — but the responsibility to design "what to express and how" still rests with the user.
Real Example 1: Snow Leopard — Scene-Narrative Prompt
A scene-narrative prompt describes every element of the scene in natural language, like a shooting script. The more you pack into the sentence — subject action, background elements, light quality, color tone — the less room AI has to fill in "the rest" arbitrarily.
by @NanoBanana / localbanana.io
Detailed prompt: lifted paw walking, melting snow, purple/yellow flowers, sun halo, sharp rock face, warm light, blue eyes, direct eye contact. Every element realized in the actual image.
This result is possible because the subject ("snow leopard with lifted paw"), background ("melting snow, purple/yellow flowers"), lighting ("sun halo, warm light"), and gaze direction ("direct eye contact") were each specified explicitly.
Real Example 2: Strawberry Staircase Fashion — Structured Prompt
A structured JSON prompt separates each image element into key-value pairs. This approach makes it easier to maintain consistent results when generating multiple images from the same prompt.
by @Strength04_X / localbanana.io
Structured JSON prompt: quality, camera, lighting, style, scene, subject, outfit, pose, composition. Consistent results across all 4 images.
Because each key is passed to AI independently, changing only outfit generates a new image with a different outfit while keeping everything else (lighting, composition, background) the same.
Real Example 3: Subway Headphone Ad — Reference Image-Based Structured Prompt
A reference image-based prompt fixes a specific person's face, style, or composition as a reference point and specifies the remaining elements in text. Best for product ads or work where character consistency matters.
A structured product ad prompt that includes conditions for carrying over the face from the reference image exactly.
JSON structure: reference (match facial structure, proportions, and identity), scene (subway car, fluorescent lighting), subject (headphone adjustment pose, side gaze), product (premium wireless headphones), style (luxury campaign, ultra-photorealistic).
Comparing the Three Prompt Approaches
The three approaches are not mutually exclusive. Choose or combine them depending on the nature of your work.
| Approach | Strengths | Weaknesses | Best For |
|---|---|---|---|
| Natural language narrative | Fast to write, flexible results | Low consistency, hard to reproduce | Idea exploration, one-off images |
| Structured JSON | Easy per-element edits, high consistency | Initial setup takes time | Image series, product photo variations |
| Reference image-based | Fixed face/style, character consistency | Reference image prep required | Product ads, character-based content |
The 5 Axes of Prompt Design
A good prompt covers all 5 axes: subject and action, environment and background, lighting, camera and composition, quality and style. The more specifically you fill each axis, the less room AI has to decide arbitrarily — and the closer you get to your intended image.
1. Subject and Action — Who Is Doing What, Specifically
| Weak Example | Strong Example |
|---|---|
| cat | A black cat sitting on a brick wall under moonlight, gazing into the distance |
| woman | A woman in her thirties at a café window, holding a coffee cup in both hands, looking outside |
| food photo | A strawberry crepe on a white ceramic plate with whipped cream dripping down the side |
2. Environment and Background — The Space Around the Subject
| Weak Example | Strong Example |
|---|---|
| forest background | Misty dawn coniferous forest, moss-covered rocks, wet fallen leaves |
| city | Rainy night Tokyo alley, neon reflections, wet asphalt |
3. Lighting — Direction, Texture, Temperature
| Weak Example | Strong Example |
|---|---|
| bright photo | 3pm side natural light, soft shadows, warm yellow tone |
| dramatic lighting | Single spotlight, bottom-up angle, cold blue light, strong contrast |
4. Camera and Composition — Angle, Lens, Ratio
| Weak Example | Strong Example |
|---|---|
| close-up | 85mm lens, shallow depth of field (f/1.8), focus on eyes, background bokeh |
| full-body shot | 20mm wide angle, high angle, full body, rule-of-thirds composition |
5. Quality and Style — Declaring the Rendering Direction
- Realistic photography:
photorealistic, 8K, shot on Sony A7R V, RAW - Illustration:
digital illustration, soft watercolor, studio ghibli style - Commercial ad:
luxury campaign, ultra-photoreal, commercial photography
Frequently Asked Questions
Q: Where do I start to write better AI image prompts?
Start by being specific about the subject and action. Instead of "cat," write "a black cat sitting on a wall looking into the distance" — adding action and state immediately improves precision. Lighting and camera angle can be added as a next step.
Q: Do Korean or English prompts work better?
Newer models like Flux and Imagen 4 handle Korean prompts well. However, since models are trained primarily on English data, technical style descriptors (photorealistic, bokeh, studio lighting, etc.) tend to produce more stable results in English. A recommended approach: describe the full context in Korean and add technical modifiers in English.
Q: Do longer prompts give better results?
Not necessarily. What matters is specificity, not length. A 30-word prompt that precisely covers the 5 axes (subject, environment, lighting, composition, style) will outperform 100 vague words listed together. Specifying too many elements simultaneously can cause AI to struggle balancing them, leaving none properly realized.
Q: Why do I get different results every time with the same prompt?
AI image generation is a probability-based process. Even with the same prompt, different seed values used in each generation produce different results. To reproduce a specific result, fix the seed value, or use a structured prompt to narrow the range of probabilistic variation by specifying more elements.
Q: How do I enter a prompt in XBRUSH?
Type your prompt into the text input field on the XBRUSH image generation screen. To use a reference image together, upload the image first and then add the prompt.