GPT-4o Image: Native Multimodal Creation & Editing
Overview
GPT-4o Image Generation, powered by OpenAI’s GPT-4o model, is a groundbreaking tool that integrates text and image processing into a single, natively multimodal system. This allows for unprecedented flexibility in creating and editing visuals. ImagenCraft provides a platform to utilize GPT-4o’s capabilities, excelling at accurate text rendering, precise prompt following, and leveraging in-context learning from uploaded images. It offers distinct Generate and Edit modes to cover a wide range of creative and practical applications.Natively Multimodal
Accurate Text Rendering
Precise Prompt Following
In-Context Learning
Modes of Operation
GPT-4o Image offers two primary modes to suit your creative workflow:Generate Mode: Create Images from Text
In Generate mode, you can create entirely new images from scratch by providing a text prompt.
- Prompt: Your text description of the desired image. (Max 1000 characters)
- Quality: Control the level of detail and generation time.
- Possible values:
low
(Faster),medium
,high
(Slower),auto
.
- Possible values:
- Size: Select the output resolution and aspect ratio.
- Possible values:
1024x1024
(Square),1024x1536
(Portrait),1536x1024
(Landscape),auto
.
- Possible values:
- Background: Choose the background type.
- Possible values:
opaque
,transparent
(Note: Availability depends on Output Format).
- Possible values:
- Moderation: Adjust content filtering sensitivity.
- Possible values:
auto
(Standard),low
(Less Restrictive).
- Possible values:
- Output Format: File format (PNG, JPEG, WebP).
- Compression Quality: For JPEG/WebP output (0-100).
Generate Mode Examples
Explore the diverse range of images you can generate from text prompts:Sustainable Campaign

Skincare Thumbnail

Infographic

Single Page Comic

Edit Mode: Maskless Image Modification
Edit mode allows you to modify an existing image by providing written instructions. It offers different edit types to suit your needs. Requires an Image Upload (except for Reference Images).
Draw Mask
Draw Mask
- Inputs: Prompt (describing changes in masked area), Uploaded Image, Mask Image (drawn), Quality, Size, Background, Moderation.
Maskless Editing
Maskless Editing
- Inputs: Prompt (describing changes), Uploaded Image, Quality, Size, Background, Moderation.
Reference Images
Reference Images
- Inputs: Prompt (describing how to use references), Reference Images (up to 4), Quality, Size, Background, Moderation.
- Prompt: Describes the desired changes (Maskless, Draw Mask) or how to use references (Reference Images).
- Quality: Control output quality (Same options as Generate).
- Size: Select output size (Same options as Generate).
- Background: Choose background type (Same options as Generate).
- Moderation: Adjust content filtering (Same options as Generate).
- Uploaded Image: The base image to edit (for Draw Mask, Maskless).
- Mask Image: The mask drawn on the uploaded image (for Draw Mask).
- Reference Images: Images to use as visual inspiration (for Reference Images).
Edit Mode Examples
See how GPT-4o Image can modify existing images using its editing capabilities:Maskless Editing Example
Modify an image based on a text prompt without drawing a mask:Input Image

Result

Reference Images Example
Generate a new image by blending concepts from multiple reference images:Reference Image 1

Reference Image 2
.png)
Reference Image 3
.png)
Result

Mastering Prompts for GPT-4o Image
Prompting is crucial in all modes, but the focus shifts depending on whether you are creating, editing, or customizing. GPT-4o models are known for strong prompt interpretation and respond well to descriptive, clear language.Prompt Writing Basics: Subject, Context, and Style
A good starting point for any prompt is to define the core elements:Subject
Context/Background
Style
General Prompting Principles:
Be Specific and Detailed
Be Specific and Detailed
Use Natural Language
Use Natural Language
Consider Negative Prompts
Consider Negative Prompts
Prompt Enhancement
Prompt Enhancement
imagen-3.0-generate-002
), a shorter prompt can be automatically expanded for potentially better results. This is enabled by default.Prompting for Specific Modes/Tools:
Create Mode
Create Mode
Edit Mode (Draw Mask)
Edit Mode (Draw Mask)
Edit Mode (Maskless Editing)
Edit Mode (Maskless Editing)
Edit Mode (Reference Images)
Edit Mode (Reference Images)
Edit Mode (Outpainting)
Edit Mode (Outpainting)
Edit Mode (Product Image)
Edit Mode (Product Image)
Customize Mode (Subject)
Customize Mode (Subject)
[referenceId]
to refer to the subject image(s).
Example: (Input: Subject Image with referenceId 1) Prompt: “Generate an image of the person [1] as a knight in shining armor.” (Referencing the subject with [1]
).Customize Mode (Style)
Customize Mode (Style)
[referenceId]
to refer to the style image.
Example: (Input: Style Image with referenceId 1) Prompt: “Generate an image of a cat sitting on a chair in the style of image [1].”Customize Mode (ControlNet)
Customize Mode (ControlNet)
Advanced Prompting Techniques:
Generating Text in Images
Generating Text in Images
- Iterate: You may need multiple attempts to get the desired text appearance and placement.
- Keep it Short: Limit text to 25 characters or less.
- Multiple Phrases: Experiment with 2-3 distinct phrases.
- Guide Placement: While placement can vary, you can attempt to guide it in the prompt.
- Inspire Font Style/Size: Specify general font styles or size indications (small, medium, large). Example: “A poster with the text ‘Summerland’ in bold font as a title, underneath this text is the slogan ‘Summer never felt so good’.”
Prompt Parameterization
Prompt Parameterization
{logo_style}
to be filled by user inputs in an interface.
Example Template: “A logo for a company on a solid color background. Include the text .”Using Specific Styles
Using Specific Styles
Photography Modifiers
Photography Modifiers
- Camera Proximity: “Close up,” “taken from far away,” “zoomed out.”
- Camera Position: “aerial,” “from below.”
- Lighting: “natural lighting,” “dramatic lighting,” “warm lighting,” “cold lighting,” “studio photo.”
- Camera Settings: “motion blur,” “soft focus,” “bokeh,” “portrait.”
- Lens Types: “35mm,” “50mm,” “fisheye,” “wide angle,” “macro,” “telephoto zoom.”
- Film Types: “black and white,” “polaroid.”
Shapes and Materials
Shapes and Materials
Image Quality Modifiers
Image Quality Modifiers
- General: “high-quality,” “beautiful,” “stylized.”
- Photos: “4K,” “HDR,” “Studio Photo.”
- Art/Illustration: “by a professional,” “detailed.” Example: “4k HDR beautiful photo of a corn stalk taken by a professional photographer.”
Aspect Ratios and Use Cases
Aspect Ratios and Use Cases
- Square (1:1): General use, social media posts.
- Fullscreen (4:3): TV, media, film, captures more horizontally.
- Portrait full screen (3:4): Fullscreen rotated, captures more vertically.
- Widescreen (16:9): TVs, monitors, mobile (landscape), good for landscapes.
- Portrait (9:16): Widescreen rotated, for tall objects, mobile (portrait), short-form video.
Photorealistic Images
Photorealistic Images
- Portraits: Prime/zoom lens (24-35mm), “black and white film,” “Film noir,” “Depth of field,” “duotone.”
- Objects (Still Life): Macro lens (60-105mm), “High detail,” “precise focusing,” “controlled lighting.”
- Motion: Telephoto zoom (100-400mm), “Fast shutter speed,” “Action or movement tracking.”
- Wide-angle (Landscape/Astronomical): Wide-angle lens (10-24mm), “Long exposure times,” “sharp focus,” “smooth water or clouds.”
How to Use GPT-4o Image
Navigate through the different modes and tools with this general workflow:Select Your Mode
Select Your Edit Type (Edit Mode)
Upload Image(s) (If Applicable)
Provide Your Prompt
Adjust Settings
Generate Image(s)
Review and Refine
Input Parameters and Options
GPT-4o Image offers a range of input parameters, varying based on the selected Mode and Edit Type.Common Parameters (Available across multiple Modes/Edit Types):
- Possible enum values:
low
,medium
,high
,auto
.
- Possible enum values:
1024x1024
,1024x1536
,1536x1024
,auto
.
- Possible enum values:
opaque
,transparent
. (Note: Transparency availability depends on Output Format).
- Possible enum values:
auto
(Standard),low
(Less Restrictive).
Mode/Edit Type Specific Parameters:
Generate Mode
Generate Mode
Edit Mode (Draw Mask, Maskless Editing)
Edit Mode (Draw Mask, Maskless Editing)
Edit Mode (Draw Mask)
Edit Mode (Draw Mask)
Edit Mode (Reference Images)
Edit Mode (Reference Images)