Advanced AI image generation and editing powered by OpenAI GPT-4o
Experience the power of GPT-4o for image generation and editing! Create stunning visuals, refine images with maskless edits, and leverage reference images, all within a natively multimodal system.
GPT-4o Image Generation, powered by OpenAI’s GPT-4o model, is a groundbreaking tool that integrates text and image processing into a single, natively multimodal system. This allows for unprecedented flexibility in creating and editing visuals. ImagenCraft provides a platform to utilize GPT-4o’s capabilities, excelling at accurate text rendering, precise prompt following, and leveraging in-context learning from uploaded images. It offers distinct Generate and Edit modes to cover a wide range of creative and practical applications.
Integrates text and image processing in one system for unprecedented flexibility.
Seamlessly integrates clear and precise text into generated images.
Excels at adhering to detailed instructions and handling multiple objects.
Leverages uploaded images as visual context and inspiration.
GPT-4o Image offers two primary modes to suit your creative workflow:
In Generate mode, you can create entirely new images from scratch by providing a text prompt.
Inputs & Options (Generate Mode):
low
(Faster), medium
, high
(Slower), auto
.1024x1024
(Square), 1024x1536
(Portrait), 1536x1024
(Landscape), auto
.opaque
, transparent
(Note: Availability depends on Output Format).auto
(Standard), low
(Less Restrictive).Explore the diverse range of images you can generate from text prompts:
Edit mode allows you to modify an existing image by providing written instructions. It offers different edit types to suit your needs. Requires an Image Upload (except for Reference Images).
Edit Types Available:
Draw Mask
Modify specific areas by drawing a mask on the uploaded image.
Maskless Editing
Modify the image based on a prompt without drawing a mask. The AI determines where to apply changes.
Reference Images
Generate a new image based on a prompt and up to 4 uploaded reference images. The AI blends concepts from the references.
Common Inputs & Options (Edit Mode):
See how GPT-4o Image can modify existing images using its editing capabilities:
Modify an image based on a text prompt without drawing a mask:
Generate a new image by blending concepts from multiple reference images:
Prompting is crucial in all modes, but the focus shifts depending on whether you are creating, editing, or customizing. GPT-4o models are known for strong prompt interpretation and respond well to descriptive, clear language.
A good starting point for any prompt is to define the core elements:
The main object, person, animal, or scenery.
The environment or setting for the subject.
The artistic style (e.g., painting, photograph, sketch, or more specific styles).
After establishing these basics, refine your prompt by adding more details through iteration until the generated image aligns with your vision.
Be Specific and Detailed
Employ descriptive language, detailed adjectives, and adverbs to paint a clear picture for Imagen. Example: Instead of “a park,” try “A park in the spring next to a lake, the sun sets across the lake, golden hour, red wildflowers.”
Use Natural Language
Formulate prompts using descriptive sentences, as you would describe an image to another person.
Consider Negative Prompts
Use negative prompts to steer away from unwanted elements (Note: support varies by model).
Prompt Enhancement
For models supporting prompt enhancement (like imagen-3.0-generate-002
), a shorter prompt can be automatically expanded for potentially better results. This is enabled by default.
Create Mode
Focus on fully describing the desired image from scratch, combining subject, context, and style with rich details. Example: “A futuristic cityscape at sunset, high angle view, digital painting, vibrant colors.”
Edit Mode (Draw Mask)
Describe what you want to appear within the masked area. Focus on the object or scene you want to generate and how it should blend with the existing image. For best results, use a description of the masked area. Avoid single-word prompts. Example: (Mask over a blank wall) Prompt: “A vibrant graffiti mural covering the wall.”
Edit Mode (Maskless Editing)
Describe the desired changes to the image in natural language. The AI will interpret your instructions and apply the edits without a mask. Example: (Input Image: Photo of a car) Prompt: “Change the car to red.”
Edit Mode (Reference Images)
Describe the desired image, indicating how the AI should draw inspiration from the reference image(s). You can reference specific images if using multiple. Example: (Reference Image 1: A character, Reference Image 2: A scene) Prompt: “Generate a portrait of the person from image1 standing in the setting from image2.”
Edit Mode (Outpainting)
Describe the content you want to appear in the expanded areas around the original image. You can provide an empty string to create the edited images, but a description of the masked area is recommended for best results. Example: (Outpainting around a portrait) Prompt: “A lush forest extending around the person.”
Edit Mode (Product Image)
Describe the desired background or environment for the product. Example: (Input: Product image) Prompt: “Place the product on a wooden table in a sunny cafe.”
Customize Mode (Subject)
Describe the desired image, referencing the subject from your input image. Use the format [referenceId]
to refer to the subject image(s).
Example: (Input: Subject Image with referenceId 1) Prompt: “Generate an image of the person [1] as a knight in shining armor.” (Referencing the subject with [1]
).
Customize Mode (Style)
Describe the desired image content, indicating it should be in the style of your input image. Use the format [referenceId]
to refer to the style image.
Example: (Input: Style Image with referenceId 1) Prompt: “Generate an image of a cat sitting on a chair in the style of image [1].”
Customize Mode (ControlNet)
Your prompt describes the content and style of the image, while the ControlNet image provides the structural guide. Ensure your prompt aligns with the structural guidance (e.g., edges, pose) provided by the ControlNet image. Example: (Input: Control Image - Canny edges of a building) Prompt: “A beautiful watercolor painting of an ancient castle at sunset.”
Generating Text in Images
GPT-4o can add text to images.
Prompt Parameterization
For API/SDK use, you can parameterize prompts with placeholders like {logo_style}
to be filled by user inputs in an interface.
Example Template: “A logo for a company on a solid color background. Include the text .”
Using Specific Styles
Specify artistic styles (e.g., “photography,” “illustration,” “digital art”) or reference historical art movements (“impressionism,” “renaissance,” “pop art”) or specific artists. Example: “An [art style or creation technique] of an angular sporty electric sedan with skyscrapers in the background.”
Photography Modifiers
Use keywords to influence camera settings and style:
Shapes and Materials
Describe objects made of unusual materials or shapes. Example: “a duffle bag made of cheese,” “neon tubes in the shape of a bird,” “an armchair made of paper, studio photo, origami style.”
Image Quality Modifiers
Use keywords to indicate desired quality level:
Aspect Ratios and Use Cases
Choose the aspect ratio that best suits your content:
Photorealistic Images
Combine keywords for specific photorealistic subjects:
Navigate through the different modes and tools with this general workflow:
Select Your Mode
Choose Generate to create a new image or Edit to modify an existing one.
Select Your Edit Type (Edit Mode)
If in Edit mode, select the type of editing you want to perform: Draw Mask, Maskless Editing, or Reference Images.
Upload Image(s) (If Applicable)
If in Edit mode, upload the necessary base image (Draw Mask, Maskless Editing) or reference images (Reference Images). If using Draw Mask, you will also need to draw a mask on the uploaded image.
Provide Your Prompt
Enter your text description. The prompt’s focus depends on your selected mode and edit type.
Adjust Settings
Configure settings like Quality, Size, Background, and Moderation.
Generate Image(s)
Click the “Generate” button.
Review and Refine
Examine the generated image(s). Iterate by adjusting prompts, settings, or inputs if needed.
GPT-4o Image offers a range of input parameters, varying based on the selected Mode and Edit Type.
Your text description guiding the image creation or modification. (Max 1000 characters).
Controls the level of detail and generation time.
low
, medium
, high
, auto
.The desired output resolution and aspect ratio.
1024x1024
, 1024x1536
, 1536x1024
, auto
.Choose the background type for the generated image.
opaque
, transparent
. (Note: Transparency availability depends on Output Format).Adjusts content filtering sensitivity.
auto
(Standard), low
(Less Restrictive).Generate Mode
Edit Mode (Draw Mask, Maskless Editing)
The base image file to be edited. (Max 10MB).
Edit Mode (Draw Mask)
The mask image (Base64 string or URL) defining the area to be edited. Created using the built-in mask editor.
Edit Mode (Reference Images)
Up to 4 image files to use as visual references. (Max 1MB each recommended).
Select the mode and edit type that precisely match your desired outcome (generation, specific edit type, or customization method).
Tailor your prompt content and detail based on the selected mode and tool’s requirements. Be specific and use natural language.
For Edit mode, start with clear, high-resolution images for the best results.
Use multi-turn conversations to adjust outputs until you achieve the desired result.
GPT-4o Image provides a powerful and versatile platform for AI image generation and editing, leveraging the advanced capabilities of OpenAI’s GPT-4o model. With its distinct modes, intuitive editing types, and strong prompt interpretation, it empowers creators to achieve stunning visual results and communicate effectively through imagery.