News

GPT Image 2 Is Here: What Changed, What It Costs, and What Builders Should Know

On 21 April 2026, OpenAI launched GPT Image 2: the first image generation model with native reasoning. It hit number one on Image Arena within 12 hours of going live. It replaces the entire DALL-E product line, which is being retired on 12 May 2026.

I build AI systems for a living. Most of them are text-based pipelines, not image generation workflows. But this release matters beyond the image space because it signals where the entire industry is heading: models that think before they generate. Here is what actually shipped, what it costs, and what it means for builders.

99%
Text rendering accuracy
4K
Max resolution
2x
Faster than DALL-E 3
#1
Image Arena in 12 hours

What is GPT Image 2

OpenAI calls it "GPT for images." The architecture is undisclosed, but the approach is clear: this is not a standalone diffusion model like DALL-E was. GPT Image 2 integrates image generation into the same reasoning pipeline that powers GPT's text capabilities. It does not just generate. It thinks about what it is generating.

The model replaces the entire DALL-E line. DALL-E 2 and DALL-E 3 are both being retired on 12 May 2026. If you have production systems calling those models, you have roughly three weeks to migrate.

The practical difference is visible immediately. Ask DALL-E 3 for a poster with three paragraphs of text, a logo, and a specific layout hierarchy, and you get something approximate. Ask GPT Image 2 for the same thing and it reasons about typography, spatial placement, reading order, and visual hierarchy before generating the output.


What actually improved

Text rendering

This is the headline feature. GPT Image 2 renders text at 99% accuracy. Previous models (DALL-E 3, Midjourney, Stable Diffusion) all struggled with text, producing misspellings, letter swaps, and garbled words. That problem is essentially solved.

The implication is significant: marketing materials, infographics, signage, product mockups, and any visual asset requiring readable text are now viable outputs from an image generation API. Print-ready assets from a single prompt.

Dense layouts and composition

GPT Image 2 handles typographic hierarchy in ways previous models could not. Headers, subheaders, body text, captions, all at different sizes and weights, all correctly positioned. It understands that a poster is not just "text on an image" but a designed composition with visual structure.

Photorealism and camera references

The model responds to camera and film references. Ask for "shot on Portra 400" or "35mm f/1.4 shallow depth of field" and the output reflects those characteristics. The photorealistic mode produces images that are increasingly difficult to distinguish from photographs, particularly for product shots, environmental scenes, and portraits.

Multilingual text

CJK characters (Chinese, Japanese, Korean), Hindi, Bengali, and other non-Latin scripts render correctly and integrate into design compositions. This is a meaningful improvement for global marketing teams that need localised visual assets.

Spatial reasoning, lighting, and materials

GPT Image 2 has a better understanding of physical space, light behaviour, and material properties. Reflections, transparency, subsurface scattering, and complex material interactions (glass on wood, metal under directional light) are noticeably more accurate.

Resolution and speed

Outputs go up to 4K resolution. Aspect ratios are flexible from 3:1 to 1:3, covering everything from social media stories to ultrawide banners. Generation speed is approximately 2x faster than DALL-E 3.

Official OpenAI announcement covering GPT Image 2 capabilities.


Thinking mode: what makes it different

The real differentiator is not any single feature. It is that GPT Image 2 reasons about images before generating them. OpenAI calls this "thinking mode," and it fundamentally changes what the model can do.

In practice, this means the model can:

  • Search the web during generation. Ask it to create an infographic about a current topic and it pulls accurate data before composing the visual.
  • Generate from data. Feed it a dataset and it produces charts, graphs, and infographics that actually reflect the numbers.
  • Reason about visual consistency. Request 8 images from a single prompt and get consistent characters, branding, and style across all of them.
  • Handle complex constraints. "Create a 4-panel comic with the same character, consistent lighting, and text bubbles that read left to right." Previous models would produce four unrelated images. GPT Image 2 maintains coherence across the set.
  • Solve visual problems. "Show the mathematical proof for the Pythagorean theorem as a visual diagram with labelled steps." It reasons about the math before rendering the diagram.

The gap between "image generator" and "visual reasoning engine" is exactly where this model sits. It does not always get it right (more on that below), but the fact that it attempts to reason about composition, accuracy, and coherence before generating pixels is a genuine architectural shift.


API and pricing

GPT Image 2 is available via the OpenAI API under the model name gpt-image-2. Pricing is token-based, not per-image, which is a departure from DALL-E's flat-rate pricing.

Token pricing

Direction Cost per 1M tokens
Input tokens $8
Output tokens $32

Approximate per-image costs

Quality Size Approx. cost
Low 1024x1024 ~$0.011
Medium 1024x1024 ~$0.02
High 1024x1024 ~$0.04
High 2048x2048 ~$0.17

API parameters

  • size: Flexible aspect ratios from 3:1 to 1:3, up to 4K. Common presets: 1024x1024, 1536x1024, 2048x2048.
  • quality: low, medium, high. Controls detail level and token usage.
  • n: Number of images per request, up to 4.
  • output_format: png, jpeg, webp.

Rate limit is 250 images per minute (IPM) at the standard tier. The model is also integrated into OpenAI's Codex environment for programmatic generation workflows.

ElevenLabs breakdown of GPT Image 2 capabilities and comparisons.


Where it still struggles

GPT Image 2 is a significant leap, but it is not flawless. Early testing from the community has surfaced consistent weak points.

Lighting inconsistencies

Scenes with multiple light sources or strong directional light can produce contradictions. The widely shared "sun behind the car" example shows the model placing the light source behind the subject while casting shadows that suggest front lighting.

Dense texture issues

Fine-grained textures (fur, fabric weave, dense foliage) can appear mushy or over-smoothed at close inspection. The model prioritises overall composition over micro-detail.

Darker image tendency

Outputs tend to skew darker than expected. Scenes that should be bright and airy often come out with heavier contrast and deeper shadows. This is correctable in post-processing, but it adds a step.

Multi-language in the same prompt

While individual language support is strong, mixing multiple languages in a single prompt (e.g., English headline with Japanese subtitle and Arabic body text) can produce inconsistent results. Single-language compositions are reliable. Multi-language in one image is still hit-or-miss.

Logo accuracy

Reproducing existing logos accurately remains difficult. The model can generate logos that are stylistically close, but exact reproduction of complex brand marks (intricate wordmarks, multi-element logos) is unreliable.

Thinking mode latency

When the model engages its reasoning capabilities (thinking mode), generation takes 15 to 30 seconds. Standard generation is fast. Complex prompts that trigger deep reasoning add noticeable latency.

None of these are dealbreakers. They are the kind of limitations that matter when you are evaluating whether to put this model into a production pipeline versus using it for one-off creative work. For production use, plan for post-processing on lighting and brightness. For creative use, the limitations are manageable with prompt refinement.


What this means if you build things

Three things stand out from a builder's perspective.

DALL-E is gone. DALL-E 2 and DALL-E 3 retire on 12 May 2026. If you have API integrations calling those models, migration is not optional. The model name changes to gpt-image-2, and the pricing model shifts from flat-rate per image to token-based. Audit your image generation costs before and after migration.

Image generation entered the reasoning era. This is the bigger signal. GPT Image 2 reasons about what it generates. It searches the web for context. It maintains consistency across multi-image sets. These are not image generation features. They are intelligence features applied to image generation. Expect every major model provider to follow this pattern.

Marketing assets are now API-viable. With 99% text accuracy, the gap between "AI-generated" and "production-ready" just closed for a specific category of visual work: social media graphics, email banners, product mockups, event posters, infographics. The category of work that previously required a designer to open Figma or Canva and type text onto a template can now come from an API call. That does not eliminate the designer. It eliminates the repetitive production layer.

This is the same pattern I see across every AI tool release. The production layer gets automated. The strategy, the creative direction, the taste, the business context: those remain human decisions. I wrote about this dynamic in detail when comparing Claude vs ChatGPT for business automation. The principle holds regardless of the model or the modality.

If you are building AI systems and want to understand how image generation fits into a broader automation stack, or if you need help migrating from DALL-E before the 12 May deadline, the AI Consulting and Roadmapping service is where I help teams make these decisions.

Hands-on tutorial covering GPT Image 2 features and use cases.

OpenAI API gpt-image-2 Image Generation Token Pricing Codex Integration

Frequently asked questions

What is GPT Image 2?

GPT Image 2 is OpenAI's latest image generation model, launched on 21 April 2026. It replaces the DALL-E line and is the first image model with native reasoning capabilities. It generates images with 99% text accuracy, up to 4K resolution, and supports flexible aspect ratios from 3:1 to 1:3.

Is GPT Image 2 free?

GPT Image 2 is available for free to ChatGPT Free users with daily limits. Plus, Pro, and Team subscribers get higher rate limits. API access is usage-based, starting at roughly $0.02 per standard-quality image at 1024x1024.

How does GPT Image 2 compare to DALL-E 3?

GPT Image 2 is a significant upgrade over DALL-E 3. Text rendering accuracy jumped from roughly 70% to 99%. It supports native reasoning during generation, handles dense layouts and typographic hierarchy, generates up to 4K resolution, runs approximately 2x faster, and supports flexible aspect ratios. DALL-E 2 and DALL-E 3 are being retired on 12 May 2026.

Can GPT Image 2 render text accurately?

Yes. GPT Image 2 achieves 99% text rendering accuracy, a major improvement over previous models. This makes it viable for print-ready marketing materials, infographics, signage, and any visual asset that requires readable, correctly spelled text.

Is GPT Image 2 available via API?

Yes. GPT Image 2 is available via the OpenAI API under the model name gpt-image-2. It supports parameters for size, quality (low, medium, high), number of images (up to 4), and output format (PNG, JPEG, WebP). Pricing is token-based at $8 per million input tokens and $32 per million output tokens.


Sources and credits

Share
X LinkedIn Reddit
Build Yours

Want a system
like this one?

Book a free 30-minute call. We map your situation, identify the highest-impact automation, and figure out if we are a fit.

Book Free 30-min Call