GPT-5.4 Launches: OpenAI’s New AI Model Targets Real Work, Coding, and Agents

Key Highlights:

  • GPT-5.4 Benchmarks show major gains in knowledge work, web browsing, and agent tasks.
  • The model introduces native computer-use capabilities and a context window up to 1 million tokens.
  • The new model improves coding, reasoning, and tool use for complex workflows.

OpenAI has officially introduced GPT-5.4, its latest frontier AI model designed for professional and real-world work. The new model is rolling out across ChatGPT, the OpenAI API, and Codex, alongside a higher-performance version called GPT-5.4 Pro.

The company says the new model combines improvements in reasoning, coding, tool use, and agent workflows. The goal is simple: help AI systems complete complex tasks with fewer prompts and less back-and-forth.

What Is GPT-5.4 and Why Is It Important?

GPT-5.4 is OpenAI’s newest reasoning model focused on professional productivity. It builds on earlier models such as GPT-5.2 and GPT-5.3-Codex but merges their capabilities into a single system.

The model can work across spreadsheets, presentations, documents, and coding tasks. It can also use external tools and software more efficiently.

According to OpenAI, the new model aims to reduce friction when people use AI for work. Instead of requiring multiple prompts, the model is designed to produce complete results in fewer steps.

In ChatGPT, GPT-5.4 appears as GPT-5.4 Thinking, which can outline a plan before generating its final response. This allows users to adjust instructions mid-response while the model is still working.

Performance Compared to Earlier Models

OpenAI claims significant improvements across several benchmarks.

On GDPval, which evaluates real-world knowledge work across 44 occupations, GPT-5.4 matches or exceeds industry professionals in 83 percent of comparisons. GPT-5.2 previously achieved about 70.9 percent.

The model also improves coding benchmarks. On SWE-Bench Pro, GPT-5.4 achieves 57.7 percent accuracy, slightly ahead of GPT-5.3-Codex and GPT-5.2.

Agent performance also shows notable gains.

For example:

  • BrowseComp: 82.7 percent accuracy
  • OSWorld-Verified: 75 percent success rate in desktop navigation
  • Toolathlon: 54.6 percent accuracy for multi-step tool workflows

These improvements suggest the model handles complex tasks, such as research or multi-tool automation, more reliably.

What New Features Does It Introduce?

One of the most notable additions is native computer-use capability.

This allows the model to interact with software environments using keyboard, mouse, and screenshots. Developers can build AI agents that operate computers, browse websites, and execute tasks across applications.

The model also supports up to 1 million tokens of context. This allows it to process very large datasets, documents, or workflows in a single session.

Another key feature is tool search.

Previously, when developers connected tools to an AI model, all tool definitions were included in the prompt. That approach increased token usage and slowed responses.

With tool search, GPT-5.4 receives a lightweight list of tools and retrieves detailed definitions only when needed. In testing, this approach reduced token usage by about 47 percent.

How Does It Improve Coding and Development?

GPT-5.4 integrates the coding strengths of GPT-5.3-Codex while expanding its reasoning and tool capabilities.

The model performs well on long development tasks that require planning, debugging, and iteration. OpenAI says GPT-5.4 produces stronger front-end code with better visual design and functionality.

Developers using Codex can also enable /fast mode, which increases token generation speed by about 1.5 times without reducing model intelligence.

Another experimental feature allows Codex to visually debug web apps using Playwright. This enables the model to test an application while building it.

How Accurate Is It in Knowledge Work?

OpenAI focused heavily on improving document-based work.

In spreadsheet modeling tasks similar to those performed by junior investment banking analysts, GPT-5.4 achieved a mean score of 87.3 percent. GPT-5.2 scored 68.4 percent in the same test.

Human evaluators also preferred presentations generated by GPT-5.4 68 percent of the time due to better visual structure and design.

The model also shows improvements in factual accuracy. According to OpenAI, GPT-5.4 responses contain 33 percent fewer false claims and 18 percent fewer overall errors compared to GPT-5.2.

How Does It Improve AI Agents and Web Research?

The model introduces stronger agentic workflows.

GPT-5.4 performs better when browsing the web to locate hard-to-find information. On the BrowseComp benchmark, it improved performance by 17 percentage points compared with GPT-5.2.

This means the model can persistently search across multiple sources and synthesize results more effectively.

For developers building AI agents, GPT-5.4 also improves tool calling. The model decides when to use tools more accurately and completes workflows in fewer steps.

Availability

OpenAI says the new model is rolling out gradually starting today.

Availability includes:

  • ChatGPT: GPT-5.4 Thinking for Plus, Team, and Pro users
  • API: Available as gpt-5.4
  • Codex: Integrated with new developer features

GPT-5.4 Pro is available for Pro and Enterprise users who need maximum performance.

OpenAI also confirmed that GPT-5.2 Thinking will remain available as a legacy model for three months before being retired in June 2026.

Conclusion

The launch of the new model marks OpenAI’s next step in building AI systems capable of real professional work. By combining stronger reasoning, coding, agent capabilities, and tool integration, the company aims to move AI from simple chat interactions to full task execution.

As the rollout expands across ChatGPT, the API, and developer tools, GPT-5.4 could reshape how organizations use AI for research, automation, and software development.

133 Views