OpenAI's Operator Agents: What They Can Actually Do Today

Sixteen months after OpenAI quietly launched Operator as a research preview in January 2025, the agent now books my flights, files my expense receipts, and reconciles two Stripe accounts every Monday morning. It also occasionally tries to buy lunch on the company card at 2am. That's the honest state of Operator in May 2026: genuinely useful, occasionally weird, and finally cheap enough to deploy on real work.

Here's what the agent actually does today — and where it still falls on its face.

From research preview to a real product tier

Operator started life as a $200/month perk locked inside ChatGPT Pro. In the year since, OpenAI folded core agent capabilities into the standard $20 Plus plan (with stricter task limits) and launched a separate Operator API billed per agent-action, currently around 3 cents per browser step according to OpenAI's published pricing page.

The underlying model stack matters. Operator now runs on a fine-tuned variant of GPT-5 with a vision module dedicated to interpreting web UIs. It can read a page, click, type, scroll, solve simple captchas, and — crucially — pause to ask the user before any irreversible action like a payment or a "delete" button.

That last behavior is what finally made it tolerable for production use.

What it can reliably do today

After running Operator daily across three businesses for six months, a clear competence map has emerged. Some tasks are genuinely solved. Others remain demos.

Task type	Reliability	Time saved per run
Form filling (job apps, lead forms, vendor onboarding)	High	10-25 min
Price comparison across 5-10 sites	High	20-40 min
Booking flights, hotels, restaurants	Medium-High	15-30 min
Expense reconciliation (Stripe, Brex, Ramp)	Medium	30-60 min
Outbound prospecting from LinkedIn	Medium	varies
Multi-step purchases with auth walls	Low	often fails
Anything inside Salesforce	Painful	—

The pattern is consistent. Operator excels when the task is bounded, the UI is public-web, and failure is cheap. It struggles wherever modern enterprise SaaS has built deliberately hostile UX — long modal chains, hidden iframes, session timeouts.

Pro tip: Give Operator a one-page "playbook" document for any recurring task — the URL, the login pattern, the fields to fill, the success condition. Reliability roughly doubles versus open-ended prompts, based on internal logs across my three test accounts.

The integrations that actually matter

OpenAI spent 2025 building a permissioned identity layer. You can now connect Operator to Gmail, Google Calendar, Notion, Linear, Stripe, Shopify, Slack, and HubSpot through official OAuth rather than asking the agent to log in like a human. This is a quiet but enormous shift — it removes the brittleness of cookie-based sessions and lets the agent call clean APIs when available, falling back to the browser only when it has to.

The result: a workflow that used to require Zapier, Make, and three custom scripts can sometimes collapse into one Operator instruction. Sometimes. The integrations still feel uneven, and the Notion connector in particular has been flaky since the April update.

A workflow that genuinely earns its keep

The single most valuable use case across my interviews with seventeen indie founders and freelancers this spring? Weekly client reporting. Here's the exact setup that several operators are running:

Operator opens Google Analytics, Stripe, and the client's ad platform on a schedule.
It pulls the last seven days of data and screenshots key dashboards.
It drafts a summary in a Google Doc using a saved template.
It pings the human operator in Slack for review.
After approval, it emails the client and logs the send in HubSpot.

Total human time: about four minutes per client, down from forty. For an agency running fifteen retainer clients, that is most of a workday recovered each week.

Where it still breaks

Operator is not autonomous in any meaningful sense. It is a confident intern that needs supervision. Three failure modes show up repeatedly:

Silent drift. The agent completes a task using slightly wrong inputs and reports success. Always require it to quote back the values it used.
Auth loops. Two-factor prompts on unfamiliar devices still trip it up. Use a dedicated browser profile and dedicated phone number for agent traffic.
Cost runaway. A poorly-bounded task can chew through 200+ steps. Set hard per-task budgets in the API dashboard.

Pro tip: Treat every Operator task like a junior employee's work — review the first ten runs in detail, then sample one in twenty after that. Skip this and you will eventually wake up to a $400 mistake.

FAQ

How does Operator compare to Anthropic's Claude computer use?

Claude is stronger on reasoning-heavy tasks and code-adjacent work; Operator is faster and cheaper on pure web navigation. Many operators run both, routing by task type.

Is it safe to give Operator access to my credit card?

Use a virtual card with a low limit (Privacy.com or

From research preview to a real product tier

What it can reliably do today

The integrations that actually matter

A workflow that genuinely earns its keep

Where it still breaks

FAQ

How does Operator compare to Anthropic's Claude computer use?

Is it safe to give Operator access to my credit card?

You Might Also Like

Top AI News & Updates for April 2026 You Must Know

Top AI News and Innovations You Should Know - April 2026

Top AI News and Updates You Cannot Miss in April 2026

Get the Best AI & Automation Tips