Sixteen months after OpenAI quietly launched Operator as a research preview in January 2025, the agent now books my flights, files my expense receipts, and reconciles two Stripe accounts every Monday morning. It also occasionally tries to buy lunch on the company card at 2am. That's the honest state of Operator in May 2026: genuinely useful, occasionally weird, and finally cheap enough to deploy on real work.
Here's what the agent actually does today — and where it still falls on its face.
From research preview to a real product tier
Operator started life as a $200/month perk locked inside ChatGPT Pro. In the year since, OpenAI folded core agent capabilities into the standard $20 Plus plan (with stricter task limits) and launched a separate Operator API billed per agent-action, currently around 3 cents per browser step according to OpenAI's published pricing page.
The underlying model stack matters. Operator now runs on a fine-tuned variant of GPT-5 with a vision module dedicated to interpreting web UIs. It can read a page, click, type, scroll, solve simple captchas, and — crucially — pause to ask the user before any irreversible action like a payment or a "delete" button.
That last behavior is what finally made it tolerable for production use.
What it can reliably do today
After running Operator daily across three businesses for six months, a clear competence map has emerged. Some tasks are genuinely solved. Others remain demos.
| Task type | Reliability | Time saved per run |
|---|---|---|
| Form filling (job apps, lead forms, vendor onboarding) | High | 10-25 min |
| Price comparison across 5-10 sites | High | 20-40 min |
| Booking flights, hotels, restaurants | Medium-High | 15-30 min |
| Expense reconciliation (Stripe, Brex, Ramp) | Medium | 30-60 min |
| Outbound prospecting from LinkedIn | Medium | varies |
| Multi-step purchases with auth walls | Low | often fails |
| Anything inside Salesforce | Painful | — |
The pattern is consistent. Operator excels when the task is bounded, the UI is public-web, and failure is cheap. It struggles wherever modern enterprise SaaS has built deliberately hostile UX — long modal chains, hidden iframes, session timeouts.
The integrations that actually matter
OpenAI spent 2025 building a permissioned identity layer. You can now connect Operator to Gmail, Google Calendar, Notion, Linear, Stripe, Shopify, Slack, and HubSpot through official OAuth rather than asking the agent to log in like a human. This is a quiet but enormous shift — it removes the brittleness of cookie-based sessions and lets the agent call clean APIs when available, falling back to the browser only when it has to.
The result: a workflow that used to require Zapier, Make, and three custom scripts can sometimes collapse into one Operator instruction. Sometimes. The integrations still feel uneven, and the Notion connector in particular has been flaky since the April update.
A workflow that genuinely earns its keep
The single most valuable use case across my interviews with seventeen indie founders and freelancers this spring? Weekly client reporting. Here's the exact setup that several operators are running:
- Operator opens Google Analytics, Stripe, and the client's ad platform on a schedule.
- It pulls the last seven days of data and screenshots key dashboards.
- It drafts a summary in a Google Doc using a saved template.
- It pings the human operator in Slack for review.
- After approval, it emails the client and logs the send in HubSpot.
Total human time: about four minutes per client, down from forty. For an agency running fifteen retainer clients, that is most of a workday recovered each week.
Where it still breaks
Operator is not autonomous in any meaningful sense. It is a confident intern that needs supervision. Three failure modes show up repeatedly:
- Silent drift. The agent completes a task using slightly wrong inputs and reports success. Always require it to quote back the values it used.
- Auth loops. Two-factor prompts on unfamiliar devices still trip it up. Use a dedicated browser profile and dedicated phone number for agent traffic.
- Cost runaway. A poorly-bounded task can chew through 200+ steps. Set hard per-task budgets in the API dashboard.
FAQ
How does Operator compare to Anthropic's Claude computer use?
Claude is stronger on reasoning-heavy tasks and code-adjacent work; Operator is faster and cheaper on pure web navigation. Many operators run both, routing by task type.
Is it safe to give Operator access to my credit card?
Use a virtual card with a low limit (Privacy.com or