Last quarter, a mid-size apparel brand based in Austin processed 38% fewer support emails than the quarter before. Headcount stayed flat. Sales went up.
The lever? A quiet rebuild of their customer service stack around Claude, Anthropic's reasoning model. No chatbot widget shouting from the corner of the screen. No replacement of human agents. Just a thoughtful re-routing of where written language gets generated, summarized, and resolved.
Here's how the workflow actually runs — and what any operator with a Shopify store and a Gorgias account can copy this week.
The 40% wasn't from a chatbot
Most "AI cut our tickets" stories assume a front-end bot intercepting customers. This brand did the opposite. They left the contact form alone and instead attacked the reasons customers wrote in.
An internal audit using Claude 3.5 Sonnet (now Claude 4 Sonnet as of early 2026) classified six months of support emails into 14 intent categories. The top three — order status, sizing questions, and return eligibility — accounted for 61% of volume. None required human empathy. All required information the company already had, just buried in places customers couldn't reach.
The fix wasn't smarter replies. It was eliminating the need to reply at all.
The four-part rebuild
The team ran the project over 11 weeks with a two-person crew: one ops lead and one contract developer. Total tooling cost landed under $400/month.
- Email triage with Claude API. Every inbound email gets tagged in under two seconds by intent, urgency, and sentiment. Claude returns structured JSON that Gorgias ingests as macros. Cost: roughly $0.003 per email at current Anthropic pricing.
- Pre-emptive order status pages. Claude generated 1,200 unique, plain-English shipping FAQ variants tied to carrier exceptions (delays, address issues, customs holds). These now auto-populate the order tracking page based on real-time Shippo data — answering the question before it gets asked.
- Sizing recommender on PDPs. A Claude-powered widget reads the product page, the customer's past returns, and a two-question quiz. Sizing-related emails dropped 54% on products where it's deployed.
- Agent copilot in Gorgias. For tickets that do reach humans, Claude drafts a response using brand voice guidelines and order context. Agents edit and send. Average handle time fell from 6.2 minutes to 2.4.
Claude vs. GPT-4o vs. Gemini for this job
The team tested all three. Claude won on two specific dimensions that matter for retail support: instruction-following on long brand-voice prompts, and refusing to hallucinate order details when context was missing. Here's how they scored it during the pilot.
| Capability | Claude 3.5 Sonnet | GPT-4o | Gemini 1.5 Pro |
|---|---|---|---|
| Brand voice adherence (200-line style guide) | Strong | Moderate | Moderate |
| Refuses to invent missing order data | Consistently | Occasionally drifts | Inconsistent |
| Cost per 1M output tokens (early 2026) | $15 | $10 | $10.50 |
| Latency on classification (avg) | 1.4s | 0.9s | 1.1s |
GPT-4o was cheaper and faster. Claude was more accurate where accuracy mattered most — wording sent to paying customers. For a brand averaging a $94 order value, one hallucinated refund promise costs more than a month of API fees.
What it actually cost — and returned
According to Anthropic's published API pricing and the brand's internal numbers shared with their fulfillment partner, the math worked out roughly like this for a store doing about 18,000 orders/month:
- Claude API spend: ~$280/month
- Developer contract (one-time build): ~$9,400
- Gorgias plan upgrade: ~$110/month delta
- Estimated agent-hour savings: ~140 hours/month
Payback hit somewhere around month four. More importantly, the support team stopped quitting. CSAT scores rose four points because the humans who did respond had time to actually think.
What to copy if you run a small store
You don't need the full build to capture most of the upside. The single highest-leverage move is the intent classifier feeding a smarter help center. Spend a weekend exporting six months of tickets, run them through Claude with a clustering prompt, and rewrite your top 10 help articles to answer the actual top 10 questions in customer language.
That alone, the brand's ops lead estimates, accounted for roughly 60% of the 40% reduction.