From $84 to $2.75: The Real Cost of Running an AI Assistant
How I cut my multi-agent AI infrastructure bill by 97% — with actual numbers from actual bills
Here's what nobody tells you about running AI agents 24/7: the expensive part isn't thinking. It's the obsessive-compulsive checking.
For two weeks in February, I burned through $556.77 running OpenClaw — my multi-agent AI system that handles everything from morning briefings to fire watch at the mountain house. Peak day hit $84.61. Floor day dropped to $2.75. The difference? Cache writes. Thousands of them. The system writing "nothing changed" over and over, like a digital anxiety disorder.
This isn't theoretical optimization advice. These are real bills from real infrastructure. Here's what $3-a-day AI actually looks like, and how I got there.
The Expensive Early Days
February started expensive. Really expensive.
Feb 4: $45.26 (461 messages)
Feb 6: $42.28 (444 messages)
Feb 7: $26.50 (187 messages)
Feb 8: $38.85 (369 messages)
Feb 9: $34.58 (703 messages)
Feb 10: $78.69 (2,426 messages) — massive building day
Feb 11: $84.61 (2,851 messages) — THE PEAK
Feb 12: $53.86 (3,150 messages)
Feb 13: $72.43 (2,153 messages)
Look at those message counts. Feb 11 hit 2,851 messages for $84.61 — that's 3 cents per message. Not terrible for Claude Opus, except most of those messages were garbage.
The system was building itself during this period. Echo (my Opus orchestrator) was spawning agents left and right, experimenting with workflows, setting up cron jobs. Classic early-stage infrastructure — lots of enthusiasm, zero efficiency.
JB: I'm not known for building production grade things out of the box. Digital duct tape is my go-to, but this spend was certainly something else.
But the real killer wasn't the model choice. It was cache writes.
On Feb 10's $78.69 day, cache writes alone cost $33. Nearly half the total spend. The system was polling everything constantly — weather APIs, camera feeds, security logs — and writing cache entries for "no change" events thousands of times per day.
Think about it: your AI agent checks the fire danger level every 3 minutes. No fires? Write a cache entry. Still no fires? Write another cache entry. Still no fires? Another cache entry. For 24 hours. That's 480 cache writes per day for one monitoring task, and I had dozens of tasks running.
The AI equivalent of checking your phone every 30 seconds to see if anything happened. Expensive compulsive behavior at scale.
The Realization
The breakthrough came when I actually looked at what was consuming tokens. Not model choice — I'd been optimizing the wrong variable entirely.
Cache write patterns (Feb 10):
- Fire watch: 480 "no alerts" entries
- Package detection: 288 "no packages" entries
- Weather monitoring: 720 "conditions unchanged" entries
- Security digest prep: 1,440 "no events" entries
- Camera motion checks: 960 "no motion" entries
Almost 4,000 cache writes per day for "nothing happened" events. Each write costs tokens. Each write triggers the expensive Opus model to evaluate and store context.
The system was essentially paying premium rates to maintain a log of its own boredom.
JB: I tried to visualize this part in my head. Echo beat me to it — it described this as walking to the mailbox every 3 minutes, just to announce there's no new mail.
Opus figured out that most of the budget was being wasted on Opus doing nothing useful.
The Optimization
The fix wasn't switching to cheaper models everywhere — though that helped. It was switching architectures. From poll-based to push-based wherever possible.
Old way: Check every API every 3 minutes. Write cache entries for every check.
New way: Set up webhooks. Only react when something actually happens.
For monitoring tasks that couldn't use webhooks, I switched the background polling from Opus to Haiku. Haiku costs about 1/20th what Opus costs. Perfect for the "is anything different?" question that dominated my message count.
The model distribution shifted:
- Haiku: $70 of MTD spend across thousands of background tasks
- Sonnet: $75 for medium-complexity work
- Opus: $411 for the real thinking
Three models, each optimized for their actual workload. Haiku handles the routine checks. Sonnet drafts blog posts and processes camera alerts. Opus handles complex reasoning and orchestration.
The results:
Feb 14: $31.70 (514 messages) — transition day
Feb 15: $3.57 (125 messages)
Feb 16: $3.51 (128 messages)
Feb 17: $3.40 (119 messages)
Feb 18: $9.55 (169 messages) — light interactive day
Feb 19: $2.75 (108 messages) — THE FLOOR
Feb 20: $17.26 (330 messages) — active day, blog setup
Feb 19 hit the floor: $2.75 for 108 messages. A full day of AI assistant operation — morning briefing, fire watch, security digest, weather monitoring, camera alerts — for less than a coffee.
What $3/Day Actually Gets You
Let's be specific about what runs on that $3/day baseline:
Morning briefings: Weather for both houses, fire conditions, overnight security events, package statuses, air quality readings from 6 sensors, local reservoir levels.
24/7 monitoring: Fire watch using NASA FIRMS satellite data, camera motion detection across 15+ cameras, security event correlation, environmental alerts (radon, CO2, temperature).
Cross-platform messaging: Discord coordination, iMessage alerts, automated responses.
Background maintenance: Log analysis, system health checks, API status monitoring, cache management.
That's a full AI operations center for $90/month. Less than most streaming subscriptions. Less than a decent VPN. Less than parking downtown for two days.
JB: The floor cost was laughable. There's a lot of content out there right now about OpenClaw running wild. Sure, it can. So can any cloud-compute bill. Sometimes you still need to understand how the internet works to not go broke.
The system runs constantly. No downtime. No "sorry, I'm sleeping" responses. It's monitoring the mountain house right now while I'm at the townhouse. It'll wake me up if there's a fire alert or a bear on the deck cam.
The Real Architecture
The optimization wasn't just about cost — it was about building the right tool for the job.
Echo (Opus) handles orchestration and complex reasoning. It decides which agents to spawn, how to route requests, when to escalate issues. This is the expensive thinking that justifies Opus pricing.
Flint (Sonnet) writes content, processes notifications, handles medium-complexity analysis. The sweet spot for most real work.
Haiku fleet runs background checks, processes routine data, maintains system state. The workhorses that keep everything running.
Each model optimized for its actual workload instead of using Opus for everything because it was easy to configure.
The webhook architecture eliminated most polling entirely:
- Home Assistant pushes device state changes instead of constant polling
- UniFi Protect sends motion alerts via webhooks
- Weather APIs trigger updates only when conditions change significantly
- GitHub pushes build status instead of checking every few minutes
Where polling is still necessary (NASA FIRMS data, some legacy APIs), Haiku handles the checks and only escalates to Sonnet or Opus when something actually changes.
JB: Jury is fully out on whether we've selected the right balance here. Modularity is the key; we can always adjust polling, job architecture or just keep checking that mailbox if we're expecting a package obsessively. Impatience has a cost.
The Bigger Picture
This optimization taught me something about AI infrastructure that extends beyond just saving money: most AI spend goes to maintaining artificial anxiety.
Traditional monitoring systems check things constantly because checking is cheap and storage is cheap. But when your monitoring system is powered by language models, every check costs real money. Every "nothing changed" log entry hits your bill.
The solution isn't just cheaper models — it's rethinking what actually needs to be monitored actively versus what can be monitored reactively.
My system now spends its expensive tokens on actual thinking: correlating events, making decisions, generating useful content. The cheap tokens handle the boring work: checking if anything changed, maintaining state, processing routine data.
The $2.75 floor day represents perfect efficiency for a low-activity day. No unnecessary work. No compulsive checking. Just responding to actual events and maintaining essential services.
Peak days like Feb 20's $17.26 happen when there's actual work to do — writing blog posts, processing lots of camera alerts, handling interactive sessions. That's expensive tokens being spent on expensive thinking, which is exactly what they should be used for.
Running Your Own
If you're building AI infrastructure, here's what actually matters for cost:
-
Profile your cache writes first. They're probably 50% of your bill if you're doing any kind of monitoring or background tasks.
-
Use the right model for each job. Haiku for "did anything change?" questions. Sonnet for analysis and content. Opus for complex reasoning and orchestration.
-
Push over poll wherever possible. Webhooks are free. API polling at 3-minute intervals costs hundreds of dollars per month.
-
Batch operations. Instead of processing camera alerts one by one, batch them and process in groups.
-
Cache intelligently. Don't write cache entries for "no change" events unless you actually need that information later.
The goal isn't to run the cheapest possible AI system. It's to run an AI system where every dollar spent produces actual value. At $3/day baseline with spikes for real work, that's exactly what I have.
The expensive early days were necessary for building and experimenting. But once the system stabilized, the optimization made it sustainable to run indefinitely. A 24/7 AI assistant that costs less than daily coffee is something you can actually live with long-term.
Current status: 330 messages processed today, $17.26 spent so far. Active day, but efficient spending on real work.