case studyarchitectureClaude CodeOpenClaw

How I Rebuilt My AI Dev Stack After Anthropic Killed OAuth

By Sviatoslav2026-04-0912 min read

I run 9 AI agents, a CRM, a voice agent, a website, and auto-posting workflows. So when Anthropic disabled OAuth for third-party harnesses in January 2026, this wasn't a minor inconvenience. It broke a real production workflow.

My Claude Code setup through OpenClaw stopped working the way I had been using it. If I wanted to keep using Claude Code in that setup, I now needed an API key. The subscription alone was no longer enough.

At the same time, Codex CLI with GPT-5.4 kept working through ChatGPT OAuth. So I didn't try to force one tool to do everything. I rebuilt the stack around a different principle:

Don't depend on one model or one vendor — depend on infrastructure that survives model changes.

What Broke

Before this change, Claude Code inside my broader OpenClaw workflow was part of a larger multi-agent system. Then Anthropic disabled OAuth for third-party tools.

That meant:

Claude Code through OpenClaw no longer worked off subscription auth alone
Using it in practice now required API-key-based access
The old “one subscription covers the workflow” assumption was gone

This matters a lot more if you are not doing isolated coding sessions, but running agents daily across products, memory, automation, and coordination. A broken auth layer is not just a tooling problem. It becomes an operations problem.

What I Did Instead

I didn't “switch from Claude to Codex.” I built a stack where each tool covers the weaknesses of the others.

1. Claude Code (Native CLI, Opus 4.6)

This is what I use for the heavy work — migrations, architecture, deploy, code review, harder reasoning-heavy implementation. I run it natively now, with an API key.

But I also structured it properly instead of keeping everything in one giant instruction file. I had a 300-line CLAUDE.md. Claude follows it worse the longer it gets. So I split it into 5 path-scoped rule files: agent-crm, agentforgeai, luna-voice, security, infrastructure. CRM rules load when I touch CRM code. Voice agent rules load for voice agent code. The rest stays out of context.

On top of that:

Deny rules at the harness level — rm -rf, .env reads, secrets/access blocked in settings.json. Not relying on the model to “be careful.” Taking the option away
Custom slash commands — /review (diff vs main), /deploy-check (pre-deploy checklist), /crm-status (health check CRM instances), /start (load session context from memory)
Ruff + Biome as mandatory code gates — every Python file gets ruff check --fix + ruff format before I see it. Every JS/TS/CSS gets biome check --write. Baked into global instructions, not optional
Session-end hooks — a Python script auto-captures every session (requests, tools used, files touched) and indexes it into ChromaDB. Next session starts with context, not a blank slate

Claude Code Pain Points

Claude Code is excellent, but not frictionless:

Limits burn fast. One command can trigger 10-20 internal model calls. Run multiple agents on Opus simultaneously and your $100/mo quota disappears in 30 minutes. This happened to me twice in one week
No OAuth for third-party harnesses.That door is shut. The subscription doesn't cover API-key usage in external tools
Mobile is a workaround. Phone to Claude Code app to GitHub repo to MacBook pulls. The Telegram bridge I wrote helps (it wraps claude --resume SESSION_IDin a bot), but it's still running through a server
MCP ecosystem is immature. Tried connecting EdgeLab MCP — 73 tools, looked promising. OAuth broken on their side. Wrote a health check script, moved on

So Claude Code became my premium tool, not my only tool.

2. OpenClaw + Codex CLI (GPT-5.4)

This is my daily driver. Codex kept working through ChatGPT OAuth, which immediately made it more practical for routine work inside my broader system. In real workflows, reliability often beats elegance.

7 agents on MagicBox (a $24/mo Digital Ocean server):

Caramel — coordinator, daily briefs, nightly audits
Sixteen — dev agent, architecture, shipping
Vibe — content, trends, social
Rex — business analysis
Mira — marketing
Luna — voice agent (ElevenLabs + Twilio)

Each has a profile, workspace, and memory files. They push data to my own CRM — FastAPI + PostgreSQL + Telegram Mini App, open-sourced at github.com/kossvat/agent-crm.

Codex is fine for 80% of routine tasks. It falls apart on complex multi-file refactors or subtle production bugs where Opus shines. That's fine — that's what the dual stack is for.

3. Model Strategy Instead of Model Loyalty

Codex / GPT-5.4 = daily work
Opus = premium reasoning
Qwen = budget tasks

I stopped thinking in terms of “best model.” I started thinking in terms of role-based model allocation. Each model has a job.

The Shared Layer That Matters Most

The most important part of the stack is not Claude Code or Codex. It's the shared infrastructure underneath.

Obsidian vault = shared memory. My OpenClaw agents write to it. Claude Code reads from it. Protocol enforced — when to write, what format, mandatory git push after every change. Mac syncs every 5 minutes via Obsidian Git. Context survives tool switches
Memory MCP server = semantic search. Custom-built, runs on ChromaDB. Indexes the Obsidian vault, Claude memory files, and rules — 83+ documents searchable by meaning, not keywords. One agent writes a decision at 2pm, another finds it semantically at 6pm
Git = bridge. Transport layer between environments, including mobile workflows and coding environments
OpenClaw = orchestration layer. Multi-agent runtime, server execution, persistent coordination. This is the part I do not want tied to one model vendor

What This Shipped in 48 Hours

Full CRM migration: SQLite to PostgreSQL, domain switch via Cloudflare Tunnel, bot security hardening
Open-sourced the CRM: 77 files, 14K+ lines, MIT license, clean git history with zero leaked secrets
Semantic memory system: ChromaDB + Obsidian + MCP server + CLI
Telegram bridge for Claude Code via chat
Complete Claude Code config: modular rules, deny list, session hooks, Ruff + Biome gates, 4 custom commands

One person. Two AI stacks. Real output.

The Lesson

Once your stack touches real operations, every weak assumption gets exposed — auth, model lock-in, memory fragmentation, tool incompatibilities, mobile vs server gaps.

Most people are still thinking about AI tooling too narrowly. They ask: which model is best?

The better question is: what infrastructure keeps working when one model breaks?

The real win is not choosing Claude Code or Codex. The real win is building a system where changing one model does not destroy your workflow. That is the difference between an AI demo and an AI operating system.

One repo. Multiple models. Shared context.

And that is a much more durable way to work.

I build these systems for clients using the same stack and protocols that run my own business. Discovery starts at $500, and most single-agent setups are deployed within two weeks.

Want a system like this for your business?

I build custom AI agent systems, deployed in about two weeks. Every project is scoped after a short discovery call.

Book a Discovery Call