OpenAI just dropped OpenAI Codex, and naturally I had to try it out. I compared it with Claude Code, Cursor, and of course a local LLM through Ollama(IBM Granite 3.3).
I crafted a challenge prompt that tested three things in one go:
Calls a public API
Loads an image
Builds a one-shot GUI app
The GitHub repo for all examples is here — but here's the breakdown!
Claude Code
The first big AI vendor CLI agent to go mainstream — it got traction for being fast, vibe-friendly, and having codebase-wide visibility.
Getting Started:
(screenshot/code snippet placeholder)
Result:
✔️ Worked smoothly
✅ Clean interface
🟰 Moderate code complexity
Result:
OpenAI Codex
🖼️ It generated a cat image!
❌ But the nose-click didn’t work on first try.
↩️ It did fix itself after a prompt nudge — nice!
⚙️ Most concise code of the bunch — promising for long-term efficiency.
✍️ Note: The repo had updates 9 minutes ago — this team is working fast!
Result:
Cursor
Still my go-to tool for daily coding — Cursor just feels natural. I let it auto-choose the model.
✔️ Functional app
👻 Cat nose button was hidden — oops on UI matching
📏 Longest code — over 70 lines!
Getting started:
Result:
Local AI Model (Granite 3)
I couldn’t resist grabbing the latest IBM Granite 3.3 (9B) model and try to have it create something too! Of course I had to copy and paste the code to get it to run, but…
Getting Started
Results
Sadly, probably because it’s not an Agentic coding system, it wasn’t able to get the cat image in there, nor was it able to retrieve a joke. I have had better luck
Conclusion
I love the idea of CLI coding systems, but for most things still prefer seeing the changes so I can better learn how it’s all working in Cursor. But, as agents get more complex, CLI is probably going to be a very efficient way to go as well. Excited to see what’s up ahead.