MO§ES™ × Artificial Analysis Coding Agent Benchmarks

Operator-augmented Claude Code + Opus 4.7 vs. 13 published combinations · Field: artificialanalysis.ai/agents/coding-agents · 2026-05-14
MO§ES™ leads all 5 measured economic categories — 7-day window (2026-05-08 → 2026-05-14)
Cache hit rate
96.88%
#1 — > field's 96.2%
SRC: Token Dashboard 7d
Output : Input
31.7×
30d 42.5× · 90d 22.1×
#1 — 83× field leader (0.38)
SRC: 3.90M out ÷ 123K in
Tokens / task
767K
#1 — 3.6× more efficient
SRC: 1.12B ÷ 1,465 tasks
Time / task
1.84 min
#1 — 3.2× faster
SRC: ~45 hr ÷ 1,465 tasks
Cost / LOC
$0.0007
plan · API: $0.044 · ccusage: $0.018
#1 — < 1¢ per line
SRC: $23.33 ÷ 35,242 LOC
Testing methodology — not the same thing being measured: AA field: per-task isolated runs on SWE-Bench-Pro-Hard-AA, Terminal-Bench v2, SWE-Atlas-QnA. Each task = one bug/issue. MO§ES™: sustained operator work on App Hub (real product) over 7-day measurement window (2026-05-08 → 2026-05-14), 21 sessions, 7,327 turns. App Hub = 5 build days within the window (5/10–5/14), 35,242 LOC shipped. Cost/task ($0.017 sub · $1.05 API) computed similarly but not on cards. $/LOC: AA = cost_per_task ÷ 20 LOC industry convention. MO§ES = $23.33/wk sub ÷ 35,242 actual LOC.

1. Cache Hit Rate

Cache reuse % · higher better · MO§ES sustained multi-project
96.9%
96.2%
96.2%
96.1%
94.9%
94.5%
93.7%
92.8%
91.7%
87.8%
86.1%
85.3%
83.7%
79.8%
MO§ES™CC+Opus 4.7+op
Cursor CLIOpus 4.7 (Medium)
Claude CodeOpus 4.7 (Medium)
Claude CodeKimi K2.6
CodexGPT-5.5 (Medium)
Claude CodeSonnet 4.6 (Medium)
Claude CodeOpus 4.6 (Medium)
CodexGPT-5.4 (Medium)
Cursor CLIComposer 2
Cursor CLIGPT-5.5 (Medium)
Gemini CLIGemini 3.1 Pro (High)
Cursor CLIGPT-5.4 (Medium)
Claude CodeGLM-5.1
Claude CodeDeepSeek V4 Pro (High)
SRC: cache_read 1.084B ÷ (1.084B + 34.83M cache_create + 123K input) = 96.88%

2. Output : Fresh Input (log scale)

Output tokens per fresh-input token · higher = denser signal
31.7×
0.38
0.25
0.24
0.24
0.17
0.16
0.15
0.15
0.14
0.07
0.06
0.05
0.04
MO§ES™CC+Opus 4.7+op
Cursor CLIOpus 4.7 (Medium)
Claude CodeKimi K2.6
Claude CodeSonnet 4.6 (Medium)
Claude CodeOpus 4.7 (Medium)
CodexGPT-5.5 (Medium)
Cursor CLIComposer 2
Claude CodeOpus 4.6 (Medium)
CodexGPT-5.4 (Medium)
Gemini CLIGemini 3.1 Pro (High)
Cursor CLIGPT-5.5 (Medium)
Cursor CLIGPT-5.4 (Medium)
Claude CodeGLM-5.1
Claude CodeDeepSeek V4 Pro (High)
SRC: 3,902,803 output ÷ 123,246 fresh input = 31.7× · 30d 42.5× · 90d/all-time 22.1×

3. Tokens per Task

Total tokens · lower better
767K
2.74M
2.93M
3.24M
3.33M
3.33M
3.75M
4.27M
4.41M
4.92M
5.42M
6.20M
7.28M
8.88M
MO§ES™CC+Opus 4.7+op
Cursor CLIGPT-5.5 (Medium)
Cursor CLIOpus 4.7 (Medium)
Gemini CLIGemini 3.1 Pro (High)
Cursor CLIComposer 2
Claude CodeOpus 4.7 (Medium)
Cursor CLIGPT-5.4 (Medium)
Claude CodeOpus 4.6 (Medium)
Claude CodeSonnet 4.6 (Medium)
CodexGPT-5.4 (Medium)
CodexGPT-5.5 (Medium)
Claude CodeDeepSeek V4 Pro (High)
Claude CodeKimi K2.6
Claude CodeGLM-5.1
SRC: 1.123B 7d total ÷ 1,465 task-equivs (7,327 turns ÷ 5) = 767K

4. Time per Task

Wall time · lower better
1.8m
5.8m
6.2m
6.9m
7.0m
7.1m
7.6m
7.6m
7.8m
8.7m
9.2m
18.0m
21.6m
41.5m
MO§ES™CC+Opus 4.7+op
Claude CodeOpus 4.7 (Medium)
Cursor CLIGPT-5.5 (Medium)
CodexGPT-5.4 (Medium)
Claude CodeOpus 4.6 (Medium)
CodexGPT-5.5 (Medium)
Cursor CLIGPT-5.4 (Medium)
Gemini CLIGemini 3.1 Pro (High)
Cursor CLIOpus 4.7 (Medium)
Cursor CLIComposer 2
Claude CodeSonnet 4.6 (Medium)
Claude CodeDeepSeek V4 Pro (High)
Claude CodeGLM-5.1
Claude CodeKimi K2.6
SRC: ~45 hr active ÷ 1,465 = 1.84 min

5. Cost per LOC — all field models + MO§ES™ (log scale)

USD per line shipped · lower better · MO§ES plan basis $0.0007 leads field
$0.0007
$0.0035
$0.018
$0.038
$0.044
$0.050
$0.051
$0.062
$0.063
$0.073
$0.076
$0.080
$0.080
$0.10
$0.11
$0.11
$0.20
$0.26
$3.30
MO§ES™Plan basis · subscription
Cursor CLIComposer 2
Claude CodeDeepSeek V4 Pro (High)
Claude CodeKimi K2.6
MO§ES™API equiv basis
CursorIndustry low est.
Claude CodeSonnet 4.6 (Medium)
Claude CodeOpus 4.7 (Medium)
Claude CodeOpus 4.6 (Medium)
Cursor CLIOpus 4.7 (Medium)
Cursor CLIGPT-5.4 (Medium)
Gemini CLIGemini 3.1 Pro (High)
Cursor CLIGPT-5.5 (Medium)
CodexGPT-5.4 (Medium)
CodexGPT-5.5 (Medium)
Claude CodeGLM-5.1
CursorIndustry high est.
DevinIndustry low est.
DevinIndustry high est.
AA models: cost_per_task ÷ 20 LOC · Cursor/Devin: industry estimates · MO§ES plan: $23.33/wk ÷ 35,242 = $0.000662 · MO§ES API: $1,564.47 ÷ 35,242 = $0.0444
RAW DATA — 7-DAY MEASUREMENT WINDOW (2026-05-08 → 2026-05-14)
Input
123,246
fresh tokens in
drives cache hit · out:in
Output
3.90 M
3,902,803
drives out:in
Cache create
34.83 M
34,826,779
drives cache hit
Cache read
1.084 B
1,084,399,183
drives cache hit
Total tokens
1.123 B
1,123,252,011
drives tokens/task · time/task
ancillary inputs › 21 sessions · 7,327 turns · 1,465 tasks · ~45 hr active · 35,242 LOC · $23.33/wk plan
$1,564.47 API equiv