Claude vs GPT-5 vs Gemini: Live Gold Trading Experiment Week 1 – My Trading – 7 October 2025

Auto-posted while I’m in Tokyo. Running these tests 24/7 on VPS.

I’ve been running the same Gold trading prompts through three different AI models for a week. Same account, same expert advisor (DoIt Alpha Pulse AI), completely different thinking patterns.

Here’s what’s actually happening with Claude, GPT-5, and Gemini when they analyze Gold.

The Test Setup (You Can Replicate This)

The Exact Prompt I’m Using

Current XAUUSD: [price] Last 3 H1 candles: [data] Session: [London/NY/Asian] News today: [economic calendar] Should I: Buy/Sell/Hold? Risk: 0.5% max Target: Risk-reward 1:2 minimum Explain reasoning in 50 words max.

Simple. Clear. Same for all three models.

Testing Conditions

Demo account: $5000
Each model gets: $1500 allocation
Same trades offered: All three see identical setups
Decision tracked: Even when they say “Hold”
Time recorded: Response speed matters

Early Observations (Not Conclusions)

GPT-5: The Overthinker

Response time: 3-5 seconds

GPT-5 keeps finding patterns that might not exist. Yesterday it said:

“The 3-candle formation resembles the May 2023 reversal pattern combined with current DXY weakness suggesting institutional accumulation however the volume profile indicates…”

Problem: By the time it finishes thinking, the entry is gone.

Interesting behavior: It catches subtle correlations. Noticed that Gold was ignoring Dollar strength because bond yields were also rising. That’s actually sophisticated.

Current status:

Signals generated: 12
Trades taken: 4 (others too slow)
Win rate: 50% (2 wins, 2 losses)
P&L: +45 pips

Claude Opus 4.1: The Speed Trader

Response time: 1-2 seconds

Claude makes decisions FAST. Sometimes too fast. Its responses are like:

“Bullish. London open + support held + Dollar weak. Buy.”

Strength: In fast markets, Claude actually gets fills. During Wednesday’s volatility, it was the only model that caught the reversal.

Weakness: Less nuanced. Missed the Bond/Gold correlation completely.

Current status:

Signals generated: 18
Trades taken: 11
Win rate: 54% (6 wins, 5 losses)
P&L: +72 pips

Gemini 2.5: The Conservative One

Response time: 2-4 seconds (varies)

Gemini is more cautious. Sometimes passes on trades the others take. Tuesday it said:

“No clear edge. Suggest waiting for better setup.”

This happens more with Gemini than GPT or Claude.

Unexpected strength: Risk management. When uncertain, it often suggests smaller positions. The only model that regularly says “reduce risk to 0.25%” when confidence is lower.

Minor weakness: Sometimes TOO conservative, missing good moves while waiting for “perfect” setups.

Current status:

Signals generated: 9
Trades taken: 5
Win rate: 60% (3 wins, 2 losses)
P&L: +38 pips

The Interesting Discovery: They Sometimes Disagree

Most of the time, they agree on direction. But here’s what happened Thursday at London open:

Gold price: 1952.30
Setup: Break above Asian high

GPT-5: “Wait for pullback to 1950”
Claude: “Buy now, momentum building”
Gemini: “Buy but smaller position”

Same bullish bias, different approaches to entry.

Claude entered immediately. Gold ran to 1958. Claude got the best entry.
But all three would have been profitable – just different amounts.

What’s Actually Valuable Here

Speed vs Intelligence Trade-off

Need fast decisions? Claude
Need deep analysis? GPT-5
Need risk management? Gemini (surprisingly)

Cost Per Decision (This Week)

GPT-5: $0.12 average
Claude: $0.08 average
Gemini: $0.06 average

Claude is 33% cheaper AND faster. But GPT-5’s two wins were bigger (+40 and +35 pips vs Claude’s average of +20).

The “Confidence” Problem

None of these models say “I don’t know” enough. They always have an opinion, even when they shouldn’t.

I’m testing adding this to prompts:

If unclear, say "No edge - skip this setup"
Confidence required: 70% minimum

Early results: 40% fewer signals, but better win rate.

The Framework That’s Emerging

After one week, here’s what I’m learning:

Use Claude When:

News is about to hit (speed matters)
London/NY session opens (momentum trades)
You need quick decisions on clear setups

Use GPT-5 When:

Asian session (more time to think)
Complex correlations matter
You can wait for perfect entries

Use Gemini When:

You want a second opinion
Risk management is priority
Testing new strategies (it’s more conservative)

What’s Actually Working Well

Smooth Operations

One thing that surprised me – DoIt Alpha Pulse AI handles all three models without issues:

No API errors (proper error handling built in)
No rate limit problems (intelligent request management)
Consistent connections across all models

This is actually our competitive advantage. While others struggle with integration, we just… trade.

The Real Differences Are Subtle

The models are more similar than different. They all:

Catch basic support/resistance
Understand trend direction
React to major news

The differences are in style, not substance:

Claude: Direct and fast
GPT-5: Detailed and thoughtful
Gemini: Cautious and measured

The “Explanation Tax”

Asking for reasoning adds:

1-2 seconds to response time
2x the token cost
Sometimes overthinking simple setups

But it’s worth it for learning what the AI “sees”

What I’m Testing Next Week

Experiment 1: Consensus Trading

Only take trades where 2 of 3 models agree. Theory: Higher conviction setups.

Experiment 2: Time-Based Rotation

Asian: Gemini (conservative for quiet markets)
London: Claude (speed for breakouts)
NY: GPT-5 (complexity of US session)

Experiment 3: Specialized Prompts

Instead of one prompt for all, optimize for each model’s strengths:

Claude: Short, action-focused
GPT-5: Include correlation analysis
Gemini: Add risk parameters

The Honest Reality

After one week of parallel testing, the models perform similarly on Gold trading.

They all catch the obvious moves. The differences are marginal – maybe 5-10% performance variance. The skill isn’t picking the “right” AI – it’s writing better prompts.

That’s why DoIt Alpha Pulse AI supports all of them. Not as a gimmick, but because different market conditions need different types of thinking.

Your Homework While I’m in Japan

If you have DoIt Alpha Pulse AI, try this:

Run the same setup through different models
Document when they disagree
Track which one was right
Share findings

By the time I’m back, we’ll have crowd-sourced data on which model works best for what.

The Questions I’m Investigating in Tokyo

Meeting with quant traders here who’ve been using AI longer:

How do they handle model disagreement?
What’s their approach to consensus?
How do they optimize for latency from Asia?
Are there models we’re not considering?

Current Scoreboard (Week 1)

Speed Champion: Claude (1-2 seconds)
Accuracy Leader: Gemini (60% win rate but small sample)
Complexity Master: GPT-5 (catches subtle patterns)
Cost Winner: Gemini ($0.06/decision)
Reliability: Claude (most consistent)

But remember – this is one week of data. Not conclusions, just observations.

The Real Value of This Experiment

It’s not about finding the “best” model. It’s about understanding that AI trading strategy isn’t one-size-fits-all.

Your trading style, the pairs you trade, your risk tolerance – they all affect which AI model suits you.

That’s why the prompt is more important than the model. A great prompt on Claude beats a bad prompt on GPT-5 every time.

Want to run your own AI model experiments?

Get DoIt Alpha Pulse AI – Now $397

Supports all major AI models. Switch between them instantly. Find what works for YOUR trading.

P.S. – Still in Tokyo. These models are running 24/7 on my VPS. When I check in from my hotel, I see Claude and GPT-5 arguing about whether 1958 is resistance or support. Even AIs can’t agree on basic TA.

P.P.S. – If you’re testing models yourself, document everything. The patterns only emerge with data, not hunches.

The Test Setup (You Can Replicate This)

The Exact Prompt I’m Using

Testing Conditions

Early Observations (Not Conclusions)

GPT-5: The Overthinker

Claude Opus 4.1: The Speed Trader

Gemini 2.5: The Conservative One

The Interesting Discovery: They Sometimes Disagree

What’s Actually Valuable Here

Speed vs Intelligence Trade-off

Cost Per Decision (This Week)

The “Confidence” Problem

The Framework That’s Emerging

Use Claude When:

Use GPT-5 When:

Use Gemini When:

What’s Actually Working Well

Smooth Operations

The Real Differences Are Subtle

The “Explanation Tax”

What I’m Testing Next Week

Experiment 1: Consensus Trading

Experiment 2: Time-Based Rotation

Experiment 3: Specialized Prompts

The Honest Reality

Your Homework While I’m in Japan

The Questions I’m Investigating in Tokyo

Current Scoreboard (Week 1)

The Real Value of This Experiment

Leave a Comment Cancel reply