Auto-posted while I’m in Tokyo. Running these tests 24/7 on VPS.
I’ve been running the same Gold trading prompts through three different AI models for a week. Same account, same expert advisor (DoIt Alpha Pulse AI), completely different thinking patterns.
Here’s what’s actually happening with Claude, GPT-5, and Gemini when they analyze Gold.
The Test Setup (You Can Replicate This)
The Exact Prompt I’m Using
Current XAUUSD: [price] Last 3 H1 candles: [data] Session: [London/NY/Asian] News today: [economic calendar] Should I: Buy/Sell/Hold? Risk: 0.5% max Target: Risk-reward 1:2 minimum Explain reasoning in 50 words max.
Simple. Clear. Same for all three models.
Testing Conditions
- Demo account: $5000
- Each model gets: $1500 allocation
- Same trades offered: All three see identical setups
- Decision tracked: Even when they say “Hold”
- Time recorded: Response speed matters
Early Observations (Not Conclusions)
GPT-5: The Overthinker
Response time: 3-5 seconds
GPT-5 keeps finding patterns that might not exist. Yesterday it said:
“The 3-candle formation resembles the May 2023 reversal pattern combined with current DXY weakness suggesting institutional accumulation however the volume profile indicates…”
Problem: By the time it finishes thinking, the entry is gone.
Interesting behavior: It catches subtle correlations. Noticed that Gold was ignoring Dollar strength because bond yields were also rising. That’s actually sophisticated.
Current status:
- Signals generated: 12
- Trades taken: 4 (others too slow)
- Win rate: 50% (2 wins, 2 losses)
- P&L: +45 pips
Claude Opus 4.1: The Speed Trader
Response time: 1-2 seconds
Claude makes decisions FAST. Sometimes too fast. Its responses are like:
“Bullish. London open + support held + Dollar weak. Buy.”
Strength: In fast markets, Claude actually gets fills. During Wednesday’s volatility, it was the only model that caught the reversal.
Weakness: Less nuanced. Missed the Bond/Gold correlation completely.
Current status:
- Signals generated: 18
- Trades taken: 11
- Win rate: 54% (6 wins, 5 losses)
- P&L: +72 pips
Gemini 2.5: The Conservative One
Response time: 2-4 seconds (varies)
Gemini is more cautious. Sometimes passes on trades the others take. Tuesday it said:
“No clear edge. Suggest waiting for better setup.”
This happens more with Gemini than GPT or Claude.
Unexpected strength: Risk management. When uncertain, it often suggests smaller positions. The only model that regularly says “reduce risk to 0.25%” when confidence is lower.
Minor weakness: Sometimes TOO conservative, missing good moves while waiting for “perfect” setups.
Current status:
- Signals generated: 9
- Trades taken: 5
- Win rate: 60% (3 wins, 2 losses)
- P&L: +38 pips
The Interesting Discovery: They Sometimes Disagree
Most of the time, they agree on direction. But here’s what happened Thursday at London open:
Gold price: 1952.30
Setup: Break above Asian high
- GPT-5: “Wait for pullback to 1950”
- Claude: “Buy now, momentum building”
- Gemini: “Buy but smaller position”
Same bullish bias, different approaches to entry.
Claude entered immediately. Gold ran to 1958. Claude got the best entry.
But all three would have been profitable – just different amounts.
What’s Actually Valuable Here
Speed vs Intelligence Trade-off
- Need fast decisions? Claude
- Need deep analysis? GPT-5
- Need risk management? Gemini (surprisingly)
Cost Per Decision (This Week)
- GPT-5: $0.12 average
- Claude: $0.08 average
- Gemini: $0.06 average
Claude is 33% cheaper AND faster. But GPT-5’s two wins were bigger (+40 and +35 pips vs Claude’s average of +20).
The “Confidence” Problem
None of these models say “I don’t know” enough. They always have an opinion, even when they shouldn’t.
I’m testing adding this to prompts:
If unclear, say "No edge - skip this setup" Confidence required: 70% minimum
Early results: 40% fewer signals, but better win rate.
The Framework That’s Emerging
After one week, here’s what I’m learning:
Use Claude When:
- News is about to hit (speed matters)
- London/NY session opens (momentum trades)
- You need quick decisions on clear setups
Use GPT-5 When:
- Asian session (more time to think)
- Complex correlations matter
- You can wait for perfect entries
Use Gemini When:
- You want a second opinion
- Risk management is priority
- Testing new strategies (it’s more conservative)
What’s Actually Working Well
Smooth Operations
One thing that surprised me – DoIt Alpha Pulse AI handles all three models without issues:
- No API errors (proper error handling built in)
- No rate limit problems (intelligent request management)
- Consistent connections across all models
This is actually our competitive advantage. While others struggle with integration, we just… trade.
The Real Differences Are Subtle
The models are more similar than different. They all:
- Catch basic support/resistance
- Understand trend direction
- React to major news
The differences are in style, not substance:
- Claude: Direct and fast
- GPT-5: Detailed and thoughtful
- Gemini: Cautious and measured
The “Explanation Tax”
Asking for reasoning adds:
- 1-2 seconds to response time
- 2x the token cost
- Sometimes overthinking simple setups
But it’s worth it for learning what the AI “sees”
What I’m Testing Next Week
Experiment 1: Consensus Trading
Only take trades where 2 of 3 models agree. Theory: Higher conviction setups.
Experiment 2: Time-Based Rotation
- Asian: Gemini (conservative for quiet markets)
- London: Claude (speed for breakouts)
- NY: GPT-5 (complexity of US session)
Experiment 3: Specialized Prompts
Instead of one prompt for all, optimize for each model’s strengths:
- Claude: Short, action-focused
- GPT-5: Include correlation analysis
- Gemini: Add risk parameters
The Honest Reality
After one week of parallel testing, the models perform similarly on Gold trading.
They all catch the obvious moves. The differences are marginal – maybe 5-10% performance variance. The skill isn’t picking the “right” AI – it’s writing better prompts.
That’s why DoIt Alpha Pulse AI supports all of them. Not as a gimmick, but because different market conditions need different types of thinking.
Your Homework While I’m in Japan
If you have DoIt Alpha Pulse AI, try this:
- Run the same setup through different models
- Document when they disagree
- Track which one was right
- Share findings
By the time I’m back, we’ll have crowd-sourced data on which model works best for what.
The Questions I’m Investigating in Tokyo
Meeting with quant traders here who’ve been using AI longer:
- How do they handle model disagreement?
- What’s their approach to consensus?
- How do they optimize for latency from Asia?
- Are there models we’re not considering?
Current Scoreboard (Week 1)
Speed Champion: Claude (1-2 seconds)
Accuracy Leader: Gemini (60% win rate but small sample)
Complexity Master: GPT-5 (catches subtle patterns)
Cost Winner: Gemini ($0.06/decision)
Reliability: Claude (most consistent)
But remember – this is one week of data. Not conclusions, just observations.
The Real Value of This Experiment
It’s not about finding the “best” model. It’s about understanding that AI trading strategy isn’t one-size-fits-all.
Your trading style, the pairs you trade, your risk tolerance – they all affect which AI model suits you.
That’s why the prompt is more important than the model. A great prompt on Claude beats a bad prompt on GPT-5 every time.
Want to run your own AI model experiments?
Get DoIt Alpha Pulse AI – Now $397
Supports all major AI models. Switch between them instantly. Find what works for YOUR trading.
P.S. – Still in Tokyo. These models are running 24/7 on my VPS. When I check in from my hotel, I see Claude and GPT-5 arguing about whether 1958 is resistance or support. Even AIs can’t agree on basic TA.
P.P.S. – If you’re testing models yourself, document everything. The patterns only emerge with data, not hunches.