The largest cryptocurrency exchange in the United States has quietly switched to a Chinese AI model, saving half the cost
Author: AI Hands-on Notes
A Data Point That Shook Silicon Valley
Recently, Brian Armstrong, the CEO of the largest cryptocurrency exchange in the U.S., Coinbase, made a statement that caused quite a stir in the tech community:
"We switched our AI models to China's GLM 5.2 and Kimi 2.7, and our AI spending was cut in half."
Cut in half? Does that mean usage has decreased as well?
On the contrary. The token usage at Coinbase has been increasing.
Using more while saving money is what truly unsettles OpenAI and Anthropic.
How Did They Do It? Three Money-Saving Strategies
Coinbase didn't just switch to a cheaper model and call it a day. They built a complete "cost-saving system":
First Strategy: Don't Bind to One Model, Let the System Choose
Coinbase set up an automatic routing system. Each time a request comes in, the system automatically selects the most suitable model based on task type, price, and cache status.
Not all tasks require the most expensive model. Use the cheaper one for simple translations and the better one for complex reasoning—just like you wouldn't drive a sports car to the grocery store.
Second Strategy: Increase Cache Hit Rate from 5% to 60%
This is the most aggressive strategy. By optimizing their caching strategy, Coinbase increased the cache hit rate from 5% to 60%.
In simple terms, 60% of requests can reuse previous computation results, significantly reducing the actual cost of each call. This optimization alone saved a substantial amount of money.
Third Strategy: Context Engineering
Coinbase requires developers to streamline context, starting new sessions for new tasks and not cramming too much into one conversation.
This isn't laziness; it's a new discipline—known in the industry as Context Engineering. Anthropic explicitly pointed out in a technical blog that when managing AI agents, context engineering is more effective than prompt engineering.
In simple terms: it's not about making AI smarter, but providing AI with more precise information.

▲ More and more companies are starting to be frugal with AI models
Not Just Coinbase, This is a Trend
Coinbase is not the first to take this leap.
Lindy, a 25-person AI startup, has completely switched from Claude to Deepseek. CEO Flo Crivello told CNBC: "AI costs have surpassed labor costs, which is unsustainable." After changing models, costs "plummeted," saving millions of dollars.
Snowflake's CEO Sridhar Ramaswamy conducted a real-world comparison: on 103 coding tasks, GLM-5.2 solved 66%, while Claude Opus 4.7 solved 67%. The difference? Almost negligible.
But the price difference is substantial:
Price Comparison (per million tokens)
- GLM-5.2: Input $1.40 / Output $4.40
- Claude Opus 4.7: Input $5 / Output $25
- GPT-5.5: Input $5 / Output $30
The output price varies by 5-7 times.
Cheap Doesn't Mean Bad? Don't Jump to Conclusions
At this point, you might ask: with such a low price, can the quality be the same?
To be honest, it's not completely the same, but the gap is smaller than you might think.
Snowflake's tests show that GLM-5.2 is indeed less stable on certain tasks—success rate on the first attempt is 47.6%, lower than Opus's 53.7%. Moreover, GLM sometimes "stubbornly" pursues the wrong direction: on one task, it took 24 minutes and made 411 calls, yet still failed. Opus completed it in 9 minutes with 49 calls.
However, on most tasks, the final success rates of both are nearly equal. The key is: are you willing to pay 5 times the price for a few percentage points of stability?
For many companies, the answer is becoming increasingly clear: no.

▲ The price gap between Eastern and Western AI models is reshaping the industry landscape
What Does This Mean for Ordinary People?
You might say: I'm not Coinbase, what does this have to do with me?
In fact, this trend has three direct implications for how you use AI:
1. Don't Just Stick to One Model
Many people only rely on one AI—either ChatGPT or Claude. But professional users don't do that. Using different models for different tasks is the most cost-effective approach.
Use the cheaper one for daily Q&A and the better one for coding and analysis. Just like you wouldn't go to a Michelin restaurant for every meal.
2. Caching and Reuse are Key to Saving Money
If you frequently use AI for similar tasks (like writing weekly reports or organizing notes daily), learning to utilize caching and templates can significantly reduce consumption.
3. Streamlining Context = Better Results
Many people try to cram all background information into their conversations with AI. But it turns out that providing AI with less but more precise information yields better results. For new tasks, start new conversations. Don't make AI sift through a pile of historical records for answers.
Deeper Changes: AI Pricing Models are Being Reshaped
The wave of "model migration" reflects a shift in the pricing logic of the entire AI industry.
The high valuations of OpenAI and Anthropic are based on the assumption of "sustained high revenue growth." However, if more and more companies like Coinbase and Lindy turn to cheaper alternatives, this assumption becomes untenable.
Reports indicate that a price war has already begun between OpenAI and Anthropic. In the newly released GPT-5.6 series, the Terra model is half the price of GPT-5.5, while Luna emphasizes the lowest price.
For users, this is a good thing. The more intense the competition, the lower the prices and the more choices available.
When American giants start saving money with Chinese models, it indicates that the competition in AI is no longer just a benchmarking contest in laboratories, but a real cost battle. Being able to do the same thing for less money is the real skill.













