Every time a new Chinese open-weight model lands, the same argument starts over. One side says western AI labs are bleeding money on every API call - they cite the burn rates, point to Anthropic and OpenAI raising prices, tightening Claude Code limits, switching enterprise plans to usage-based billing. Token prices, in this telling, are deeply subsidized - propped up by VC money that's just buying market share until the bubble pops. The other side says this is mostly fiction. Inference is profitable, training is the cost center, and frontier labs charge what they charge because they can. The "we sell at a loss" story does PR work - it justifies price hikes and frames the labs as gracious benefactors instead of companies running normal high-margin businesses.
Then DeepSeek drops a 671-billion-parameter frontier-tier model and prices it at $3.48 per million output tokens, and the argument starts over. I've watched this fight play out a dozen times by now, and the answer matters more than settling a comment-thread debate - it shapes how you should think about pricing, model lock-in, and what tomorrow's API costs will look like.
What "subsidized" actually means
Half the disagreement comes from people meaning different things by the word. If "subsidized" means the price you pay per million tokens is below the marginal cost of running a GPU for that inference call - energy plus hardware depreciation - then no, that's almost certainly not happening at the major frontier labs. Independent providers serve open-weight models at lower prices and stay in business. Microsoft sells Claude on Azure at the same price as Anthropic and isn't an Anthropic investor. If raw inference were below cost, none of this would work.
If "subsidized" means the all-in cost - training, salaries, R&D, the next model being built right now - exceeds what tokens bring in, then sure, obviously. Anthropic and OpenAI are cashflow negative because they're training while they sell, and the next run is bigger and more expensive than the last. Investors are funding that growth on the promise of future returns. These are not the same claim, but people keep collapsing them into one sentence and ending up talking past each other.
There's a useful framing I keep coming back to. Treat each model as its own company - "Claude 3 Inc" gets charged the pro-rata training cost, the inference costs, the salaries that went into building it, and it collects the revenue from every API call ever made against it. Add it all up and ask whether that company is profitable. The answer, by what Dario and Sam have both said in slightly different words, is yes. Each fully-loaded model has paid for itself. The parent companies are cashflow negative because the next model is being trained while the current one is still selling, and the next run is some multiple bigger than the last. That's a different story than "we lose money on every prompt," and it's a story that gets harder to tell when an open-weight competitor shows you what the floor actually looks like.
A serious lab just dropped a frontier-tier model and charged $3.48 per million output tokens. Opus 4.7 sits closer to $75 per million output - roughly 21x more expensive for output that's somewhere between modestly better and noticeably worse depending on the task. DeepSeek V4-Pro outpaces Opus on competitive coding benchmarks and trails it on parts of software engineering and pure reasoning. You can't explain a 21x gap with "DeepSeek is light years more efficient." There are real efficiency wins there - mixture-of-experts architecture, their own kernel work, an inference team that's been cooking on this for a while - but 21x isn't architecture, it's pricing power.
The cleanest way to see it is to look at independent providers serving open-weight models. They host Kimi K2.6, GLM 5.1, DeepSeek V3 variants at prices that let them stay in business. Margins aren't huge but they exist, and once you have a price floor coming from companies whose entire business is just serving someone else's weights, you can reason about what the labs above them are actually doing. The frontier labs charge a lot more than that floor, and the gap is the markup.
The strongest pushback against this take is fair, and worth quoting in full:
Yes, tokens for random no-name firms serving Kimi K2 probably do make money, although even there it's unclear because so many datacenters and GPU purchases have been made on credit etc. And if we assume that's sustainable forever then you can assume training/staffing costs should be subsidized to zero and say sure, token serving is profitable in that situation. But we were discussing the top labs.
This is right as far as it goes. The top labs are training while they sell, and the cost of inference cards, the loans against future revenue, the bet that something resembling AGI sits at the end of all this - those are real things that aren't on the per-token line item. But the original claim wasn't about company-wide profitability. It was that the prices we pay are below cost. And every time a Chinese lab ships a near-frontier model at a tenth of the western price, that specific claim gets harder to defend.
What this means for developers
If you're shipping products on top of these APIs, the answer to this question changes how you architect. Assume the subsidy story is true and you'll over-provision - lock in long-term contracts, hedge against a future where everything costs 5x more, design around expected scarcity, treat today's prices as a temporary gift. Assume the subsidy story is mostly false and you'll architect around competition instead: build for portability, expect the gap between the top tier and the open-weight tier to keep closing, treat lock-in as the real risk rather than future price hikes. The second view ages better, and the DeepSeek release is another reason why.
The release isn't important because it's the smartest model available. The benchmarks are mixed, Opus 4.7 still beats it on most software engineering tasks, and Kimi K2.6 might be a more interesting open-weight pick depending on the workload. The release is important because of what its price tag implies - another data point in a steady, accumulating case that the floor for serving a frontier-class model is much lower than the prices the top labs are charging.
This doesn't mean the top labs are scamming you. They're charging what the market will bear, the same way every premium product does. Opus is genuinely better at certain tasks and some workloads are worth the markup; plenty aren't, and the test for which is which gets easier as the open-weight tier keeps narrowing the quality gap from below. What it does mean is that the "we'd love to charge less but we'd lose money" framing should stop being treated as gospel. It's marketing - partially true at the company level since training the next model isn't free, almost certainly not true at the per-token level. The argument will start again with the next release, but the trajectory has been pointing one direction for a while, and pretending otherwise gets harder every time.