Jon Durbin Explains Chutes’ Next Phase as the GPU Market Tightens

Jon Durbin Explains Chutes’ Next Phase as the GPU Market Tightens
Read Time:3 Minute, 18 Second

Jon Durbin, a core contributor and backend developer at Chutes (SN64), has published one of the company’s most substantive operational updates since launch. 

Chutes: A Glance at Jon’s Thesis on SN64

The piece covers where the platform stands today, the shifts in the GPU market that have changed the playbook, and a new decentralized training method the team has been developing. 

The headline thesis is that Chutes has shifted from optimizing for raw throughput to closing the gap between revenue and miner emissions, and that the work happening on the training side may end up being the more consequential bet.

Where Chutes Stands

Jon Durbin: Chart of Revenue v. Tokens Consumed

Chutes proved its scale by hitting roughly 160 billion tokens served in a single day on permissionless compute, at a time when H200 nodes were accessible at around $0.77 per hour through miner emissions.

That market has largely disappeared, with Hopper and Blackwell GPUs now scarce, prices materially higher, and longer commitment windows becoming standard. Inventory has been cut by roughly two-thirds since December, but revenue has held stable and grown in recent months.

Chutes’ Revenue Per LLM Tokens

Token volume is down, but dollar-per-token revenue, which is the metric that actually matters, is trending up.

The Revenue Levers

The plan for closing the supply-demand gap runs across five fronts:

a. New compute providers being onboarded to expand network capacity.

b. A mining pool feature in development to stabilize collective compute capacity.

c. Price increases to align with prevailing GPU market rates.

d. Model purging to remove the long tail of unprofitable models and consolidate supply around what users actually demand.

e. Private Chutes with hourly rather than per-token pricing, offering a more predictable revenue path for consistent workloads.

Parallax

Jon has been personally developing Parallax, a new decentralized training method built specifically for Mixture-of-Experts models. 

Servify Sphere Solutions: How Mixture-of-Experts Architecture Operates 

The intent is not for Chutes to pivot into training, but to produce efficient models that reduce global compute pressure rather than add to it. Early experimental results are notable:

a. A 20 billion parameter run over the public internet using single GPUs per composer hit a roughly 1.5% performance gap against traditional end-to-end training, expected to close with additional training steps.

b. The algorithm cuts per-token FLOPS per island by 3.4x or more while maintaining over 99% forward accuracy.

c. An extreme mode allows routed experts to run on commodity GPUs or even consumer hardware in exchange for moderate bandwidth.

The end goal is matching the quality of frontier MoE models like GLM-5.1 or Kimi-K2.6 on a single H200. A whitepaper and preliminary results are expected to follow in the coming weeks.

Inference and the TEE (Trusted Execution Environment) Migration

In parallel, Chutes is pushing on inference performance through secure prefix cache optimizations, a near-complete research collaboration with Professor Juncheng Yang at Harvard on cache hit rates and routing methodologies, and continuous SGLang and vLLM engine updates. 

DEV Community: vLLM and SGLang Architecture Comparison Chart

Underneath the optimization work, the team is accelerating the migration to a fully TEE-only infrastructure stack. 

IQ.Wiki: What TEE Means

Some short-term instability is expected as TEE-incompatible models drop off the platform, but the privacy, security, and validation benefits justify the trade.

Conclusion

The update is one of the cleaner operational reads on what running a serious Bittensor subnet looks like in a tightening compute market. Chutes has already proven it can deliver scale, and the work now is converting that into a sustainable business that funds the longer-term research, particularly Parallax, which may end up mattering more than the inference platform itself.

Enjoyed this article? Join our newsletter

Get the latest TAO & Bittensor news straight to your inbox.

We respect your privacy. Unsubscribe anytime.

Be the first to comment

Leave a Reply

Your email address will not be published.


*