Jon Durbin Explains Chutes’ Next Phase as the GPU Market Tightens

Read Time:3 Minute, 18 Second

Jon Durbin, a core contributor and backend developer at Chutes (SN64), has published one of the company’s most substantive operational updates since launch.

Chutes: A Glance at Jon’s Thesis on SN64

The piece covers where the platform stands today, the shifts in the GPU market that have changed the playbook, and a new decentralized training method the team has been developing.

The headline thesis is that Chutes has shifted from optimizing for raw throughput to closing the gap between revenue and miner emissions, and that the work happening on the training side may end up being the more consequential bet.

Table of Contents

Where Chutes Stands

Jon Durbin: Chart of Revenue v. Tokens Consumed

Chutes proved its scale by hitting roughly 160 billion tokens served in a single day on permissionless compute, at a time when H200 nodes were accessible at around $0.77 per hour through miner emissions.

That market has largely disappeared, with Hopper and Blackwell GPUs now scarce, prices materially higher, and longer commitment windows becoming standard. Inventory has been cut by roughly two-thirds since December, but revenue has held stable and grown in recent months.

Token volume is down, but dollar-per-token revenue, which is the metric that actually matters, is trending up.

The Revenue Levers

The plan for closing the supply-demand gap runs across five fronts:

a. New compute providers being onboarded to expand network capacity.

b. A mining pool feature in development to stabilize collective compute capacity.

c. Price increases to align with prevailing GPU market rates.

d. Model purging to remove the long tail of unprofitable models and consolidate supply around what users actually demand.

1) Private, shareable custom chutes

• Any chute you deploy is private by default.
• You can share it with specific users; when they call it, they’re billed per use and their spend is credited back to your balance to offset your running costs.
— Chutes (@chutes_ai) September 11, 2025

e. Private Chutes with hourly rather than per-token pricing, offering a more predictable revenue path for consistent workloads.

Parallax

You can force an architecture into decentralized training, or you can craft an architecture specifically for decentralized training (and ultimately insanely performant inference). Stop trying to fight gravity.

"Parallax", so far, is working exceptionally well.

Surrogates,… pic.twitter.com/wXjFNXx003
— Jon Durbin (@jon_durbin) May 11, 2026

Jon has been personally developing Parallax, a new decentralized training method built specifically for Mixture-of-Experts models.

Servify Sphere Solutions: How Mixture-of-Experts Architecture Operates

The intent is not for Chutes to pivot into training, but to produce efficient models that reduce global compute pressure rather than add to it. Early experimental results are notable:

a. A 20 billion parameter run over the public internet using single GPUs per composer hit a roughly 1.5% performance gap against traditional end-to-end training, expected to close with additional training steps.

b. The algorithm cuts per-token FLOPS per island by 3.4x or more while maintaining over 99% forward accuracy.

c. An extreme mode allows routed experts to run on commodity GPUs or even consumer hardware in exchange for moderate bandwidth.

The end goal is matching the quality of frontier MoE models like GLM-5.1 or Kimi-K2.6 on a single H200. A whitepaper and preliminary results are expected to follow in the coming weeks.

Inference and the TEE (Trusted Execution Environment) Migration

Harvard Professor Taps Chutes (SN64) to Test Inference Breakthrough

In parallel, Chutes is pushing on inference performance through secure prefix cache optimizations, a near-complete research collaboration with Professor Juncheng Yang at Harvard on cache hit rates and routing methodologies, and continuous SGLang and vLLM engine updates.

DEV Community: vLLM and SGLang Architecture Comparison Chart

Underneath the optimization work, the team is accelerating the migration to a fully TEE-only infrastructure stack.

Some short-term instability is expected as TEE-incompatible models drop off the platform, but the privacy, security, and validation benefits justify the trade.

Conclusion

The update is one of the cleaner operational reads on what running a serious Bittensor subnet looks like in a tightening compute market. Chutes has already proven it can deliver scale, and the work now is converting that into a sustainable business that funds the longer-term research, particularly Parallax, which may end up mattering more than the inference platform itself.

Enjoyed this article? Join our newsletter

Get the latest TAO & Bittensor news straight to your inbox.

We respect your privacy. Unsubscribe anytime.

Jon Durbin Explains Chutes’ Next Phase as the GPU Market Tightens

Where Chutes Stands

The Revenue Levers

Parallax

Inference and the TEE (Trusted Execution Environment) Migration

Conclusion

Enjoyed this article? Join our newsletter

Like this:

Be the first to comment

Leave a Reply Cancel reply

General Tensor Acquires Backprop Finance in Major Bittensor Expansion

Where Chutes Stands

The Revenue Levers

Parallax

Inference and the TEE (Trusted Execution Environment) Migration

Conclusion

Enjoyed this article? Join our newsletter

Like this:

Related Articles

Bittensor.ai Positions as the App Store for Decentralized AI

Like this:

Net TAO Flow: A Smarter Way to Measure Network Activity

Like this:

Why Subnets Will Outperform Traditional Companies

Like this:

Be the first to comment

Leave a Reply Cancel reply