Covenant AI’s PULSE: Making Decentralized RL for LLMs as Fast as Centralized Training

Covenant AI's PULSE: Making Decentralized RL for LLMs as Fast as Centralized Training
Read Time:1 Minute, 50 Second

Covenant AI just released PULSE (Patch Updates via Lossless Sparse Encoding), a technique that slashes bandwidth for weight synchronization in decentralized reinforcement learning (RL) by 100Γ—+, while staying completely lossless (bit-identical reconstruction, SHA-256 verified on every sync).

Media credit: Erfan Miahi

The biggest bottleneck in decentralized RL: training can happen on fast interconnects, but inference nodes are spread globally over normal internet. Syncing full model weights becomes a bandwidth nightmare. A 7B checkpoint is ~14 GB, and repeatedly syncing that reduces performance.

PULSE fixes this by exploiting a powerful (and honestly surprising) property: RL weight updates are extremely sparseβ€”around 99% sparse per step.

What makes this solid is how they measured it, which is by bitwise sparsity step-by-step, not just comparing initial vs final checkpoints. They tested across model families, sizes, and even async delays up to 32 steps. The result showed that sparsity consistently stays above 98% in realistic training conditions.

The reason is inferred in the update. Gradients are dense, but after Adam optimization + BF16 precision at RL learning rates around 1–3Γ—10⁻⁢, most updates are so small they round to zero. So you get sparsity β€œfor free” with standard RL hyperparameters.

Learning rate matters a lot: lower LR β†’ smaller deltas β†’ more values fall below BF16 precision β†’ higher sparsity.

From there, they built the full compression pipeline:

  • Bitwise checkpoint diff: extract changed indices + values
  • Delta encoding for indices
  • Type downscaling (uint8/16 for small deltas)
  • zstd compression

Average compression lands at ~79Γ—, often reaching 100Γ—+.

And this isn’t theoretical. PULSE is already deployed live on Grail AI β€” Covenant’s decentralized, incentivized RL network.

In real deployment:

  • Patch sizes stabilize around ~108 MB (~130Γ— smaller than 14 GB full syncs)
  • Bandwidth drops from 20 Gbit/s β†’ 0.2 Gbit/s
  • GPU utilization stays around 90%
  • Every sync is SHA-256 verified, meaning zero information loss and no drift

That bandwidth reduction is the real unlock. It takes what used to require datacenter-grade networking and makes it viable over normal consumer connections. Decentralized RL starts matching centralized performance.


Learn more about the PULSE update:

Paper: https://arxiv.org/pdf/2602.03839

Code: https://github.com/one-covenant/grail

Enjoyed this article? Join our newsletter

Get the latest Bittensor & TAO ecosystem news straight to your inbox.

We respect your privacy. Unsubscribe anytime.

Be the first to comment

Leave a Reply

Your email address will not be published.


*