Covenant AI’s TGIF Recap: Templar Pushes Bigger Training, While Grail Breaks the RL Bandwidth Wall

Covenant AI’s TGIF Recap
Read Time:5 Minute, 17 Second

Covenant AI’s latest TGIF session covered major progress across its three core efforts: Templar, Basilica, and Grail. The session’s overall takeaway was that decentralized AI training is moving faster than most people expected, and the team is actively redesigning incentive systems to keep miners innovating.

Watch the full episode here:

Templar: Bigger Model Training, But Incentives Hit a Wall

The team confirmed that Templar has successfully trained the largest model ever trained over the internet, marking another milestone for decentralized pre-training.

But the last run also exposed a serious issue. Once token prices and hardware costs (B200 GPUs) were factored in, miners had little room to optimize. Performance wasn’t bad, but it stopped improving compared to earlier smaller runs.

To address this, Covenant is shifting its approach.

They are now exposing more parts of the protocol as competitions, with the long-term goal of steering miners toward breakthroughs like:

  • Neural architecture search
  • New training algorithms
  • Optimizers that improve internet-scale training

They also revealed they are preparing a new run using Heter Loco, a new optimizer designed to combine data-parallel and model-parallel training.

Crusades: Incentives Worked Too Well and Miners “Broke It”

One of the more interesting updates was around Crusades, Covenant’s competition-style incentive system.

They admitted the first version didn’t last long.

Miners “broke it” within a day, forcing the team to launch a fresh iteration immediately. Covenant framed this as proof of their thesis: scarcity breeds innovation, and miners will aggressively exploit any edge available.

Crusades is now live again, with active monitoring through Discord.

Basilica: Sacred Compute and the Push Toward “Agent Infrastructure”

Basilica continues to position itself as Covenant’s compute layer, and the team is clearly leaning into a bigger narrative shift: the idea that the old SaaS model is being disrupted by agents and distributed infrastructure.

The bigger focus is what the team described as their next scaling step, expected next week, aimed at:

  • scaling dynamically with demand
  • pushing miners toward providing cheaper compute
  • keeping the network competitive without overpaying

Their message was basically: the market is changing fast, and Basilica is being designed to match that shift.

Grail: The Missing Piece for Decentralized Post-Training

The most important part of the session was the update on Grail, Covenant’s decentralized reinforcement learning and post-training network.

Covenant said their early work was heavily focused on pre-training, but the landscape has changed. Post-training is now dominating modern model development, and Grail is meant to solve the hard part: decentralizing reinforcement learning.

They framed it as the final piece needed for a full decentralized training pipeline.

Why RL Is Harder Than Pre-Training

A researcher on the call broke it down clearly:

Pre-training is straightforward. You train on a dataset.

Reinforcement learning is more complex because it has two moving parts:

  • Inference workers generating rollouts
  • A trainer updating the model weights
  • Then sending new weights back to inference workers repeatedly

Covenant’s approach is to push inference costs onto miners because inference is 80%+ of RL cost, while the team maintains the training loop.

The Biggest Bottleneck: Weight Sync Latency

They explained the real killer problem in decentralized RL: sending updated model weights across the internet.

In datacenters, weights can be synced in seconds using high-speed interconnects.

Across public internet, it can take minutes.

They referenced a prior case where syncing weights reportedly took 14 minutes, which makes decentralized RL 10–20× slower than centralized training.

That latency destroys throughput.

PULSE: 100× Less Bandwidth for Weight Synchronization

The major breakthrough discussed was PULSE, Covenant’s new technique for cutting weight sync bandwidth dramatically.

The key insight is that RL weight updates are extremely sparse.

They tested model families including:

  • Gemma
  • Llama
  • Qwen

Instead of comparing model weights before training vs after hours of training, they measured step-by-step updates.

Their conclusion: only about 1% of weights change per step, meaning ~99% sparsity.

And that opens the door to a simple question: why send a full model if almost nothing changes?

Why Sparsity Happens in RL

They explained the cause as a combination of:

  • Adam optimization
  • BF16 precision
  • very low RL learning rates (around 1e-6)

Updates exist, but many changes are too small to survive BF16 rounding, meaning they become zeros. Sparsity becomes a built-in feature of standard RL settings.

What PULSE Actually Does

Instead of shipping the entire checkpoint, PULSE sends only the patch:

  • identify weight differences at bit-level
  • extract changed indices and values
  • compress indices using delta encoding
  • downscale small deltas into smaller integer types
  • compress everything using zstd

The result is an average compression of around 79×, with frequent 100×+ reductions.

Live Deployment Results on Grail

They emphasized this is not theoretical. PULSE is already deployed live on Grail AI, and the results were strong:

  • patch sizes stabilized around 108 MB
  • compared to 14 GB full syncs
  • bandwidth requirements dropped from 20 Gbit/s to 0.2 Gbit/s
  • GPU utilization stayed around 90%
  • every sync is SHA-256 verified
  • fully lossless, no drift, no approximation

Their claim is that this turns decentralized RL from a datacenter-only problem into something that can run over normal internet connections.

Lossless vs Lossy: PULSE Differs From Templar’s Approach

They also compared PULSE to approaches used in pre-training, pointing out that some prior optimizers rely on lossy approximations.

PULSE is positioned differently: bit-identical reconstruction every time.

That means inference nodes receive the exact same weights the trainer produced.

What Comes Next: A Model That Generates Optimized GPU Kernels

The Grail team said their next goal is to prove the system can produce a model that matters.

The model they want to post-train next is focused on generating optimized GPU kernels, a valuable capability with real-world demand.

If successful, it would be a strong demonstration that decentralized RL can produce competitive, usable models.

The Bigger Point: Incentives Are Funding Research

Covenant closed the conversation by reinforcing their broader thesis: decentralized networks force teams to innovate because they cannot brute-force problems with unlimited infrastructure.

Their argument is that constraint creates breakthroughs, and systems like PULSE are exactly what that environment produces.

They ended by acknowledging market volatility but emphasized the team is operating lean to stay antifragile.

Subscribe to receive The Tao daily content in your inbox.

We don’t spam! Read our privacy policy for more info.

Be the first to comment

Leave a Reply

Your email address will not be published.


*