
For years, the narrative around artificial intelligence has been dominated by a simple assumption: only hyperscalers can train large AI models. The logic seemed airtight with massive GPU clusters, proprietary infrastructure, and tens of billions of dollars in compute.
In other words, if you wanted to build frontier AI, you needed to play the hyperscaler game.
But according to Sam Dare, the founder of Covenant AI, that assumption is exactly what decentralized AI is trying to break.
Speaking alongside Jacob ‘Const’ Steeves, a co-founder at Bittensor ($TAO), during Covenant AI’s TGIF Episode 28, Dare introduced Covenant-72B, a new large language model (LLM) trained through decentralized infrastructure on Bittensor ($TAO).
The model was trained permissionlessly across distributed compute on Templar (Bittensor Subnet 3), marking what the team believes is the largest decentralized LLM pre-training run ever attempted.
If their trajectory continues, the implications could extend far beyond a single model.
A Different Vision for AI Training
At the heart of the conversation was that centralized AI labs argue that training modern models requires increasingly powerful hardware like NVIDIA’s latest chips and tightly controlled data centers.
Dare sees it differently.
Instead of relying on hyperscale infrastructure, he opined that the goal is to connect compute resources across the internet and coordinate them through incentives. In his words, decentralized training is about proving that:
a. AI models can be trained permissionlessly,
b. Global compute can be coordinated through incentive systems, and
c. Innovation can emerge outside centralized infrastructure.
This vision is exactly what the Bittensor ecosystem is designed to enable, and Covenant-72B represents one of the clearest demonstrations yet of that philosophy in practice.
From 1 Billion (1B) to 72 Billion (72B) Parameters
One of the most striking aspects of the project is the speed of its development. According to Dare, the team was able to scale from a 1B parameter model to a 72B parameter model in just nine months.
That progression highlights a core advantage of Bittensor’s ecosystem: developers can leverage decentralized infrastructure to rapidly experiment and scale models.
Const emphasized this point during the discussion. While a 70B model might not rival the latest frontier systems yet, he noted that achieving this scale through decentralized training is historically significant.
He noted that training a 70B model over the internet with incentives is the first-of-its-kind.
The milestone is less about competing with today’s largest labs and more about proving that decentralized training can actually work at a meaningful scale.
Don’t Fight Your Miners: The Early Days of Templar
The story of Covenant-72B begins with Templar (Subnet 3), a Bittensor subnet dedicated to decentralized model training.
Dare recalled the early development phase, when Const initially built the first version of the system. At the time, the architecture resembled early decentralized training experiments like Exo Sparta. The team had quickly realized that training models permissionlessly was far more chaotic than traditional machine learning pipelines.
During this important and fragile phase, a challenge emerged immediately, and it’s that miners could behave unpredictably. Because the network was permissionless, participants were free to experiment, attack, or optimize in unexpected ways.
Rather than trying to control the network, the team eventually adopted a different mindset. Instead, the system had to be designed so that incentives naturally guided behavior toward useful training contributions.
As Dare summarized the lesson simply: Don’t fight your miners.
This philosophy would shape the next phase of development.
Solving the Hard Problems of Decentralized Training
Training AI models across distributed nodes introduces several technical challenges that centralized systems rarely face. The team reportedly tackled these problems through decisions like:
1. Permissionless Coordination
The first milestone came with the introduction of Gauntlets, a framework designed to coordinate untrusted miners. Gauntlets allowed the system to:
a. assign tasks across participants
b. verify contributions
c. align incentives toward useful computation
The key objective was eliminating human control loops, allowing the system to function autonomously. Dare explained that true decentralization requires removing subjective decision-making wherever possible.
2. Communication Complexity
Even once miners were cooperating, another challenge appeared. Distributed training requires large amounts of communication between nodes, and without optimization, decentralized training could take years.
To address this, the team developed SparseLoco, an algorithm designed to dramatically reduce communication overhead.
SparseLoco enabled the network to:
a. Perform longer local computation steps,
b. Minimize synchronization requirements, and
c. Maintain training performance while reducing communication costs.
According to the team, this algorithm remains state-of-the-art for decentralized training environments.
The Covenant-72B Training Run
Using these innovations, the team launched the Covenant-72B training run, which was a 72-billion parameter model trained across decentralized infrastructure.
It was noted that some key details from the training process include:
a. Training performed across 20 compute nodes,
b. Each node running high-performance GPU clusters, and
c. Approximately 1.1 billion training tokens.
While the token count remains far below frontier training runs, the experiment demonstrated that large-scale decentralized pre-training is feasible. Const framed it in historical terms noting that centralized AI labs currently train models using tens of billions of tokens, but if decentralized systems can scale rapidly, the gap could shrink much faster than many had expected.
The Next Frontier: Heterogeneous Compute
One of the most interesting discussions during the TGIF session focused on what comes next. That’s that current decentralized training setups still rely on relatively powerful GPUs, and that limits how broadly the network can scale.
The team’s next research direction aims to remove that constraint entirely, and their proposed solution is an architecture called HeteroLoco (Heterogeneous SparseLoCo).
The idea is that instead of requiring identical GPUs across the network, the system would allow heterogeneous clusters of compute. This means the training network could include:
a. Data center GPUs like B200 or H100,
b. Mid-tier hardware such as A100,
c. Consumer GPUs like RTX 3090 or 4090, and
d. Large clusters of smaller machines.
In this model, compute resources are grouped into “islands of compute”, where each cluster performs local operations before synchronizing with the broader training network.
The result is a training topology that resembles a massive distributed Lego system, where different pieces of hardware contribute to a single model.
Turning the Internet Into a Training Cluster
Const highlighted the broader implications of this approach highlighting that if heterogeneous compute becomes viable, decentralized training networks could tap into vast pools of unused hardware around the world.
Instead of relying solely on expensive data center GPUs, the network could incorporate:
a. Idle research cluster GPUs,
b. Underutilized enterprise hardware,
c. Home mining rigs, and
d. Enthusiast gaming PCs.
The team is also exploring ways to pool smaller compute contributions together. For example, individual GPUs could combine into mining pools, clusters could represent single network participants, and rewards could be distributed proportionally across contributors
In other words, training AI models could eventually resemble cryptocurrency mining.
Closing the Compute Gap
Despite the progress, the speakers acknowledged that decentralized training still faces a large gap compared to centralized AI labs.
According to industry estimates, decentralized compute currently represent a fraction of the total training power used by leading AI companies. But Dare sees scarcity as an advantage rather than a limitation.
Instead of competing directly with hyperscalers by buying the same hardware, decentralized systems can innovate through better algorithms and more efficient compute coordination.
As he put it, decentralized AI does not win by copying centralized labs, it wins by doing something entirely different.
The Road Ahead for Templar
Looking forward, the team plans to push the system through several major milestones. Upcoming goals include:
a. Improving training efficiency and hardware utilization,
b. Scaling the number of nodes participating in training,
c. Integrating heterogeneous compute clusters, and
d. Dramatically increasing token counts in future runs.
Eventually, the architecture could support trillion-parameter scale training across decentralized networks, and once it happens, the implications would be profound.
Instead of a handful of AI labs controlling the future of machine intelligence, anyone with compute could participate in building it.
A Small Step for Decentralized AI
Covenant-72B may not rival today’s largest frontier models yet, but its significance lies elsewhere.
For the first time, a model of this scale has been trained permissionlessly across decentralized infrastructure, and according to Dare and Steeves, this is only the beginning.
If decentralized training continues to evolve at its current pace, the next few years could fundamentally reshape how AI models are built. Though not in hyperscale data centers, but across a global network of distributed machines, coordinated through incentives and open participation.
If Covenant-72B is any indication, that future might arrive faster than many people expect.

Be the first to comment