Covenant AI: The Holy Trinity of Decentralized AI Training

Covenant AI: The Holy Trinity of Decentralized A.I. Training
Read Time:9 Minute, 41 Second

Contributor: School of Crypto

It wasn’t supposed to be possible to pre-train a foundational model off the internet alone. The consensus in the A.I. Industry was clear: you needed massive datacenters with expensive cooling systems, enterprise-grade machines, and thousands of GPUs clustered together to create a large language model (LLM) with any real capability.

The technical challenges seemed insurmountable. Bandwidth limitations, synchronization issues across distributed nodes, and the vast computational overhead made decentralized training appear economically unfeasible.

Sam Dare, the CEO of Templar, thought otherwise. He believed that by coordinating enough compute through miners on Bittensor’s decentralized network, it would be possible to train a foundational model competitive with centralized alternatives.

While it had never been done before, Sam and the Templar team decided to attempt the impossible. Over the course of the following year, they would achieve something that even surprised Sam Dare himself.

The Breakthrough: Covenant72B and Templar-1B

In May 2025, Templar announced a watershed moment in decentralized AI: they had successfully pre-trained foundational models in a fully decentralized manner, starting with Templar-1B, a 1.2B parameter model trained over 20,000 training cycles. This proof of concept paved the way for their flagship achievement.

Then came Covenant72B, a 72-billion parameter model trained on over 1.2 trillion tokens. This became the largest model ever trained leveraging distributed internet compute. Templar is the pre-training engine, but it’s actually part of a larger ecosystem called Covenant A.I.,  a complete decentralized A.I. stack that we’ll explore throughout this article.

The results were remarkable. According to benchmark comparisons, Covenant72B demonstrated superior performance to centralized checkpoints on certain tasks like ARC-C and ARC-E reasoning benchmarks, while tracking slightly behind on others like HellaSwag and MMLU. 

The model proved competitive with Meta’s Llama family of models, which was an extraordinary achievement for a decentralized approach.

Const, the founder of Bittensor, acknowledged the accomplishment:

While Covenant72B’s 72 billion parameters make it comparable to larger Llama variants, the specific comparison to Llama2 7B highlights the token efficiency achieved. Templar successfully trained a model with a fraction of the tokens typically required.

According to Dare, Covenant72B represented proof that the assumed limitations of decentralized training were surmountable. The question was no longer whether it could be done, but how far this approach could scale.

The Economics: Why Decentralization Matters

During an appearance on the Hashrate podcast with Mark Jeffrey, Sam Dare broke down the substantial costs associated with pre-training large language models in traditional settings. Industry estimates suggest thorough pre-training of a production-ready model typically requires $500,000 to $5 million in compute costs alone, and that’s before factoring in infrastructure.

Beyond the direct training expenses, there are significant barriers to entry. Massive datacenters with high-powered machines, specialized GPUs, and industrial cooling systems are required to generate sufficient compute. Access to these resources is largely limited to well-funded entities like Meta, OpenAI, and Anthropic.

Traditional A.I. companies face additional constraints. They’re dependent on local power grids and the physical limitations of their datacenter infrastructure. Sam mentioned that even major players like Microsoft struggle with the operational costs and logistical challenges of maintaining high-performance datacenters, including difficulties with machine colocation that lead to thermal management and efficiency issues.

Templar’s approach offers a fundamentally different economic model. By tapping into globally distributed compute resources through Bittensor’s incentivized network, they’ve demonstrated the ability to pre-train competitive models at dramatically reduced costs, potentially 10-20x cheaper than traditional approaches.

Rather than requiring billions in datacenter infrastructure investment, Templar coordinates existing compute worldwide, effectively turning the internet itself into a distributed supercomputer.

Technical Deep Dive: How Covenant AI Makes Decentralized Training Work

The technical innovation behind Covenant AI is sophisticated, overcoming challenges that made many researchers skeptical of decentralized training’s viability.

The holy trinity of the A.I. model development stack consists of Templar for pre-training, Grail for post-training, and Basilica for GPU inference.

Here are the technical breakdowns for each respective cog:

Templar: Distributed Pre-Training Architecture

Templar operates on Bittensor’s subnet network, where miners and validators collaborate in a carefully orchestrated process:

Miners sync with the latest model checkpoint, receive consistent data batches, and perform local forward and backward passes to compute gradients, the mathematical updates that improve model performance. 

To address bandwidth constraints, these gradients are compressed using techniques like Discrete Cosine Transform (DCT), reducing data transmission requirements by orders of magnitude while preserving the essential information needed for model updates.

Validators assess the quality of miner contributions by testing whether the submitted gradients actually improve model performance. They score submissions based on error reduction metrics, then allocate rewards through blockchain-recorded weights and TAO token emissions. 

The elegance of this system lies in its ability to aggregate learning signals from geographically distributed hardware without requiring the low-latency, high-bandwidth interconnects that traditional datacenter training demands. 

Grail: Specialized Post-Training

Grail handles the post-training phase, which Sam Dare analogizes as “grad school” compared to pre-training’s “university education.” This stage refines base models to create stronger connections across knowledge domains and adapts them for specialized applications.

The process works by taking Templar’s pre-trained foundation and fine-tuning it on domain-specific datasets. For instance, a pharmaceutical company could input proprietary drug research data, and Grail would adapt the model to excel at chemistry and pharmacology tasks while maintaining general reasoning capabilities. This skill transfer happens through continued training on curated datasets.

Miners providing compute for post-training are rewarded based on the quality and efficiency of their contributions, maintaining the incentive structure that makes the entire ecosystem function.

The Grail validators ensure fairness and security. They create random puzzles to challenge miners, check the miners’ responses for accuracy and originality, score their performance over short periods, and update rewards accordingly. 

They also handle data storage, share results, and connect with the blockchain to keep the system transparent and balanced.

Basilica: GPU Compute Infrastructure

Basilica completes the stack as the GPU compute layer, providing the hardware environment where models are tested, trained, and deployed. It functions as a marketplace for verified GPU compute, where AI engineers can deploy artifacts and access distributed hardware resources.

The subnet implements quality controls: idle GPUs or hardware that doesn’t meet performance thresholds receive minimal or no emissions. This ensures that only active, capable hardware contributes to the network, maintaining the reliability required for production workloads.

By integrating seamlessly with Templar and Grail, Basilica transforms globally distributed GPUs into on-demand infrastructure accessible at a fraction of traditional cloud compute costs.

The Basilica validators simply verify the miner’s hardware capabilities while maintaining GPU profiles and scoring the miner’s overall performance.Β 

Training as a Service (TaaS): The Business Model

The combination of Templar’s pre-training, Grail’s post-training, and Basilica’s compute infrastructure positions Covenant AI as an end-to-end A.I. development platform, effectively offering “Training as a Service”.

Sam Dare is currently engaging with potential enterprise clients, including institutions, universities, and government agencies. The value proposition is compelling for specific use cases:

Consider a pharmaceutical company wanting to train a model on proprietary drug discovery data. Sending this information to OpenAI’s or Anthropic’s servers creates IP and confidentiality risks. 

Or imagine a government intelligence agency needing a specialized model trained on classified information. Traditional AI providers can’t accommodate these requirements.

This raises an important question: why pre-train a model from scratch rather than fine-tuning an existing model like Llama or GPT? The costs of pre-training far exceed post-training alone.

According to Sam Dare, the answer lies in transparency, control, and optimization. With off-the-shelf models, “you don’t know how the sausage was made,” as he puts it. 

The challenge lies in pricing this emerging market. As a novel service category, Covenant AI is still calibrating how to value comprehensive model development, from initial pre-training through post-training to deployment infrastructure. Early enterprise partnerships will likely establish pricing benchmarks for these new industry segments.

Current Limitations and Challenges

The current Bittensor network supporting Covenant AI operates with a few hundred miners. While this is sufficient for proof-of-concept, scaling to thousands or tens of thousands of participants will be necessary to match the computational capacity of major AI labs.

Sam Dare candidly admits they’re “60% of the way there” to competing with frontier models like Claude, GPT-4, Grok, and Gemini. Bridging that final gap, what he describes as going from “90 to 100”, requires substantial additional development. The difference between a competent model and a truly exceptional one involves exponentially more resources and refinement. 

Furthermore, the Training as a Service model remains largely theoretical until Covenant AI secures and publicizes major enterprise clients. The true test will be whether organizations are willing to trust decentralized training for production-critical applications.

Long-Term Vision and Timeline

Looking ahead, Sam Dare and the Templar team have an ambitious roadmap: develop a state-of-the-art model that competes directly with Claude, ChatGPT, Grok, and Gemini on all benchmarks.

The path forward depends heavily on network growth. As more miners join Bittensor and the computational capacity scales, the probability of Covenant AI matching or exceeding centralized competitors increases correspondingly. The economics favor this trajectory, distributed compute becomes more cost-effective at scale, while traditional datacenters face physical and economic constraints.

This creates a compelling David vs. Goliath narrative. Google, X.AI, Meta, OpenAI, and Anthropic have collectively raised tens of billions in investment to power their respective AI initiatives. They operate some of the world’s largest data centers and employ thousands of researchers.

Covenant AI, by contrast, operates through a few hundred independent miners coordinated by elegant incentive mechanisms. Yet in just over a year, they’ve progressed from theoretical possibility to functional 72B parameter models.

The question isn’t whether decentralized training works; Covenant72B and Templar-1B prove it does. The question is how far this approach can scale, and whether the economic and architectural advantages of decentralization can ultimately overcome the head start and resources of Big Tech.

If Covenant AI successfully attracts enterprise clients and continues expanding the Bittensor network, 2025-2026 could see decentralized A.I. move from impressive experiment to legitimate competitor in the foundation model market.

Conclusion: A New Paradigm

Sam Dare himself was surprised by the rapid progress Covenant AI achieved in just over a year. The Covenant72B model made history in a short period of time, with Sam Dare and his team just getting started.

The implications extend beyond Templar specifically. Covenant AI demonstrates that the most sophisticated AI development, previously the exclusive domain of highly capitalized tech giants, can be achieved through coordinated decentralized networks. The barriers to entry haven’t been eliminated, but they’ve been dramatically lowered.

Whether this decentralized approach can truly go toe-to-toe with billions in Big Tech investment remains to be seen. The next 12 to 24 months will be telling. Will enterprise clients embrace Training as a Service? Can the Bittensor network scale by an order of magnitude? Will Covenant AI’s models reach parity with frontier systems?

One thing is clear: the assumption that advanced A.I. development requires massive centralized infrastructure has been fundamentally challenged.Β 

The next frontier in AI training and inference may well be decentralized, transparent, and accessible in ways previously thought impossible. Perhaps by 2026, Sam Dare will surprise himself and the industry once again.

Subscribe to receive The Tao daily content in your inbox.

We don’t spam! Read our privacy policy for more info.

Be the first to comment

Leave a Reply

Your email address will not be published.


*