SUBNETS

Quasar releases first public proof that the subnet architecture works at scale

Quasar (SN24) released Quasar-Preview yesterday, an 18B-parameter Mixture-of-Experts (MoE) model with just 2B active parameters and experimental support for a 5-million-token context window. The team tips it as the “first public proof” that its custom

Ige A

June 9, 2026 · 2 min read

Quasar (SN24) released Quasar-Preview yesterday, an 18B-parameter Mixture-of-Experts (MoE) model with just 2B active parameters and experimental support for a 5-million-token context window.

The team tips it as the “first public proof” that its custom Quasar architecture works at real scale.

Today we’re releasing Quasar-Preview!

Our first public proof that the Quasar architecture works at real scale.

[ 18B MoE – 2B active / 5M context ]

Built with Loop Transformer + Quasar attention
Trained on Bittensor through decentralized infrastructure 👇 pic.twitter.com/TN4QZsqCNJ
— Quasar (@QuasarModels) June 8, 2026

Table of Contents

The bottleneck it’s attacking

Long-context processing is still one of AI’s most stubborn engineering problems. Leading 2026 models advertise ranges up to 10 million tokens (Google’s Gemini 3 Pro and Meta’s Llama 4 Scout among them), but real-world performance often degrades sharply past a few hundred thousand.

Most AI models run on a standard Transformer, the same foundation under GPT, Claude, Gemini. Powerful, but it has a fatal limit: double the context, quadruple the compute. That quadratic wall is why long-context AI is still a bottleneck everywhere.

The architecture

Quasar’s bet is a hybrid recurrent/attention stack rather than dense scaling or pure sparse-attention tricks:

Loop Transformer: A scaffold that reuses decoder layers across multiple passes, raising effective compute depth without inflating parameter count. The Preview runs a single loop with looped anchor injection disabled.
Quasar Hybrid Attention: Layers cycle through three branch types — dominant Quasar branches, Raven (slot-routed recurrent attention with Mamba-2-style decay), and GLA (Flash Linear Attention for fast sequence mixing). The Preview uses 20 layers, with active hybrid layers from 4 to 19.
Sparse MoE routing: 256 experts, 8 selected per token plus one shared expert, keeping active params at ~2B while the full checkpoint sits at ~18B.
Experimental context extension: A “Safe NoPE” (no positional encoding past the first 512 tokens) plus RoPE config enables the 5M-token setting, though the model card flags this path as immature.

The design targets specialized massive-context workloads. The team has hinted that this release is an experiment and a reveal of the long-context architecture, with a tech report and a 10T-token version still to come.

Benchmarks and early pushback

The release includes a table comparing Quasar-Alpha (the lineage leading to Preview) against Covenant-72B, Youtu-LLM-2B, Qwen3-4B-Base, and Gemma-3-4B-PT across MMLU, ARC, PIQA, HellaSwag, OpenBookQA, and MATH-500. Quasar-Alpha looks competitive or better in several categories despite fewer active params and limited training.

The X community flagged few caveats fast. The comparators are older or differently sized models; long-context-specific evals (needle-in-haystack at scale) are absent; and every Quasar result is bolded regardless of win or loss.

Critics also pressed on VRAM requirements for actually running the full 5M context, and on tool-calling and agentic capability. The team’s response: the release “is not about evals or beating other models” but about proving the architecture scales. A fuller tech report is promised.

Learn more about Quasar:

Quasar (SN24) Just Solved the AI Memory Problem, and Anyone Can Use It

Enjoyed this article? Join our newsletter

Get the latest TAO & Bittensor news straight to your inbox.

We respect your privacy. Unsubscribe anytime.

The Daily Dispatch

Enjoyed this article?
Join our newsletter

Get the latest TAO & Bittensor news straight to your inbox — every morning before markets open.

QUASAR SN24

Ige A

Senior Editor

Quasar releases first public proof that the subnet architecture works at scale

The bottleneck it’s attacking

The architecture

Benchmarks and early pushback

Enjoyed this article? Join our newsletter

Enjoyed this article?
Join our newsletter

Like this:

Be the first to comment

Leave a Reply Cancel reply

Quasar releases first public proof that the subnet architecture works at scale

The bottleneck it’s attacking

The architecture

Benchmarks and early pushback

Enjoyed this article? Join our newsletter

Enjoyed this article?Join our newsletter

Like this:

Be the first to comment

Leave a Reply Cancel reply

Related stories

Jean Herelle is Building the Wayback Machine for AI, and It’s Interesting

Beam (SN105) Unveils Beam Data Mesh for Seamless Data Transfers

How Bittensor’s ‘Contestonomics’ Model Is Positioning It for AI’s Shift to Low-Cost Inference

Enjoyed this article?
Join our newsletter